Default Notebook Extensions For Better Presentation: Imgur

Data Types

Numeric (integer and double)

i1 = 10
i2= 2.34
i3  = 5/2
i4 = -90
i5 = 5L          # Here, L tells R to store the value as an integer,
i6 = 3+6i
print(class(i1))
print(typeof(i1))
print(class(i2))
print(class(i3))
print(class(i4))
print(class(i5))
print(class(i6))
# these are to test for numerical data types
#real numbers (real) and complex numbers
[1] "numeric"
[1] "double"
[1] "numeric"
[1] "numeric"
[1] "numeric"
[1] "integer"
[1] "complex"
print(typeof(i1))
print(typeof(i2))
print(typeof(i3))
print(typeof(i4))
print(typeof(i5))
[1] "double"
[1] "double"
[1] "double"
[1] "double"
[1] "integer"

complex

i5 = 2+7i
print(class(i5))
[1] "complex"

character and factor

c1 = "c"
c2 = "data"
c3 = 'R-Programming'
print(class(c1))
print(class(c2))
print(class(c3))
v1 <- c("data","science","R-Programming")
print(class(v1))
[1] "character"
[1] "character"
[1] "character"
[1] "character"

Factors Factor object encodes a vector of unique elements (levels) from the given data vector.

v1 <- c("data","science","R-Programming")
f1 =factor(v1)
print(class(v1))
print(v1)
print(f1)
v2 <- c("medium","High","low")
f2 =factor(v2)
print(f2)
v3 <- c("medium","high","low")
f3 =factor(v3, ordered = TRUE)
print(f3)
v4 <- c("medium","high","low")
f4 =factor(v4, ordered = TRUE, levels = c("low", "medium", "high"))
print(f4)
[1] "character"
[1] "data"          "science"       "R-Programming"
[1] data          science       R-Programming
Levels: data R-Programming science
[1] medium High   low   
Levels: High low medium
[1] medium high   low   
Levels: high < low < medium
[1] medium high   low   
Levels: low < medium < high

logical

l1 = TRUE
l2 = FALSE
print(class(l1))
print(class(l2))
[1] "logical"
[1] "logical"

date and time

c1 <- '2021-6-16'
d1 <- as.Date('2021-6-16')
print(c1)
print(d1)
print(class(c1))
print(class(d1))
[1] "2021-6-16"
[1] "2021-06-16"
[1] "character"
[1] "Date"

as.Date('1/15/2001',format='%m/%d/%Y')
# "2001-01-15"
as.Date('April 26, 2001',format='%B %d, %Y')
# "2001-04-26"
as.Date('22JUN01',format='%d%b%y')   # %y is system-specific; use with caution
# "2001-06-22"

as.POSIXct: Date-time Conversion-

Sys.Date()
Sys.time()
[1] "2022-08-07 10:01:29 IST"

Raw

raw<- charToRaw("R programming")  
print(raw)
 [1] 52 20 70 72 6f 67 72 61 6d 6d 69 6e 67

Raw is a very unusual data type. For instance, you could transform a character object or a integer numeric value to a raw object with the charToRaw and intToBits functions, respectively.

a <- charToRaw("data science")
print(a) # [1] 64 61 74 61 20 73 63 69 65 6e 63 65
class(a) # "raw"
print(is.vector(a))
b <- intToBits(6)
print(b)
class(b) # "raw"
 [1] 64 61 74 61 20 73 63 69 65 6e 63 65
'raw'
[1] TRUE
 [1] 00 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[26] 00 00 00 00 00 00 00
'raw'
rawToChar(a, multiple = TRUE)
<ol class=list-inline>
  • 'd'
  • 'a'
  • 't'
  • 'a'
  • ' '
  • 's'
  • 'c'
  • 'i'
  • 'e'
  • 'n'
  • 'c'
  • 'e'
  • </ol>

    Objects:
    In every computer language variables provide a means of accessing the data stored in memory. R does not provide direct access to the computer’s memory but rather provides a number of specialized data structures we will refer to as objects. These objects are referred to through symbols or variables

    Everything in R language is an Object, Objects are further categorized into above mentioned list

    drawing

    In this chapter we provide preliminary descriptions extraction and operations of the various data structures provided in R

    Data Structures

    vectors

    Atomic vectors are one of the basic types of objects in R programming. Atomic vectors can store homogeneous data types such as character, doubles, integers, raw, logical, and complex. A single element variable is also said to be vector.

    To create a vector we use c() function each value/element seperated with comma
    Eg:

    1. x <- c(2,7,1,7,1,6,80,0,1)
    2. z <- c("Alec", "Dan", "Rob", "Rich")
    3. y <- c(TRUE, TRUE, FALSE, FALSE)
    x <- c(2,7,1,7,1,6,80,0,1)
    y <- c(TRUE, TRUE, FALSE, FALSE)
    z <- c("Alec", "Dan", "Rob", "Rich")
    
    print(class(x)) # numeric vector
    print(class(y)) # logical vector
    print(class(z)) # character vector
    
    [1] "numeric"
    [1] "logical"
    [1] "character"
    

    Indexing

    Positive

    print(x) #entire vector
    print(x[1])
    print(x[2])
    # to get range of elements
    print(x) #entire vector
    print(x[1:5])
    print(x[3:8])
    
    [1]  2  7  1  7  1  6 80  0  1
    [1] 2
    [1] 7
    [1]  2  7  1  7  1  6 80  0  1
    [1] 2 7 1 7 1
    [1]  1  7  1  6 80  0
    

    Negative

    print(x) #entire vector
    print(x[-1]) #ignore/exclude first element
    print(x[-4]) #ignore/exclude fourth value
    
    [1]  2  7  1  7  1  6 80  0  1
    [1]  7  1  7  1  6 80  0  1
    [1]  2  7  1  1  6 80  0  1
    

    matrices

    Matrices To store values as 2-Dimensional array, matrices are used in R. Data, number of rows and columns are defined in the matrix() function.

    To create a matrix we use matrix() function with nrow and ncol arguements
    Eg:

    1. m1 <- matrix(c(2,7,1,7,1,6,80,0,1),nrow=2,ncol=4)
    m1 <- matrix(c(2,7,1,7,1,6,80,0,1),nrow=3,ncol=3)
    print(m1)
    
         [,1] [,2] [,3]
    [1,]    2    7   80
    [2,]    7    1    0
    [3,]    1    6    1
    

    Indexing

    print(m1[1,1]) #first row first column element
    print(m1[2,3]) #second row third column element
    print(m1[3,3]) #third row third column value
    print(m1[2,]) #entire second row
    print(m1[,3]) #entire second row
    print(m1[,3,drop=FALSE]) #entire second row as vertical display
    
    [1] 2
    [1] 0
    [1] 1
    [1] 7 1 0
    [1] 80  0  1
         [,1]
    [1,]   80
    [2,]    0
    [3,]    1
    
    thismatrix <- matrix(c("apple", "banana", "cherry", "orange", "mango", "pineapple"), nrow = 3, ncol =2)
    print(thismatrix)
    #Remove the first row and the first column
    
         [,1]     [,2]       
    [1,] "apple"  "orange"   
    [2,] "banana" "mango"    
    [3,] "cherry" "pineapple"
    
    thismatrix <- thismatrix[-c(1), -c(1)]
    
    print(thismatrix)
    
    [1] "mango"     "pineapple"
    

    create a matrix using cbind and rbind

    m <- cbind(Index = c(1:3), Age = c(30, 45, 34), Salary = c(500, 600, 550)) 
    class(m)
    print(m)
    
    'matrix'
         Index Age Salary
    [1,]     1  30    500
    [2,]     2  45    600
    [3,]     3  34    550
    

    arrays

    Arrays array() function is used to create n-dimensional array. This function takes dim attribute as an argument and creates required length of each dimension as specified in the attribute.

    a <- array(data = 1:27, dim=c(3,3,3))
    print(class(a))
    print(a)
    
    [1] "array"
    , , 1
    
         [,1] [,2] [,3]
    [1,]    1    4    7
    [2,]    2    5    8
    [3,]    3    6    9
    
    , , 2
    
         [,1] [,2] [,3]
    [1,]   10   13   16
    [2,]   11   14   17
    [3,]   12   15   18
    
    , , 3
    
         [,1] [,2] [,3]
    [1,]   19   22   25
    [2,]   20   23   26
    [3,]   21   24   27
    
    

    Indexing

    print(a[2,3,2]) # extract single element
    
    [1] 17
    
    print(a[,,2]) # A two-dimensional array is the same thing as a matrix.
    
         [,1] [,2] [,3]
    [1,]   10   13   16
    [2,]   11   14   17
    [3,]   12   15   18
    
    print(a[1,,2]) # extract one row
    
    [1] 10 13 16
    
    print(a[, 3, 2, drop = FALSE])  # extract one column
    
    , , 1
    
         [,1]
    [1,]   16
    [2,]   17
    [3,]   18
    
    

    data frames

    Data Frames Data frames are 2-dimensional tabular data object in R programming. Data frames consists of multiple columns and each column represents a vector. Columns in data frame can have different modes of data unlike matrices.

    Name <- c("Ramesh", "Tarun", "Shekar")
    age <- c(23, 54, 32)
    height <- c(4.6,5.4,6.2)
    df <- data.frame(Name, age, height)
    df
    
    Name age height
    Ramesh 23 4.6
    Tarun 54 5.4
    Shekar 32 6.2
    df <- data.frame(x = 1:3, y = c("a", "b", "c"))
    print(df)
    
      x y
    1 1 a
    2 2 b
    3 3 c
    

    Create dataframe using cbind and rbind methods

    df1 <- cbind(1, df)
    print(class(df1))
    print(df1)
    # Using rbind
    df <- data.frame(a = c(1:5), b = (1:5)^2)
    df
    df2 = rbind(df, c(2, 3), c(5, 6))
    print(df2)
    
    [1] "data.frame"
      1 x y
    1 1 1 a
    2 1 2 b
    3 1 3 c
    
    a b
    1 1
    2 4
    3 9
    4 16
    5 25
      a  b
    1 1  1
    2 2  4
    3 3  9
    4 4 16
    5 5 25
    6 2  3
    7 5  6
    

    Indexing and Extracting Elements:

    Name <- c("Ramesh", "Tarun", "Shekar")
    age <- c(23, 54, 32)
    height <- c(4.6,5.4,6.2)
    df <- data.frame(Name, age, height)
    df
    
    Name age height
    Ramesh 23 4.6
    Tarun 54 5.4
    Shekar 32 6.2
    df[1,1]
    
    Ramesh
    <summary style=display:list-item;cursor:pointer> Levels: </summary> <ol class=list-inline>
  • 'Ramesh'
  • 'Shekar'
  • 'Tarun'
  • </ol>
    print(df$Name)
    
    [1] Ramesh Tarun  Shekar
    Levels: Ramesh Shekar Tarun
    
    print(df[,'Name'])
    
    [1] Ramesh Tarun  Shekar
    Levels: Ramesh Shekar Tarun
    
    print(df[,c('Name','age')])
    
        Name age
    1 Ramesh  23
    2  Tarun  54
    3 Shekar  32
    

    Create a dataframe using expand.grid() method

    eg <- expand.grid(pants = c("blue", "black"), shirt = c("white", "grey", "plaid"))
    print(class(eg))
    print(eg)
    
    [1] "data.frame"
      pants shirt
    1  blue white
    2 black white
    3  blue  grey
    4 black  grey
    5  blue plaid
    6 black plaid
    

    lists

    Lists List is another type of object in R programming. List can contain heterogeneous data types such as vectors or another lists.

    • List is a special vector. Each element can be a different class.
    • lists act as containers
    • Unlike atomic vectors, its contents are not restricted to a single type
    • a list can be anything, and two elements within a list can be of different types!
    • Lists are sometimes called recursive vectors, because a list can contain other lists

    Create a ists using list function
    x <- list(1, "a", TRUE, 1+4i)

    lst1 <- list(1, "a", TRUE, 1+4i)
    print(lst1)
    
    [[1]]
    [1] 1
    
    [[2]]
    [1] "a"
    
    [[3]]
    [1] TRUE
    
    [[4]]
    [1] 1+4i
    
    

    Indexing

    print(lst1[1])
    
    [[1]]
    [1] 1
    
    
    class(lst1[1])
    
    'list'

    print(lst1[1]*2) # we will get error because its not a number, but its still a list

    class(lst1[[1]])
    
    'numeric'
    print(lst1[[1]])
    
    [1] 1
    
    lst1[[1]]*2
    
    2
    v <- c("apple", "banana", "cherry", "orange", "mango", "pineapple")
    print(v)
    #creating matrix
    m <- matrix(c("apple", "banana", "cherry", "orange", "mango", "pineapple"), nrow = 3, ncol =2)
    print(m)
    #creating array
    a <- array(c("apple", "banana", "cherry", "orange", "mango", "pineapple"), dim = c(2,2,2))
    print(a)
    #creating dataframe
    Name <- c("Ramesh", "Tarun", "Shekar")
    age <- c(23, 54, 32)
    height <- c(4.6,5.4,6.2)
    df <- data.frame(Name, age, height)
    print(df)
    
    [1] "apple"     "banana"    "cherry"    "orange"    "mango"     "pineapple"
         [,1]     [,2]       
    [1,] "apple"  "orange"   
    [2,] "banana" "mango"    
    [3,] "cherry" "pineapple"
    , , 1
    
         [,1]     [,2]    
    [1,] "apple"  "cherry"
    [2,] "banana" "orange"
    
    , , 2
    
         [,1]        [,2]    
    [1,] "mango"     "apple" 
    [2,] "pineapple" "banana"
    
        Name age height
    1 Ramesh  23    4.6
    2  Tarun  54    5.4
    3 Shekar  32    6.2
    
    lst2 <- list(v,m,a,df, lst1)
    print(class(lst2))
    print(lst2)
    
    [1] "list"
    [[1]]
    [1] "apple"     "banana"    "cherry"    "orange"    "mango"     "pineapple"
    
    [[2]]
         [,1]     [,2]       
    [1,] "apple"  "orange"   
    [2,] "banana" "mango"    
    [3,] "cherry" "pineapple"
    
    [[3]]
    , , 1
    
         [,1]     [,2]    
    [1,] "apple"  "cherry"
    [2,] "banana" "orange"
    
    , , 2
    
         [,1]        [,2]    
    [1,] "mango"     "apple" 
    [2,] "pineapple" "banana"
    
    
    [[4]]
        Name age height
    1 Ramesh  23    4.6
    2  Tarun  54    5.4
    3 Shekar  32    6.2
    
    [[5]]
    [[5]][[1]]
    [1] 1
    
    [[5]][[2]]
    [1] "a"
    
    [[5]][[3]]
    [1] TRUE
    
    [[5]][[4]]
    [1] 1+4i
    
    
    
    lst2[[1]] # a vector stored in a lis
    
    <ol class=list-inline>
  • 'apple'
  • 'banana'
  • 'cherry'
  • 'orange'
  • 'mango'
  • 'pineapple'
  • </ol>
    lst2[[1]][2]
    
    'banana'
    lst2[[2]] # a matrix stored in a list
    
    apple orange
    banana mango
    cherry pineapple
    lst2[[2]][2,2]
    
    'mango'
    print(lst2[[3]])
    
    , , 1
    
         [,1]     [,2]    
    [1,] "apple"  "cherry"
    [2,] "banana" "orange"
    
    , , 2
    
         [,1]        [,2]    
    [1,] "mango"     "apple" 
    [2,] "pineapple" "banana"
    
    
    print(lst2[[3]][,,2])
    
         [,1]        [,2]    
    [1,] "mango"     "apple" 
    [2,] "pineapple" "banana"
    
    print(lst2[[3]][2,1,2])
    
    [1] "pineapple"
    
    print(lst2[[4]])
    
        Name age height
    1 Ramesh  23    4.6
    2  Tarun  54    5.4
    3 Shekar  32    6.2
    
    print(lst2[[4]][2,1])
    
    [1] Tarun
    Levels: Ramesh Shekar Tarun
    
    print(lst2[[5]])
    
    [[1]]
    [1] 1
    
    [[2]]
    [1] "a"
    
    [[3]]
    [1] TRUE
    
    [[4]]
    [1] 1+4i
    
    
    print(lst2[[5]][[2]])
    
    [1] "a"
    

    Reshaping R Objects

    vec <- 1:12 # a vector
    print(vec)
    mat <- matrix( vec, nrow=2) # a matrix
    print(mat)
    dim(mat) <- NULL
    print(mat) # back to vector
    
     [1]  1  2  3  4  5  6  7  8  9 10 11 12
         [,1] [,2] [,3] [,4] [,5] [,6]
    [1,]    1    3    5    7    9   11
    [2,]    2    4    6    8   10   12
     [1]  1  2  3  4  5  6  7  8  9 10 11 12
    
    print(mtcars)
    ULmtcars <- unlist(mtcars) # produces a vector from the dataframe
    
                         mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    
    print(ULmtcars)
    UCmtcars <- unclass(mtcars) # removes the class attribute, turning the dataframe into a
    
       mpg1    mpg2    mpg3    mpg4    mpg5    mpg6    mpg7    mpg8    mpg9   mpg10 
     21.000  21.000  22.800  21.400  18.700  18.100  14.300  24.400  22.800  19.200 
      mpg11   mpg12   mpg13   mpg14   mpg15   mpg16   mpg17   mpg18   mpg19   mpg20 
     17.800  16.400  17.300  15.200  10.400  10.400  14.700  32.400  30.400  33.900 
      mpg21   mpg22   mpg23   mpg24   mpg25   mpg26   mpg27   mpg28   mpg29   mpg30 
     21.500  15.500  15.200  13.300  19.200  27.300  26.000  30.400  15.800  19.700 
      mpg31   mpg32    cyl1    cyl2    cyl3    cyl4    cyl5    cyl6    cyl7    cyl8 
     15.000  21.400   6.000   6.000   4.000   6.000   8.000   6.000   8.000   4.000 
       cyl9   cyl10   cyl11   cyl12   cyl13   cyl14   cyl15   cyl16   cyl17   cyl18 
      4.000   6.000   6.000   8.000   8.000   8.000   8.000   8.000   8.000   4.000 
      cyl19   cyl20   cyl21   cyl22   cyl23   cyl24   cyl25   cyl26   cyl27   cyl28 
      4.000   4.000   4.000   8.000   8.000   8.000   8.000   4.000   4.000   4.000 
      cyl29   cyl30   cyl31   cyl32   disp1   disp2   disp3   disp4   disp5   disp6 
      8.000   6.000   8.000   4.000 160.000 160.000 108.000 258.000 360.000 225.000 
      disp7   disp8   disp9  disp10  disp11  disp12  disp13  disp14  disp15  disp16 
    360.000 146.700 140.800 167.600 167.600 275.800 275.800 275.800 472.000 460.000 
     disp17  disp18  disp19  disp20  disp21  disp22  disp23  disp24  disp25  disp26 
    440.000  78.700  75.700  71.100 120.100 318.000 304.000 350.000 400.000  79.000 
     disp27  disp28  disp29  disp30  disp31  disp32     hp1     hp2     hp3     hp4 
    120.300  95.100 351.000 145.000 301.000 121.000 110.000 110.000  93.000 110.000 
        hp5     hp6     hp7     hp8     hp9    hp10    hp11    hp12    hp13    hp14 
    175.000 105.000 245.000  62.000  95.000 123.000 123.000 180.000 180.000 180.000 
       hp15    hp16    hp17    hp18    hp19    hp20    hp21    hp22    hp23    hp24 
    205.000 215.000 230.000  66.000  52.000  65.000  97.000 150.000 150.000 245.000 
       hp25    hp26    hp27    hp28    hp29    hp30    hp31    hp32   drat1   drat2 
    175.000  66.000  91.000 113.000 264.000 175.000 335.000 109.000   3.900   3.900 
      drat3   drat4   drat5   drat6   drat7   drat8   drat9  drat10  drat11  drat12 
      3.850   3.080   3.150   2.760   3.210   3.690   3.920   3.920   3.920   3.070 
     drat13  drat14  drat15  drat16  drat17  drat18  drat19  drat20  drat21  drat22 
      3.070   3.070   2.930   3.000   3.230   4.080   4.930   4.220   3.700   2.760 
     drat23  drat24  drat25  drat26  drat27  drat28  drat29  drat30  drat31  drat32 
      3.150   3.730   3.080   4.080   4.430   3.770   4.220   3.620   3.540   4.110 
        wt1     wt2     wt3     wt4     wt5     wt6     wt7     wt8     wt9    wt10 
      2.620   2.875   2.320   3.215   3.440   3.460   3.570   3.190   3.150   3.440 
       wt11    wt12    wt13    wt14    wt15    wt16    wt17    wt18    wt19    wt20 
      3.440   4.070   3.730   3.780   5.250   5.424   5.345   2.200   1.615   1.835 
       wt21    wt22    wt23    wt24    wt25    wt26    wt27    wt28    wt29    wt30 
      2.465   3.520   3.435   3.840   3.845   1.935   2.140   1.513   3.170   2.770 
       wt31    wt32   qsec1   qsec2   qsec3   qsec4   qsec5   qsec6   qsec7   qsec8 
      3.570   2.780  16.460  17.020  18.610  19.440  17.020  20.220  15.840  20.000 
      qsec9  qsec10  qsec11  qsec12  qsec13  qsec14  qsec15  qsec16  qsec17  qsec18 
     22.900  18.300  18.900  17.400  17.600  18.000  17.980  17.820  17.420  19.470 
     qsec19  qsec20  qsec21  qsec22  qsec23  qsec24  qsec25  qsec26  qsec27  qsec28 
     18.520  19.900  20.010  16.870  17.300  15.410  17.050  18.900  16.700  16.900 
     qsec29  qsec30  qsec31  qsec32     vs1     vs2     vs3     vs4     vs5     vs6 
     14.500  15.500  14.600  18.600   0.000   0.000   1.000   1.000   0.000   1.000 
        vs7     vs8     vs9    vs10    vs11    vs12    vs13    vs14    vs15    vs16 
      0.000   1.000   1.000   1.000   1.000   0.000   0.000   0.000   0.000   0.000 
       vs17    vs18    vs19    vs20    vs21    vs22    vs23    vs24    vs25    vs26 
      0.000   1.000   1.000   1.000   1.000   0.000   0.000   0.000   0.000   1.000 
       vs27    vs28    vs29    vs30    vs31    vs32     am1     am2     am3     am4 
      0.000   1.000   0.000   0.000   0.000   1.000   1.000   1.000   1.000   0.000 
        am5     am6     am7     am8     am9    am10    am11    am12    am13    am14 
      0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000 
       am15    am16    am17    am18    am19    am20    am21    am22    am23    am24 
      0.000   0.000   0.000   1.000   1.000   1.000   0.000   0.000   0.000   0.000 
       am25    am26    am27    am28    am29    am30    am31    am32   gear1   gear2 
      0.000   1.000   1.000   1.000   1.000   1.000   1.000   1.000   4.000   4.000 
      gear3   gear4   gear5   gear6   gear7   gear8   gear9  gear10  gear11  gear12 
      4.000   3.000   3.000   3.000   3.000   4.000   4.000   4.000   4.000   3.000 
     gear13  gear14  gear15  gear16  gear17  gear18  gear19  gear20  gear21  gear22 
      3.000   3.000   3.000   3.000   3.000   4.000   4.000   4.000   3.000   3.000 
     gear23  gear24  gear25  gear26  gear27  gear28  gear29  gear30  gear31  gear32 
      3.000   3.000   3.000   4.000   5.000   5.000   5.000   5.000   5.000   4.000 
      carb1   carb2   carb3   carb4   carb5   carb6   carb7   carb8   carb9  carb10 
      4.000   4.000   1.000   1.000   2.000   1.000   4.000   2.000   2.000   4.000 
     carb11  carb12  carb13  carb14  carb15  carb16  carb17  carb18  carb19  carb20 
      4.000   3.000   3.000   3.000   4.000   4.000   4.000   1.000   2.000   1.000 
     carb21  carb22  carb23  carb24  carb25  carb26  carb27  carb28  carb29  carb30 
      1.000   2.000   2.000   4.000   2.000   1.000   2.000   2.000   4.000   6.000 
     carb31  carb32 
      8.000   2.000 
    
    head(mtcars)
    
    mpg cyl disp hp drat wt qsec vs am gear carb
    Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
    Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
    Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
    Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
    Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
    Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
    print(UCmtcars)
    print(c(mtcars))  # similar to unclass but without the attributes
    
    $mpg
     [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
    [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
    [31] 15.0 21.4
    
    $cyl
     [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
    
    $disp
     [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8
    [13] 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1 318.0 304.0 350.0
    [25] 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0
    
    $hp
     [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52
    [20]  65  97 150 150 245 175  66  91 113 264 175 335 109
    
    $drat
     [1] 3.90 3.90 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 3.07 2.93
    [16] 3.00 3.23 4.08 4.93 4.22 3.70 2.76 3.15 3.73 3.08 4.08 4.43 3.77 4.22 3.62
    [31] 3.54 4.11
    
    $wt
     [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070
    [13] 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840
    [25] 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780
    
    $qsec
     [1] 16.46 17.02 18.61 19.44 17.02 20.22 15.84 20.00 22.90 18.30 18.90 17.40
    [13] 17.60 18.00 17.98 17.82 17.42 19.47 18.52 19.90 20.01 16.87 17.30 15.41
    [25] 17.05 18.90 16.70 16.90 14.50 15.50 14.60 18.60
    
    $vs
     [1] 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1
    
    $am
     [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
    
    $gear
     [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4
    
    $carb
     [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
    
    attr(,"row.names")
     [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
     [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
     [7] "Duster 360"          "Merc 240D"           "Merc 230"           
    [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
    [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
    [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
    [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
    [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
    [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
    [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
    [31] "Maserati Bora"       "Volvo 142E"         
    $mpg
     [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
    [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
    [31] 15.0 21.4
    
    $cyl
     [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
    
    $disp
     [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6 275.8
    [13] 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1 318.0 304.0 350.0
    [25] 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0
    
    $hp
     [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52
    [20]  65  97 150 150 245 175  66  91 113 264 175 335 109
    
    $drat
     [1] 3.90 3.90 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 3.92 3.07 3.07 3.07 2.93
    [16] 3.00 3.23 4.08 4.93 4.22 3.70 2.76 3.15 3.73 3.08 4.08 4.43 3.77 4.22 3.62
    [31] 3.54 4.11
    
    $wt
     [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070
    [13] 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840
    [25] 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780
    
    $qsec
     [1] 16.46 17.02 18.61 19.44 17.02 20.22 15.84 20.00 22.90 18.30 18.90 17.40
    [13] 17.60 18.00 17.98 17.82 17.42 19.47 18.52 19.90 20.01 16.87 17.30 15.41
    [25] 17.05 18.90 16.70 16.90 14.50 15.50 14.60 18.60
    
    $vs
     [1] 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1
    
    $am
     [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
    
    $gear
     [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4
    
    $carb
     [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
    
    

    Operators

    Assignment Operators (<− = <<− -> ->>)
    Arithmetic Operators (+ - * / %% (modulo) %/%(integer divide) ^(raised to the power of))
    Relational Operators(> < <= >= == != )
    Logical Operators (& | ! && ||)
    Miscellaneous Operators (: %in% %*%)

    Assignment operators:

    a <- 5.67
    print(a)
    b = 'data'
    print(b)
    6+5i -> c
    print(c)
    
    [1] 5.67
    [1] "data"
    [1] 6+5i
    
    make.accumulator<-function(){
                 a<-0
                 function(x) {
                      a<-a+x
                       a
                }
    }
    f<-make.accumulator()
    print(f(1))
    print(f(2))
    
    [1] 1
    [1] 2
    

    It's the 'superassignment' operator. It does the assignment in the enclosing environment. That is, starting with the enclosing frame, it works its way up towards the global environment

    make.accumulator<-function(){
                 a<-0
                 function(x) {
                      a<<-a+x
                       a
                }
    }
    f<-make.accumulator()
    print(f(1))
    print(f(2))
    
    [1] 1
    [1] 3
    

    Arithmetic operators:

    Arithmetic Operators

    Operator Description Example

    • Subtraction 5 - 1 = 4
    • Addition 5 + 1 = 6
    • Multiplication 5 3 = 15 / Division 10 / 2 = 5 ^ or ** Exponentiation 2222*2 as 2 to the power of 5 x%%y Modulus 5%%2 is 1 x%/%y Integer Division 5%/%2 is 2
    print(5 + 1)
    print(5 - 1)
    print(5 * 3)
    print(2^5)
    print(2**5)
    print(5 / 2) 
    print(5%/%2) # Integer Division
    print(5%%2) # Modulus or reminder
    
    [1] 6
    [1] 4
    [1] 15
    [1] 32
    [1] 32
    [1] 2.5
    [1] 2
    [1] 1
    

    Relational operators:

    Description
    Binary operators which allow the comparison of values in atomic vectors.

    Operator Description Example < less than 5 < 10 <= less than or equal to <= 5

           greater than                10 > 5
    

    = greater than or equal to >= 10 == exactly equal to == 10 != not equal to != 5

    x <- 5
    y <- -3
    print(x < y)
    print(x > y)
    print(x <= y)
    print(x >= y)
    print(x == y)
    print(x != y)
    
    [1] FALSE
    [1] TRUE
    [1] FALSE
    [1] TRUE
    [1] FALSE
    [1] TRUE
    

    Logical operators:

    Operator Description Example !x not x x <- c(5), !x x | y x or y x <- c(5), y <- c(10), x | y x & y x and y x <- c(5), y <- c(10), x & y

    Logical AND (&&) and Logical OR (||)

    v <- c(3,0,TRUE,2+2i)
    t <- c(1,3,TRUE,2+3i)
    print(v&&t)
    
    [1] TRUE
    
    v <- c(0,0,TRUE,2+2i)
    t <- c(0,3,TRUE,2+3i)
    print(v||t)
    
    [1] FALSE
    

    Miscellaneous operators:

    (: %in% %*%)

    : Operator

    print(2:10)
    print(-2:-10)
    print(2:10)
    
    [1]  2  3  4  5  6  7  8  9 10
    [1]  -2  -3  -4  -5  -6  -7  -8  -9 -10
    [1]  2  3  4  5  6  7  8  9 10
    

    %in% Operator

    v1 <- 8
    v2 <- 12
    t <- 1:10
    print(v1 %in% t) 
    print(v2 %in% t)
    
    [1] TRUE
    [1] FALSE
    

    %*% Operator

    a <- matrix(1:9, 3, 3)
    b <- matrix(-1:-9, 3, 3)
    print(a)
    print(b)
    
         [,1] [,2] [,3]
    [1,]    1    4    7
    [2,]    2    5    8
    [3,]    3    6    9
         [,1] [,2] [,3]
    [1,]   -1   -4   -7
    [2,]   -2   -5   -8
    [3,]   -3   -6   -9
    
    print(a*b)
    
         [,1] [,2] [,3]
    [1,]   -1  -16  -49
    [2,]   -4  -25  -64
    [3,]   -9  -36  -81
    
    print(a%*%b)
    
         [,1] [,2] [,3]
    [1,]  -30  -66 -102
    [2,]  -36  -81 -126
    [3,]  -42  -96 -150
    
    print(1*-1 +   4*-2 +    7*-3)
    print(1*-4 +   4*-5 +    7*-6)
    
    [1] -30
    [1] -66
    

    Special values
    NA, NULL, ±Inf and NaN

    # NA Stands for not available
    # NA is a placeholder for a missing value
    
    print(NA + 2)
    print(sum(c(NA, 4, 6)))
    print(median(c(NA, 4, 8, 4), na.rm = TRUE))
    print(length(c(NA, 2, 3, 4)))
    print(5 == NA)
    print(NA == NA)
    print(TRUE | NA)
    
    [1] NA
    [1] NA
    [1] 4
    [1] 4
    [1] NA
    [1] NA
    [1] TRUE
    
    x <- c(2,NA,5,4.89,10,TRUE,6/7)
    is.na(x)
    
    <ol class=list-inline>
  • FALSE
  • TRUE
  • FALSE
  • FALSE
  • FALSE
  • FALSE
  • FALSE
  • </ol>

    NULL

    • The class of NULL is null and has length 0
    • Does not take up any space in a vector
    • The function is.null() can be used to detect NULL variables.
    print(length(c(3, 4, NULL, 1)))
    print(sum(c(5, 1, NULL, 4)))
    
    x <- NULL
    print(c(x, 5))
    
    [1] 3
    [1] 10
    [1] 5
    

    Inf

    • Inf is a valid numeric that results from calculations like division of a number by zero.
    • Since Inf is a numeric, operations between Inf and a finite numeric are well-defined and comparison operators work as expected.
    print(32/0)
    print(5 * Inf)
    print(Inf - 2e+10)
    print(Inf + Inf)
    8 < -Inf
    print(Inf == Inf)
    
    [1] Inf
    [1] Inf
    [1] Inf
    [1] Inf
    
    FALSE
    [1] TRUE
    

    NaN

    • Stands for not a number.
    • unknown resulsts, but it is surely not a number
    • e.g like 0/0, Inf-Inf and Inf/Inf result in NaN
    • Computations involving numbers and NaN always result in NaN
    NaN + 1
    exp(NaN)
    
    NaN
    NaN

    Coercion and Testing an Object:

    Internal (implicit) coercion External coercion and testing objects

    Internal (implicit) coercion:
    If the two arguments are atomic vectors of different modes, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical

    Guess what the following do without running them first

    {r}
    xx <- c(1.7, "a")
    xx <- c(TRUE, 2)
    xx <- c("a", TRUE)
    This is called implicit coercion.
    print(c(1, FALSE))
    # numeric 1, 0
    print(mode(c(1, FALSE)))
    
    
    print(c("a", 1))
    # character 'a', '1'
    print(mode(c("a", 1)))
    
    print(c(TRUE, 1L))
    print(mode(c(TRUE, 1L)))
    # numeric 1, 1
    print(c(3.427, 1L))
    print(mode(c(3.427, 1L)))
    
    [1] 1 0
    [1] "numeric"
    [1] "a" "1"
    [1] "character"
    [1] 1 1
    [1] "numeric"
    [1] 3.427 1.000
    [1] "numeric"
    

    External coercion and testing objects:
    These following function will be used to test the object and convert to another object with coercing

    while testing the objects
    Use is.atomic() to test if an object is either atomic vector or is.recursive() || is.list() for recursive list.
    is.atomic() is more suitable for testing if an object is a vector.
    is.list() tests whether an object is truly a list.
    is.numeric(), similarly, is TRUE for either integer or double vectors, but not for lists.
    Help: https://bookdown.org/

    Control Structures & Loops

    (decision making statements)

    if

    If Condition

    IF statement associates a condition with a sequence of statements, The sequence of statements is executed only if the condition is true. If the condition is false or null, the IF statement does nothing. In either case, control passes to the next statement**

    num1=10
    num2=20 
    if(num1<=num2)
    {
    print("Num1 is less or equal to Num2")
    }
    
    [1] "Num1 is less or equal to Num2"
    

    if else

    x <- 1:15
    if (sample(x, 1) <= 10) 
    {
        print("x is less than 10")
    } else 
    {
        print("x is greater than 10")
    }
    
    [1] "x is less than 10"
    

    Another way of ifelse in R:

    x <- 1:15
    ifelse(x <= 10, "x less than 10", "x greater than 10")
    
    <ol class=list-inline>
  • 'x less than 10'
  • 'x less than 10'
  • 'x less than 10'
  • 'x less than 10'
  • 'x less than 10'
  • 'x less than 10'
  • 'x less than 10'
  • 'x less than 10'
  • 'x less than 10'
  • 'x less than 10'
  • 'x greater than 10'
  • 'x greater than 10'
  • 'x greater than 10'
  • 'x greater than 10'
  • 'x greater than 10'
  • </ol>

    drawing

    If else if

    x <- c("what","is","truth")
    
    if("Truth" %in% x) {
       print("Truth is found the first time")
    } else if ("truth" %in% x) {
       print("truth is found the second time")
    } else {
       print("No truth found")
    }
    
    [1] "truth is found the second time"
    

    for

    For Loop:
    To repeats a statement or group of for a fixed number of times.

    vector <- c("aaa","bbb","ccc")
    for(i in vector){   
       print(i)   
    }
    
    [1] "aaa"
    [1] "bbb"
    [1] "ccc"
    
    for (year in c(2010,2011,2012,2013,2014,2015)){
      print(paste("The year is", year))
    }
    
    [1] "The year is 2010"
    [1] "The year is 2011"
    [1] "The year is 2012"
    [1] "The year is 2013"
    [1] "The year is 2014"
    [1] "The year is 2015"
    
    for(i in 2:5){
        z <- i +1
        print(z)
    }
    
    [1] 3
    [1] 4
    [1] 5
    [1] 6
    
    mymat <- matrix(1:9,3,3)
    print(mymat)
    
    for (i in seq_len(nrow(mymat))){
        for (j in seq_len(ncol(mymat))){
            print(mymat[i,j])
        }
    }
    
         [,1] [,2] [,3]
    [1,]    1    4    7
    [2,]    2    5    8
    [3,]    3    6    9
    [1] 1
    [1] 4
    [1] 7
    [1] 2
    [1] 5
    [1] 8
    [1] 3
    [1] 6
    [1] 9
    

    while

    While Loop:
    Loop until a specific condition is met

    i <- 1
    
    while (i < 6) {
       print(i)
       i = i+1
    }
    
    [1] 1
    [1] 2
    [1] 3
    [1] 4
    [1] 5
    

    break

    break Statement:
    break is used inside any loop like repeat, for or while to stop the iterations and flow the control outside of the loop.

    x <- 1:5
    
    for (val in x) {
        if (val == 3){
            break
        }
        print(val)
    }
    
    [1] 1
    [1] 2
    
    x <- 1:10
    for (num in x){
        if (num==6) break
        mynum <- paste(num, "and so on. ", sep = " ")
        print(mynum)
    }
    
    [1] "1 and so on. "
    [1] "2 and so on. "
    [1] "3 and so on. "
    [1] "4 and so on. "
    [1] "5 and so on. "
    

    repeat

    Repeat statement:
    Iterate over a block of code multiple number of times.

    x <- 1
    
    repeat {
       print(x)
       x = x+1
       if (x == 6){
           break
       }
    }
    
    [1] 1
    [1] 2
    [1] 3
    [1] 4
    [1] 5
    

    next

    next Statment:

    • Useful to controls the flow of R loops
    • general usage inside the For Loop and While Loop
    x <- 1:5
    
    for (val in x) {
        if (val == 3){
            next
        }
        print(val)
    }
    
    [1] 1
    [1] 2
    [1] 4
    [1] 5
    

    switch

    Switch Statment:
    switch statement allows a variable to be tested for equality against a list of values. Each value is called a case

    number1 <- 30
    number2 <- 20
    operator <- readline(prompt="Please enter any ARITHMETIC OPERATOR You wish!: ")
    
    switch(operator,
           "+" = print(paste("Addition of two numbers is: ", number1 + number2)),
           "-" = print(paste("Subtraction of two numbers is: ", number1 - number2)),
           "*" = print(paste("Multiplication of two numbers is: ", number1 * number2)),
           "^" = print(paste("Exponent of two numbers is: ", number1 ^ number2)),
           "/" = print(paste("Division of two numbers is: ", number1 / number2)),
           "%/%" = print(paste("Integer Division of two numbers is: ", number1 %/% number2)),
           "%%" = print(paste("Division of two numbers is: ", number1 %% number2))
    )
    

    **Conclusion**

    Loops are not recommended until and unless its really needed, since R has vectorisation feature

    Vectorization concept

    vect <- c(1,2,3,4,5,6,7,9)
    # now we multiply each element of vect with 5
    print(vect * 5)
    # now we add each element of vect with 5
    print(vect + 5)
    # now we subtract each element of vect with 5
    print(vect - 5)
    
    [1]  5 10 15 20 25 30 35 45
    [1]  6  7  8  9 10 11 12 14
    [1] -4 -3 -2 -1  0  1  2  4
    

    recycling concept

    a <- 1:10
    b <- 1:5
    a + b
    
    <ol class=list-inline>
  • 2
  • 4
  • 6
  • 8
  • 10
  • 7
  • 9
  • 11
  • 13
  • 15
  • </ol>
    a <- 1:10
    b <- 5
    a * b # here b  is a vector of length 1
    
    <ol class=list-inline>
  • 5
  • 10
  • 15
  • 20
  • 25
  • 30
  • 35
  • 40
  • 45
  • 50
  • </ol>

    Functions

    Functions:

    Functions can be described as ”black boxes” that take an input and print out an output based on the operation logic inside the function

    Built-in functions

    used by the user to make their work easier. Eg:mean(x), sum(x) ,sqrt(x),toupper(x), etc.

    Some Built-in functions for the Objects

    Function Description
    c() combines values, vectors, and/or lists to create new objects
    unique() returns a vector containing one element for each unique value in the vector
    duplicated() returns a logical vector which tells if elements of a vector are duplicated with regard to previous one
    rev() reverse the order of element in a vector
    sort() sorts the elements in a vector
    append() append or insert elements in a vector.
    sum() sum of the elements of a vector
    min() minimum value in a vector
    max() maximum value in a vector
    cumsum cumulative sum
    diff x[i+1] - x[i]
    prod product
    cumprod cumulative product
    sample random sample
    mean average
    median median
    var variance
    sd standard deviation
    Function Description
    abs(x) absolute value (magnitude of numbers regardless of whether or not they are positive, magnitude of -21 > magnitude of 19. That magnitude is called an absolute)
    sqrt(x) square root
    floor(x) floor(3.975) is 3
    ceiling(x) ceiling(3.475) is 4
    trunc(x) trunc(5.99) is 5
    round(x, digits=n) round(3.475, digits=2) is 3.48
    exp(x) e^x (calculate the power of e i.e. e^x)
    log10(x) common logarithm (base 10)
    log(x) natural logarithm (base e)
    strsplit(x, split) Split the elements of character vector x at split.strsplit("abc", "") returns 3 element vector "a","b","c"
    toupper(x) Uppercase
    tolower(x) Lowercase

    Functions Examples

    {r}
    (x <- c(sort(sample(1:20, 9)), NA))
    (y <- c(sort(sample(3:23, 7)), NA))
    which.min(x)
    which.max(x)
    union(x, y)
    intersect(x, y)
    setdiff(x, y)
    setdiff(y, x)
    match(x,y)

    User defined functions

    Sets of instructions that you want to use repeatedly, it is a piece of code written to carry out a specified task, these functions are created by the user to meet a specific requirement of the user.

    • Function Name
    • Arguments
    • Function Body
    • Return Value

    Objects & Functions
    To understand in R two slogans are helpful:

    1. Everything that exists is an object
    2. Everything that happens is a function call

    syntax

    {r}
    function_name <–function(arg_1, arg_2, …) 
    { 
    
    //Function body 
    
    }
    sum_of_squares <- function(x,y) 
    {
    x^2 + y^2
    }
    
    sum_of_squares(3,4)
    
    25
    pow <- function(x, y) 
    {
       result <- x^y
       print(paste(x,"raised to the power", y, "is", result))
    }
    
    pow(3,5)
    
    [1] "3 raised to the power 5 is 243"
    

    Default Arguments:

    new.function <- function(a = 3, b = 6) {
       result <- a * b
       print(result)
    }
    
    # Call the function without giving any argument.
    new.function()
    
    # Call the function with giving new values of the argument.
    new.function(9,5)
    
    [1] 18
    [1] 45
    
    # Sets default of exponent to 2 (just square)
    MyThirdFun <- function(n, y = 2) 
    {
      # Compute the power of n to the y
      n^y  
    }
    
    # Specify both args
    MyThirdFun(2,3) 
    
    # Just specify the first arg
    MyThirdFun(2)   
    
    # Specify no argument: error!
    # MyThirdFun()
    
    8
    4

    Named Arguments:

    pow <- function(x, y) {
       # function to print x raised to the power y
    
       result <- x^y
       print(paste(x,"raised to the power", y, "is", result))
    }
    
    pow(8, 2)
    # 8 raised to the power 2 is 64
    pow(x = 8, y = 2)
    # 8 raised to the power 2 is 64
    pow(y = 2, x = 8)
    
    [1] "8 raised to the power 2 is 64"
    [1] "8 raised to the power 2 is 64"
    [1] "8 raised to the power 2 is 64"
    

    partial matching:

    ?round
    
    Round {base} R Documentation

    Rounding of Numbers

    Description

    ceiling takes a single numeric argument x and returns a numeric vector containing the smallest integers not less than the corresponding elements of x.

    floor takes a single numeric argument x and returns a numeric vector containing the largest integers not greater than the corresponding elements of x.

    trunc takes a single numeric argument x and returns a numeric vector containing the integers formed by truncating the values in x toward 0.

    round rounds the values in its first argument to the specified number of decimal places (default 0). See ‘Details’ about “round to even” when rounding off a 5.

    signif rounds the values in its first argument to the specified number of significant digits.

    Usage

    ceiling(x)
    floor(x)
    trunc(x, ...)
    
    round(x, digits = 0)
    signif(x, digits = 6)
    

    Arguments

    x

    a numeric vector. Or, for round and signif, a complex vector.

    digits

    integer indicating the number of decimal places (round) or significant digits (signif) to be used. Negative values are allowed (see ‘Details’).

    ...

    arguments to be passed to methods.

    Details

    These are generic functions: methods can be defined for them individually or via the Math group generic.

    Note that for rounding off a 5, the IEC 60559 standard (see also ‘IEEE 754’) is expected to be used, ‘go to the even digit’. Therefore round(0.5) is 0 and round(-1.5) is -2. However, this is dependent on OS services and on representation error (since e.g. 0.15 is not represented exactly, the rounding rule applies to the represented number and not to the printed number, and so round(0.15, 1) could be either 0.1 or 0.2).

    Rounding to a negative number of digits means rounding to a power of ten, so for example round(x, digits = -2) rounds to the nearest hundred.

    For signif the recognized values of digits are 1...22, and non-missing values are rounded to the nearest integer in that range. Complex numbers are rounded to retain the specified number of digits in the larger of the components. Each element of the vector is rounded individually, unlike printing.

    These are all primitive functions.

    S4 methods

    These are all (internally) S4 generic.

    ceiling, floor and trunc are members of the Math group generic. As an S4 generic, trunc has only one argument.

    round and signif are members of the Math2 group generic.

    Warning

    The realities of computer arithmetic can cause unexpected results, especially with floor and ceiling. For example, we ‘know’ that floor(log(x, base = 8)) for x = 8 is 1, but 0 has been seen on an R platform. It is normally necessary to use a tolerance.

    References

    Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

    The ISO/IEC/IEEE 60559:2011 standard is available for money from https://www.iso.org.

    The IEEE 745:2008 standard is more openly documented, e.g, at https://en.wikipedia.org/wiki/IEEE_754.

    See Also

    as.integer.

    Examples

    round(.5 + -2:4) # IEEE / IEC rounding: -2  0  0  2  2  4  4
    ## (this is *good* behaviour -- do *NOT* report it as bug !)
    
    ( x1 <- seq(-2, 4, by = .5) )
    round(x1) #-- IEEE / IEC rounding !
    x1[trunc(x1) != floor(x1)]
    x1[round(x1) != floor(x1 + .5)]
    (non.int <- ceiling(x1) != floor(x1))
    
    x2 <- pi * 100^(-1:3)
    round(x2, 3)
    signif(x2, 3)
    

    [Package base version 3.6.1 ]
    print(round(9.523, d=2))
    print(round(9.523, di=2))
    print(round(9.523, dig=2))
    
    [1] 9.52
    [1] 9.52
    [1] 9.52
    
    testFun <- function(axb, bcd = 1, axdk) {
     return(axb + axdk)
     }
    

    testFun(ax=2,ax=3)

    testFun(axb=2,ax = 3)
    
    5

    Functions in R are first class objects

    • can be treated as much like any other objects
    • Can be passed as arguments to othre functions
    • Can be nested so that you can define a function in another function
    • Return value is the last expression in the function body

    Lazy evaluation

    • Materalize only when necessary
    • Data is not loaded until its needed
    • increase spped & saving computaions
    fun <- function(a,b){
        a^2
    }
    fun(2,x/0)
    
    4
    fun <- function(x){
        10
    }
    fun("hello")
    
    10

    Automatic Returns:

    In R, it is not necessary to include the return statement. R automatically returns whichever variable is on the last line of the body of the function. OR we can explicitly define the return statement.

    add <- function(x,y=1,z=2){
        x+y
        x+z
    }
    add(5)
    
    7
    add <- function(x,y=1,z=2){
        x+y
        x+z
        return(x+y)
    }
    add(5)
    
    6
    fahr_to_kelvin <- function(temp) {
      kelvin <- ((temp - 32) * (5 / 9)) + 273.15
      return(kelvin)
    }
    
    fahr_to_kelvin(32)
    # boiling point of water
    fahr_to_kelvin(212)
    
    273.15
    373.15

    elipses or three dots (...)

    • which is especially useful for creating customized versions of existing functions or in providing additional options to end-users.
    • Pass arguments to another function
    • These three dots (an ellipsis) act as a placeholder for any extra arguments given to the function
    • Take any number of named or unnamed arguments
    printDots <- function(...) {
      myDots <- list(...)
      paste(myDots)
    }
     
    printDots("how", "is", "your", "health")
    
    <ol class=list-inline>
  • 'how'
  • 'is'
  • 'your'
  • 'health'
  • </ol>

    Named AND Anonymous (nameless) functions:

    named <- function(x) x*10
    # calling a named function
        named(6)
    
    60
    (function(x) x*10)(6) #
    
    60

    Apply family functions

    apply family functions pertains to the R base package and is populated with functions to manipulate slices of data from matrices, arrays, lists and dataframes in a repetitive way. These functions allow crossing the data in a number of ways and avoid explicit use of loop constructs.

    head(mtcars)
    
    mpg cyl disp hp drat wt qsec vs am gear carb
    Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
    Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
    Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
    Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
    Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
    Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
    dim(mtcars)
    
    <ol class=list-inline>
  • 32
  • 11
  • </ol>
    str(mtcars)
    
    'data.frame':	32 obs. of  11 variables:
     $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
     $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
     $ disp: num  160 160 108 258 360 ...
     $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
     $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
     $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
     $ qsec: num  16.5 17 18.6 19.4 17 ...
     $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
     $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
     $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
     $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
    
    # one method
    max(mtcars[,1])
    max(mtcars[,2])
    max(mtcars[,3])
    max(mtcars[,4])
    max(mtcars[,5])
    #...etc
    
    33.9
    8
    472
    335
    4.93
    for (i in 1:ncol(mtcars))
        {
        col <- mtcars[,i]
        max <- max(col)
        print(max)
    }
    
    [1] 33.9
    [1] 8
    [1] 472
    [1] 335
    [1] 4.93
    [1] 5.424
    [1] 22.9
    [1] 1
    [1] 1
    [1] 5
    [1] 8
    
    • apply - apply over the margins of an array (e.g. the rows or columns of a matrix)
    • lapply - apply a function to each element of a list in turn and get a list back.
    • sapply - apply a function to each element of a list and get a simplified object like vector back, rather than a list.
    • tapply - apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.
    • mapply - apply a function to the 1st elements of each, and then the 2nd elements of each, etc

    Imgur

    apply

    apply function:

    • When you want to apply a function to the rows or columns of a matrix (and higherdimensional analogues);
    ?apply #take help
    
    apply {base} R Documentation

    Apply Functions Over Array Margins

    Description

    Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.

    Usage

    apply(X, MARGIN, FUN, ...)
    

    Arguments

    X

    an array, including a matrix.

    MARGIN

    a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector selecting dimension names.

    FUN

    the function to be applied: see ‘Details’. In the case of functions like +, %*%, etc., the function name must be backquoted or quoted.

    ...

    optional arguments to FUN.

    Details

    If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.

    FUN is found by a call to match.fun and typically is either a function or a symbol (e.g., a backquoted name) or a character string specifying a function to be searched for from the environment of the call to apply.

    Arguments in ... cannot have the same name as any of the other arguments, and care may be needed to avoid partial matching to MARGIN or FUN. In general-purpose code it is good practice to name the first three arguments if ... is passed through: this both avoids partial matching to MARGIN or FUN and ensures that a sensible error message is given if arguments named X, MARGIN or FUN are passed through ....

    Value

    If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n > 1. If n equals 1, apply returns a vector if MARGIN has length 1 and an array of dimension dim(X)[MARGIN] otherwise. If n is 0, the result has length 0 but not necessarily the ‘correct’ dimension.

    If the calls to FUN return vectors of different lengths, apply returns a list of length prod(dim(X)[MARGIN]) with dim set to MARGIN if this has length greater than one.

    In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.

    References

    Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

    See Also

    lapply and there, simplify2array; tapply, and convenience functions sweep and aggregate.

    Examples

    ## Compute row and column sums for a matrix:
    x <- cbind(x1 = 3, x2 = c(4:1, 2:5))
    dimnames(x)[[1]] <- letters[1:8]
    apply(x, 2, mean, trim = .2)
    col.sums <- apply(x, 2, sum)
    row.sums <- apply(x, 1, sum)
    rbind(cbind(x, Rtot = row.sums), Ctot = c(col.sums, sum(col.sums)))
    
    stopifnot( apply(x, 2, is.vector))
    
    ## Sort the columns of a matrix
    apply(x, 2, sort)
    
    ## keeping named dimnames
    names(dimnames(x)) <- c("row", "col")
    x3 <- array(x, dim = c(dim(x),3),
    	    dimnames = c(dimnames(x), list(C = paste0("cop.",1:3))))
    identical(x,  apply( x,  2,  identity))
    identical(x3, apply(x3, 2:3, identity))
    
    ##- function with extra args:
    cave <- function(x, c1, c2) c(mean(x[c1]), mean(x[c2]))
    apply(x, 1, cave,  c1 = "x1", c2 = c("x1","x2"))
    
    ma <- matrix(c(1:4, 1, 6:8), nrow = 2)
    ma
    apply(ma, 1, table)  #--> a list of length 2
    apply(ma, 1, stats::quantile) # 5 x n matrix with rownames
    
    stopifnot(dim(ma) == dim(apply(ma, 1:2, sum)))
    
    ## Example with different lengths for each call
    z <- array(1:24, dim = 2:4)
    zseq <- apply(z, 1:2, function(x) seq_len(max(x)))
    zseq         ## a 2 x 3 matrix
    typeof(zseq) ## list
    dim(zseq) ## 2 3
    zseq[1,]
    apply(z, 3, function(x) seq_len(max(x)))
    # a list without a dim attribute
    

    [Package base version 3.6.1 ]
    apply(mtcars, 2, max)
    
    <dl class=dl-horizontal>
    mpg
    33.9
    cyl
    8
    disp
    472
    hp
    335
    drat
    4.93
    wt
    5.424
    qsec
    22.9
    vs
    1
    am
    1
    gear
    5
    carb
    8
    </dl>
    apply(mtcars, 1, max)
    
    <dl class=dl-horizontal>
    Mazda RX4
    160
    Mazda RX4 Wag
    160
    Datsun 710
    108
    Hornet 4 Drive
    258
    Hornet Sportabout
    360
    Valiant
    225
    Duster 360
    360
    Merc 240D
    146.7
    Merc 230
    140.8
    Merc 280
    167.6
    Merc 280C
    167.6
    Merc 450SE
    275.8
    Merc 450SL
    275.8
    Merc 450SLC
    275.8
    Cadillac Fleetwood
    472
    Lincoln Continental
    460
    Chrysler Imperial
    440
    Fiat 128
    78.7
    Honda Civic
    75.7
    Toyota Corolla
    71.1
    Toyota Corona
    120.1
    Dodge Challenger
    318
    AMC Javelin
    304
    Camaro Z28
    350
    Pontiac Firebird
    400
    Fiat X1-9
    79
    Porsche 914-2
    120.3
    Lotus Europa
    113
    Ford Pantera L
    351
    Ferrari Dino
    175
    Maserati Bora
    335
    Volvo 142E
    121
    </dl>

    lapply

    • When you want to apply a function to each element of a list/vector in turn and get a list back
    lapply(1:3, function(x) x^2)
    
    1. 1
    2. 4
    3. 9
    class(lapply(1:3, function(x) x^2) )
    
    'list'
    CAGO.list <- list(Diet1 = c(2,5,4,3,5,3), Diet2 =c(8,5,6,5,7,7), Diet3 =c(3,4,2,5,2,6) , Diet4 = c(2,2,3,2,5,2))
    
    lapply(CAGO.list, mean)
    
    $Diet1
    3.66666666666667
    $Diet2
    6.33333333333333
    $Diet3
    3.66666666666667
    $Diet4
    2.66666666666667
    • convert list to data frame and check whether lapply() is working for data frames or not
    CAGO.df <- as.data.frame(CAGO.list)
    
    CAGO.df
    
    Diet1 Diet2 Diet3 Diet4
    2 8 3 2
    5 5 4 2
    4 6 2 3
    3 5 5 2
    5 7 2 5
    3 7 6 2
    lapply(CAGO.df, mean) # without specifying margins it calculate as column wise
    
    $Diet1
    3.66666666666667
    $Diet2
    6.33333333333333
    $Diet3
    3.66666666666667
    $Diet4
    2.66666666666667
    • We can apply on vector as well using lappply()
    Random <- c("This", "Is", "a", "Random", "Vector")
    lapply(Random,nchar) #To get number of character for each vector element from above object
    
    1. 4
    2. 2
    3. 1
    4. 6
    5. 6
    lapply(Random,toupper)
    
    1. 'THIS'
    2. 'IS'
    3. 'A'
    4. 'RANDOM'
    5. 'VECTOR'

    sapply

    • When you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list
    sapply(1:3, function(x) x^2)
    
    <ol class=list-inline>
  • 1
  • 4
  • 9
  • </ol>
    class(sapply(1:3, function(x) x^2))
    
    'numeric'
    • apply the sapply() function on the CAGO.list and CAGO.df
    print(sapply(CAGO.list, mean)) # output as a vector
    
       Diet1    Diet2    Diet3    Diet4 
    3.666667 6.333333 3.666667 2.666667 
    
    print(sapply(CAGO.df, mean)) # output as a vector
    
       Diet1    Diet2    Diet3    Diet4 
    3.666667 6.333333 3.666667 2.666667 
    

    tapply

    Applies a function or operation on subset of the vector broken down by a given factor variable.

    To understand clearly lets imagine you have height of 1000 people ( 500 male and 500 females), and you want to know the average height of males and females from this sample data. To deal with this problem you can group height by the gender, height of 500 males, and height of 500 females, and later calculate the average height for males and females.

    tapply(mtcars$wt,mtcars$cyl,mean)
    
    <dl class=dl-horizontal>
    4
    2.28572727272727
    6
    3.11714285714286
    8
    3.99921428571429
    </dl>
    head(iris)
    
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    5.1 3.5 1.4 0.2 setosa
    4.9 3.0 1.4 0.2 setosa
    4.7 3.2 1.3 0.2 setosa
    4.6 3.1 1.5 0.2 setosa
    5.0 3.6 1.4 0.2 setosa
    5.4 3.9 1.7 0.4 setosa
    tapply(iris$Sepal.Length,iris$Species,mean)
    
    <dl class=dl-horizontal>
    setosa
    5.006
    versicolor
    5.936
    virginica
    6.588
    </dl>

    mapply

    Multivariate version of sapply
    It applies FUN to the first elements of each (…) argument, the second elements, the third elements, and so on.
    Note that the first argument of mapply() here is the name of a function
    Advisable when you have several data structures (e.g. vectors, lists) and you want to apply a function over elements

    l1 <- list(a = c(1:10), b = c(11:20))
    l2 <- list(c = c(21:30), d = c(31:40))
    # sum the corresponding elements of l1 and l2
    print(mapply(sum, l1$a, l1$b, l2$c, l2$d))
    
     [1]  64  68  72  76  80  84  88  92  96 100
    
    print(mapply(sum, l1))
    #sum(c(1:10))
    
      a   b 
     55 155 
    
    print(mapply(sum, l1,l2))
    #sum(c(1:10),c(21:30))
    
      a   b 
    310 510 
    
    print(mapply(sum, l1$a, l1$b))
    
     [1] 12 14 16 18 20 22 24 26 28 30
    
    Q1 <- matrix(c(rep(1, 4), rep(2, 4), rep(3, 4), rep(4, 4)),4,4)
    
    # Print `Q1`
    print(Q1)
    
    # Or use `mapply()`
    Q2 <- mapply(rep,1:4,4)
    
    # Print `Q2`
    print(Q2)
    
         [,1] [,2] [,3] [,4]
    [1,]    1    2    3    4
    [2,]    1    2    3    4
    [3,]    1    2    3    4
    [4,]    1    2    3    4
         [,1] [,2] [,3] [,4]
    [1,]    1    2    3    4
    [2,]    1    2    3    4
    [3,]    1    2    3    4
    [4,]    1    2    3    4
    
    mapply(rep, 1:4, 4:1)
    
    1. <ol class=list-inline>
    2. 1
    3. 1
    4. 1
    5. 1
    6. </ol>
    7. <ol class=list-inline>
    8. 2
    9. 2
    10. 2
    11. </ol>
    12. <ol class=list-inline>
    13. 3
    14. 3
    15. </ol>
    16. 4

    Packages/Modules/Libraries

    R has many packages prepared and conitniuosly maintaining and upgrading for the specific purpose of activities
    Eg:
    we can use stringr package used for various string related operations
    we can use dplyr package for data manipulation/cleaning/analysis
    we can use ggplot2 package for data visualization
    ....etc

    To install any package we need internet connectivity for the machine and use install.packages("package name")

    Eg:
    install.packages("stringr")

    To load the installed package for the current session we should use library(Package_Name)

    Eg:
    library(stringr)