2016-06-21

Data Frames

  • Data Frames are the work horse of R objects
  • Structured by rows and columns and can be indexed
  • Each column is a specified variable type
  • Columns names can be used to index a variable
  • Advice for naming variable applys to editing columns names
  • Can be specified by grouping vectors of equal length as columns

  • Follow along (copy & paste the code into the console):

curl::curl_download(
  "https://raw.githubusercontent.com/heike/rwrks/gh-pages/summerschool/01-Introduction-to-R/code/3-r-objects.R",
  "3-r-objects.R"
)
file.edit("3-r-objects.R")

Data Frame Indexing

  • Elements indexed similar to a vector using [ ]
  • df[i,j] will select the element in the \(i^{th}\) row and \(j^{th}\) column
  • df[ ,j] will select the entire \(j^{th}\) column and treat it as a vector
  • df[i ,] will select the entire \(i^{th}\) row and treat it as a vector
  • Logical vectors can be used in place of i and j used to subset the row and columns

Adding a new Variable to a Data Frame

  • Create a new vector that is the same length as other columns
  • Append new column to the data frame using the $ operator
  • The new data frame column will adopt the name of the vector

This is what we did before in the tips data set:

# create rate variable in the tips data set:
tips$rate <- tips$tip / tips$total_bill

Data Frame Demo

Use Edgar Anderson's Iris Data:

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Select Species column (5th column):

iris[,5]
##   [1] setosa     setosa     setosa     setosa     setosa     setosa    
##   [7] setosa     setosa     setosa     setosa     setosa     setosa    
##  [13] setosa     setosa     setosa     setosa     setosa     setosa    
##  [19] setosa     setosa     setosa     setosa     setosa     setosa    
##  [25] setosa     setosa     setosa     setosa     setosa     setosa    
##  [31] setosa     setosa     setosa     setosa     setosa     setosa    
##  [37] setosa     setosa     setosa     setosa     setosa     setosa    
##  [43] setosa     setosa     setosa     setosa     setosa     setosa    
##  [49] setosa     setosa     versicolor versicolor versicolor versicolor
##  [55] versicolor versicolor versicolor versicolor versicolor versicolor
##  [61] versicolor versicolor versicolor versicolor versicolor versicolor
##  [67] versicolor versicolor versicolor versicolor versicolor versicolor
##  [73] versicolor versicolor versicolor versicolor versicolor versicolor
##  [79] versicolor versicolor versicolor versicolor versicolor versicolor
##  [85] versicolor versicolor versicolor versicolor versicolor versicolor
##  [91] versicolor versicolor versicolor versicolor versicolor versicolor
##  [97] versicolor versicolor versicolor versicolor virginica  virginica 
## [103] virginica  virginica  virginica  virginica  virginica  virginica 
## [109] virginica  virginica  virginica  virginica  virginica  virginica 
## [115] virginica  virginica  virginica  virginica  virginica  virginica 
## [121] virginica  virginica  virginica  virginica  virginica  virginica 
## [127] virginica  virginica  virginica  virginica  virginica  virginica 
## [133] virginica  virginica  virginica  virginica  virginica  virginica 
## [139] virginica  virginica  virginica  virginica  virginica  virginica 
## [145] virginica  virginica  virginica  virginica  virginica  virginica 
## Levels: setosa versicolor virginica

Demo (Continued)

Select Species column with the $ operator:

iris$Species
##   [1] setosa     setosa     setosa     setosa     setosa     setosa    
##   [7] setosa     setosa     setosa     setosa     setosa     setosa    
##  [13] setosa     setosa     setosa     setosa     setosa     setosa    
##  [19] setosa     setosa     setosa     setosa     setosa     setosa    
##  [25] setosa     setosa     setosa     setosa     setosa     setosa    
##  [31] setosa     setosa     setosa     setosa     setosa     setosa    
##  [37] setosa     setosa     setosa     setosa     setosa     setosa    
##  [43] setosa     setosa     setosa     setosa     setosa     setosa    
##  [49] setosa     setosa     versicolor versicolor versicolor versicolor
##  [55] versicolor versicolor versicolor versicolor versicolor versicolor
##  [61] versicolor versicolor versicolor versicolor versicolor versicolor
##  [67] versicolor versicolor versicolor versicolor versicolor versicolor
##  [73] versicolor versicolor versicolor versicolor versicolor versicolor
##  [79] versicolor versicolor versicolor versicolor versicolor versicolor
##  [85] versicolor versicolor versicolor versicolor versicolor versicolor
##  [91] versicolor versicolor versicolor versicolor versicolor versicolor
##  [97] versicolor versicolor versicolor versicolor virginica  virginica 
## [103] virginica  virginica  virginica  virginica  virginica  virginica 
## [109] virginica  virginica  virginica  virginica  virginica  virginica 
## [115] virginica  virginica  virginica  virginica  virginica  virginica 
## [121] virginica  virginica  virginica  virginica  virginica  virginica 
## [127] virginica  virginica  virginica  virginica  virginica  virginica 
## [133] virginica  virginica  virginica  virginica  virginica  virginica 
## [139] virginica  virginica  virginica  virginica  virginica  virginica 
## [145] virginica  virginica  virginica  virginica  virginica  virginica 
## Levels: setosa versicolor virginica

Demo (Continued)

iris$Species == "setosa"
##   [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [23]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [34]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [45]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
##  [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [100] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [122] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Demo (Continued)

iris[iris$Species=="setosa", ]
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1           5.1         3.5          1.4         0.2  setosa
## 2           4.9         3.0          1.4         0.2  setosa
## 3           4.7         3.2          1.3         0.2  setosa
## 4           4.6         3.1          1.5         0.2  setosa
## 5           5.0         3.6          1.4         0.2  setosa
## 6           5.4         3.9          1.7         0.4  setosa
## 7           4.6         3.4          1.4         0.3  setosa
## 8           5.0         3.4          1.5         0.2  setosa
## 9           4.4         2.9          1.4         0.2  setosa
## 10          4.9         3.1          1.5         0.1  setosa
## 11          5.4         3.7          1.5         0.2  setosa
## 12          4.8         3.4          1.6         0.2  setosa
## 13          4.8         3.0          1.4         0.1  setosa
## 14          4.3         3.0          1.1         0.1  setosa
## 15          5.8         4.0          1.2         0.2  setosa
## 16          5.7         4.4          1.5         0.4  setosa
## 17          5.4         3.9          1.3         0.4  setosa
## 18          5.1         3.5          1.4         0.3  setosa
## 19          5.7         3.8          1.7         0.3  setosa
## 20          5.1         3.8          1.5         0.3  setosa
## 21          5.4         3.4          1.7         0.2  setosa
## 22          5.1         3.7          1.5         0.4  setosa
## 23          4.6         3.6          1.0         0.2  setosa
## 24          5.1         3.3          1.7         0.5  setosa
## 25          4.8         3.4          1.9         0.2  setosa
## 26          5.0         3.0          1.6         0.2  setosa
## 27          5.0         3.4          1.6         0.4  setosa
## 28          5.2         3.5          1.5         0.2  setosa
## 29          5.2         3.4          1.4         0.2  setosa
## 30          4.7         3.2          1.6         0.2  setosa
## 31          4.8         3.1          1.6         0.2  setosa
## 32          5.4         3.4          1.5         0.4  setosa
## 33          5.2         4.1          1.5         0.1  setosa
## 34          5.5         4.2          1.4         0.2  setosa
## 35          4.9         3.1          1.5         0.2  setosa
## 36          5.0         3.2          1.2         0.2  setosa
## 37          5.5         3.5          1.3         0.2  setosa
## 38          4.9         3.6          1.4         0.1  setosa
## 39          4.4         3.0          1.3         0.2  setosa
## 40          5.1         3.4          1.5         0.2  setosa
## 41          5.0         3.5          1.3         0.3  setosa
## 42          4.5         2.3          1.3         0.3  setosa
## 43          4.4         3.2          1.3         0.2  setosa
## 44          5.0         3.5          1.6         0.6  setosa
## 45          5.1         3.8          1.9         0.4  setosa
## 46          4.8         3.0          1.4         0.3  setosa
## 47          5.1         3.8          1.6         0.2  setosa
## 48          4.6         3.2          1.4         0.2  setosa
## 49          5.3         3.7          1.5         0.2  setosa
## 50          5.0         3.3          1.4         0.2  setosa

Creating our own Data Frame

Create our own data frame using data.frame function

mydf <- data.frame(NUMS = 1:5, 
                   lets = letters[1:5],
                   vehicle = c("car", "boat", "car", "car", "boat"))
mydf
##   NUMS lets vehicle
## 1    1    a     car
## 2    2    b    boat
## 3    3    c     car
## 4    4    d     car
## 5    5    e    boat

expand.grid allows to quickly create all combinations of levels

dframe <- data.frame(expand.grid(
  reps = 1:3, Type = c("Control", "Treatment")))
dframe
##   reps      Type
## 1    1   Control
## 2    2   Control
## 3    3   Control
## 4    1 Treatment
## 5    2 Treatment
## 6    3 Treatment

Renaming columns

The names function can be used on the left hand side and the right hand side of an assignment:

names(dframe)
## [1] "reps" "Type"
names(dframe)[1] <- "Reps"
dframe
##   Reps      Type
## 1    1   Control
## 2    2   Control
## 3    3   Control
## 4    1 Treatment
## 5    2 Treatment
## 6    3 Treatment

Your Turn

  1. Make a data frame with column 1: 1,2,3,4,5,6 and column 2: a,b,a,b,a,b
  2. Select only rows with value "a" in column 2 using a logical vector
  3. mtcars is a built in data set like iris: Extract the 4th row of the mtcars data.

Lists

  • Lists are a structured collection of R objects
  • R objects in a list need not be the same type
  • Create lists using the list function
  • Lists indexed using double square brackets [[ ]] to select an object

List Example

Creating a list containing a vector and a matrix:

mylist <- list(matrix(letters[1:10], nrow = 2, ncol = 5),
               seq(0, 49, by = 7))
mylist
## [[1]]
##      [,1] [,2] [,3] [,4] [,5]
## [1,] "a"  "c"  "e"  "g"  "i" 
## [2,] "b"  "d"  "f"  "h"  "j" 
## 
## [[2]]
## [1]  0  7 14 21 28 35 42 49

Use indexing to select the second list element:

mylist[[2]]
## [1]  0  7 14 21 28 35 42 49

Your Turn

  1. Create a list containing a vector and a 2x3 data frame
  2. Use indexing to select the data frame from your list
  3. Use further indexing to select the first row from the data frame in your list

Examining Objects

  • head(x) - View top 6 rows of a data frame
  • tail(x) - View bottom 6 rows of a data frame
  • summary(x) - Summary statistics
  • str(x) - View structure of object
  • dim(x) - View dimensions of object
  • length(x) - Returns the length of a vector

Examining Objects Example

We can examine the first two values of an object by passing the n parameter to the head function:

head(iris, n = 2)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa

What's its structure?

str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Your Turn

  1. View the top 8 rows of mtcars data
  2. What type of object is the mtcars data set?
  3. How many rows are in the mtcars data set? (try finding this using dim or indexing + length)
  4. Summarize the values in each column of the mtcars data set

Working with Output from a Function

  • Can save output from a function as an object
  • Object is generally a list of output objects
  • Can pull off items from the output for further computing
  • Examine object using functions like str(x)

Saving Output Demo

  • t-test using iris data to see if petal lengths for setosa and versicolor are the same
  • t.test function can only handle two groups, so we subset out the virginica species
t.test(Petal.Length ~ Species, data = iris[iris$Species != "virginica", ])
## 
##  Welch Two Sample t-test
## 
## data:  Petal.Length by Species
## t = -39.493, df = 62.14, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.939618 -2.656382
## sample estimates:
##     mean in group setosa mean in group versicolor 
##                    1.462                    4.260

Demo (Continued)

Save the output of the t-test function to an object

tout <- t.test(Petal.Length ~ Species, data = iris[iris$Species != "virginica", ])

Let's look at the structure of this object:

str(tout)
## List of 9
##  $ statistic  : Named num -39.5
##   ..- attr(*, "names")= chr "t"
##  $ parameter  : Named num 62.1
##   ..- attr(*, "names")= chr "df"
##  $ p.value    : num 9.93e-46
##  $ conf.int   : atomic [1:2] -2.94 -2.66
##   ..- attr(*, "conf.level")= num 0.95
##  $ estimate   : Named num [1:2] 1.46 4.26
##   ..- attr(*, "names")= chr [1:2] "mean in group setosa" "mean in group versicolor"
##  $ null.value : Named num 0
##   ..- attr(*, "names")= chr "difference in means"
##  $ alternative: chr "two.sided"
##  $ method     : chr "Welch Two Sample t-test"
##  $ data.name  : chr "Petal.Length by Species"
##  - attr(*, "class")= chr "htest"

Demo: Extracting the P-Value

Since this is simply a list, we can use our regular indexing:

tout$p.value
## [1] 9.934433e-46
tout[[3]]
## [1] 9.934433e-46

Importing Data

We often need to import in our own data rather than just using built-in datasets.

  • First need to tell R where the data is saved (either by specifying the path or navigating the working directory by setting setwd())
  • For finding a file/path can use a file browser/search by calling file.choose()
  • Data read in using R functions such as:
    • read.table() for reading in .txt files
    • read.csv() for reading in .csv files
  • Assign the data to new R object when reading in the file
write.csv(iris, "iris.csv", row.names=FALSE)

Exporting Data and Objects

  • Similarly to importing, exporting is supported using functions:

    • for data frames use write.csv or write.table
    • for more general objects such as lists use save (see ?save). Objects saved with save can be loaded using the function load()
    • save writes the object in an rda format (short for R data) - this is a R specific binary format (it is small, keeps types and loads fast)
iris_rda <- iris
save(iris_rda, file="iris-data.rda")
rm(iris_rda) # object is gone from environment

load("iris-data.rda") # and now it is back!

Your Turn

  • Write the iris data set into a csv file on your machine. Check where it appears. For a challenge try to change the location.
  • Export the tout object using the function save. Delete tout from your working environment by running rm(tout). Load the previously saved object using load. If everything went alright, tout will be back in your working environment!