class: center, middle, inverse, title-slide # Data Structures ### Haley Jeppson, Sam Tyner --- ## Data Frames - Data Frames are the work horse of R objects - Structured by rows and columns and can be indexed - Each column is a variable of one type - Column names can be used to index a variable - Advice for naming variables applys to naming columns - Can be specified by grouping vectors of equal length as columns --- ## Data Frame Indexing - Elements indexed similar to a vector using `[` `]` - `df[i,j]` will select the element in the `\(i^{th}\)` row and `\(j^{th}\)` column - `df[ ,j]` will select the entire `\(j^{th}\)` column and treat it as a vector - `df[i ,]` will select the entire `\(i^{th}\)` row and treat it as a vector - Logical or integer vectors can also be used in place of i and j used to subset the row and columns --- ## Adding a new Variable to a Data Frame - Create a new vector that is the same length as other columns - Append new column to the data frame using the `$` operator - The new data frame column will adopt the name of the vector --- ## Data Frame Demo Use [Edgar Anderson's Iris Data](https://en.wikipedia.org/wiki/Iris_flower_data_set): ```r flower <- iris ``` Select Species column (5th column): ```r flower[,5] ``` ``` ## [1] setosa setosa setosa setosa setosa setosa ## [7] setosa setosa setosa setosa setosa setosa ## [13] setosa setosa setosa setosa setosa setosa ## [19] setosa setosa setosa setosa setosa setosa ## [25] setosa setosa setosa setosa setosa setosa ## [31] setosa setosa setosa setosa setosa setosa ## [37] setosa setosa setosa setosa setosa setosa ## [43] setosa setosa setosa setosa setosa setosa ## [49] setosa setosa versicolor versicolor versicolor versicolor ## [55] versicolor versicolor versicolor versicolor versicolor versicolor ## [61] versicolor versicolor versicolor versicolor versicolor versicolor ## [67] versicolor versicolor versicolor versicolor versicolor versicolor ## [73] versicolor versicolor versicolor versicolor versicolor versicolor ## [79] versicolor versicolor versicolor versicolor versicolor versicolor ## [85] versicolor versicolor versicolor versicolor versicolor versicolor ## [91] versicolor versicolor versicolor versicolor versicolor versicolor ## [97] versicolor versicolor versicolor versicolor virginica virginica ## [103] virginica virginica virginica virginica virginica virginica ## [109] virginica virginica virginica virginica virginica virginica ## [115] virginica virginica virginica virginica virginica virginica ## [121] virginica virginica virginica virginica virginica virginica ## [127] virginica virginica virginica virginica virginica virginica ## [133] virginica virginica virginica virginica virginica virginica ## [139] virginica virginica virginica virginica virginica virginica ## [145] virginica virginica virginica virginica virginica virginica ## Levels: setosa versicolor virginica ``` --- ## Demo (Continued) Select Species column with the `$` operator: ```r flower$Species ``` ``` ## [1] setosa setosa setosa setosa setosa setosa ## [7] setosa setosa setosa setosa setosa setosa ## [13] setosa setosa setosa setosa setosa setosa ## [19] setosa setosa setosa setosa setosa setosa ## [25] setosa setosa setosa setosa setosa setosa ## [31] setosa setosa setosa setosa setosa setosa ## [37] setosa setosa setosa setosa setosa setosa ## [43] setosa setosa setosa setosa setosa setosa ## [49] setosa setosa versicolor versicolor versicolor versicolor ## [55] versicolor versicolor versicolor versicolor versicolor versicolor ## [61] versicolor versicolor versicolor versicolor versicolor versicolor ## [67] versicolor versicolor versicolor versicolor versicolor versicolor ## [73] versicolor versicolor versicolor versicolor versicolor versicolor ## [79] versicolor versicolor versicolor versicolor versicolor versicolor ## [85] versicolor versicolor versicolor versicolor versicolor versicolor ## [91] versicolor versicolor versicolor versicolor versicolor versicolor ## [97] versicolor versicolor versicolor versicolor virginica virginica ## [103] virginica virginica virginica virginica virginica virginica ## [109] virginica virginica virginica virginica virginica virginica ## [115] virginica virginica virginica virginica virginica virginica ## [121] virginica virginica virginica virginica virginica virginica ## [127] virginica virginica virginica virginica virginica virginica ## [133] virginica virginica virginica virginica virginica virginica ## [139] virginica virginica virginica virginica virginica virginica ## [145] virginica virginica virginica virginica virginica virginica ## Levels: setosa versicolor virginica ``` --- ## Demo (Continued) ```r flower$Species == "setosa" ``` ``` ## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [12] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [23] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [34] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [45] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE ## [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [100] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [122] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE ``` --- ## Demo (Continued) ```r flower[flower$Species=="setosa", ] ``` ``` ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5.0 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa ## 11 5.4 3.7 1.5 0.2 setosa ## 12 4.8 3.4 1.6 0.2 setosa ## 13 4.8 3.0 1.4 0.1 setosa ## 14 4.3 3.0 1.1 0.1 setosa ## 15 5.8 4.0 1.2 0.2 setosa ## 16 5.7 4.4 1.5 0.4 setosa ## 17 5.4 3.9 1.3 0.4 setosa ## 18 5.1 3.5 1.4 0.3 setosa ## 19 5.7 3.8 1.7 0.3 setosa ## 20 5.1 3.8 1.5 0.3 setosa ## 21 5.4 3.4 1.7 0.2 setosa ## 22 5.1 3.7 1.5 0.4 setosa ## 23 4.6 3.6 1.0 0.2 setosa ## 24 5.1 3.3 1.7 0.5 setosa ## 25 4.8 3.4 1.9 0.2 setosa ## 26 5.0 3.0 1.6 0.2 setosa ## 27 5.0 3.4 1.6 0.4 setosa ## 28 5.2 3.5 1.5 0.2 setosa ## 29 5.2 3.4 1.4 0.2 setosa ## 30 4.7 3.2 1.6 0.2 setosa ## 31 4.8 3.1 1.6 0.2 setosa ## 32 5.4 3.4 1.5 0.4 setosa ## 33 5.2 4.1 1.5 0.1 setosa ## 34 5.5 4.2 1.4 0.2 setosa ## 35 4.9 3.1 1.5 0.2 setosa ## 36 5.0 3.2 1.2 0.2 setosa ## 37 5.5 3.5 1.3 0.2 setosa ## 38 4.9 3.6 1.4 0.1 setosa ## 39 4.4 3.0 1.3 0.2 setosa ## 40 5.1 3.4 1.5 0.2 setosa ## 41 5.0 3.5 1.3 0.3 setosa ## 42 4.5 2.3 1.3 0.3 setosa ## 43 4.4 3.2 1.3 0.2 setosa ## 44 5.0 3.5 1.6 0.6 setosa ## 45 5.1 3.8 1.9 0.4 setosa ## 46 4.8 3.0 1.4 0.3 setosa ## 47 5.1 3.8 1.6 0.2 setosa ## 48 4.6 3.2 1.4 0.2 setosa ## 49 5.3 3.7 1.5 0.2 setosa ## 50 5.0 3.3 1.4 0.2 setosa ``` --- ## Creating our own Data Frame Create our own data frame using `data_frame` function ```r library(tidyverse) mydf <- data_frame(NUMS = 1:5, lets = letters[1:5], vehicle = c("car", "boat", "car", "car", "boat")) mydf ``` ``` ## # A tibble: 5 x 3 ## NUMS lets vehicle ## <int> <chr> <chr> ## 1 1 a car ## 2 2 b boat ## 3 3 c car ## 4 4 d car ## 5 5 e boat ``` --- ## Renaming columns We can use the `names` function to set that first column to lowercase: ```r names(mydf)[1] <- "nums" mydf ``` ``` ## # A tibble: 5 x 3 ## nums lets vehicle ## <int> <chr> <chr> ## 1 1 a car ## 2 2 b boat ## 3 3 c car ## 4 4 d car ## 5 5 e boat ``` --- class: inverse ## Your Turn 1. Make a data frame with column 1: `1,2,3,4,5,6` and column 2: `a,b,a,b,a,b` 2. Select only rows with value `"a"` in column 2 using logical vector 3. `mtcars` is a built-in data set like `iris`: Extract the 4th row of the `mtcars` data. --- ## Lists - Lists are a structured collection of R objects - R objects in a list need not be the same type - Create lists using the `list` function - Lists indexed using double square brackets `[[ ]]` to select an object - Use single square brackets to select two or more list elements. e.g. `[c(2,4)]` - For named lists, can select a list element with `$` like data frames --- ## List Example Creating a list containing a vector and a matrix: ```r mylist <- list(matrix(letters[1:10], nrow = 2, ncol = 5), seq(0, 49, by = 7)) mylist ``` ``` ## [[1]] ## [,1] [,2] [,3] [,4] [,5] ## [1,] "a" "c" "e" "g" "i" ## [2,] "b" "d" "f" "h" "j" ## ## [[2]] ## [1] 0 7 14 21 28 35 42 49 ``` Use indexing to select the second list element: ```r mylist[[2]] ``` ``` ## [1] 0 7 14 21 28 35 42 49 ``` --- class: inverse ## Your Turn 1. Create a list containing a vector and a 2x3 data frame 2. Use indexing to select the data frame from your list 3. Use further indexing to select the first row from the data frame in your list --- ## Examining Objects - `head(x)` - View top 6 rows of a data frame - `tail(x)` - View bottom 6 rows of a data frame - `summary(x)` - Summary statistics - `str(x)` - View structure of object - `dim(x)` - View dimensions of object - `length(x)` - Returns the length of a vector --- ## Examining Objects Example We can examine the first two values of an object by passing the `n` parameter to the `head` function: ```r head(iris, n = 2) ``` ``` ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ``` What's its structure? ```r str(iris) ``` ``` ## 'data.frame': 150 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... ``` --- class: inverse ## Your Turn 1. View the top 8 rows of mtcars data 2. What type of object is the mtcars data set? 3. How many rows are in iris data set? (try finding this using `dim` or indexing + length) 4. Summarize the values in each column in iris data set --- ## Working with Output from a Function - The output from a function can be saved as an object - The object can be any type (data frame, vector, etc.) but is often a list object - Items from the output can be extracted for further computing - The output object can be examined using functions like `str(x)` --- ## Saving Output Demo - t-test using iris data to see if petal lengths for setosa and versicolor are the same - `t.test` function can only handle two groups, so we subset out the virginica species ```r t.test(Petal.Length ~ Species, data = iris[iris$Species != "virginica", ]) ``` ``` ## ## Welch Two Sample t-test ## ## data: Petal.Length by Species ## t = -39.493, df = 62.14, p-value < 2.2e-16 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -2.939618 -2.656382 ## sample estimates: ## mean in group setosa mean in group versicolor ## 1.462 4.260 ``` --- ## Demo (Continued) Save the output of the t-test to an object ```r tout <- t.test(Petal.Length ~ Species, data = iris[iris$Species != "virginica", ]) ``` Let's look at the structure of this object: ```r str(tout) ``` ``` ## List of 9 ## $ statistic : Named num -39.5 ## ..- attr(*, "names")= chr "t" ## $ parameter : Named num 62.1 ## ..- attr(*, "names")= chr "df" ## $ p.value : num 9.93e-46 ## $ conf.int : atomic [1:2] -2.94 -2.66 ## ..- attr(*, "conf.level")= num 0.95 ## $ estimate : Named num [1:2] 1.46 4.26 ## ..- attr(*, "names")= chr [1:2] "mean in group setosa" "mean in group versicolor" ## $ null.value : Named num 0 ## ..- attr(*, "names")= chr "difference in means" ## $ alternative: chr "two.sided" ## $ method : chr "Welch Two Sample t-test" ## $ data.name : chr "Petal.Length by Species" ## - attr(*, "class")= chr "htest" ``` --- ## Demo: Extracting the P-Value Since this is simply a list, we can use our regular indexing: ```r tout$p.value ``` ``` ## [1] 9.934433e-46 ``` ```r tout[[3]] ``` ``` ## [1] 9.934433e-46 ``` --- ## Importing Data We often need to import in our own data rather than just using built-in datasets. - First need to tell R where the data is saved using `setwd()` - Data read in using R functions such as: - `read_table()` for reading in .txt files - `read_csv()` for reading in .csv files - Assign the data to new R object when reading in the file --- ## Importing Data Demo We first create a csv file (We can use a text editor or MS Excel) Then we load it in: ```r littledata <- read_csv("PretendData.csv") ``` --- class: inverse ## Your Turn - Make 5 rows of data in an excel spreadsheet and save it as a *tab-delimited txt file*. - Import this new .txt file into R with `read_table`. You may need to look at the help page for `read_table` in order to properly do this.