class: center, middle, inverse, title-slide # Basics ### Haley Jeppson, Sam Tyner --- ## Overgrown Calculator ```r # Addition and Subtraction 2 + 5 - 1 ``` ``` ## [1] 6 ``` ```r # Multiplication 109*23452 ``` ``` ## [1] 2556268 ``` ```r # Division 3/7 ``` ``` ## [1] 0.4285714 ``` --- ## More Calculator Operations ```r # Integer division 7 %/% 2 ``` ``` ## [1] 3 ``` ```r # Modulo operator (Remainder) 7 %% 2 ``` ``` ## [1] 1 ``` ```r # Powers 1.5^3 ``` ``` ## [1] 3.375 ``` --- ## Even More Functions - Exponentiation - `exp(x)` - Logarithms - `log(x)` - `log(x, base = 10)` - Trigonometric functions - `sin(x)` - `asin(x)` - `cos(x)` - `tan(x)` --- ## Creating Variables We can create variables using the assignment operator `<-`: ```r x <- 5 MyAge <- 26 ``` We can then perform any of the functions on the variables: ```r log(x) ``` ``` ## [1] 1.609438 ``` ```r MyAge^2 ``` ``` ## [1] 676 ``` --- ## Rules for Variable Creation - Variable names can't start with a number - Variables in `R` are case-sensitive - Some common letters are used internally by R and should be avoided as variable names (c, q, t, C, D, F, T, I) - There are reserved words that R won't let you use for variable names. (for, in, while, if, else, repeat, break, next) - R *will* let you use the name of a predefined function. Try not to overwrite those though! --- ## Vectors A variable does not need to be a single value. We can create a **vector** using the `c` (combine) function: ```r y <- c(1, 5, 3, 2) ``` Operations will then be done element-wise: ```r y / 2 ``` ``` ## [1] 0.5 2.5 1.5 1.0 ``` --- ## Getting Help We will talk MUCH more about vectors in a bit, but for now, let's talk about a couple ways to get help. The primary function to use is the `help` function. Just pass in the name of the function you need help with: ```r help(head) ``` The `?` function also works: ```r ?head ``` Googling for help can be difficult at first. You might need to search for R + CRAN + \<your query\> to get good results Stackoverflow is VERY helpful --- ## Getting Help **R Reference Card ** We will pass you out a copy, but you can download the reference card from: Having this open or printed off and near you while working is helpful. </br> **Rstudio cheatsheets** The Rstudio cheatsheets are VERY helpful. --- class: inverse ## Your Turn Using the R Reference Card (and the Help pages, if needed), do the following: Find out how many rows and columns the `iris` data set has. Figure out at least 2 ways to do this. **Hint**: "Variable Information" section on the first page of the reference card! Use the `rep` function to construct the following vector: `1 1 2 2 3 3 4 4 5 5` **Hint**: "Data Creation" section of the reference card Use `rep` to construct this vector: `1 2 3 4 5 1 2 3 4 5 1 2 3 4 5` --- ## Data Frames: Introduction - `final_shed` is a data frame. - Data frames hold data sets - Not every column need be the same type - like an Excel spreadsheet - Each column in a data frame is a vector <sup>1</sup> - so each column needs to have values that are all the same type. - We can access different columns using the `$` operator. ```r shedding <- final_shed$total_shedding treatment <- final_shed$treatment ``` .footnote[ [1] a column can also be a list! This is a more advanced topic that will be saved for later. ] --- ## More about Vectors A vector is a list of values that are all the same type. We have seen that we can create them using the `c` or the `rep` function. We can also use the `:` operator if we wish to create consecutive values: ```r a <- 10:15 a ``` ``` ## [1] 10 11 12 13 14 15 ``` We can extract the different elements of the vector like so: ```r shedding[3] ``` ``` ## [1] 59.04973 ``` --- ## Indexing Vectors We saw that we can access individual elements of the vector. But **indexing** is a lot more powerful than that: ```r head(shedding) ``` ``` ## [1] 37.14022 43.88073 59.04973 44.96963 38.74342 56.12656 ``` ```r shedding[c(1, 3, 5)] ``` ``` ## [1] 37.14022 59.04973 38.74342 ``` ```r shedding[1:5] ``` ``` ## [1] 37.14022 43.88073 59.04973 44.96963 38.74342 ``` --- ## Logical Values - R has built in support for logical values - TRUE and FALSE are built in. T (for TRUE) and F (for FALSE) are supported but can be modified - Logicals can result from a comparison using - `<` : "less than" - `>` : "greater than" - `<=` : "less than or equal to" - `>=` : "greater than or equal to" - `==` : "is equal to" - `!=` : "not equal to" --- ## Indexing with Logicals We can index vectors using logical values as well: ```r x <- c(2, 3, 5, 7) x[c(TRUE, FALSE, FALSE, TRUE)] ``` ``` ## [1] 2 7 ``` ```r x > 3.5 ``` ``` ## [1] FALSE FALSE TRUE TRUE ``` ```r x[x > 3.5] ``` ``` ## [1] 5 7 ``` --- ## Logical Examples ```r bad_shedder <- shedding > 50 shedding[bad_shedder] ``` ``` ## [1] 59.04973 56.12656 66.20657 51.98984 58.53921 64.74017 64.27066 56.06566 ## [9] 53.76049 ``` --- class: inverse ## Your Turn 1. Find out how many pigs had a total shedding value of less than 30 log10 CFUs. **Hint**: if you use the `sum` function on a logical vector, it'll return how many TRUEs are in the vector: ```r sum(c(TRUE, TRUE, FALSE, TRUE, FALSE)) ``` ``` ## [1] 3 ``` 2. **More Challenging**: Calculate the sum of the total shedding log10 CFUs of all pigs with a total shedding value of less than 30 log10 CFUs. --- ## Element-wise Logical Operators - `&` (elementwise AND) - `|` (elementwise OR) ```r c(T, T, F, F) & c(T, F, T, F) ``` ``` ## [1] TRUE FALSE FALSE FALSE ``` ```r c(T, T, F, F) | c(T, F, T, F) ``` ``` ## [1] TRUE TRUE TRUE FALSE ``` ```r # Which are high shedders in the control group? id <- (shedding > 50 & treatment == "control") final_shed[id,] ``` ``` ## # A tibble: 4 x 7 ## pignum time_point pig_weight daily_shedding treatment total_shedding ## <int> <int> <dbl> <dbl> <chr> <dbl> ## 1 122 21 33.9 5.01 control 59.0 ## 2 224 21 22.9 3.91 control 56.1 ## 3 337 21 29.5 5.52 control 66.2 ## 4 419 21 31.0 6.21 control 52.0 ## # ... with 1 more variable: gain <dbl> ``` --- ## Modifying Vectors We can modify vectors using indexing as well: ```r x <- shedding[1:5] x ``` ``` ## [1] 37.14022 43.88073 59.04973 44.96963 38.74342 ``` ```r x[1] <- 20 x ``` ``` ## [1] 20.00000 43.88073 59.04973 44.96963 38.74342 ``` --- ## Vector Elements Elements of a vector must all be the same type: ```r head(shedding) ``` ``` ## [1] 37.14022 43.88073 59.04973 44.96963 38.74342 56.12656 ``` ```r shedding[bad_shedder] <- ":-(" head(shedding) ``` ``` ## [1] "37.1402150411922" "43.8807276727966" ":-(" ## [4] "44.9696314253854" "38.7434232007542" ":-(" ``` By changing a value to a string, all the other values were also changed. --- ## Data Types in R - Can use `mode` or `class` to find out information about variables - `str` is useful to find information about the structure of your data - Many data types: numeric, integer, character, Date, and factor most common ```r str(final_shed) ``` ``` ## Classes 'tbl_df', 'tbl' and 'data.frame': 59 obs. of 7 variables: ## $ pignum : int 77 87 122 160 191 224 337 345 419 458 ... ## $ time_point : int 21 21 21 21 21 21 21 21 21 21 ... ## $ pig_weight : num 25.4 23.9 33.9 28.4 28.9 ... ## $ daily_shedding: num 4.61 3.91 5.01 3.91 3.91 ... ## $ treatment : chr "control" "control" "control" "control" ... ## $ total_shedding: num 37.1 43.9 59 45 38.7 ... ## $ gain : num 13.9 11.7 16.8 15.1 14.6 ... ``` --- ## Converting Between Types We can convert between different types using the `as` series of functions: ```r pignum <- head(final_shed$pignum) pignum ``` ``` ## [1] 77 87 122 160 191 224 ``` ```r as.character(pignum) ``` ``` ## [1] "77" "87" "122" "160" "191" "224" ``` ```r as.numeric("77") ``` ``` ## [1] 77 ``` --- ## Some useful functions There are a whole variety of useful functions to operate on vectors. A couple of the more common ones are `length`, which returns the length (number of elements) of a vector, and `sum`, which adds up all the elements of a vector. ```r pig_weight <- final_shed$pig_weight x <- pig_weight[1:5] length(x) ``` ``` ## [1] 5 ``` ```r sum(x) ``` ``` ## [1] 140.36 ``` --- ## Statistical Functions Using the basic functions we've learned it wouldn't be hard to compute some basic statistics. ```r (n <- length(pig_weight)) ``` ``` ## [1] 59 ``` ```r (meanweight <- sum(pig_weight) / n) ``` ``` ## [1] 28.82305 ``` ```r (standdev <- sqrt(sum((pig_weight - meanweight)^2) / (n - 1))) ``` ``` ## [1] 4.10429 ``` But we don't have to. --- ## Built-in Statistical Functions ```r mean(pig_weight) ``` ``` ## [1] 28.82305 ``` ```r sd(pig_weight) ``` ``` ## [1] 4.10429 ``` ```r summary(pig_weight) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 19.50 25.79 28.80 28.82 32.24 36.30 ``` ```r quantile(pig_weight, c(.025, .975)) ``` ``` ## 2.5% 97.5% ## 22.279 35.952 ``` <!-- class: inverse ## Your Turn 1. Read up on the diamonds dataset (`?diamonds`) 2. Plot price by carat (use qplot - go back to the motivating example for help with the syntax) 3. Create a variable `ppc` for price/carat. Store this variable as a column in the diamonds data 4. Make a histogram of all ppc values that exceed $10000 per carat. 5. Explore any other interesting relationships you find -->