Basics

class: center, middle, inverse, title-slide

# Basics
### Haley Jeppson, Sam Tyner

---

## Overgrown Calculator

```r
# Addition and Subtraction
2 + 5 - 1
```

```
## [1] 6
```

```r
# Multiplication
109*23452
```

```
## [1] 2556268
```

```r
# Division
3/7
```

```
## [1] 0.4285714
```

---
## More Calculator Operations

```r
# Integer division
7 %/% 2
```

```
## [1] 3
```

```r
# Modulo operator (Remainder)
7 %% 2
```

```
## [1] 1
```

```r
# Powers
1.5^3
```

```
## [1] 3.375
```

---
## Even More Functions

- Exponentiation 
    - `exp(x)`
- Logarithms
    - `log(x)`
    - `log(x, base = 10)`
- Trigonometric functions
    - `sin(x)`
    - `asin(x)`
    - `cos(x)`
    - `tan(x)`
    
---
## Creating Variables

We can create variables using the assignment operator `<-`:

```r
x <- 5
MyAge <- 26
```

We can then perform any of the functions on the variables:

```r
log(x)
```

```
## [1] 1.609438
```

```r
MyAge^2
```

```
## [1] 676
```

---
## Rules for Variable Creation

- Variable names can't start with a number

- Variables in `R` are case-sensitive

- Some common letters are used internally by R and should be avoided as variable names (c, q, t, C, D, F, T, I)

- There are reserved words that R won't let you use for variable names. (for, in, while, if, else, repeat, break, next)

- R *will* let you use the name of a predefined function.  Try not to overwrite those though!

---
## Vectors

A variable does not need to be a single value. We can create a **vector** using the `c` (combine) function:

```r
y <- c(1, 5, 3, 2)
```

Operations will then be done element-wise:

```r
y / 2
```

```
## [1] 0.5 2.5 1.5 1.0
```

---
## Getting Help

We will talk MUCH more about vectors in a bit, but for now, let's talk about a couple ways to get help. The primary function to use is the `help` function. Just pass in the name of the function you need help with:

```r
help(head)
```

The `?` function also works:

```r
?head
```

Googling for help can be difficult at first. You might need to search for R + CRAN + \<your query\> to get good results

Stackoverflow is VERY helpful

---
## Getting Help

**R Reference Card **

We will pass you out a copy, but you can download the reference card from:

http://cran.r-project.org/doc/contrib/Short-refcard.pdf

Having this open or printed off and near you while working is helpful.

</br>

**Rstudio cheatsheets**

The Rstudio cheatsheets are VERY helpful.

https://www.rstudio.com/resources/cheatsheets/

---
class: inverse

## Your Turn

Using the R Reference Card (and the Help pages, if needed), do the following:

Find out how many rows and columns the `iris` data set has. Figure out at least 2 ways to do this.

**Hint**: "Variable Information" section on the first page of the reference card!

Use the `rep` function to construct the following vector: `1 1 2 2 3 3 4 4 5 5`

**Hint**: "Data Creation" section of the reference card

Use `rep` to construct this vector: `1 2 3 4 5 1 2 3 4 5 1 2 3 4 5`

---
## Data Frames: Introduction

- `final_shed` is a data frame.

- Data frames hold data sets

- Not every column need be the same type - like an Excel spreadsheet

- Each column in a data frame is a vector <sup>1</sup> - so each column needs to have values that are all the same type.

- We can access different columns using the `$` operator.

```r
shedding <- final_shed$total_shedding
treatment <- final_shed$treatment
```

.footnote[
[1] a column can also be a list! This is a more advanced topic that will be saved for later.
]
---
## More about Vectors

A vector is a list of values that are all the same type. We have seen that we can create them using the `c` or the `rep` function. We can also use the `:` operator if we wish to create consecutive values:

```r
a <- 10:15
a
```

```
## [1] 10 11 12 13 14 15
```

We can extract the different elements of the vector like so:

```r
shedding[3]
```

```
## [1] 59.04973
```

---
## Indexing Vectors

We saw that we can access individual elements of the vector. But **indexing** is a lot more powerful than that:

```r
head(shedding)
```

```
## [1] 37.14022 43.88073 59.04973 44.96963 38.74342 56.12656
```

```r
shedding[c(1, 3, 5)]
```

```
## [1] 37.14022 59.04973 38.74342
```

```r
shedding[1:5]
```

```
## [1] 37.14022 43.88073 59.04973 44.96963 38.74342
```

---
## Logical Values

- R has built in support for logical values

- TRUE and FALSE are built in.  T (for TRUE) and F (for FALSE) are supported but can be modified

- Logicals can result from a comparison using
    - `<` : "less than"
    - `>` : "greater than"
    - `<=` : "less than or equal to"
    - `>=` : "greater than or equal to"
    - `==` : "is equal to"
    - `!=` : "not equal to"
    
---
## Indexing with Logicals

We can index vectors using logical values as well:

```r
x <- c(2, 3, 5, 7)
x[c(TRUE, FALSE, FALSE, TRUE)]
```

```
## [1] 2 7
```

```r
x > 3.5
```

```
## [1] FALSE FALSE  TRUE  TRUE
```

```r
x[x > 3.5]
```

```
## [1] 5 7
```

---
## Logical Examples

```r
bad_shedder <- shedding > 50
shedding[bad_shedder]
```

```
## [1] 59.04973 56.12656 66.20657 51.98984 58.53921 64.74017 64.27066 56.06566
## [9] 53.76049
```

---
class: inverse

## Your Turn

1. Find out how many pigs had a total shedding value of less than 30 log10 CFUs.

**Hint**: if you use the `sum` function on a logical vector, it'll return how many TRUEs are in the vector:

```r
sum(c(TRUE, TRUE, FALSE, TRUE, FALSE))
```

```
## [1] 3
```

2. **More Challenging**: Calculate the sum of the total shedding log10 CFUs of all pigs with a total shedding value of less than 30 log10 CFUs.

---
## Element-wise Logical Operators

- `&` (elementwise AND)
- `|` (elementwise OR)

```r
c(T, T, F, F) & c(T, F, T, F)
```

```
## [1]  TRUE FALSE FALSE FALSE
```

```r
c(T, T, F, F) | c(T, F, T, F)
```

```
## [1]  TRUE  TRUE  TRUE FALSE
```

```r
# Which are high shedders in the control group?
id <- (shedding > 50 & treatment == "control")
final_shed[id,]
```

```
## # A tibble: 4 x 7
##   pignum time_point pig_weight daily_shedding treatment total_shedding
##    <int>      <int>      <dbl>          <dbl> <chr>              <dbl>
## 1    122         21       33.9           5.01 control             59.0
## 2    224         21       22.9           3.91 control             56.1
## 3    337         21       29.5           5.52 control             66.2
## 4    419         21       31.0           6.21 control             52.0
## # ... with 1 more variable: gain <dbl>
```

---

## Modifying Vectors

We can modify vectors using indexing as well:

```r
x <- shedding[1:5]
x
```

```
## [1] 37.14022 43.88073 59.04973 44.96963 38.74342
```

```r
x[1] <- 20
x
```

```
## [1] 20.00000 43.88073 59.04973 44.96963 38.74342
```

---
## Vector Elements

Elements of a vector must all be the same type:

```r
head(shedding)
```

```
## [1] 37.14022 43.88073 59.04973 44.96963 38.74342 56.12656
```

```r
shedding[bad_shedder] <- ":-("
head(shedding)
```

```
## [1] "37.1402150411922" "43.8807276727966" ":-("             
## [4] "44.9696314253854" "38.7434232007542" ":-("
```

By changing a value to a string, all the other values were also changed.

---
## Data Types in R

- Can use `mode` or `class` to find out information about variables

- `str` is useful to find information about the structure of your data

- Many data types: numeric, integer, character, Date, and factor most common

```r
str(final_shed)
```

```
## Classes 'tbl_df', 'tbl' and 'data.frame':	59 obs. of  7 variables:
##  $ pignum        : int  77 87 122 160 191 224 337 345 419 458 ...
##  $ time_point    : int  21 21 21 21 21 21 21 21 21 21 ...
##  $ pig_weight    : num  25.4 23.9 33.9 28.4 28.9 ...
##  $ daily_shedding: num  4.61 3.91 5.01 3.91 3.91 ...
##  $ treatment     : chr  "control" "control" "control" "control" ...
##  $ total_shedding: num  37.1 43.9 59 45 38.7 ...
##  $ gain          : num  13.9 11.7 16.8 15.1 14.6 ...
```

---
## Converting Between Types

We can convert between different types using the `as` series of functions:

```r
pignum <- head(final_shed$pignum)
pignum
```

```
## [1]  77  87 122 160 191 224
```

```r
as.character(pignum)
```

```
## [1] "77"  "87"  "122" "160" "191" "224"
```

```r
as.numeric("77")
```

```
## [1] 77
```

---
## Some useful functions

There are a whole variety of useful functions to operate on vectors.

A couple of the more common ones are `length`, which returns the length (number of elements) of a vector, and `sum`, which adds up all the elements of a vector.

```r
pig_weight <- final_shed$pig_weight
x <- pig_weight[1:5]
length(x)
```

```
## [1] 5
```

```r
sum(x)
```

```
## [1] 140.36
```

---
## Statistical Functions

Using the basic functions we've learned it wouldn't be hard to compute some basic statistics.

```r
(n <- length(pig_weight))
```

```
## [1] 59
```

```r
(meanweight <- sum(pig_weight) / n)
```

```
## [1] 28.82305
```

```r
(standdev <- sqrt(sum((pig_weight - meanweight)^2) / (n - 1)))
```

```
## [1] 4.10429
```

But we don't have to.

---
## Built-in Statistical Functions

```r
mean(pig_weight)
```

```
## [1] 28.82305
```

```r
sd(pig_weight)
```

```
## [1] 4.10429
```

```r
summary(pig_weight)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.50   25.79   28.80   28.82   32.24   36.30
```

```r
quantile(pig_weight, c(.025, .975))
```

```
##   2.5%  97.5% 
## 22.279 35.952
```

<!--

class: inverse
## Your Turn

1. Read up on the diamonds dataset (`?diamonds`)

2. Plot price by carat (use qplot - go back to the motivating example for help with the syntax)

3. Create a variable `ppc` for price/carat. Store this variable as a column in the diamonds data

4. Make a histogram of all ppc values that exceed $10000 per carat.

5. Explore any other interesting relationships you find
-->