- Follow along (copy & paste the code into the console):
curl::curl_download( "https://raw.githubusercontent.com/heike/summerschool-2022/master/01-Introduction-to-R/code/2-basics.R", "2-basics.R" ) file.edit("2-basics.R")
2022-05-16
curl::curl_download( "https://raw.githubusercontent.com/heike/summerschool-2022/master/01-Introduction-to-R/code/2-basics.R", "2-basics.R" ) file.edit("2-basics.R")
# Addition and Subtraction 2 + 5 - 1
## [1] 6
# Multiplication 109*23452
## [1] 2556268
# Division 3/7
## [1] 0.4285714
# Integer division 7 %/% 2
## [1] 3
# Modulo operator (Remainder) 7 %% 2
## [1] 1
# Powers 1.5^3
## [1] 3.375
exp(x)
log(x)
log(x, base = 10)
sin(x)
asin(x)
cos(x)
tan(x)
We create an object using the assignment operator <-
x <- 5 y <- 21
We then perform any operations on these objects:
log(x)
## [1] 1.609438
y^2
## [1] 441
Pro-tip: before introducing a new object, type it in the console to check that it is not yet taken
A variable usually consists of more than a single value. We create a vector using the c
(combine) function:
y <- c(1, 5, 3, 2)
Operations will then be done element-wise:
y / 2
## [1] 0.5 2.5 1.5 1.0
We will talk MUCH more about vectors in a bit, but for now, let’s talk about a couple ways to get help. The primary function to use is the help
function. Just pass in the name of the function you need help with:
help(head)
The ?
function also works:
?head
Googling for help is a bit hard. Searches of the form R + CRAN +
Download the reference card from:
http://cran.r-project.org/doc/contrib/Short-refcard.pdf
Having this open or printed off and near you while working is helpful.
Using the R Reference Card at https://cran.r-project.org/doc/contrib/Short-refcard.pdf (and the Help pages, if needed), do the following:
Hint: “Variable Information” section on the first page of the reference card!
rep
function to construct the following vector: 1 1 2 2 3 3 4 4 5 5Hint: “Data Creation” section of the reference card
Give this vector the name x
x
, then calculate the average value.penguins
is a data frame.$
operator.penguins <- read.csv("https://raw.githubusercontent.com/heike/summerschool-2022/main/01-Introduction-to-R/data/penguins.csv", stringsAsFactors = TRUE) species <- penguins$species bill_length <- penguins$bill_length_mm
A vector is a list of values that are all the same type. We have seen that we can create them using the c
or the rep
function. We can also use the :
operator if we wish to create consecutive values:
a <- 10:15 a
## [1] 10 11 12 13 14 15
We can extract the different elements of the vector like so (note, unlike python indexing starts with 1):
bill_length[3]
## [1] 40.3
We have seen that we can access individual elements of the vector. But indexing is a lot more powerful than that:
head(bill_length)
## [1] 39.1 39.5 40.3 NA 36.7 39.3
bill_length[c(1, 3, 5)]
## [1] 39.1 40.3 36.7
bill_length[1:5]
## [1] 39.1 39.5 40.3 NA 36.7
We can index vectors using logical values as well:
x <- c(2, 3, 5, 7) x[c(TRUE, FALSE, FALSE, TRUE)]
## [1] 2 7
x > 3.5
## [1] FALSE FALSE TRUE TRUE
x[x > 3.5]
## [1] 5 7
bill_length <- penguins$bill_length head(bill_length)
## [1] 39.1 39.5 40.3 NA 36.7 39.3
short_bills <- bill_length < 35 species[short_bills]
## [1] <NA> Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie ## [11] <NA> ## Levels: Adelie Chinstrap Gentoo
Read up on the tips data.
tips <- read.csv("https://raw.githubusercontent.com/heike/summerschool-2022/master/01-Introduction-to-R/data/tips.csv")
The tips
data set consists of 244 parties being served at a restaurant.
Calculate the rate that each party tipped (in percent), i.e. fill the blanks in the statement: tips$tip_pct <- ___ / ___ * 100
Find out how many people tipped over 20%.
Hint: if you use the sum
function on a logical vector, it’ll return how many TRUEs are in the vector:
mode
or class
to find out information about variablesstr
is useful to find information about the structure of your datastr(penguins)
## 'data.frame': 344 obs. of 8 variables: ## $ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ... ## $ bill_length_mm : num 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ... ## $ bill_depth_mm : num 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ... ## $ flipper_length_mm: int 181 186 195 NA 193 190 181 195 193 190 ... ## $ body_mass_g : int 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ... ## $ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ... ## $ year : int 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
There are a whole variety of useful functions to operate on vectors. A couple of the more common ones are length
, which returns the length (number of elements) of a vector, and sum
, which adds up all the elements of a vector.
x <- bill_length[1:5] length(x)
## [1] 5
sum(x)
## [1] NA
Using the basic functions we’ve learned it wouldn’t be hard to compute some basic statistics.
(n <- length(bill_length))
## [1] 344
(meanlength <- sum(bill_length) / n)
## [1] NA
(standdev <- sqrt(sum((bill_length - meanlength)^2) / (n - 1)))
## [1] NA
But we don’t have to.
mean(bill_length)
## [1] NA
sd(bill_length)
## [1] NA
summary(bill_length)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## 32.10 39.23 44.45 43.92 48.50 59.60 2
quantile(bill_length, c(.025, .975), na.rm = TRUE)
## 2.5% 97.5% ## 34.810 53.085
&
(elementwise AND)|
(elementwise OR)c(T, T, F, F) & c(T, F, T, F)
## [1] TRUE FALSE FALSE FALSE
c(T, T, F, F) | c(T, F, T, F)
## [1] TRUE TRUE TRUE FALSE
# How many of the short billed penguins are male? id <- (bill_length <35 & penguins$sex == "male") penguins[id,]
## species island bill_length_mm bill_depth_mm flipper_length_mm ## NA <NA> <NA> NA NA NA ## NA.1 <NA> <NA> NA NA NA ## 15 Adelie Torgersen 34.6 21.1 198 ## NA.2 <NA> <NA> NA NA NA ## body_mass_g sex year ## NA NA <NA> NA ## NA.1 NA <NA> NA ## 15 4400 male 2007 ## NA.2 NA <NA> NA
diamonds
(?diamonds
) in the ggplot2
packageqplot
- go back to the motivating example for help with the syntax)ppc
for price per carat. Store this variable as a column in the diamonds data