- Follow along (copy & paste the code into the console):
curl::curl_download( "https://raw.githubusercontent.com/heike/rwrks/gh-pages/summerschool/01-Introduction-to-R/code/2-basics.R", "2-basics.R" ) file.edit("2-basics.R")
2016-06-21
curl::curl_download( "https://raw.githubusercontent.com/heike/rwrks/gh-pages/summerschool/01-Introduction-to-R/code/2-basics.R", "2-basics.R" ) file.edit("2-basics.R")
# Addition and Subtraction 2 + 5 - 1
## [1] 6
# Multiplication 109*23452
## [1] 2556268
# Division 3/7
## [1] 0.4285714
# Integer division 7 %/% 2
## [1] 3
# Modulo operator (Remainder) 7 %% 2
## [1] 1
# Powers 1.5^3
## [1] 3.375
exp(x)
log(x)
log(x, base = 10)
sin(x)
asin(x)
cos(x)
tan(x)
We can create an object using the assignment operator <-
:
x <- 5 todays_date <- 21
We can then perform any of the functions on these objects:
log(x)
## [1] 1.609438
todays_date^2
## [1] 441
A variable usually consists of more than a single value. We can create a vector using the c
(combine) function:
y <- c(1, 5, 3, 2)
Operations will then be done element-wise:
y / 2
## [1] 0.5 2.5 1.5 1.0
We will talk MUCH more about vectors in a bit, but for now, let's talk about a couple ways to get help. The primary function to use is the help
function. Just pass in the name of the function you need help with:
help(head)
The ?
function also works:
?head
Googling for help is a bit hard. Searches of the form R + CRAN +
We will pass you out a copy, but you can download the reference card from:
http://cran.r-project.org/doc/contrib/Short-refcard.pdf
Having this open or printed off and near you while working is helpful.
Using the R Reference Card (and the Help pages, if needed), do the following:
Find out how many rows and columns the `iris' data set has. Figure out at least 2 ways to do this. Hint: "Variable Information" section on the first page of the reference card!
Use the rep
function to construct the following vector: 1 1 2 2 3 3 4 4 5 5 Hint: "Data Creation" section of the reference card
Use rep
to construct this vector: 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
tips
is a data frame.$
operator.tip <- tips$tip bill <- tips$total_bill
A vector is a list of values that are all the same type. We have seen that we can create them using the c
or the rep
function. We can also use the :
operator if we wish to create consecutive values:
a <- 10:15 a
## [1] 10 11 12 13 14 15
We can extract the different elements of the vector like so:
bill[3]
## [1] 21.01
We have seen that we can access individual elements of the vector. But indexing is a lot more powerful than that:
head(tip)
## [1] 1.01 1.66 3.50 3.31 3.61 4.71
tip[c(1, 3, 5)]
## [1] 1.01 3.50 3.61
tip[1:5]
## [1] 1.01 1.66 3.50 3.31 3.61
We can index vectors using logical values as well:
x <- c(2, 3, 5, 7) x[c(TRUE, FALSE, FALSE, TRUE)]
## [1] 2 7
x > 3.5
## [1] FALSE FALSE TRUE TRUE
x[x > 3.5]
## [1] 5 7
rate <- tips$rate head(rate)
## [1] 0.05944673 0.16054159 0.16658734 0.13978041 0.14680765 0.18623962
sad_tip <- rate < 0.10 rate[sad_tip]
## [1] 0.05944673 0.07180385 0.07892660 0.05679667 0.09935739 0.05643341 ## [7] 0.09553024 0.07861635 0.07296137 0.08146640 0.09984301 0.09452888 ## [13] 0.07717751 0.07398274 0.06565988 0.09560229 0.09001406 0.07745933 ## [19] 0.08364236 0.06653360 0.08527132 0.08329863 0.07936508 0.03563814 ## [25] 0.07358352 0.08822232 0.09820426
Hint: if you use the sum
function on a logical vector, it'll return how many TRUEs are in the vector:
sum(c(TRUE, TRUE, FALSE, TRUE, FALSE))
## [1] 3
We can modify vectors using indexing as well:
x <- bill[1:5] x
## [1] 16.99 10.34 21.01 23.68 24.59
x[1] <- 20 x
## [1] 20.00 10.34 21.01 23.68 24.59
Elements of a vector must all be the same type:
head(rate)
## [1] 0.05944673 0.16054159 0.16658734 0.13978041 0.14680765 0.18623962
rate[sad_tip] <- ":-(" head(rate)
## [1] ":-(" "0.160541586073501" "0.166587339362208" ## [4] "0.139780405405405" "0.146807645384303" "0.186239620403321"
By changing a value to a string, all the other values got changed as well.
mode
or class
to find out information about variablesstr
is useful to find information about the structure of your datastr(tips)
## 'data.frame': 244 obs. of 8 variables: ## $ total_bill: num 17 10.3 21 23.7 24.6 ... ## $ tip : num 1.01 1.66 3.5 3.31 3.61 4.71 2 3.12 1.96 3.23 ... ## $ sex : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 2 2 2 2 2 ... ## $ smoker : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ... ## $ day : Factor w/ 4 levels "Fri","Sat","Sun",..: 3 3 3 3 3 3 3 3 3 3 ... ## $ time : Factor w/ 2 levels "Dinner","Lunch": 1 1 1 1 1 1 1 1 1 1 ... ## $ size : int 2 3 3 2 4 4 2 4 2 2 ... ## $ rate : num 0.0594 0.1605 0.1666 0.1398 0.1468 ...
We can convert between different types using the as
series of functions:
size <- head(tips$size) size
## [1] 2 3 3 2 4 4
as.character(size)
## [1] "2" "3" "3" "2" "4" "4"
as.numeric("2")
## [1] 2
There are a whole variety of useful functions to operate on vectors. A couple of the more common ones are length
, which returns the length (number of elements) of a vector, and sum
, which adds up all the elements of a vector.
x <- tip[1:5] length(x)
## [1] 5
sum(x)
## [1] 13.09
Using the basic functions we've learned it wouldn't be hard to compute some basic statistics.
(n <- length(tip))
## [1] 244
(meantip <- sum(tip) / n)
## [1] 2.998279
(standdev <- sqrt(sum((tip - meantip)^2) / (n - 1)))
## [1] 1.383638
But we don't have to.
mean(tip)
## [1] 2.998279
sd(tip)
## [1] 1.383638
summary(tip)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.000 2.000 2.900 2.998 3.562 10.000
quantile(tip, c(.025, .975))
## 2.5% 97.5% ## 1.1760 6.4625
&
(elementwise AND)|
(elementwise OR)c(T, T, F, F) & c(T, F, T, F)
## [1] TRUE FALSE FALSE FALSE
c(T, T, F, F) | c(T, F, T, F)
## [1] TRUE TRUE TRUE FALSE
# Which are big bills with a poor tip rate? id <- (bill > 40 & rate < .10) tips[id,]
## total_bill tip sex smoker day time size rate ## 103 44.30 2.5 Female Yes Sat Dinner 3 0.05643341 ## 183 45.35 3.5 Male Yes Sun Dinner 3 0.07717751 ## 185 40.55 3.0 Male Yes Sun Dinner 2 0.07398274
?diamonds
)ppc
for price/carat. Store this variable as a column in the diamonds data