Chapter 2 R basics
Let's go through some R-basics
You'll see a >
when you open R, or in the console from RStudio in the left bottom corner. This means that R is waiting for a command. Let's try and do simple calculations in R. Copy the below code and run it in R.
5 + 19
## [1] 24
R sure is good at calculating. We also see a [1]
which refers to the fact that the number to the right of the [1]
is the 1st element. Not very useful in this case, but we'll see later that it can be useful.
We can perform some more complicated calculations:
2^3 * 5 + pi + log(9)
## [1] 45.33882
In this case, we have made use of an 'in-built' number (pi
) and function (log
).
2.1 Storing objects
An important part of R is storing information in objects.
<- 5 a
a
gets assigned a 5. We see that a
appears in our upper-right corner in the 'global environment'. This means that we can use it now!
4 * a
## [1] 20
We could of course have stored that result as well:
<- 4 * a answer
And we can call on the answer.
answer
## [1] 20
We can do many other things with this object, for instance:
- is
answer
identical to the value 20?
- plot the object (when it is a numeric value)
plot(answer)
Some notes on objects:
a = 5
does exactly the same thing asa <- 5
; the latter option is considered better practice, because the=
signs is also used to check whether an object has a particular value (e.g.,a == 5
is questioning whethera
equals 5 or not.)]There are rules to the names that you can give these objects: you can't use spaces or special characters (e.g., ^&"'*+?); you can't start a name with a number (i.e., you can use
object1
as an object name, but not1object
); capitalization matters (i.e.,myobject
is different frommyObject
which is different fromMyobject
).Strive for consistency in naming; I typically use lowercase and underscores (e.g.,
education_male
,income_female
,year_birth
)
2.2 Vectors
An important concept in R are vectors. A vector is a sequence of ordered data elements of the same basic type. You can create a vector by enclosing the elements in c(...)
. For instance, this is a vector with 3 whole numbers (integers):
c(5, 3, 1)
## [1] 5 3 1
This is a vector with 5 logical values:
c(TRUE, TRUE, TRUE, FALSE, FALSE)
## [1] TRUE TRUE TRUE FALSE FALSE
This is a vector with 7 string values:
c("Groningen", "Amsterdam", "Zaandam", "Wassenaar", "Roelofarendsveen", "Leeuwarden", "Middelburg")
## [1] "Groningen" "Amsterdam" "Zaandam" "Wassenaar"
## [5] "Roelofarendsveen" "Leeuwarden" "Middelburg"
What happens when we mix up different types?
c("Groningen", FALSE, 5)
## [1] "Groningen" "FALSE" "5"
It has produced a vector of the same types, in this case it has converted all elements to character!
Of course, we can also store the results of these vectors as objects:
<- c(5, 3, 1, 4, 5, 9, 0) vector1
R puts a strong focus on vector and handles them in particular ways (R is built in such a way to deal with them efficiently). As an example:
+ 7 vector1
## [1] 12 10 8 11 12 16 7
R has added 7 to each element in the vector. (other programming languages would typically require saying something like; for each element in the vector vector1, add 7)
We can do more complex operations:
log( ( (vector1 + 7) * 25) / 16 )
## [1] 2.931194 2.748872 2.525729 2.844182 2.931194 3.218876 2.392197
Again, the calculations are done for each element!
Many (built-in) functions in R expect vectors. Let's take a look at the average:
mean( vector1 )
## [1] 3.857143
We can use these functions/values in our calculations, for instance, we can easily standardize the vector (i.e., subtracting the mean and dividing by the standard deviation):
- mean( vector1 ) ) / sd( vector1 ) ( vector1
## [1] 0.3850488 -0.2887866 -0.9626219 0.0481311 0.3850488 1.7327194 -1.2995396
2.2.1 Accessing elements in vectors
We can also access each element within a vector. Let's try and get the first element, by using the []
(we'll see this []
much more often later):
1] vector1[
## [1] 5
Third element:
3] vector1[
## [1] 1
Second and fifth:
c(2,5)] vector1[
## [1] 3 5
Third to sixth:
c(3:6)] vector1[
## [1] 1 4 5 9
How about the eighth?
8] vector1[
## [1] NA
Does not exist; NA
stands for missing value (Not Available)!
2.3 Functions
Above we have made use of two (built-in) function in R, namely mean()
and sd()
. Functions take arguments and do something with these. For instance, mean()
takes as argument a vector of numbers, and calculates a mean from all those numbers.
Functions always look something like name_function(argument1, argument2, ...)
. The brackets mean that we are dealing with a function, the arguments mean that the function is expecting something. Sometimes the arguments are mandatory, sometimes they are optional. Let's look at another function in R, the round()
-function, which let's you round numbers.
Let's call on round without any arguments:
round()
## Error in eval(expr, envir, enclos): 0 arguments passed to 'round' which requires 1 or 2 arguments
Clearly, at least one argument is rounded. Let's round the number pi
, which is stored in R as pi
:
round( pi )
## [1] 3
We can also add another argument, namely the number of decimal places we want to round off:
round( pi, digits = 2 )
## [1] 3.14
[this would have also worked round( pi, 2 )
]
One more example of a function: rnorm()
, which allows you to draw random numbers from a normal distributions. rnorm()
requires a number of observations (n
):
rnorm( n = 20 )
## [1] 0.491378183 -0.001509984 -1.312455845 0.156278217 -0.144945100
## [6] 0.055041144 1.503009992 0.499850907 -0.994378088 0.618798783
## [11] -0.313706083 0.449166159 0.725980528 1.107495430 -0.930523104
## [16] 0.015236345 0.395417956 -0.361503212 -1.312717054 1.003455574
We have now twenty random numbers from a normal distribution. Because we haven't specified any other arguments, the default the function uses is a normal distribution with a mean of 0 and a standard deviation of 1. We can of course change this:
rnorm( n = 20, mean = 170, sd = 7 )
## [1] 176.5699 175.1067 161.7839 174.0213 169.2736 176.0136 161.3376 157.7434
## [9] 182.5918 165.2178 171.8070 170.8431 169.1359 165.9095 174.5427 164.2350
## [17] 174.5399 162.3372 176.9677 168.5011
We have now sampled 20 numbers from a distribution with a mean of 170 and a standard deviation of 7 (more or less the distribution of height in women in the Netherlands). If we run the code again, we'll have different:
rnorm( n = 20, mean = 170, sd = 7 )
## [1] 168.2476 166.5353 174.4087 168.7173 177.1894 166.9483 169.9977 153.9398
## [9] 169.0740 180.3988 172.5061 162.8835 168.0021 165.4467 174.4625 168.9048
## [17] 166.7838 178.2742 171.2319 177.9186
[note that rnorm(20,170,7)
also works].
2.3.1 Help on functions
For more help on functions, you can type ?functionname
. A helper guide will appear in the bottom right corner. Try, for instance, ?round
. Sometimes, these in-built R help files are a bit difficult to understand. I typically use google and search for something like "R function rnorm help".
2.3.2 Creating your own functions
The beauty of R is that you can create your own functions. Once you have moderate proficiency in R, you will create many functions yourself.
Let’s create our own function to calculate the mean. Our function needs as input a vector with numbers.
<- function(input) {
mean2 <- sum(input) # calculate sum of all numbers in vector
sum <- length(input) # calculate length of vector = number of numbers, n
n <- sum / n # calculate mean
mean # return mean as output
mean }
Let’s check:
mean2(input = c(1, 2, 3))
## [1] 2
2.4 Further reading
The "Base R Cheat Sheet" on https://www.rstudio.com/resources/cheatsheets/ is useful.