Chapter 3 Dataframes
Our data will typically stored in a dataframe. Let’s make one:
<- data.frame(Int = c(1, 2, 3),
dataframe1 Bool = c(TRUE, FALSE, FALSE),
Char = c("aap", "noot", "mies"))
The beauty of dataframes in RStudio, is that when you click on them in the global environment in the right top corner, you can look at the data within the dataframe (obviously much more convenient when the dataframes are very large!).
3.1 Accessing elements in a dataframe
Accessing elements in a dataframe is similar to that in a matrix:
1,2] dataframe1[
## [1] TRUE
Because the columns now have names, you can also use them:
1, "Bool"] dataframe1[
## [1] TRUE
3, "Char"] dataframe1[
## [1] "mies"
We can also refer to entire columns in the following way:
"Int"] dataframe1[,
## [1] 1 2 3
There is also another way of doing this. Given that R puts so much emphasis on vectors, and that each column is a vector(!), there is also a shorthand with the $
sign that we can use (and looks slightly better):
$Int dataframe1
## [1] 1 2 3
If we want the third element of the third column, we can also do:
$Char[3] dataframe1
## [1] "mies"
3.2 Getting a quick glimpse of the dataframe
Often you want a quick look at the variables in the dataframe. We can do that with the functions str
:
str(dataframe1)
## 'data.frame': 3 obs. of 3 variables:
## $ Int : num 1 2 3
## $ Bool: logi TRUE FALSE FALSE
## $ Char: chr "aap" "noot" "mies"
We have learned anything new really, but we could have if the dataframe would have been bigger.
3.3 Transforming a dataframe
Transforming (or manipulating or wrangling) a dataframe is a huge part of what we will do today. Most data science / data analyses involve dataframes that need to be cleaned up and transformed (e.g., new variables need to be created) before we can visualise and analyse.
3.4 Assignments
The dataset
ToothGrowth
is built-in to R. So if you just typeToothGrowth
into the console, you’ll see a dataset appear. Let’s get a quick glimpse of the dataset. You can think of assigningToothgrowth
to an object, so that it will become available in our global environment. Or you can think of usingstr
.Try and select the first column of the
ToothGrowth
-dataset, calledlen
Select the fifth element of the
len
-variable.Calculate the average for the
len
-variable with themean
-function.Calculate the average for the
supp
-variable with themean
-function. Why doesn’t it work?