Chapter 3 Dataframes
Our data will typically stored in a dataframe. Let’s make one:
dataframe1 <- data.frame(Int = c(1, 2, 3),
Bool = c(TRUE, FALSE, FALSE),
Char = c("aap", "noot", "mies")) The beauty of dataframes in RStudio, is that when you click on them in the global environment in the right top corner, you can look at the data within the dataframe (obviously much more convenient when the dataframes are very large!).
3.1 Accessing elements in a dataframe
Accessing elements in a dataframe is similar to that in a matrix:
dataframe1[1,2]## [1] TRUE
Because the columns now have names, you can also use them:
dataframe1[1, "Bool"]## [1] TRUE
dataframe1[3, "Char"]## [1] "mies"
We can also refer to entire columns in the following way:
dataframe1[, "Int"]## [1] 1 2 3
There is also another way of doing this. Given that R puts so much emphasis on vectors, and that each column is a vector(!), there is also a shorthand with the $ sign that we can use (and looks slightly better):
dataframe1$Int## [1] 1 2 3
If we want the third element of the third column, we can also do:
dataframe1$Char[3]## [1] "mies"
3.2 Getting a quick glimpse of the dataframe
Often you want a quick look at the variables in the dataframe. We can do that with the functions str:
str(dataframe1)## 'data.frame': 3 obs. of 3 variables:
## $ Int : num 1 2 3
## $ Bool: logi TRUE FALSE FALSE
## $ Char: chr "aap" "noot" "mies"
We have learned anything new really, but we could have if the dataframe would have been bigger.
3.3 Transforming a dataframe
Transforming (or manipulating or wrangling) a dataframe is a huge part of what we will do today. Most data science / data analyses involve dataframes that need to be cleaned up and transformed (e.g., new variables need to be created) before we can visualise and analyse.
3.4 Assignments
The dataset
ToothGrowthis built-in to R. So if you just typeToothGrowthinto the console, you’ll see a dataset appear. Let’s get a quick glimpse of the dataset. You can think of assigningToothgrowthto an object, so that it will become available in our global environment. Or you can think of usingstr.Try and select the first column of the
ToothGrowth-dataset, calledlenSelect the fifth element of the
len-variable.Calculate the average for the
len-variable with themean-function.Calculate the average for the
supp-variable with themean-function. Why doesn’t it work?