Chapter 3 Dataframes

Our data will typically stored in a dataframe. Let’s make one:

dataframe1 <- data.frame(Int = c(1, 2, 3),
                         Bool = c(TRUE, FALSE, FALSE),
                         Char = c("aap", "noot", "mies")) 

The beauty of dataframes in RStudio, is that when you click on them in the global environment in the right top corner, you can look at the data within the dataframe (obviously much more convenient when the dataframes are very large!).

3.1 Accessing elements in a dataframe

Accessing elements in a dataframe is similar to that in a matrix:

dataframe1[1,2]
## [1] TRUE

Because the columns now have names, you can also use them:

dataframe1[1, "Bool"]
## [1] TRUE
dataframe1[3, "Char"]
## [1] "mies"

We can also refer to entire columns in the following way:

dataframe1[, "Int"]
## [1] 1 2 3

There is also another way of doing this. Given that R puts so much emphasis on vectors, and that each column is a vector(!), there is also a shorthand with the $ sign that we can use (and looks slightly better):

dataframe1$Int
## [1] 1 2 3

If we want the third element of the third column, we can also do:

dataframe1$Char[3]
## [1] "mies"

3.2 Getting a quick glimpse of the dataframe

Often you want a quick look at the variables in the dataframe. We can do that with the functions str:

str(dataframe1)
## 'data.frame':    3 obs. of  3 variables:
##  $ Int : num  1 2 3
##  $ Bool: logi  TRUE FALSE FALSE
##  $ Char: chr  "aap" "noot" "mies"

We have learned anything new really, but we could have if the dataframe would have been bigger.

3.3 Transforming a dataframe

Transforming (or manipulating or wrangling) a dataframe is a huge part of what we will do today. Most data science / data analyses involve dataframes that need to be cleaned up and transformed (e.g., new variables need to be created) before we can visualise and analyse.

3.4 Assignments

  1. The dataset ToothGrowth is built-in to R. So if you just type ToothGrowth into the console, you’ll see a dataset appear. Let’s get a quick glimpse of the dataset. You can think of assigning Toothgrowth to an object, so that it will become available in our global environment. Or you can think of using str.

  2. Try and select the first column of the ToothGrowth-dataset, called len

  3. Select the fifth element of the len-variable.

  4. Calculate the average for the len-variable with the mean-function.

  5. Calculate the average for the supp-variable with the mean-function. Why doesn’t it work?