Chapter 5 Import data

Now that we have learned what dataframes are, we will now try to import data from a file into a dataframe. Getting your own data into R is not always easy. There are several useful recent packages that can help you import your csv, Excel, or SPSS file.

5.1 Reading in different file types

5.1.1 To read in a csv-file:

install.packages("readr")
library("readr")
my_data <- read_csv("/some_path_to_file/file.csv", col_names = TRUE ) 
# col_names=TRUE implies that the first row has column names

5.1.2 To read in an Excel-file:

install.packages("readxl")
library("readxl")
my_data <- read_xlsx("/some_path_to_file/file.xlsx")

5.1.3 To read in an SPSS-file:

install.packages("haven")
library("haven")
my_data <- read_sav("/some_path_to_file/file.sav")

Later, we’ll talk more about importing SPSS-data.

5.1.4 Two ways to reduce frustration:

  1. When you open an RStudio-project, and you put your data file in the same folder as the .Rproj file, you do not need to specify the path, but can simply write read_sav("file.sav")

  2. You can also use read_sav( file.choose() ), which opens up a window that lets you select your datafile with point-and-click.

The first method is a bit better because it is more ‘reproducible’.

5.1.5 A cautionary note

Please do note that the functions will treat variables differently whether they are numeric or not (e.g., factors, categorical variables, string variables). A variable with zeros and ones to signify women and men will be seen as a continuous variable; a numerical variable that has one character value, will be seen as a character/string variable. Always check whether your variables have been read-in properly!

5.2 Assignments

  1. Download the file from the following path: “http://stulp.gmw.rug.nl/24-06-2019/Rworkshop/test.xlsx” and put the file in your RStudio project folder.

  2. Now try to import the datafile into R via read_xlsx("name_file.xlsx"). Don’t forget to install the package readxl first (and to tell R will be using the package by running library(readxl)).

When your read-in was successful, you’ll see an object in the global environment my_data (or whatever you called the object).

  1. Examine the data. You can click on the object, or use the str() or summary()f unction to get a quick glimpse

  2. Now try the same for an SPSS-file ( http://stulp.gmw.rug.nl/24-06-2019/Rworkshop/test.sav ) and a csv-file ( http://stulp.gmw.rug.nl/24-06-2019/Rworkshop/test.csv )

5.3 Further reading

All the above packages have excellent websites for further guidance:

Also, see the “Data Import Cheat Sheet” on https://www.rstudio.com/resources/cheatsheets/.

5.3.1 References

Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

Wickham, Hadley, and Jennifer Bryan. 2018. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.

Wickham, Hadley, and Evan Miller. 2018. Haven: Import and Export ‘Spss’, ‘Stata’ and ‘Sas’ Files. https://CRAN.R-project.org/package=haven.