Chapter 5 Import data
Now that we have learned what dataframes are, we will now try to import data from a file into a dataframe. Getting your own data into R is not always easy. There are several useful recent packages that can help you import your csv, Excel, or SPSS file.
5.1 Reading in different file types
5.1.1 To read in a csv-file:
install.packages("readr")
library("readr")
<- read_csv("/some_path_to_file/file.csv", col_names = TRUE )
my_data # col_names=TRUE implies that the first row has column names
5.1.2 To read in an Excel-file:
install.packages("readxl")
library("readxl")
<- read_xlsx("/some_path_to_file/file.xlsx") my_data
5.1.3 To read in an SPSS-file:
install.packages("haven")
library("haven")
<- read_sav("/some_path_to_file/file.sav") my_data
Later, we’ll talk more about importing SPSS-data.
5.1.4 Two ways to reduce frustration:
When you open an RStudio-project, and you put your data file in the same folder as the .Rproj file, you do not need to specify the path, but can simply write
read_sav("file.sav")
You can also use
read_sav( file.choose() )
, which opens up a window that lets you select your datafile with point-and-click.
The first method is a bit better because it is more ‘reproducible’.
5.1.5 A cautionary note
Please do note that the functions will treat variables differently whether they are numeric or not (e.g., factors, categorical variables, string variables). A variable with zeros and ones to signify women and men will be seen as a continuous variable; a numerical variable that has one character value, will be seen as a character/string variable. Always check whether your variables have been read-in properly!
5.2 Assignments
Download the file from the following path: “http://stulp.gmw.rug.nl/24-06-2019/Rworkshop/test.xlsx” and put the file in your RStudio project folder.
Now try to import the datafile into R via
read_xlsx("name_file.xlsx")
. Don’t forget to install the packagereadxl
first (and to tell R will be using the package by runninglibrary(readxl)
).
When your read-in was successful, you’ll see an object in the global environment my_data
(or whatever you called the object).
Examine the data. You can click on the object, or use the
str()
orsummary()
f unction to get a quick glimpseNow try the same for an SPSS-file ( http://stulp.gmw.rug.nl/24-06-2019/Rworkshop/test.sav ) and a csv-file ( http://stulp.gmw.rug.nl/24-06-2019/Rworkshop/test.csv )
5.3 Further reading
All the above packages have excellent websites for further guidance:
readr
: http://readr.tidyverse.org/ (Wickham, Hester, and Francois (2018))readxl
: http://readxl.tidyverse.org/ (Wickham and Bryan (2018))haven
: http://haven.tidyverse.org/ (Wickham and Miller (2018))
Also, see the “Data Import Cheat Sheet” on https://www.rstudio.com/resources/cheatsheets/.
5.3.1 References
Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wickham, Hadley, and Jennifer Bryan. 2018. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Wickham, Hadley, and Evan Miller. 2018. Haven: Import and Export ‘Spss’, ‘Stata’ and ‘Sas’ Files. https://CRAN.R-project.org/package=haven.