Chapter 14 Class assignment 3
We’ll now spend the rest of class on a class assignment, with the aim of getting to know more about data transformation, learning to calculate means for visualizations, and using multiple datasets for one graph.
- Load data set & packages
For this visualization we’re using some data from the General Social Survey data (2016). Here you can find more information from this dataset: https://rdrr.io/github/kjhealy/socviz/man/gss_sm.html
library(ggplot2)
library(tidyverse)
library(socviz) # install.packages("socviz") if you haven't done so
data <- gss_sm # gss_sm is a dataset from the package socviz
data
## # A tibble: 2,867 × 32
## year id ballot age childs sibs degree race sex region income16
## <dbl> <dbl> <labelled> <dbl> <dbl> <labe> <fct> <fct> <fct> <fct> <fct>
## 1 2016 1 1 47 3 2 Bache… White Male New E… $170000…
## 2 2016 2 2 61 0 3 High … White Male New E… $50000 …
## 3 2016 3 3 72 2 3 Bache… White Male New E… $75000 …
## 4 2016 4 1 43 4 3 High … White Fema… New E… $170000…
## 5 2016 5 3 55 2 2 Gradu… White Fema… New E… $170000…
## 6 2016 6 2 53 2 2 Junio… White Fema… New E… $60000 …
## 7 2016 7 1 50 2 2 High … White Male New E… $170000…
## 8 2016 8 3 23 3 6 High … Other Fema… Middl… $30000 …
## 9 2016 9 1 45 3 5 High … Black Male Middl… $60000 …
## 10 2016 10 3 71 4 1 Junio… White Male Middl… $60000 …
## # ℹ 2,857 more rows
## # ℹ 21 more variables: relig <fct>, marital <fct>, padeg <fct>, madeg <fct>,
## # partyid <fct>, polviews <fct>, happy <fct>, partners <fct>, grass <fct>,
## # zodiac <fct>, pres12 <labelled>, wtssall <dbl>, income_rc <fct>,
## # agegrp <fct>, ageq <fct>, siblings <fct>, kids <fct>, religion <fct>,
## # bigregion <fct>, partners_rc <fct>, obama <dbl>
- Get a quick overview of the dataset by running:
Create a jitter-plot in which you show the relationship between the number of children somebody has (on the x-axis) and the age of that respondent on the y-axis.
Create a dataframe with the average age for every number of children (i.e., average age for people with no children, average age for people with one child, et cetera). Because there are missing values in the variable
age
, we need to tell ourmean
-function that we want to ignore these missing values by givingna.rm = TRUE
as an extra argument (i.e.mean(age, na.rm = TRUE)
.Add the averages from your dataframe to the graph you created in 3.
Using
case_when()
transform the variablechilds
into another variablechilds_rec
such that the following categories exist: 0, 1, 2, 3, 4, 5+.Recreate the graph in 6, but now with the variable
childs_rec
.Using
case_when()
transform the variablechilds
into another variablechilds_3cat
such that the following categories exist: “no children”, “1 child”, “multiple children”.Recreate the graph in 6, but now with the variable
childs_3cat
.Add errorbars to the graph in 9.