Chapter 14 Class assignment 3

We’ll now spend the rest of class on a class assignment, with the aim of getting to know more about data transformation, learning to calculate means for visualizations, and using multiple datasets for one graph.

  1. Load data set & packages
    For this visualization we’re using some data from the General Social Survey data (2016). Here you can find more information from this dataset: https://rdrr.io/github/kjhealy/socviz/man/gss_sm.html
library(ggplot2)
library(tidyverse)
library(socviz) # install.packages("socviz") if you haven't done so
data <- gss_sm # gss_sm is a dataset from the package socviz
data
## # A tibble: 2,867 × 32
##     year    id ballot       age childs sibs   degree race  sex   region income16
##    <dbl> <dbl> <labelled> <dbl>  <dbl> <labe> <fct>  <fct> <fct> <fct>  <fct>   
##  1  2016     1 1             47      3 2      Bache… White Male  New E… $170000…
##  2  2016     2 2             61      0 3      High … White Male  New E… $50000 …
##  3  2016     3 3             72      2 3      Bache… White Male  New E… $75000 …
##  4  2016     4 1             43      4 3      High … White Fema… New E… $170000…
##  5  2016     5 3             55      2 2      Gradu… White Fema… New E… $170000…
##  6  2016     6 2             53      2 2      Junio… White Fema… New E… $60000 …
##  7  2016     7 1             50      2 2      High … White Male  New E… $170000…
##  8  2016     8 3             23      3 6      High … Other Fema… Middl… $30000 …
##  9  2016     9 1             45      3 5      High … Black Male  Middl… $60000 …
## 10  2016    10 3             71      4 1      Junio… White Male  Middl… $60000 …
## # ℹ 2,857 more rows
## # ℹ 21 more variables: relig <fct>, marital <fct>, padeg <fct>, madeg <fct>,
## #   partyid <fct>, polviews <fct>, happy <fct>, partners <fct>, grass <fct>,
## #   zodiac <fct>, pres12 <labelled>, wtssall <dbl>, income_rc <fct>,
## #   agegrp <fct>, ageq <fct>, siblings <fct>, kids <fct>, religion <fct>,
## #   bigregion <fct>, partners_rc <fct>, obama <dbl>
  1. Get a quick overview of the dataset by running:
head(data) # print first couple of rows of dataset
summary(data)
  1. Create a jitter-plot in which you show the relationship between the number of children somebody has (on the x-axis) and the age of that respondent on the y-axis.

  2. Create a dataframe with the average age for every number of children (i.e., average age for people with no children, average age for people with one child, et cetera). Because there are missing values in the variable age, we need to tell our mean-function that we want to ignore these missing values by giving na.rm = TRUE as an extra argument (i.e. mean(age, na.rm = TRUE).

  3. Add the averages from your dataframe to the graph you created in 3.

  4. Using case_when() transform the variable childs into another variable childs_rec such that the following categories exist: 0, 1, 2, 3, 4, 5+.

  5. Recreate the graph in 6, but now with the variable childs_rec.

  6. Using case_when() transform the variable childs into another variable childs_3cat such that the following categories exist: “no children”, “1 child”, “multiple children”.

  7. Recreate the graph in 6, but now with the variable childs_3cat.

  8. Add errorbars to the graph in 9.