Chapter 22 Class assignment 5
Let’s create a visualization of the association between two categorical variables. Along the way we’ll change some colours, annotate some plots, and combine multiple graphs into one.
- Load data set & packages For this visualization we’re using some data from the General Social Survey data (2016). Here you can find more information from this dataset: https://rdrr.io/github/kjhealy/socviz/man/gss_sm.html
library(tidyverse)
library(patchwork)
library(socviz) # install.packages("socviz") if you haven't done so
data <- gss_sm # gss_sm is a dataset from the package socviz
- Get a quick overview of the dataset by running:
## # A tibble: 6 × 32
## year id ballot age childs sibs degree race sex region income16 relig
## <dbl> <dbl> <labe> <dbl> <dbl> <lab> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 2016 1 1 47 3 2 Bache… White Male New E… $170000… None
## 2 2016 2 2 61 0 3 High … White Male New E… $50000 … None
## 3 2016 3 3 72 2 3 Bache… White Male New E… $75000 … Cath…
## 4 2016 4 1 43 4 3 High … White Fema… New E… $170000… Cath…
## 5 2016 5 3 55 2 2 Gradu… White Fema… New E… $170000… None
## 6 2016 6 2 53 2 2 Junio… White Fema… New E… $60000 … None
## # ℹ 20 more variables: marital <fct>, padeg <fct>, madeg <fct>, partyid <fct>,
## # polviews <fct>, happy <fct>, partners <fct>, grass <fct>, zodiac <fct>,
## # pres12 <labelled>, wtssall <dbl>, income_rc <fct>, agegrp <fct>,
## # ageq <fct>, siblings <fct>, kids <fct>, religion <fct>, bigregion <fct>,
## # partners_rc <fct>, obama <dbl>
We are going to examine the relationship between one’s degree (degree
) and one’s marital status (marital
)
The variable
marital
consists of five categories (“Married”, “Widowed” “Divorced”, “Separated”, “Never Married”). Create a new variable in whichmarital
is recoded into three categories: “Married”, “Ever Married”, and “Never Married”Let’s see if education is related to marital status. Make a stacked bar chart with education on the x-axis.
There are certainly some things that we can improve. Amongst others: the y-axis-label is not correct, x-axis-labels are overlapping, a scale from never married to ever married to married would make more sense, sample sizes for each group would be great, and there are NAs for both degree and marital status. Let’s start with the last.
Remove the
NA
s for bothdegree
andmarit_3
, but count how many cases we lose when doing so.Calculate sample sizes for each category of
degree
.Recreate the graph with the cleaned dataset and add the sample sizes to each bar.
Change the order of the bars going from “Never Married” to “Married”.
Create a sensible y-axis label and legend-title.
Save your graph to an object so you can call upon it later.
Create the same graph bar chart, but now for the entire group. Thus, get a bar chart representing marital status, but don’t split up the data for education. Save your graph to an object so you can call upon it later.
Use patchwork to combine your plots
Try to collapse the legend, and adjust dimensions to create a sensible overall graph.