# Chapter 3 ggplot - some theory

The “gg” in ggplot stands for the “grammar of graphics” developed by Leland Wilkinson , and describes the “deep features that underlie all statistical graphics” . In essence, it’s a way of thinking about how to create graphs. This all sounds a bit esoteric, so let’s try and be a bit more specific.

Wickham writes:

In brief, the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system.

Every graph consists of data within a coordinate system with the data being represented by geometric objects (or geoms), like points, lines, or bars. The data that you want to visualize are mapped to aesthetic attributes, like shape, colour, location.

The ggplot-cheatsheet is tremendously helpful in representing this process:

Now let’s revisit one of our earlier graphs:

``````ggplot(mpg, aes(x = cty, y = hwy, colour = drv)) +
geom_point() ``````

The data here is “mpg”. The data will be represented by points (the geom). We have mapped the variables in the data to visual properties of the geom, the aesthetics; in this case the x- and y-location (based on variables `cty` and `hwy`) and a colour (based on the variable `drv`). We haven’t specified a coordinate system, so ggplot takes the default cartesian coordinate system (but other options are available!).

Kieran Healy’s recent (freely available) book “Data Visualization for Social Science” also provides a good scheme of the process:

For a quick overview of ggplot, the chapter on visualization in “R for data science” (Garrett Grolemund (2017)) is also excellent.

## 3.1 Layers

ggplot works with layers, meaning that additional visualizations (or representations of data elements) can be added to a plot. We already saw an example of a plot with two layers:

``````ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point() +
geom_smooth() ``````
``## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'``

Importantly, each layer can have a different aesthetic mapping, and can be based on different datasets. This allows for the combinations of visualizations from multiple data sources, allowing near-infinite possibilities.