Chapter 24 Advanced ggplot
Here we’ll see some more advanced features of ggplot. It is a pretty random collection of stuff, that also includes features that we can use only if we download other packages.
24.1 Adding layers
Often people build their graph a bit differently, to save intermediate results. So this code:
ggplot(mpg, aes(x=displ, y=cty)) +
geom_point() +
geom_smooth(method="lm", se=FALSE)
Is the same as:
plot <- ggplot(mpg, aes(x=displ, y=cty))
plot2 <- plot + geom_point()
plot3 <- plot2 + geom_smooth(method="lm", se=FALSE)
plot3
24.2 Layers in code
Layers are important in ggplot, so it might be handy to realise that geom_point()
really is a short-hand for the following code:
ggplot(mpg, aes(x=cty, y=hwy)) +
layer(
mapping = NULL,
data = NULL,
geom = "point",
stat = "identity",
position = "identity"
)
or:
ggplot() +
layer(
mapping = aes(x=cty, y=hwy),
data = mpg,
geom = "point",
stat = "identity",
position = "identity"
)
24.3 ..count.. / ..density.. /
We’ve seen the ..something..
before. This signifies variables that are created by some stat
function. Typically more than one option is available.
This:
ggplot(mpg, aes(x=cty)) + geom_histogram(binwidth=1)
is the same as:
ggplot(mpg, aes(x=cty, y=..count..)) + geom_histogram(binwidth=1)
But we could have also used a different variable that was calculated, ..density..
:
ggplot(mpg, aes(x=cty, y=..density..)) + geom_histogram(binwidth=1)
See also the ggplot-cheatsheet
24.4 Different summary-functions
Remember how we used the stat="summary", fun.y="mean"
code to let ggplot do the work for us? As you might have guessed, you can also use different functions:
ggplot(mpg, aes(x=manufacturer, y=cty)) + geom_bar(stat="summary", fun.y="median")
## Warning: Ignoring unknown parameters: fun.y
## No summary function supplied, defaulting to `mean_se()`
The beauty of R is that you can also define your own function. To show a proof of principle, we can create a function that returns the mean, i.e. function(x) {mean(x)}
. We can use this function in our ggplot arguments. This should be identical to fun.y=“mean”.
ggplot(mpg, aes(x=manufacturer, y=cty)) + geom_bar(stat="summary", fun.y=function(x) {mean(x)})
## Warning: Ignoring unknown parameters: fun.y
## No summary function supplied, defaulting to `mean_se()`
It is identical. So now we can also create new functions! Let’s add all the numbers and subtract 13 with the following function: function(x) {sum(x) - 13}
ggplot(mpg, aes(x=manufacturer, y=cty)) + geom_bar(stat="summary", fun.y=function(x) {sum(x) - 13})
## Warning: Ignoring unknown parameters: fun.y
## No summary function supplied, defaulting to `mean_se()`
Not a very useful plot (or function), but it works.
24.5 ggplotgui
I’ve developed a Graphical User Interface (GUI) for ggplot, that might be useful if you quickly want to explore some graph options.
install.packages("ggplotgui")
library("ggplotgui")
ggplot_shiny()
You can also find an online version here: https://site.shinyserver.dck.gmw.rug.nl/ggplotgui/
24.6 Showing all data in facets
One neat little trick is to have the raw data present in each facet. You can do this by using a dataframe that does NOT include the facetting variable. The data = transform(mpg, drv = NULL)
below means for this geom we will use a different dataset, which is the mpg-dataset with the drv-variable set to NULL):
ggplot(mpg, aes(x=displ, y=cty)) +
geom_point(data = transform(mpg, drv = NULL), colour = "grey85") +
geom_point() +
geom_smooth(method="lm", se=FALSE) +
facet_wrap(~drv)
## `geom_smooth()` using formula 'y ~ x'
This also works beautifully for histograms:
ggplot(mpg, aes(x=cty)) +
geom_histogram(data = transform(mpg, manufacturer = NULL), fill="grey85", colour = "grey85", binwidth=1) +
geom_histogram(binwidth=1) +
facet_wrap(~manufacturer)
24.7 Maps
You will need some additional packages to create maps.
library(ggmap)
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
With the function get_map
you can get all sorts of cool maps! Let’s get a map from Groningen from google:
map_gron <- get_map("Groningen", source="google")
We can visualise this map using ggmap
:
ggmap(map_gron)
Cool! Let’s try and get a map from the building we’re in now. I’ve used maps.google.com to find the coordinates of the “Academiegebouw” (53.219245, 6.563051). Let’s use this information to get a map, and let’s zoom in a bit more and visualise the map:
map_uni <- get_map(location = c(6.563051, 53.219245), zoom = 16, source="google")
ggmap(map_uni)
Now let’s visualise the Sociology department at the Faculty of Behavioural and Social Sciences, which is at the coordinates: 53.222266, 6.557550
df_sociology <- data.frame(lon = 6.557550, lat = 53.222266)
ggmap(map_uni) +
geom_label(data = df_sociology,
aes(x=lon, y=lat, label = "Sociology"),fill = "#9933FF", colour = "white", size = 3) +
theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank())
There are also other beautiful maps you can use; how about this (from here):
map_stm <- get_map("netherlands", zoom = 7, maptype="watercolor", source="stamen")
ggmap(map_stm)
24.8 Networks
Networks are gaining in popularity. We need to put a bit more effort in to get a network graph. We will require these packages:
# install.packages("tidygraph")
# install.packages("ggraph")
library(tidygraph)
library(ggraph)
df_edges <- data.frame(from = c("Gert",
"Gert",
"Gert",
"Gert",
"Gert",
"Ben",
"Anne"),
to = c("Anne",
"Ben",
"Winy",
"Vera",
"Laura",
"Winy",
"Vera"))
df_nodes <- data.frame(name = c("Gert","Anne","Ben","Winy","Vera","Laura"),
relation = c("Ego", "Partner", "Family", "Family",
"Friend", "Friend"))
df_network <- tbl_graph(nodes = df_nodes,
edges = df_edges,
directed = FALSE)
df_network
## # A tbl_graph: 6 nodes and 7 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 6 x 2 (active)
## name relation
## <chr> <chr>
## 1 Gert Ego
## 2 Anne Partner
## 3 Ben Family
## 4 Winy Family
## 5 Vera Friend
## 6 Laura Friend
## #
## # Edge Data: 7 x 2
## from to
## <int> <int>
## 1 1 2
## 2 1 3
## 3 1 4
## # … with 4 more rows
Let’s visualise!
ggraph(df_network, layout = "kk") +
geom_node_point() +
geom_edge_link()
Let’s improve
ggraph(df_network, layout = "kk") +
geom_edge_link() +
geom_node_point(aes(colour = relation), size = 13) +
geom_node_text(aes(label = name), colour = "white") +
theme_void() +
scale_color_brewer(palette = "Set2")
24.9 ggridges
Ridge plots, formally known as ‘joyplots’, are pretty awesome
#install.packages("ggridges")
#install.packages("viridis")
library(ggridges)
library(viridis)
## Loading required package: viridisLite
ggplot(lincoln_weather, aes(x = `Mean Temperature [F]`, y = `Month`, fill = ..x..)) +
geom_density_ridges_gradient(scale = 3, rel_min_height = 0.01, gradient_lwd = 1.) +
scale_x_continuous(expand = c(0.01, 0)) +
scale_y_discrete(expand = c(0.01, 0)) +
scale_fill_viridis(name = "Temp. [F]", option = "C") +
labs(title = 'Temperatures in Lincoln NE',
subtitle = 'Mean temperatures (Fahrenheit) by month for 2016\nData: Original CSV from the Weather Underground') +
theme_ridges(font_size = 13, grid = TRUE) + theme(axis.title.y = element_blank())
## Picking joint bandwidth of 3.37
24.10 Animation
#devtools::install_github("dgrtwo/gganimate")
#install.packages("gapminder")
library(gganimate)
library(gapminder)
p <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) +
geom_point() +
scale_x_log10()
gganimate(p)
24.11 Reactive graphs with shiny
Take a look a https://shiny.rstudio.com/! You can quite easily build reactive and interactive graphs in R. [strictly speaking, we’re not talking about ggplot any more]
This is one example: https://site.shinyserver.dck.gmw.rug.nl/ggplotgui/
Also for teaching I have built some apps, e.g.: https://shiny.gmw.rug.nl/Week2/Week2a.html
Others have made much more amazing things: https://www.showmeshiny.com/
24.12 ggplot extensions
Go here and be amazed by the extensions on ggplot (some of which shown above).