Chapter 4 networks are special
A visualization of a network is mostly not overly complicated–typically some circles on an xy-plane with some lines between those circles. Network data, though, is rather curious. This is because a network is typically described by two different bodies of information:
Information on the nodes/vertices/leafs. An easy example is a dataset with the names of individuals and some characteristics of these individuals (e.g., age, sex)
Information on the edges/links/ties. This could be the existence of a tie (A and B know one another), the existence of a directional tie (A considers B a friend), or the strength of that tie (A is “very close” to B). To complicate things, different network data structure/packages favour different kinds of ways to store information on edges. There is for instance the “edge list”, or the “adjacency matrix”. We’ll see examples of each. A network is hardly a network without information on ties, whereas attributes on nodes (except for their “identity”) are optional.
4.1 Visualizing from an edge list
4.1.1 The simplest case
Let’s create an edgelist. We’ll create a dataset called df_edges
[this name is arbitrary].
<- data.frame(
df_edges from = c("Gert","Gert","Gert","Gert","Gert","Ben","Anne"),
to = c("Anne","Ben","Winy","Vera","Laura","Winy","Vera")
)
If you run this code, you’ll see something appear in your “global environment” [top right corner]. There is an object called df_edges
, which is a dataset. Let’s turn it into a network object via the package tidygraph
.
# install.packages("tidygraph")
library(tidygraph)
# Create network object from dataframe with edges
<- as_tbl_graph(df_edges)
nw_obj
nw_obj
## # A tbl_graph: 6 nodes and 7 edges
## #
## # A directed acyclic simple graph with 1 component
## #
## # Node Data: 6 × 1 (active)
## name
## <chr>
## 1 Gert
## 2 Ben
## 3 Anne
## 4 Winy
## 5 Vera
## 6 Laura
## #
## # Edge Data: 7 × 2
## from to
## <int> <int>
## 1 1 3
## 2 1 2
## 3 1 4
## # … with 4 more rows
The creation of the network object was a success. We see that the nw_obj
(of class tbl_graph
) consists of Node Data
and Edge Data
. So it has created two datasets from our one dataset. Let’s explore both. Because the nw_obj
consists of two datasets, we need a way to “tell” the object which dataset we’re interested in if we want to see/change it. We can do this with activate
and then specify nodes
or edges
.
# The `%>%` is called "the pipe" and you can read it as "and then do" or
# "send this thing on the left hand side to the function on the right hand side"
%>% activate(nodes) nw_obj
## # A tbl_graph: 6 nodes and 7 edges
## #
## # A directed acyclic simple graph with 1 component
## #
## # Node Data: 6 × 1 (active)
## name
## <chr>
## 1 Gert
## 2 Ben
## 3 Anne
## 4 Winy
## 5 Vera
## 6 Laura
## #
## # Edge Data: 7 × 2
## from to
## <int> <int>
## 1 1 3
## 2 1 2
## 3 1 4
## # … with 4 more rows
%>% activate(edges) nw_obj
## # A tbl_graph: 6 nodes and 7 edges
## #
## # A directed acyclic simple graph with 1 component
## #
## # Edge Data: 7 × 2 (active)
## from to
## <int> <int>
## 1 1 3
## 2 1 2
## 3 1 4
## 4 1 5
## 5 1 6
## 6 2 4
## # … with 1 more row
## #
## # Node Data: 6 × 1
## name
## <chr>
## 1 Gert
## 2 Ben
## 3 Anne
## # … with 3 more rows
The network object exists, let’s try and visualize via the tidygraph
package. The main function to use is ggraph
, where the network object is the first “argument”.
# install.packages("ggraph")
library(ggraph)
ggraph(nw_obj)
SUCCES! Sort of. Something happened. Apparently with the sugiyama
layout, but we see little evidence of a layout. That’s because we haven’t drawn any edges/ties. We have only defined an empty canvas to draw on!
ggraph(nw_obj) +
geom_edge_link()
## Using `sugiyama` as default layout
Looks more like a network! Let’s draw some nodes.
ggraph(nw_obj) +
geom_edge_link() +
geom_node_point()
## Using `sugiyama` as default layout
Let’s remove that grey background which is part of the theme, by including a theme called theme_graph
.
ggraph(nw_obj) +
geom_edge_link() +
geom_node_point() +
theme_graph()
## Using `sugiyama` as default layout
4.2 A slightly more difficult edge list
Let’s have data from a slightly more complicated edge list. Where we also have a variable on tie strength.
<- data.frame(
df_edges2 closeness = c("very close", "close", "close", "close",
"somewhat close", "very close", "very close"),
person1 = c("Gert","Gert","Gert","Gert","Gert","Ben","Anne"),
person2 = c("Anne","Ben","Winy","Vera","Laura","Winy","Vera")
)
# Create network object from dataframe with edges
<- as_tbl_graph(df_edges2)
nw_obj2
nw_obj2
## # A tbl_graph: 6 nodes and 7 edges
## #
## # A directed acyclic multigraph with 1 component
## #
## # Node Data: 6 × 1 (active)
## name
## <chr>
## 1 very close
## 2 close
## 3 somewhat close
## 4 Gert
## 5 Ben
## 6 Anne
## #
## # Edge Data: 7 × 3
## from to person2
## <int> <int> <chr>
## 1 1 4 Anne
## 2 2 4 Ben
## 3 2 4 Winy
## # … with 4 more rows
That’s not quite right.
To learn what went wrong, we’d have to dive into the helpfunction of as_tbl_graph
:
?as_tbl_graph
The relevant bit is under edges
:
The terminal nodes of each edge must either be encoded in a to and from column, or in the two first columns, as integers. These integers refer to nodes index.
So we can fix this in two ways.
- By having a column named
to
andfrom
. Let’s try and rename the columns
<- df_edges2 %>%
df_edges2a rename(
from = "person1",
to = "person2"
) df_edges2a
## closeness from to
## 1 very close Gert Anne
## 2 close Gert Ben
## 3 close Gert Winy
## 4 close Gert Vera
## 5 somewhat close Gert Laura
## 6 very close Ben Winy
## 7 very close Anne Vera
<- as_tbl_graph(df_edges2a)
nw_obj2a
nw_obj2a
## # A tbl_graph: 6 nodes and 7 edges
## #
## # A directed acyclic simple graph with 1 component
## #
## # Node Data: 6 × 1 (active)
## name
## <chr>
## 1 Gert
## 2 Ben
## 3 Anne
## 4 Winy
## 5 Vera
## 6 Laura
## #
## # Edge Data: 7 × 3
## from to closeness
## <int> <int> <chr>
## 1 1 3 very close
## 2 1 2 close
## 3 1 4 close
## # … with 4 more rows
Success. The other method involves rearranging the columns, where the first two columns will be used.
<- df_edges2 %>%
df_edges2b select(person1, person2, closeness) # rearrange order variables
df_edges2b
## person1 person2 closeness
## 1 Gert Anne very close
## 2 Gert Ben close
## 3 Gert Winy close
## 4 Gert Vera close
## 5 Gert Laura somewhat close
## 6 Ben Winy very close
## 7 Anne Vera very close
<- as_tbl_graph(df_edges2b)
nw_obj2b
nw_obj2b
## # A tbl_graph: 6 nodes and 7 edges
## #
## # A directed acyclic simple graph with 1 component
## #
## # Node Data: 6 × 1 (active)
## name
## <chr>
## 1 Gert
## 2 Ben
## 3 Anne
## 4 Winy
## 5 Vera
## 6 Laura
## #
## # Edge Data: 7 × 3
## from to closeness
## <int> <int> <chr>
## 1 1 3 very close
## 2 1 2 close
## 3 1 4 close
## # … with 4 more rows
Yet another success.
Let’s see if we can use that information from the closeness
variable.
ggraph(nw_obj2a) +
geom_edge_link(aes(colour = closeness)) +
geom_node_point() +
theme_graph()
## Using `sugiyama` as default layout
In the above code aes
refers to “aesthetics”. Within those brackets you can “map” variables to “aesthetics”. Here we “map” the variable closeness
to colour. We could have also done edge_width
(or edge_alpha
, or edge_linetype
or …, see section “Edges”):
ggraph(nw_obj2a) +
geom_edge_link(aes(edge_width = closeness)) +
geom_node_point() +
theme_graph()
## Using `sugiyama` as default layout
Well, that’s no improvement!
4.3 An adjacancy matrix
It works more or less similar when providing adjacency matrices. Let’s create an undirected one:
<- matrix(c(1, 0, 1,
adj1 0, 1, 1,
1, 1, 1), nrow = 3, ncol = 3)
<- as_tbl_graph(adj1, directed = FALSE)
nw_obj3
nw_obj3
## # A tbl_graph: 3 nodes and 5 edges
## #
## # An undirected multigraph with 1 component
## #
## # Node Data: 3 × 0 (active)
## #
## # Edge Data: 5 × 3
## from to weight
## <int> <int> <dbl>
## 1 1 1 1
## 2 1 3 1
## 3 2 2 1
## # … with 2 more rows
ggraph(nw_obj3) +
geom_edge_link() +
geom_node_point() +
theme_graph()
## Using `stress` as default layout
Let’s create a directed adjacency matrix, with tie strength information.
<- matrix(c(1, 2, 3,
adj2 0, 1, 4,
0, 4, 1), nrow = 3, ncol = 3)
<- as_tbl_graph(adj2)
nw_obj4
nw_obj4
## # A tbl_graph: 3 nodes and 7 edges
## #
## # A directed multigraph with 1 component
## #
## # Node Data: 3 × 0 (active)
## #
## # Edge Data: 7 × 3
## from to weight
## <int> <int> <dbl>
## 1 1 1 1
## 2 2 1 2
## 3 2 2 1
## # … with 4 more rows
ggraph(nw_obj4) +
geom_edge_link() +
geom_node_point() +
theme_graph()
## Using `stress` as default layout
Let’s do something more radical:
ggraph(nw_obj4) +
geom_edge_arc(
aes(colour = factor(weight)),
arrow = arrow(length = unit(4, 'mm')),
end_cap = circle(3, 'mm')
+
) geom_node_point() +
theme_graph()
## Using `stress` as default layout
4.4 Combining data on note attributes and an edge list
<- data.frame(
df_edges from = c("Gert","Gert","Gert","Gert","Gert","Ben","Anne"),
to = c("Anne","Ben","Winy","Vera","Laura","Winy","Vera"),
closeness = c("very close", "close", "close", "close", "somewhat close",
"very close", "very close")
)
<- data.frame(
df_nodes name = c("Gert", "Anne", "Ben", "Winy", "Vera", "Laura"),
relation = c("Ego", "Partner", "Family", "Family", "Friend", "Friend")
)
<- tbl_graph(nodes = df_nodes, edges = df_edges, directed = FALSE)
df_network
df_network
## # A tbl_graph: 6 nodes and 7 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 6 × 2 (active)
## name relation
## <chr> <chr>
## 1 Gert Ego
## 2 Anne Partner
## 3 Ben Family
## 4 Winy Family
## 5 Vera Friend
## 6 Laura Friend
## #
## # Edge Data: 7 × 3
## from to closeness
## <int> <int> <chr>
## 1 1 2 very close
## 2 1 3 close
## 3 1 4 close
## # … with 4 more rows
The simplest one possible:
ggraph(df_network, layout = "kk") +
geom_node_point() +
geom_edge_link() +
theme_graph()
ggraph(df_network, layout = "kk") +
geom_edge_link(aes(linetype = closeness)) +
geom_node_point(aes(colour = relation), size = 13) +
geom_node_text(aes(label = name), colour = "white") +
theme_void() +
scale_color_brewer(palette = "Set2")