ggraph画网络图

ggraph是Thomas Lin Pedersen开发的网络可视化的包.官方文档在这:https://ggraph.data-imaginist.com/index.html.

它和igraph不同,igraph虽然也有网络可视化,但是更多的还是用于网络分析,可视化并不是太友好.

大致记录一下如何使用它来进行网络图的构建.

1 安装

需要安装两个包.

install.packages("ggraph")
install.packages("tidygraph")
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.1
## -- Attaching packages ------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.1
## Warning: package 'tibble' was built under R version 3.6.1
## Warning: package 'tidyr' was built under R version 3.6.1
## Warning: package 'dplyr' was built under R version 3.6.1
## -- Conflicts ---------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(tidygraph)
## Warning: package 'tidygraph' was built under R version 3.6.1
## 
## Attaching package: 'tidygraph'
## The following object is masked from 'package:stats':
## 
##     filter
library(ggraph)
## Warning: package 'ggraph' was built under R version 3.6.1

2 构建所需要的数据

最为简单的办法是使用数据库格式的数据,而且前两行分别为fromto.我们构建一个示例数据:

2.1 构建edges数据

set.seed(0)
edges <- data.frame(from = sample(1:15, 80, replace = TRUE), 
                 to = sample(1:15, 80, replace = TRUE), 
                 stringsAsFactors = FALSE) %>% 
  distinct()

这是最简单的一个edge信息,每一行就是一个edge信息,当然,我们也可以给每一条边都加上属性信息.

edges <- data.frame(edges, 
                 edge.width = rnorm(n = nrow(edges), mean = 1, sd = 0.5), 
                 edge.colour = rnorm(n = nrow(edges), mean = 0, sd = 0.5),
                 stringsAsFactors = FALSE)
edges %>% head
##   from to edge.width  edge.colour
## 1   14  6  0.2382166  0.225093551
## 2    9  8  1.2969731 -0.009279916
## 3    4  7  1.1664752 -0.159034187
## 4    7 11  1.5315499 -0.464681074
## 5    1  1  0.8479080 -0.743730155
## 6    2  4  1.1850094 -0.537596148

2.2 然后构建node的数据

node <- unique(c(edges$from, edges$to)) %>% sort()
node
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
nodes <- data.frame(node, node.size = rnorm(n = length(node), mean = 1, sd = 0.5),
                    node.colour = sample(c("Class A", "Class B"), length(node), replace = TRUE), 
                    stringsAsFactors = FALSE)
nodes %>% head
##   node node.size node.colour
## 1    1 0.3733551     Class A
## 2    2 1.3211207     Class B
## 3    3 0.9776454     Class A
## 4    4 0.1333908     Class B
## 5    5 1.0010659     Class B
## 6    6 0.6848498     Class A

2.3 构建ggraph所需的数据

得到edges和nodes之后,需要将其转为ggraph所需要的格式.

graph_data <- tidygraph::tbl_graph(nodes = nodes, edges = edges, 
                                   directed = FALSE)
graph_data
## # A tbl_graph: 15 nodes and 67 edges
## #
## # An undirected multigraph with 1 component
## #
## # Node Data: 15 x 3 (active)
##    node node.size node.colour
##   <int>     <dbl> <chr>      
## 1     1     0.373 Class A    
## 2     2     1.32  Class B    
## 3     3     0.978 Class A    
## 4     4     0.133 Class B    
## 5     5     1.00  Class B    
## 6     6     0.685 Class A    
## # ... with 9 more rows
## #
## # Edge Data: 67 x 4
##    from    to edge.width edge.colour
##   <int> <int>      <dbl>       <dbl>
## 1     6    14      0.238     0.225  
## 2     8     9      1.30     -0.00928
## 3     4     7      1.17     -0.159  
## # ... with 64 more rows

3 画图

3.1 基础绘图

拿到所需数据之后,可以开始画图了,跟ggplot2一样,也是图层的画法,一层层进行叠加.我们先看一个简单的例子.

ggraph(graph = graph_data) +
  geom_edge_fan() +
  geom_node_point()
## Using `stress` as default layout

首先可以看到,需要使用ggraph启动一个图像,然后必须的两个geom分别是geom_edge_xxgeom_node_xxx分别用来定义边和node.他们的使用办法跟ggplot2非常类似,参数也都很类似,只是加上了edgenode标签.

我们下面接着对图片进行美化.

plot <- 
ggraph(graph = graph_data, layout = "linear", circular = TRUE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_void()
plot

从上面例子可以看到,对于edge的属性的设置,需要使用scale_edge_xxx系列函数,而对于node,则直接使用原来的ggplot2scale_xxx系列函数就可以了.

3.2 添加文字

添加文字可以使用geom_node_text()geom_node_label()函数.

对于layout为圆形网状来说,我们需要将node的角度进行一定程度的调整.

node_name = paste("Node", node, sep = "_")
node_name
##  [1] "Node_1"  "Node_2"  "Node_3"  "Node_4"  "Node_5"  "Node_6"  "Node_7" 
##  [8] "Node_8"  "Node_9"  "Node_10" "Node_11" "Node_12" "Node_13" "Node_14"
## [15] "Node_15"
angle <- 360 * (c(1:length(node_name)) - 0.5)/length(node_name)
hjust <- ifelse(angle > 180, 1, 0)
angle <- ifelse(angle > 180, 90 - angle + 180, 90 - angle)

然后添加node文字.

plot +
  geom_node_text(aes(x = x * 1.05,
                     y = y * 1.15,
                     label = node_name), 
                 angle = angle, 
                 hjust = hjust,
                 colour = "black",
                 size = 3.5)

可以看到有些文字跑到绘图区域外面了,这时候需要将坐标轴进行扩展就行了.

plot +
  geom_node_text(aes(x = x * 1.05,
                     y = y * 1.15,
                     label = node_name), 
                 angle = angle, 
                 hjust = hjust,
                 colour = "black",
                 size = 3.5) +
  expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5))

我们可以看看原来的坐标轴是什么样子的.

ggraph(graph = graph_data, layout = "linear", circular = TRUE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_bw()

我们将theme设置为theme_bw()就可以清楚的看到原来的坐标体系了.

可以看到legend的顺序有点乱.可以在guides()函数中设置.

ggraph(graph = graph_data, layout = "linear", circular = TRUE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
  scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  guides(colour = guide_legend(order = 1),
         size = guide_legend(order = 2),
         colour = guide_edge_colourbar(order = 3)) +
  theme_void()
## Warning: Duplicated aesthetics after name standardisation: colour

添加文字的时候,文字之间,以及文字node之间,会出现覆盖的问题.如下图所示:

ggraph(graph = graph_data, layout = "auto", circular = TRUE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_void() +
      geom_node_text(aes(x = x,
                     y = y,
                     label = node_name,
                     colour = node.colour), 
                 size = 3.5) 
## Using `stress` as default layout

ggplot2中,我们使用ggrepel包可以解决这个问题.在这里,我们可以设置repel为TRUE.

ggraph(graph = graph_data, layout = "auto", circular = TRUE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_void() +
      geom_node_text(aes(label = node_name,
                     colour = node.colour), 
                 size = 3.5, repel = TRUE) 
## Using `stress` as default layout

当然,也可以使用geom_node_label()来标注.

ggraph(graph = graph_data, layout = "auto", circular = TRUE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_void() +
      geom_node_label(aes(label = node_name,
                     colour = node.colour), 
                 size = 3.5, repel = TRUE) 
## Using `stress` as default layout

3.3 使用不同的layout

对网络来说,可以使用不同的layout,layout既可以通过再ggraph中通过设置layout参数实现,也可以通过将graph直接赋予layout属性实现.

ggraph(graph = graph_data, layout = "auto", circular = TRUE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_void()
## Using `stress` as default layout

ggraph(graph = graph_data, layout = "linear", circular = FALSE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
      geom_node_text(aes(colour = node.colour),
                     hjust = 1,
                     angle = 65,
                     nudge_y = -0.3,
                     label = node_name, 
                 size = 3.5) +
  expand_limits(x = c(-1.5, 1.5), y = c(-1.5, 1.5)) +
  theme_void()

ggraph(graph = graph_data, layout = "eigen", circular = FALSE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_void() +
    geom_node_text(aes(x = x,
                     y = y,
                     label = node_name,
                     colour = node.colour), 
                 size = 3.5) 

3.4 不同的连接线

上面的例子node之间的连接都是曲线(geom_edge_arc),当然我们也可以使用不同的连接方式,比如直线,这时候需要使用不同的geom_edge_xxx()函数.

比如直线可以使用geom_edge_link(),有三个不同的函数,暂时没有仔细看差别,详细差别可以使用:?get_edges查看.

ggraph(graph = graph_data, layout = "auto", circular = TRUE) +
  geom_edge_link(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_void()
## Using `stress` as default layout

ggraph(graph = graph_data, layout = "auto", circular = TRUE) +
  geom_edge_link2(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_void()
## Using `stress` as default layout

ggraph(graph = graph_data, layout = "auto", circular = TRUE) +
  geom_edge_link0(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_void()
## Using `stress` as default layout

画有一定弯度的edge可以使用geom_edge_fan()函数.

ggraph(graph = graph_data, layout = "auto", circular = TRUE) +
  geom_edge_fan(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  theme_void()
## Using `stress` as default layout

3.5 使用不同的主题

可以使用暗黑主题,像图片显示的更炫酷一些.

ggraph(graph = graph_data, layout = "auto", circular = TRUE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
scale_edge_colour_gradient2(low = "#155F83FF", mid = "white", high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2)) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  ggdark::dark_theme_void() +
      geom_node_text(aes(label = node_name,
                     colour = node.colour), 
                 size = 3.5, repel = TRUE) 
## Using `stress` as default layout
## Inverted geom defaults of fill and color/colour.
## To change them back, use invert_geom_defaults().

因为leged的某些显示颜色为黑色,而主题没有将其修改过来,可以手动进行修改.

ggraph(graph = graph_data, layout = "auto", circular = TRUE) +
  geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
  scale_edge_colour_gradient2(low = "#155F83FF", mid = "white",
                              high = "#800000FF") +
  scale_edge_width_continuous(range = c(0.2,2), 
                              guide = guide_legend(override.aes = list(colour = "white", alpha =1))) +
  guides(colour = guide_legend(override.aes = list(size = 5))) +
  geom_node_point(aes(colour = node.colour, size = node.size)) +
  scale_size_continuous(range = c(5,10)) +
  scale_colour_manual(values = c("Class A" = "#8A9045FF", "Class B" = "#155F83FF")) +
  ggdark::dark_theme_void() +
      geom_node_text(aes(label = node_name,
                     colour = node.colour), 
                 size = 3.5, repel = TRUE) 
## Using `stress` as default layout

Avatar
申小涛 博士
博士后研究员

代谢组学,多组学,生物信息,健康

相关

下一页
上一页
comments powered by Disqus