1 KEGG Pathway数据库
KEGG pathway的数据是使用KGML
(KEGG Markup Language)格式存储下来的.详细的信息可以阅读官方文档.
https://www.kegg.jp/kegg/xml/docs/
1.1 背景
KEGG pathway有两个类型.
reference pathways: 手动画的.
organism-specific pathways: 使用计算方法自动从reference pathway得到的.
在KGML
文件中,entry
指的是node,而relation
和reaction
指的是整个网络的边.在KEGG pathway中,矩形表示基因(也可以认为是gene的产物蛋白质),而circle代表的是metabolite.
如果网络中只有entry
和relation
,则该网络称为protein network
.如果只有entry
和reaction
,则称为chemical network
.
所以一般来说,可以用两种方式来表示metabolic pathway:
metabolic pathways viewed as both protein networks and chemical networks and
regulatory pathways viewed as protein networks only.
1.2 KGML
的overview
这幅图是整个KGML
的总览.
1.3 如何获得某个pathwayKGML
文件
使用R包KEGGlincs
可以下载pathway的KGML
文件.
library(tidyverse)
library(KEGGREST)
library(KEGGlincs)
###
#获得所有has的pathway ID
path_id <- KEGGREST::keggList(database = "pathway",
organism = "hsa")
head(path_id)
## path:hsa00010
## "Glycolysis / Gluconeogenesis - Homo sapiens (human)"
## path:hsa00020
## "Citrate cycle (TCA cycle) - Homo sapiens (human)"
## path:hsa00030
## "Pentose phosphate pathway - Homo sapiens (human)"
## path:hsa00040
## "Pentose and glucuronate interconversions - Homo sapiens (human)"
## path:hsa00051
## "Fructose and mannose metabolism - Homo sapiens (human)"
## path:hsa00052
## "Galactose metabolism - Homo sapiens (human)"
path_id <-
path_id %>%
names() %>%
unique() %>%
stringr::str_replace("path:", "")
head(path_id)
## [1] "hsa00010" "hsa00020" "hsa00030" "hsa00040" "hsa00051" "hsa00052"
##下载kgml文件并进行解析
temp_kgml <-
KEGGlincs::get_KGML(pathwayid = path_id[1])
class(temp_kgml)
## [1] "KEGGPathway"
## attr(,"package")
## [1] "KEGGgraph"
temp_kgml
## KEGG Pathway
## [ Title ]: Glycolysis / Gluconeogenesis
## [ Name ]: path:hsa00010
## [ Organism ]: hsa
## [ Number ] :00010
## [ Image ] :https://www.kegg.jp/kegg/pathway/hsa/hsa00010.png
## [ Link ] :https://www.kegg.jp/kegg-bin/show_pathway?hsa00010
## ------------------------------------------------------------
## Statistics:
## 102 node(s)
## 89 edge(s)
## 35 reaction(s)
## ------------------------------------------------------------
get_KGML()
函数其实分别使用了KEGGREST
包中的keggGet()
函数来下载的,然后使用KEGGgraph
包中的paraseKGML()
函数来解析.
temp_kgml2 <-
KEGGREST::keggGet(dbentries = path_id[1], option = "kgml")
## No encoding supplied: defaulting to UTF-8.
class(temp_kgml2)
## [1] "character"
temp_kgml2 <-
KEGGgraph::parseKGML(file = temp_kgml2)
temp_kgml2
## KEGG Pathway
## [ Title ]: Glycolysis / Gluconeogenesis
## [ Name ]: path:hsa00010
## [ Organism ]: hsa
## [ Number ] :00010
## [ Image ] :https://www.kegg.jp/kegg/pathway/hsa/hsa00010.png
## [ Link ] :https://www.kegg.jp/kegg-bin/show_pathway?hsa00010
## ------------------------------------------------------------
## Statistics:
## 102 node(s)
## 89 edge(s)
## 35 reaction(s)
## ------------------------------------------------------------
1.4 Pathway
1.4.1 Pathway element
一共有六个属性.
Attribute.name | Explanation |
---|---|
name | Pathway ID |
org | ko/ec/org ID |
number | pathway ID numebr |
title | pathway title |
image | pathway figure link |
link | pathway link |
我们可以下载pathway的图片.
image_link <- slot(slot(temp_kgml, "pathwayInfo"), "image")
image_link
## [1] "https://www.kegg.jp/kegg/pathway/hsa/hsa00010.png"
download.file(url = image_link,
destfile = basename(image_link),
mode = "wb")
1.5 Entry(条目)
1.5.1 Entry element
包括了网络中的node的属性.属性包括以下内容.
Attribute.name | Explanation |
---|---|
id | ID of the entry |
name | KEGG ID of the entry |
type | Type of the entry |
link | Link of the entry |
reaction | The KEGG ID of corresponding reaction |
#整体信息
temp_kgml@pathwayInfo
## [ Title ]: Glycolysis / Gluconeogenesis
## [ Name ]: path:hsa00010
## [ Organism ]: hsa
## [ Number ] :00010
## [ Image ] :https://www.kegg.jp/kegg/pathway/hsa/hsa00010.png
## [ Link ] :https://www.kegg.jp/kegg-bin/show_pathway?hsa00010
#分别的属性
temp_kgml@pathwayInfo@name
## [1] "path:hsa00010"
temp_kgml@pathwayInfo@org
## [1] "hsa"
temp_kgml@pathwayInfo@number
## [1] "00010"
temp_kgml@pathwayInfo@title
## [1] "Glycolysis / Gluconeogenesis"
temp_kgml@pathwayInfo@image
## [1] "https://www.kegg.jp/kegg/pathway/hsa/hsa00010.png"
temp_kgml@pathwayInfo@link
## [1] "https://www.kegg.jp/kegg-bin/show_pathway?hsa00010"
图片下载链接:
pathway详细信息链接:
我们下面看看pathwa中node的信息.
temp_kgml
## KEGG Pathway
## [ Title ]: Glycolysis / Gluconeogenesis
## [ Name ]: path:hsa00010
## [ Organism ]: hsa
## [ Number ] :00010
## [ Image ] :https://www.kegg.jp/kegg/pathway/hsa/hsa00010.png
## [ Link ] :https://www.kegg.jp/kegg-bin/show_pathway?hsa00010
## ------------------------------------------------------------
## Statistics:
## 102 node(s)
## 89 edge(s)
## 35 reaction(s)
## ------------------------------------------------------------
可以看到这个pathway含有node,edge还有reaction.
# node
temp_kgml@nodes$`18`
## KEGG Node (Entry '18'):
## ------------------------------------------------------------
## [ displayName ]: ALDOA, ALDA, GSD12, HEL-S-87p...
## [ Name ]: hsa:226,hsa:229,hsa:230
## [ Type ]: gene
## [ Link ]: https://www.kegg.jp/dbget-bin/www_bget?hsa:226+hsa:229+hsa:230
## ------------------------------------------------------------
#edge
temp_kgml@edges[[1]]
## KEGG Edge (Type: ECrel):
## ------------------------------------------------------------
## [ Entry 1 ID ]: 73
## [ Entry 2 ID ]: 75
## [ Subtype ]:
## [ Subtype name ]: compound
## [ Subtype value ]: 90
## ------------------------------------------------------------
#reaction
temp_kgml@reactions[[1]]
## KEGG Reaction(rn:R00710)
## ------------------------------------------------------------
## [ Name ]: rn:R00710
## [ Type ]: reversible
## [ Substrate Name ]: cpd:C00084
## [ Product Name ]: cpd:C00033
1.5.1.1 Type属性的详细解释
在pathway中,如果一个pathway链接到另外一个pathway,那么pathway也是会被作为一个node展示出来.
Attribute.name | Explanation |
---|---|
ortholog | the node is a KO (ortholog group) |
enzyme | the node is an enzyme |
reaction | the node is a reaction |
gene | the node is a gene product (mostly a protein) |
group | the node is a complex of gene products (mostly a protein complex) |
compound | the node is a chemical compound (including a glycan) |
map | the node is a linked pathway map |
brite | the node is a linked brite hierarchy |
other | the node is an unclassified type |
1.6 Relation
1.6.1 relation元素
这里面的relation不包括我们所知道的酶催化代谢物的联系.只包括两个proteins(gene products)或者两个KOs(ortholog groups)或者protein和compound
Attribute.name | Explanation |
---|---|
entry1 | the first (from) entry that defines this relation |
entry2 | the second (to) entry that defines this relation |
type | the type of this relation |
1.6.1.1 Type属性
relation的type一共有5个,分别是
ECrel: enzyme-enzyme relation, indicating two enzymes catalyzing successive reaction steps.代表的是两个酶之间的联系,比如enzyme1-metabolite1-enzyme2-metabolite2.那么enzyme1-enzyme2之间的关系realtion就是
ECrel
.PPrel: protein-protein interaction, such as binding and modification.
GErel: gene expression interaction, indicating relation of transcription factor and target gene product.
PCrel: protein-compound interaction,注意这个关系不是enzyme催化的关系.
maplink: 跟另外一个pathway的联系.
1.6.1.2 subtype element
每个relatio还有一个subtype属性.这个属性用来对relation做进一步的说明.
- name: interation/relation的信息.
包含以下几种信息.
其中compound
是指这个relation其实是两个enzyme.
temp_kgml@edges[[1]]@entry1ID
## [1] "73"
temp_kgml@edges[[1]]@entry2ID
## [1] "75"
temp_kgml@edges[[1]]@type
## [1] "ECrel"
temp_kgml@edges[[1]]@subtype
## $subtype
## [ Subtype name ]: compound
## [ Subtype value ]: 90
这个relation就是两个enzyme连接起来的.
1.7 Reaction反应
reaction和relation很类似.不一样的是type属性.
type属性包括reversible
和irreversible
.分别表示可逆反应和不可逆反映.
temp_kgml@reactions[[1]]@name
## [1] "rn:R00710"
temp_kgml@reactions[[1]]@type
## [1] "reversible"
temp_kgml@reactions[[1]]@substrateName
## [1] "cpd:C00084"
temp_kgml@reactions[[1]]@productName
## [1] "cpd:C00033"