Biological pathway annotation

Pathways are biological networks defining how biomolecules cooperate to accomplish cellular tasks in different conditions. Pathways are assembled from physically interacting molecules such as proteins and thus, it is particularly important to the annotation of those proteins in the pathway.

Breaking: Thirty pathway academics databases are summarized in this document.


1.3.1 KEGG

Introduction: KEGG PATHWAY is a collection of manually drawn pathway maps representing the current knowledge of the molecular interaction, reaction and relation networks for: 1). Metabolism 2). Genetic Information Processing 3). Environmental Information Processing 4). Cellular Processes 5). Organismal Systems 6). Human Diseases 7). Drug Development.

Type Number
species 8504(813 eukaryotes,7291 bacteria, 400 archaea)
pathway 559 (979,624 entries)
orthology 25,479 entries
genome 20,348 entries
genes 44,226,835 entries
compound 19,007 entries
glycan 11,104 entries
reaction 11,841 entries
rclass 3,180 entries
enzyme 8,012 entries
network 1,427 entries
variant 738 entries
disease 2,599 entries
drug 11,993 entries
Database information
Release 07, Nov 22

Web server: KEGG. KEGG


1.3.2 Pathway ontology

Introduction: The goal of the Pathway Ontology is to cover all types of biological pathways, including altered and disease pathways (2643 pathways), and to capture the relationships between them within the hierarchical structure of a Directed Acyclic Graph (DAG). The five nodes of the ontology are: classic metabolic, regulatory, signaling, drug and disease pathways.

Web server: Pathway ontology. Pathwayontology


1.3.3 Biocarta

Introduction: BioCarta is a database of gene interaction models. The database contains high-quality images of several cellular signaling and interaction pathways, and each diagram is fully hyperlinked to products and information pages about individual genes. Users can access product sales pages for selected elements of each pathway. state: 1396 genes 254 pathways 4417 gene-pathway associations

Web server: Biocarta. Biocarta


1.3.4 Reactome

Introduction: REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. The cornerstone of Reactome is a freely available, open source relational database of signaling and metabolic molecules and their relations organized into biological pathways and processes. The core unit of the Reactome data model is the reaction. Entities (nucleic acids, proteins, complexes, vaccines, anti-cancer therapeutics and small molecules) participating in reactions form a network of biological interactions and are grouped into pathways. Examples of biological pathways in Reactome include classical intermediary metabolism, signaling, transcriptional regulation, apoptosis and disease. The Reactome curation process for a pathway is similar to the editing of a scientific review. An external domain expert provides his or her expertise, a curator formalizes it into the database structure, and an external domain expert reviews the representation. A system of evidence tracking ensures that all assertions are backed up by the primary literature. Reactome pathway, reaction and molecules pages extensively cross-reference to over 100 different online bioinformatics resources, including NCBI Gene, Ensembl and UniProt databases, the UCSC Genome Browser, ChEBI small molecule databases, and the PubMed literature database.

SPECIES PROTEINS COMPLEXES REACTIONS PATHWAYS
S. pombe 1690 1805 1486 819
S. cerevisiae 1913 1827 1566 812
D. rerio 8633 8452 7383 1676
X. tropicalis 7046 7321 6159 1580
G. gallus 7296 7931 6859 1706
S. scrofa 8407 8825 7548 1660
B. taurus 8841 9182 8048 1696
C. familiaris 8162 8725 7455 1657
R. norvegicus 8808 9505 8356 1702
M. musculus 9537 10620 9456 1715
H. sapiens 11097 14084 14398 2601
D. melanogaster 4755 5402 4596 1477
C. elegans 4468 4403 3700 1304
D. discoideum 2681 2502 2313 982
P. falciparum 1051 1007 861 599
Database information

Web server: Biocarta. reactome reactome2


1.3.5 PANTHER

Introduction: The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System was designed to classify proteins (and their genes) in order to facilitate high-throughput analysis. The core of PANTHER is a comprehensive, annotated “library” of gene family phylogenetic trees. All nodes in the tree have persistent identifiers that are maintained between versions of PANTHER, providing a stable substrate for annotations of protein properties like subfamily and function. Each phylogenetic tree is used to annotate each protein member of the family by its: Family and Protein Class (supergrouping of protein families) Subfamily (subgroup within the family phylogenetic tree) Orthologs (genes in other organisms that derive from the same gene in the MRCA) Paralogs (genes in the same organism that are related by gene duplication) Function (using GO terms annotated on the trees by the GO Phylogenetic Annotation Project) Pathways (curated by PANTHER and by Reactome)

Type Number
species 143
pathways 177
Ontologies 3361 terms
2267 biological process terms
544 cellular component terms
550 molecular function terms
Database information

Web server: PANTHER. PANTHER


1.3.6 BioCyc

Introduction: The BioCyc collection of Pathway/Genome Databases (PGDBs) provides electronic reference sources on the pathways and genomes of many organisms. BioCyc databases describe organisms with sequenced genomes. BioCyc is primarily microbial. In addition, BioCyc contains databases for humans; for important model organisms such as yeast, fly, and mouse; and for other eukaryotes whose PGDBs have been curated. BioCyc is a collection of 20,025 Pathway/Genome Databases (PGDBs) for model eukaryotes and for thousands of microbes, plus software tools for exploring them. BioCyc is an encyclopedic reference that contains curated data from 130,000 publications.

Curated Pathway/Genome Databases for many organisms have been created using our Pathway Tools software by a variety of institutions and are available from the following Web sites.

Database Web site
EcoCyc EcoCyc.org
MetaCyc MetaCyc.org
HumanCyc HumanCyc.org
PlantCyc PlantCyc.org
GutCyc GutCyc.org
MouseCyc mousecyc
AraCyc aracyc
YeastCyc yeast.biocyc.org
LeishCyc leishcyc

Web server: BioCyc. BioCyc


1.3.7 INOH

Introduction: INOH is a pathway database providing 857 pathways from model organisms including Drosophila, Homo sapiens, Mus musculus, and Rattus norvegicus. In INOH, the term pathway refers to higher order functional knowledge such as relationships among multiple bio-molecules that constitute signal transduction pathways or biological events in general.

Web server: INOH.


1.3.8 EHMN

Introduction: EHMN (Edinburgh Human Metabolic Network) present a high-quality human metabolic network manually reconstructed by integrating genome annotation information from different databases and metabolic reaction information from literature. The network contains nearly 3000 metabolic reactions, which were reorganized into about 70 human-specific metabolic pathways according to their functional relationships. By analysis of the functional connectivity of the metabolites in the network, the bow-tie structure, which was found previously by structure analysis, is reconfirmed.

Web server: EHMN.


1.3.9 WikiPathways

Introduction: WikiPathways is a database of biological pathways maintained by and for the scientific community. WikiPathways is an open, collaborative platform dedicated to the curation of biological pathways. WikiPathways thus presents a new model for pathway databases that enhances and complements ongoing efforts, such as KEGG, Reactome and Pathway Commons. Building on the same MediaWiki software that powers Wikipedia, a custom graphical pathway editing tool and integrated databases were added covering major gene, protein, and small-molecule systems.

Type Number
species 33
pathways 3091
Data Source Statistics

Installation: To install this package, start R (version “4.2”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("rWikiPathways")

other installation version:

  • Java: Java API client with code examples at github.
  • Perl: Perl API client with code examples at github.
  • PHP: Example scripts at github.
  • Python: Python API client with code examples at github.
  • cytoscape: The WikiPathways app offers a cytoscape command for scripting purposes.

Web server: WikiPathways. WikiPathways2 WikiPathways


1.3.10 PID

Introduction: The Pathway Interaction Database (PID) is a free biomedical database of human cellular signaling pathways. The database contains information about the molecular interactions and reactions that take place in cells, with a particular focus on processes that might be relevant to cancer research and treatment.

Type Number
pathways 254
Data Source Statistics

Web server: PID.


1.3.11 Pathway Commons

Introduction: Pathway Commons provides biologists with (i) tools to search this comprehensive resource, (ii) a download site offering integrated bulk sets of pathway data (e.g. tables of interactions and gene sets), (iii) reusable software libraries for working with pathway information in several programming languages (Java, R, Python and Javascript) and (iv) a web service for programmatically querying the entire dataset. Visualization of pathways is supported using the Systems Biological Graphical Notation (SBGN). Pathway Commons currently contains data from 22 databases with 4794 detailed human biochemical processes (i.e. pathways) and ∼2.3 million interactions.

Web server: Pathway Commons. WikiPathways


1.3.12 SMPDB

Introduction: SMPDB (The Small Molecule Pathway Database) is an interactive, visual database containing more than 30 000 small molecule pathways found in humans only. The majority of these pathways are not found in any other pathway database. SMPDB is designed specifically to support pathway elucidation and pathway discovery in metabolomics, transcriptomics, proteomics and systems biology. It is able to do so, in part, by providing exquisitely detailed, fully searchable, hyperlinked diagrams of human metabolic pathways, metabolic disease pathways, metabolite signaling pathways and drug-action pathways. All SMPDB pathways include information on the relevant organs, subcellular compartments, protein_complex cofactors, protein_complex locations, metabolite locations, chemical structures and protein_complex quaternary structures. Each small molecule is hyperlinked to detailed descriptions contained in the HMDB or DrugBank and each protein_complex or enzyme complex is hyperlinked to UniProt. All SMPDB pathways are accompanied with detailed descriptions and references, providing an overview of the pathway, condition or processes depicted in each diagram. The database is easily browsed and supports full text, sequence and chemical structure searching. Users may query SMPDB with lists of metabolite names, drug names, genes/protein_complex names, SwissProt IDs, GenBank IDs, Affymetrix IDs or Agilent microarray IDs. These queries will produce lists of matching pathways and highlight the matching molecules on each of the pathway diagrams. Gene, metabolite and protein_complex concentration data can also be visualized through SMPDB’s mapping interface. All of SMPDB’s images, image maps, descriptions and tables are downloadable.

Type Number
Total Pathways 48690
Normal Metabolic Pathways 27876
Drug Action Pathways 404
Drug Metabolism Pathways 64
Disease Pathways 20251
Signaling Pathways 24
Protein Pathways 63
Physiological Pathways 8
Drugs 696
Metabolites 55700
Proteins 1451
Enzymes 791
Transporters 137
Reactions 57402
Transportations 294
Reaction-Coupled Transportations 73
Interactions 691
Data Source Statistics

Web server: SMPDB. SMPDB


1.3.14 NetPath

Introduction: NetPath is a manually curated resource of human signal transduction pathways. It is a joint effort between Pandey Lab at the Johns Hopkins University and the Institute of Bioinformatics (IOB), Bangalore, India, and is also worked on by other parties. NetPath hosts 45 signaling pathways, including 10 pathways with a major role in the regulation of immune system and 10 pathways with relevance to regulation of cancer.

Web server: NetPath. NetPath


1.3.15 iPAVS

Introduction: Integrated Pathway Resources, Analysis and Visualization System (iPAVS): provides a collection of highly-structured manually curated human pathway data, it also integrates biological pathway information from several public databases and provides several tools to manipulate,filter, browse, search, analyze, visualize and compare the integrated pathway resources.

Web server: iPAVS.


1.3.16 ParmGKB

Introduction: PharmGKB pathways are evidence-based diagrams depicting the pharmacokinetics (PK) and/or pharmacodynamics (PD) of a drug with relevant (or potential) pharmacogenetic (PGx) associations. Drugs featured in PharmGKB pathways are chosen through extensive review of a variety of sources, including, but not limited to, the U.S. Food and Drug Administration (FDA) biomarker list and Clinical Pharmacogenetics Implementation Consortium (CPIC) nominations.

Web server: ParmGKB. ParmGKB


1.3.17 PathCards

Introduction: PathCards is an integrated database of human biological pathways and their annotations. Human pathways were clustered into SuperPaths based on gene content similarity. Each PathCard provides information on one SuperPath which represents one or more human pathways. It includes 1570 SuperPath entries, consolidated from 11 sources.

Web server: PathCards. PathCards PathCards2


1.3.18 ACSN 2.0

Introduction: ACSN is a resource of cancer signalling knowledge, comprehensive map of molecular interactions in cancer based on the latest scientific literature.

Feature Content
Maps of biological processes 5
Functional modules 52
Chemical species 5975
Reactions 4826
Proteins 2371
Metabolites 595
Genes 159
References 2919
Data Source Statistics

Web server: ACSN.


1.3.19 NDEx

Introduction: The NDEx Project provides an open-source framework where scientists and organizations can store, share, manipulate, and publish biological network knowledge. One of the goals of the project is to create a home for models that are currently available only as figures, tables, or supplementary information, such as networks produced via systematic mining and integration of large-scale molecular data. The NDEx project does not compete with existing pathway and interaction databases, such as Pathway Commons, KEGG, or Reactome; instead, NDEx provides a novel, common distribution channel for these efforts, preserving their identity and attribution rather than subsuming them.

Type Number
Pathways 972 (WikiPathways:675, signor_database:90 and NCIsysbio’s PID v2.0:207)
IndraSysBio-assembled GO term networks 6295
Gene sets 32263
Data Source Statistics

Installation:The NDEx app offers a cytoscape command for scripting purposes.

Web server: NDEx. NDEx NDEx2


1.3.20 SIGNOR

Introduction: Signor is a resource that annotates experimental evidence about causal interactions between proteins and other entities of biological relevance: stimuli, phenotypes, enzyme inhibitors, complexes, protein families etc. Each entry points to the experimental evidence supporting the interaction and is enriched by additional relevant metadata such as the effect of the interaction on the activity of the target entity, the molecular mechanism underlying this effect, etc… The curated data can be displayed as signed directed graphs by a graph drawing tool.

Web server: SIGNOR. SIGNOR SIGNOR2 SIGNOR3


1.3.21 Plant Reactome

Introduction: Plant Reactome knowledgebase, a conceptual plant pathway network, is built by biocuration and integrating (bio)chemical entities, gene products, and macromolecular interactions. It provides manually curated pathways for the reference species Oryza sativa (rice) and gene orthology-based projections that extend pathway knowledge to 120 plant species. Currently, it hosts 30,000 reference pathways for plant metabolism, hormone signaling, transport, genetic regulation, plant organ development and differentiation, and biotic and abiotic stress responses. In addition to the pathway browsing and search functions, the Plant Reactome provides the analysis tools for pathway comparison between reference and projected species, pathway enrichment in gene expression data, and overlay of gene–gene interaction data on pathways. Web server: Plant Reactome. PlantReactome


1.3.22 HAMdb

Introduction: Human Autophagy Modulator Database (HAMdb, http://hamdb.scbdd.com), to provide researchers related pathway and disease information as many as possible. HAMdb contains 796 proteins, 841 chemicals and 132 microRNAs. Their specific effects on autophagy, physicochemical information, biological information and disease information were manually collected and compiled. Additionally, lots of external links were available for more information covering extensive biomedical knowledge.

Web server: HAMdb.


1.3.23 ComPath

Introduction: An ecosystem that supports curation of pathway mappings between databases and fosters the exploration of pathway knowledge through several novel visualizations.ComPath can generate new biological insights by identifying pathway modules, clusters, and cross-talks with these mappings. Resources from KEGG, Reactome, WikiPathways, and MSigDB have been used to build this package and application.

Resource Pathways Genes
WikiPathways 438 6015
Reactome 2195 10633
KEGG 330 7425
Data Source Statistics

Installation:The NDEx app offers the Python and docker command for scripting purposes.

Web server: ComPath. ComPath ComPath2 ComPath3


1.3.24 BIOPYDB

Introduction: BIOchemical PathwaY DataBase is developed as a manually curated, readily updatable, dynamic resource of human cell specific pathway information along with integrated computational platform to perform various pathway analyses. Presently, it comprises of 46 pathways, 3189 molecules, 5742 reactions and 6897 different types of diseases linked with pathway proteins, which are referred by 520 literatures and 17 other pathway databases.

Web server: BIOPYDB.


1.3.25 PCxN

Introduction: A unifying interpretation of functional interaction between pathways by systematically quantifying coexpression between 1,330 canonical pathways from the Molecular Signatures Database (MSigDB) established the Pathway Coexpression Network (PCxN). A curated collection of 3,207 microarrays from 72 normal human tissues were estimated the correlation between canonical pathways valid in a broad context. PCxN accounts for shared genes between annotations to estimate significant correlations between pathways with related functions rather than with similar annotations. PCxN complements the results of gene set enrichment methods by revealing relationships between enriched pathways, and by identifying additional highly correlated pathways.

Installation: To install this package, start R (version “4.2”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("pcxn")

Web server: PCxN. PCxN PCxN1


1.3.26 PathMe

Introduction: a Python package that transforms pathway knowledge from three major pathway databases (KEGG, reactome and wikiPathways) into a unified abstraction using Biological Expression Language as the pivotal, integrative schema. PathMe can generate new biological insights by identifying pathway modules and cross-talks similar to ComPath.

Installation:The PathMe app offers the Python command for scripting purposes.

Web server: PathMe.


1.3.27 pathDIP

Introduction: An extended pathway annotations and enrichment analysis resource for human, model organisms and domesticated species. Search Genes Search miRNAs Search Pathways API Download Statistics Documentation Publications Team Contact. pathDIP is an annotated database of signaling cascades in human and non-human organisms, comprising core pathways from major curated pathways databases, and pathways predicted based on orthology, and by using physical protein interactions. Data integration and predictions increase coverage of pathway annotations for human proteome to 92%. pathDIP annotates 122,131 unique proteins in 6,401 pathways in 17 organisms (including 18,454 human proteins), annotates 36,216 pathway orphans (including 5,366 human proteins), and provides multiple query, analysis and output options.

Data Source Statistics

pathDIP Web server: pathDIP. pathDIP1


1.3.28 PathBank

Introduction: A comprehensive pathway database for model organisms. Quantitative metabolomics services for biomarker discovery and validation. Specializing in ready to use metabolomics kits. Your source for quantitative metabolomics technologies and bioinformatics. PathBank is an interactive, visual database containing more than 100 000 machine-readable pathways found in model organisms such as humans, mice, E. coli, yeast, and Arabidopsis thaliana. The majority of these pathways are not found in any other pathway database. All PathBank pathways include information on the relevant organelles, subcellular compartments, protein complex cofactors, protein complex locations, metabolite locations, chemical structures, and protein complex quaternary structures. Each small molecule is hyperlinked to detailed descriptions contained in the HMDB or DrugBank and each protein complex or enzyme complex is hyperlinked to UniProt. All PathBank pathways are accompanied with detailed descriptions and references, providing an overview of the pathway, condition, or processes depicted in each diagram.

Data Source Statistics

PathBank2 Web server: PathBank. PathBank


1.3.29 AOP

Introduction: An Adverse Outcome Pathway (AOP) is a model that identifies a sequence of molecular and celluar events that may lead to adverse health effects in individuals and populations. An AOP maps out a sequence of biological events following an exposure that may result in illness or injury. By understanding the individual, key biological events in the organism, researchers can gain a better understanding of stressor-induced health outcomes. The Adverse Outcome Pathway Database (AOP-DB) is an online database that combines different data types (AOP, gene, chemical, disease, and pathway) to identify the impacts of chemicals on human health and the environment. EPA developed the AOP-DB to better characterize adverse outcomes of toxicological interest that are relevant to human health and the environment.

Biological Category Data Source
Gene NCBI Gene
STRING
Taxonomy & Orthology NCBI Taxonomy
Homologene
KEGG Orthology
metaPhOrs
AOP AOP-wiki
Chemical CTD
AOP-wiki
ToxCast
Pathway KEGG Pathway
Reactome
ConcensusPathDB
Disease DisGeNET
Ontology NCBI Gene
Tissues HumanBase
Haplotypes 1000 Genomes
GTEx
Ensemble
GWAS Catalog
Data Source Statistics

Web server: AOP. AOP


1.3.30 PathBIX

Introduction: A novel web application for network-based pathway analysis, based on the recently published ANUBIX algorithm which has been shown to be more accurate than previous network-based methods. The PathBIX website performs pathway annotation for 21 species, and utilizes prefetched and preprocessed network data from FunCoup 5.0 networks and pathway data from three databases: KEGG, Reactome, and WikiPathways.

Web server: PathBIX. PathBIX PathBIX1