> Gene Function Annotations > Biological pathway annotation

Biological pathway annotation

Pathways are biological networks defining how biomolecules cooperate to accomplish cellular tasks in different conditions. Pathways are assembled from physically interacting molecules such as proteins and thus, it is particularly important to the annotation of those proteins in the pathway.

Breaking: Thirty pathway academics databases are summarized in this document.

1.3.1 KEGG

Introduction: KEGG PATHWAY is a collection of manually drawn pathway maps representing the current knowledge of the molecular interaction, reaction and relation networks for: 1). Metabolism 2). Genetic Information Processing 3). Environmental Information Processing 4). Cellular Processes 5). Organismal Systems 6). Human Diseases 7). Drug Development.

Type	Number
species	8504(813 eukaryotes,7291 bacteria, 400 archaea)
pathway	559 (979,624 entries)
orthology	25,479 entries
genome	20,348 entries
genes	44,226,835 entries
compound	19,007 entries
glycan	11,104 entries
reaction	11,841 entries
rclass	3,180 entries
enzyme	8,012 entries
network	1,427 entries
variant	738 entries
disease	2,599 entries
drug	11,993 entries

Database information Release 07, Nov 22

Web server: KEGG. KEGG

1.3.2 Pathway ontology

Introduction: The goal of the Pathway Ontology is to cover all types of biological pathways, including altered and disease pathways (2643 pathways), and to capture the relationships between them within the hierarchical structure of a Directed Acyclic Graph (DAG). The five nodes of the ontology are: classic metabolic, regulatory, signaling, drug and disease pathways.

Web server: Pathway ontology. Pathwayontology

1.3.3 Biocarta

Introduction: BioCarta is a database of gene interaction models. The database contains high-quality images of several cellular signaling and interaction pathways, and each diagram is fully hyperlinked to products and information pages about individual genes. Users can access product sales pages for selected elements of each pathway. state: 1396 genes 254 pathways 4417 gene-pathway associations

Web server: Biocarta.

1.3.4 Reactome

Introduction: REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. The cornerstone of Reactome is a freely available, open source relational database of signaling and metabolic molecules and their relations organized into biological pathways and processes. The core unit of the Reactome data model is the reaction. Entities (nucleic acids, proteins, complexes, vaccines, anti-cancer therapeutics and small molecules) participating in reactions form a network of biological interactions and are grouped into pathways. Examples of biological pathways in Reactome include classical intermediary metabolism, signaling, transcriptional regulation, apoptosis and disease. The Reactome curation process for a pathway is similar to the editing of a scientific review. An external domain expert provides his or her expertise, a curator formalizes it into the database structure, and an external domain expert reviews the representation. A system of evidence tracking ensures that all assertions are backed up by the primary literature. Reactome pathway, reaction and molecules pages extensively cross-reference to over 100 different online bioinformatics resources, including NCBI Gene, Ensembl and UniProt databases, the UCSC Genome Browser, ChEBI small molecule databases, and the PubMed literature database.

SPECIES	PROTEINS	COMPLEXES	REACTIONS	PATHWAYS
S. pombe	1690	1805	1486	819
S. cerevisiae	1913	1827	1566	812
D. rerio	8633	8452	7383	1676
X. tropicalis	7046	7321	6159	1580
G. gallus	7296	7931	6859	1706
S. scrofa	8407	8825	7548	1660
B. taurus	8841	9182	8048	1696
C. familiaris	8162	8725	7455	1657
R. norvegicus	8808	9505	8356	1702
M. musculus	9537	10620	9456	1715
H. sapiens	11097	14084	14398	2601
D. melanogaster	4755	5402	4596	1477
C. elegans	4468	4403	3700	1304
D. discoideum	2681	2502	2313	982
P. falciparum	1051	1007	861	599

Database information

Web server: Biocarta. reactome reactome2

1.3.5 PANTHER

Introduction: The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System was designed to classify proteins (and their genes) in order to facilitate high-throughput analysis. The core of PANTHER is a comprehensive, annotated “library” of gene family phylogenetic trees. All nodes in the tree have persistent identifiers that are maintained between versions of PANTHER, providing a stable substrate for annotations of protein properties like subfamily and function. Each phylogenetic tree is used to annotate each protein member of the family by its: Family and Protein Class (supergrouping of protein families) Subfamily (subgroup within the family phylogenetic tree) Orthologs (genes in other organisms that derive from the same gene in the MRCA) Paralogs (genes in the same organism that are related by gene duplication) Function (using GO terms annotated on the trees by the GO Phylogenetic Annotation Project) Pathways (curated by PANTHER and by Reactome)

Type	Number
species	143
pathways	177
Ontologies	3361 terms
	2267 biological process terms
	544 cellular component terms
	550 molecular function terms

Database information

Web server: PANTHER. PANTHER

1.3.6 BioCyc

Introduction: The BioCyc collection of Pathway/Genome Databases (PGDBs) provides electronic reference sources on the pathways and genomes of many organisms. BioCyc databases describe organisms with sequenced genomes. BioCyc is primarily microbial. In addition, BioCyc contains databases for humans; for important model organisms such as yeast, fly, and mouse; and for other eukaryotes whose PGDBs have been curated. BioCyc is a collection of 20,025 Pathway/Genome Databases (PGDBs) for model eukaryotes and for thousands of microbes, plus software tools for exploring them. BioCyc is an encyclopedic reference that contains curated data from 130,000 publications.

Curated Pathway/Genome Databases for many organisms have been created using our Pathway Tools software by a variety of institutions and are available from the following Web sites.

Database	Web site
EcoCyc	EcoCyc.org
MetaCyc	MetaCyc.org
HumanCyc	HumanCyc.org
PlantCyc	PlantCyc.org
GutCyc	GutCyc.org
MouseCyc	mousecyc
AraCyc	aracyc
YeastCyc	yeast.biocyc.org
LeishCyc	leishcyc

Web server: BioCyc. BioCyc

1.3.7 INOH

Introduction: INOH is a pathway database providing 857 pathways from model organisms including Drosophila, Homo sapiens, Mus musculus, and Rattus norvegicus. In INOH, the term pathway refers to higher order functional knowledge such as relationships among multiple bio-molecules that constitute signal transduction pathways or biological events in general.

Web server: INOH.

1.3.8 EHMN

Introduction: EHMN (Edinburgh Human Metabolic Network) present a high-quality human metabolic network manually reconstructed by integrating genome annotation information from different databases and metabolic reaction information from literature. The network contains nearly 3000 metabolic reactions, which were reorganized into about 70 human-specific metabolic pathways according to their functional relationships. By analysis of the functional connectivity of the metabolites in the network, the bow-tie structure, which was found previously by structure analysis, is reconfirmed.

Web server: EHMN.

1.3.9 WikiPathways

Introduction: WikiPathways is a database of biological pathways maintained by and for the scientific community. WikiPathways is an open, collaborative platform dedicated to the curation of biological pathways. WikiPathways thus presents a new model for pathway databases that enhances and complements ongoing efforts, such as KEGG, Reactome and Pathway Commons. Building on the same MediaWiki software that powers Wikipedia, a custom graphical pathway editing tool and integrated databases were added covering major gene, protein, and small-molecule systems.

Type	Number
species	33
pathways	3091

Data Source Statistics

Installation: To install this package, start R (version “4.2”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("rWikiPathways")

other installation version:

Java: Java API client with code examples at github.
Perl: Perl API client with code examples at github.
PHP: Example scripts at github.
Python: Python API client with code examples at github.
cytoscape: The WikiPathways app offers a cytoscape command for scripting purposes.

Web server: WikiPathways. WikiPathways2

1.3.10 PID

Introduction: The Pathway Interaction Database (PID) is a free biomedical database of human cellular signaling pathways. The database contains information about the molecular interactions and reactions that take place in cells, with a particular focus on processes that might be relevant to cancer research and treatment.

Type	Number
pathways	254

Data Source Statistics

Web server: PID.

1.3.11 Pathway Commons

Introduction: Pathway Commons provides biologists with (i) tools to search this comprehensive resource, (ii) a download site offering integrated bulk sets of pathway data (e.g. tables of interactions and gene sets), (iii) reusable software libraries for working with pathway information in several programming languages (Java, R, Python and Javascript) and (iv) a web service for programmatically querying the entire dataset. Visualization of pathways is supported using the Systems Biological Graphical Notation (SBGN). Pathway Commons currently contains data from 22 databases with 4794 detailed human biochemical processes (i.e. pathways) and ∼2.3 million interactions.

Web server: Pathway Commons. WikiPathways

1.3.12 SMPDB

Introduction: SMPDB (The Small Molecule Pathway Database) is an interactive, visual database containing more than 30 000 small molecule pathways found in humans only. The majority of these pathways are not found in any other pathway database. SMPDB is designed specifically to support pathway elucidation and pathway discovery in metabolomics, transcriptomics, proteomics and systems biology. It is able to do so, in part, by providing exquisitely detailed, fully searchable, hyperlinked diagrams of human metabolic pathways, metabolic disease pathways, metabolite signaling pathways and drug-action pathways. All SMPDB pathways include information on the relevant organs, subcellular compartments, protein_complex cofactors, protein_complex locations, metabolite locations, chemical structures and protein_complex quaternary structures. Each small molecule is hyperlinked to detailed descriptions contained in the HMDB or DrugBank and each protein_complex or enzyme complex is hyperlinked to UniProt. All SMPDB pathways are accompanied with detailed descriptions and references, providing an overview of the pathway, condition or processes depicted in each diagram. The database is easily browsed and supports full text, sequence and chemical structure searching. Users may query SMPDB with lists of metabolite names, drug names, genes/protein_complex names, SwissProt IDs, GenBank IDs, Affymetrix IDs or Agilent microarray IDs. These queries will produce lists of matching pathways and highlight the matching molecules on each of the pathway diagrams. Gene, metabolite and protein_complex concentration data can also be visualized through SMPDB’s mapping interface. All of SMPDB’s images, image maps, descriptions and tables are downloadable.

Type	Number
Total Pathways	48690
Normal Metabolic Pathways	27876
Drug Action Pathways	404
Drug Metabolism Pathways	64
Disease Pathways	20251
Signaling Pathways	24
Protein Pathways	63
Physiological Pathways	8
Drugs	696
Metabolites	55700
Proteins	1451
Enzymes	791
Transporters	137
Reactions	57402
Transportations	294
Reaction-Coupled Transportations	73
Interactions	691

Data Source Statistics

Web server: SMPDB. SMPDB

1.3.13 Signalink

Introduction:SignaLink is an integrated resource to analyze signaling pathway cross-talks, transcription factors, miRNAs and regulatory enzymes. Main features: A signaling network resource with known and predicted information for human and model organisms. Manually curated dataset of major signaling pathways including curated data from resources such as ACSN, InnateDB, Reactome and Signor. Extends pathways with integrated regulatory resources to contain pathway-specific. transcription factors, miRNA, scaffolds and post translational modifying enzymes. Proteins are classified by pathway position (core/non-core) and function (ligand, receptor, mediator, etc.). Signaling interactions are directed and labeled with PubMed IDs of the publications of experimental evidence. A multi-layered network structure allows the selection of user-specific details. Allows filtering based on tissue or sub-cellular localization. Supporting downloads in csv or psimi tab formats.

Web server: Signalink.

Signalink Signalink2

1.3.14 NetPath

Introduction: NetPath is a manually curated resource of human signal transduction pathways. It is a joint effort between Pandey Lab at the Johns Hopkins University and the Institute of Bioinformatics (IOB), Bangalore, India, and is also worked on by other parties. NetPath hosts 45 signaling pathways, including 10 pathways with a major role in the regulation of immune system and 10 pathways with relevance to regulation of cancer.

Web server: NetPath. NetPath

1.3.15 iPAVS

Introduction: Integrated Pathway Resources, Analysis and Visualization System (iPAVS): provides a collection of highly-structured manually curated human pathway data, it also integrates biological pathway information from several public databases and provides several tools to manipulate,filter, browse, search, analyze, visualize and compare the integrated pathway resources.

Web server: iPAVS.

1.3.16 ParmGKB

Introduction: PharmGKB pathways are evidence-based diagrams depicting the pharmacokinetics (PK) and/or pharmacodynamics (PD) of a drug with relevant (or potential) pharmacogenetic (PGx) associations. Drugs featured in PharmGKB pathways are chosen through extensive review of a variety of sources, including, but not limited to, the U.S. Food and Drug Administration (FDA) biomarker list and Clinical Pharmacogenetics Implementation Consortium (CPIC) nominations.

Web server: ParmGKB. ParmGKB

1.3.17 PathCards

Introduction: PathCards is an integrated database of human biological pathways and their annotations. Human pathways were clustered into SuperPaths based on gene content similarity. Each PathCard provides information on one SuperPath which represents one or more human pathways. It includes 1570 SuperPath entries, consolidated from 11 sources.

Web server: PathCards. PathCards2

1.3.18 ACSN 2.0

Introduction: ACSN is a resource of cancer signalling knowledge, comprehensive map of molecular interactions in cancer based on the latest scientific literature.

Feature	Content
Maps of biological processes	5
Functional modules	52
Chemical species	5975
Reactions	4826
Proteins	2371
Metabolites	595
Genes	159
References	2919

Data Source Statistics

Web server: ACSN.

1.3.19 NDEx

Introduction: The NDEx Project provides an open-source framework where scientists and organizations can store, share, manipulate, and publish biological network knowledge. One of the goals of the project is to create a home for models that are currently available only as figures, tables, or supplementary information, such as networks produced via systematic mining and integration of large-scale molecular data. The NDEx project does not compete with existing pathway and interaction databases, such as Pathway Commons, KEGG, or Reactome; instead, NDEx provides a novel, common distribution channel for these efforts, preserving their identity and attribution rather than subsuming them.

Type	Number
Pathways	972 (WikiPathways:675, signor_database:90 and NCIsysbio’s PID v2.0:207)
IndraSysBio-assembled GO term networks	6295
Gene sets	32263

Data Source Statistics

Installation:The NDEx app offers a cytoscape command for scripting purposes.

Web server: NDEx. NDEx NDEx2

1.3.20 SIGNOR

Introduction: Signor is a resource that annotates experimental evidence about causal interactions between proteins and other entities of biological relevance: stimuli, phenotypes, enzyme inhibitors, complexes, protein families etc. Each entry points to the experimental evidence supporting the interaction and is enriched by additional relevant metadata such as the effect of the interaction on the activity of the target entity, the molecular mechanism underlying this effect, etc… The curated data can be displayed as signed directed graphs by a graph drawing tool.

Web server: SIGNOR. SIGNOR SIGNOR2 SIGNOR3

1.3.21 Plant Reactome

Introduction: Plant Reactome knowledgebase, a conceptual plant pathway network, is built by biocuration and integrating (bio)chemical entities, gene products, and macromolecular interactions. It provides manually curated pathways for the reference species Oryza sativa (rice) and gene orthology-based projections that extend pathway knowledge to 120 plant species. Currently, it hosts 30,000 reference pathways for plant metabolism, hormone signaling, transport, genetic regulation, plant organ development and differentiation, and biotic and abiotic stress responses. In addition to the pathway browsing and search functions, the Plant Reactome provides the analysis tools for pathway comparison between reference and projected species, pathway enrichment in gene expression data, and overlay of gene–gene interaction data on pathways. Web server: Plant Reactome. PlantReactome

1.3.22 HAMdb

Introduction: Human Autophagy Modulator Database (HAMdb, http://hamdb.scbdd.com), to provide researchers related pathway and disease information as many as possible. HAMdb contains 796 proteins, 841 chemicals and 132 microRNAs. Their specific effects on autophagy, physicochemical information, biological information and disease information were manually collected and compiled. Additionally, lots of external links were available for more information covering extensive biomedical knowledge.

Web server: HAMdb.

1.3.23 ComPath

Introduction: An ecosystem that supports curation of pathway mappings between databases and fosters the exploration of pathway knowledge through several novel visualizations.ComPath can generate new biological insights by identifying pathway modules, clusters, and cross-talks with these mappings. Resources from KEGG, Reactome, WikiPathways, and MSigDB have been used to build this package and application.

Resource	Pathways	Genes
WikiPathways	438	6015
Reactome	2195	10633
KEGG	330	7425

Data Source Statistics

Installation:The NDEx app offers the Python and docker command for scripting purposes.

Web server: ComPath. ComPath ComPath2 ComPath3

1.3.24 BIOPYDB

Introduction: BIOchemical PathwaY DataBase is developed as a manually curated, readily updatable, dynamic resource of human cell specific pathway information along with integrated computational platform to perform various pathway analyses. Presently, it comprises of 46 pathways, 3189 molecules, 5742 reactions and 6897 different types of diseases linked with pathway proteins, which are referred by 520 literatures and 17 other pathway databases.

Web server: BIOPYDB.

1.3.25 PCxN

Introduction: A unifying interpretation of functional interaction between pathways by systematically quantifying coexpression between 1,330 canonical pathways from the Molecular Signatures Database (MSigDB) established the Pathway Coexpression Network (PCxN). A curated collection of 3,207 microarrays from 72 normal human tissues were estimated the correlation between canonical pathways valid in a broad context. PCxN accounts for shared genes between annotations to estimate significant correlations between pathways with related functions rather than with similar annotations. PCxN complements the results of gene set enrichment methods by revealing relationships between enriched pathways, and by identifying additional highly correlated pathways.

Installation: To install this package, start R (version “4.2”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("pcxn")

Web server: PCxN. PCxN PCxN1

1.3.26 PathMe

Introduction: a Python package that transforms pathway knowledge from three major pathway databases (KEGG, reactome and wikiPathways) into a unified abstraction using Biological Expression Language as the pivotal, integrative schema. PathMe can generate new biological insights by identifying pathway modules and cross-talks similar to ComPath.

Installation:The PathMe app offers the Python command for scripting purposes.

Web server: PathMe.

1.3.27 pathDIP

Introduction: An extended pathway annotations and enrichment analysis resource for human, model organisms and domesticated species. Search Genes Search miRNAs Search Pathways API Download Statistics Documentation Publications Team Contact. pathDIP is an annotated database of signaling cascades in human and non-human organisms, comprising core pathways from major curated pathways databases, and pathways predicted based on orthology, and by using physical protein interactions. Data integration and predictions increase coverage of pathway annotations for human proteome to 92%. pathDIP annotates 122,131 unique proteins in 6,401 pathways in 17 organisms (including 18,454 human proteins), annotates 36,216 pathway orphans (including 5,366 human proteins), and provides multiple query, analysis and output options.

Data Source Statistics

pathDIP Web server: pathDIP. pathDIP1

1.3.28 PathBank

Introduction: A comprehensive pathway database for model organisms. Quantitative metabolomics services for biomarker discovery and validation. Specializing in ready to use metabolomics kits. Your source for quantitative metabolomics technologies and bioinformatics. PathBank is an interactive, visual database containing more than 100 000 machine-readable pathways found in model organisms such as humans, mice, E. coli, yeast, and Arabidopsis thaliana. The majority of these pathways are not found in any other pathway database. All PathBank pathways include information on the relevant organelles, subcellular compartments, protein complex cofactors, protein complex locations, metabolite locations, chemical structures, and protein complex quaternary structures. Each small molecule is hyperlinked to detailed descriptions contained in the HMDB or DrugBank and each protein complex or enzyme complex is hyperlinked to UniProt. All PathBank pathways are accompanied with detailed descriptions and references, providing an overview of the pathway, condition, or processes depicted in each diagram.

Data Source Statistics

PathBank2 Web server: PathBank.

1.3.29 AOP

Introduction: An Adverse Outcome Pathway (AOP) is a model that identifies a sequence of molecular and celluar events that may lead to adverse health effects in individuals and populations. An AOP maps out a sequence of biological events following an exposure that may result in illness or injury. By understanding the individual, key biological events in the organism, researchers can gain a better understanding of stressor-induced health outcomes. The Adverse Outcome Pathway Database (AOP-DB) is an online database that combines different data types (AOP, gene, chemical, disease, and pathway) to identify the impacts of chemicals on human health and the environment. EPA developed the AOP-DB to better characterize adverse outcomes of toxicological interest that are relevant to human health and the environment.

Biological Category	Data Source
Gene	NCBI Gene
	STRING
Taxonomy & Orthology	NCBI Taxonomy
	Homologene
	KEGG Orthology
	metaPhOrs
AOP	AOP-wiki
Chemical	CTD
	AOP-wiki
	ToxCast
Pathway	KEGG Pathway
	Reactome
	ConcensusPathDB
Disease	DisGeNET
Ontology	NCBI Gene
Tissues	HumanBase
Haplotypes	1000 Genomes
	GTEx
	Ensemble
	GWAS Catalog

Data Source Statistics

Web server: AOP. AOP

1.3.30 PathBIX

Introduction: A novel web application for network-based pathway analysis, based on the recently published ANUBIX algorithm which has been shown to be more accurate than previous network-based methods. The PathBIX website performs pathway annotation for 21 species, and utilizes prefetched and preprocessed network data from FunCoup 5.0 networks and pathway data from three databases: KEGG, Reactome, and WikiPathways.

Web server: PathBIX. PathBIX PathBIX1