Prediction of potential drug (Pharmocogenomics)

Drug repurposing (alternatively called “new uses for old drugs”) is a strategy for identifying new uses for approved or investigational drugs that are outside the scope of the original medical indication. Methods based on transcriptomics data have emerged as very promising approaches for drug repurposing, and the leading idea is that a given drug induces a specific gene expression signature that are over- or under-expressed, that reflect the effect of the drug in the cell.

Breaking: The popular R/python packages and web services tools based on transcriptomics are summarized in this document.


5.1.1 Clue

Introduction: CLUE is a cloud-based software platform for the analysis of perturbational datasets generated using gene expression (L1000) and proteomic (P100 and GCP) assays. The CLUE platform provides integrated access to datasets, results from the processing and analysis of these data, and software tools that the community can leverage to advance their research.

Data Source Statistics
Type Number
Projects 11
Datasets 14
Signatures 12900

Installation: In addition to apps, such as Touchstone and Query, that can facilitate analysis of these perturbational datasets, our team also offers easy access to open source software packages available in Python, R, Matlab, and Java. We will continue to expand the availability of code to further enable command line access of CLUE tools and data.

1.CMap R code: R modules integrated with tidyverse for analysis and io of a variety of file formats (including GCTx and GCT files).In R version 4.0 or newer:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("cmapR")
  1. A docker container with a slightly earlier version of cmapR can be obtained here: https://github.com/cmap/cmapR

Application: The cmapR tutorial can be found here. The other tutorials can be found here.

Web server: Clue. Clue


5.1.2 CREEDS

Introduction: CREEDS is a knowledgebase for associations between drugs, genes and diseases based on the data from a crowdsourcing microtask project implemented to annotate and reanalyze gene expression profiles from GEO. Signatures of these data are unique and are manually validated for quality. This database visualizes all of the signatures in a packed circles layout in which similar signatures are closer to each other. Furthermore, it has interactive heatmaps of hierarchically clustered matrices of all signatures.

Web server: CREEDS. CREEDS


5.1.3 L1000CDS²

Introduction: L1000CDS² is a LINCS L1000 characteristic direction signature search engine. It enables users to find consensus L1000 small molecule signatures that match user input signatures. The underlying dataset for the search engine is a portion of the LINCS L1000 small molecule expression profiles generated at the Broad Institute by the Connectivity Map team. The differentially expressed (DE) genes of these profiles were calculated using the characteristic direction method. Depending on the user’s input, L1000CDS² uses either a gene-set method or cosine distance method to compare the input signatures to the L1000 signatures to perform the search. When up/down gene lists are submitted to L1000CDS², the search engine compares the input gene lists to the DE genes computed from the LINCS L1000 data and descriptive information of the top 50 matched signatures is returned. When a signature is submitted to L1000CDS² in the format of “gene symbol, expression value”, the search engine calculates a cosine distance between the input signature and every characteristic direction signature in the underlying dataset, and the top 50 signatures of either the largest (reverse mode) or the smallest (mimic mode) cosine distances are returned. L1000CDS² leverages the efficiency of matrix operations to perform the search. The search finishes a query against more than 20,000 signatures in less than a decisecond using the gene-set method or less than 4 seconds using the cosine distance method.

L1000CDS² emained with 33,197 significant signatures served by the default search.These signatures cover 62 unique cell-lines and 3,924 different small molecues. The top six cell-lines with the most number of small molecule perturbations are:

Cell-line Small Molecule Count
MCF7 2,052
VCAP 1,893
PC3 1,803
HA1E 1,505
A375 1,499
A549 1,479

Application: The SigCom LINCS tutorial can be found at https://maayanlab.cloud/L1000CDS2/help/.

Web server: L1000CDS². L1000CDS2


5.1.4 PharmacoGx

Introduction: Contains a set of functions to perform large-scale analysis of pharmaco-genomic data. These include the PharmacoSet object for storing the results of pharmacogenomic experiments, as well as a number of functions for computing common summaries of drug-dose response and correlating them with the molecular features in a cancer cell-line.

Installation: To install this package, start R (version “4.2”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("PharmacoGx")

Or install package from github

Application: The PharmacoGx vignette can be found at https://www.bioconductor.org/packages/release/bioc/vignettes/PharmacoGx/inst/doc/PharmacoGx.html.


5.1.5 DrugComboRanker

Introduction:DrugComboRanker is a computational tool that prioritizes synergistic drug combinations and uncovers their mechanisms of action. It can identify drug combinations by searching for drugs whose targets are enriched in the complementary signaling modules of the disease signaling network to predict top ranked drug combinations and map the drug targets on the disease signaling network to highlight the mechanisms of action of the drug combinations.

Installation: DrugComboRanker is a Java-based application. If Java is not installed on your computer, please download and install Java SE7 or higher. The JRE package is available from: http://www.oracle.com/technetwork/java/javase/downloads/jre7-downloads-1880261.html. Install R languuage and dependent packages, DrugComboRanker uses igraph R package. To install this package, open RGui and run the command below.

install.packages("ligraph")

The Python code download from github

Application: The DrugComboRanker vignette can be found at https://github.com/methodistsmab/DrugComboRanker/blob/master/DrugComboRanker_user_guide_v1.pdf. DrugComboRanker


5.1.6 SEP-L1000

Introduction:A web portal for browsing and searching predictive small-molecule/ADR connections. A machine learning classifier was presented to prioritize ADRs for approved drugs and pre-clinical small-molecule compounds by combining chemical structure (CS) and gene expression (GE) features. The GE data is from the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset that measured changes in GE before and after treatment of human cells with over 20,000 small molecule compounds including most of the FDA-approved drugs.

Application: The SEP-L1000 tutorial can be found at http://maayanlab.net/SEP-L1000/#.

Web server: SEP-L1000. SEP


5.1.7 L1000FWD

Introduction: L1000 fireworks display (L1000FWD) is a web application that provides interactive visualization of over 16,000 drug and small-molecule induced gene expression signatures. L1000FWD enables coloring of signatures by different attributes such as cell type, time point, concentration, as well as drug attributes such as MOA and clinical phase. Signature similarity search is implemented to enable the search for mimicking or opposing signatures given as input of up and down gene sets. Each point on the L1000FWD interactive map is linked to a signature landing page, which provides multifaceted knowledge from various sources about the signature and the drug. Notably such information includes most frequent diagnoses, co-prescribed drugs and age distribution of prescriptions as extracted from the Mount Sinai Health System electronic medical records (EMR). Overall, L1000FWD serves as a platform for identifying functions for novel small molecules using unsupervised clustering, as well as for exploring drug MOA. L1000FWD

Application: The L1000FWD tutorial can be found at https://maayanlab.cloud/L1000FWD/api_page.

Web server: L1000FWD.


5.1.8 Mantra

Introduction: Mode of Action by NeTwoRk Analysis (MANTRA) is a computational tool for the analysis of the Mode of Action (MoA) of novel drugs and the identification of known and approved candidates for “drug repositioning”. It is based on network theory and non-parametric statistics on gene expression data. In order to study a novel drug users have to give to MANTRA one or more genome-wide ranked list of genes sorted according to their differential expression in a treatment with the drug. On the basis of this input, MANTRA automatically integrates this novel drug in a huge network of compounds in which the topology reveals similarities and differences in MoA. To make novel hypothesis on known and FDA approved drugs, hence to find “repositionable drugs”, users have just to explore this drug network.

Application: The Mantra tutorial can be found at https://mantra.tigem.it/About/MantraTutorial.pdf.

Web server: Mantra. Mantra


5.1.9 DvD

Introduction: Drug versus Disease (DvD) provides a pipeline, available through R or Cytoscape, for the comparison of drug and disease gene expression profiles from public microarray repositories. Dynamic access to both Array Express and GEO is provided, with automatic annotation and generation of differential expression profiles. These profiles can be compared to a reference set of profiles from the Connectivity Map or to a users own set of data. Correlations between profiles are calculated using Gene Set Enrichment Analysis. Resulting matches between drug and disease profiles can be viewed in a network representation within the Cytoscape plug-in. Negatively correlated (enriched) profiles can be used to generate hypotheses of drug-repurposing whilst positively correlated (enriched) profiles may be used to infer side-effects of drugs.

Installation: To install this package, start R (version “4.2”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DvDdata")

Installation instructions for Cytoscape plug-in: Make sure you have installed DvD, DvDdata and dependencies as above. Install Rserve. It is a R package that enables communication between Java and R. Enter the following commands in R console:

install.packages("Rserve") # This command only needs to be typed once
library(Rserve)
Rserve(args="--vanilla")

Make sure you have Cytoscape installed. You can download it from http://www.cytoscape.org/download.html.

Application: The DvD vignette can be found at https://saezlab.github.io/DrugVsDisease/.


5.1.10 DSEA

Introduction: The Drug Set Enrichment Analysis (DSEA) works on the same principles as GSEA, but using an inverse preparation and interpretation of the data. A set of drugs of interest is tested against a database of pathways. Each pathway in the database is stored as a ranked list of drugs (Perturbagen Expression Profile), sorted from the one most up-regulating the pathway (or gene) to the one most down-regulating it. Current data is based on the Cmap 2.0. Pathways analyzed by the DSEA are defined as gene sets collected by various sources, as summarized in the following table: DSEA

Application: The DSEA tutorial can be found at https://dsea.tigem.it/about.php.

Web server: DSEA. DSEA1


5.1.11 cogena

Introduction: cogena is a workflow for co-expressed gene-set enrichment analysis. It aims to discovery smaller scale, but highly correlated cellular events that may be of great biological relevance. A novel pipeline for drug discovery and drug repositioning based on the cogena workflow is proposed. Particularly, candidate drugs can be predicted based on the gene expression of disease-related data, or other similar drugs can be identified based on the gene expression of drug-related data. Moreover, the drug mode of action can be disclosed by the associated pathway analysis. In summary, cogena is a flexible workflow for various gene set enrichment analysis for co-expressed genes, with a focus on pathway/GO analysis and drug repositioning.

Installation: To install this package, start R (version “4.2”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("cogena")

Application: The cogena vignette can be found at https://www.bioconductor.org/packages/release/bioc/vignettes/cogena/inst/doc/cogena-vignette_html.html.


5.1.12 gene2drug

Introduction: Gene2drug ranks small molecules according to their ability to dysregulate an input set of pathways. Sets of pathways can be defined starting from a gene and exploiting its pathway annotations from a number of publicly available databases.The list of 10 databases supported is: Gene Ontology BP, MF, CC (through BiomaRt); CP, KEGG, BIOCARTA, REACTOME, CGP, TFT (through MSigDB); MIPS Corum; Single-gene Sets.

The following data are stored in tab-separated plain text files: 1.List of 12012 genes considered by Gene2drug when computing Enrichment Scores for any gene set. 2.List of 1309 small molecules.
3.10 lists of pathways, one for each of the 10 databases (gene set contents omitted, they can be obtained through their respective public sources). 4.Pathway-wise converted microarray data in 20 matrices (10 for the Enrichment Scores and 10 for the corresponding P-values) of size Nx1309 each, where N is the number of pathways in the database. For row and column mapping use (3) and (2) respectively. 5.List of 22261 selectable pathways. 6.gep2pep data (this section is under construction).

Application: The gene2drug tutorial can be found at https://gene2drug.tigem.it/about.php.

Web server: gene2drug. gene2drug


5.1.13 DeSigN

Introduction: DeSigN (Differentially Expressed Gene Signatures Inhibitors) is a CMap-inspired bioinformatics pipeline that enables gene expression patterns from experimental data to be linked to gene expression patterns associated with drug response in a cancer cell line database.

Web server: DeSigN. DeSigN


5.1.14 ksRepo

Introduction: This package enables investigators to mix and match various types of case/control gene lists with any gene::drug interaction database to predict repositioning oportunities.

Installation: In order to use ksRepo, you’ll need to first install and load it into your R environment. You can either do so by downloading/forking the GitHub Repository and installing it manually (for experienced users) or by using the Devtools package:

# Install devtools if necessary
install.packages('devtools')
# Load the package
require(devtools)
# Download and install ksRepo from GitHub directly
install_github('adam-sam-brown/ksRepo')
library(ksRepo)

Application: The cogena vignette can be found here.


5.1.15 iLINCS

Introduction: iLINCS (Integrative LINCS) is an integrative web platform for analysis of LINCS data and signatures. The portal provides biologists-friendly user interfaces for analyzing transcriptomics and proteomics LINCS datasets. The portal integrates R analytical engine via several R tools for web-computing (rserve, opencpu, Shiny, rgl) and DCIC developed web tools and applications (FTreeView, Enrichr) into a coherent web platform for LINCS data analysis. Users can follow several workflows which allow them to identify differentially expressed genes, proteins and phosphoproteins in LINCS datasets and use them in analysis of other LINCS and non-LINCS dataset (eg TCGA and GEO transcriptomic datasets), and in the analysis of LINCS L1000 signatures. In this way, the platform facilitates integrative analysis of LINCS data and signatures. The mechanistic interpretation of LINCS transcriptomic and proteomics signatures is facilitated by the enrichment analysis via Enrichr and DAVID, and by pathways analysis using the R implementation of the SPIA algorithm.

Installation: There is an extensive workflow in iLINCS API repository https://github.com/uc-bd2k/ilincsAPI to show how to embed iLINCS functionalities in R. This repository consists of set of examples to showcase the ways of interacting with iLINCS APIs.

install.packages(c("knitr", "tinytex", "httr", "jsonlite", "htmltools", "data.table"),repos = "http://cran.us.r-project.org")
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager",repos = "http://cran.us.r-project.org");
    
BiocManager::install("Biobase")

Application: The iLINCS tutorial can be found at http://www.ilincs.org/help/what-is-ilincs.

Web server: iLINCS. iLINCS


5.1.16 Gene Expression Signature

Introduction: This package gives the implementations of the gene expression signature and its distance to each. Gene expression signature is represented as a list of genes whose expression is correlated with a biological state of interest. And its distance is defined using a nonparametric, rank-based pattern-matching strategy based on the Kolmogorov-Smirnov statistic. Gene expression signature and its distance can be used to detect similarities among the signatures of drugs, diseases, and biological states of interest.

Installation:To install this package, start R (version “4.2”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("GeneExpressionSignature")

Or install the development version from github:

if (!requireNamespace("remotes", quietly=TRUE))
    install.packages("remotes")
remotes::install_github("yiluheihei/GeneExpressionSignature")

Application: The GeneExpressionSignature vignette can be found here.


5.1.17 PDOD

Introduction: This web page presents implementation of our PDOD approach. PDOD (Prediction of Drugs having Opposite effects on Diseases) can discover drugs likely to restore altered states of disease genes. You can submit a gene list with their Entrez gene IDs and altered states in diseases. Scoring function of PDOD gives a high score to a drug satisfying two conditions: (i) target proteins of the drug are close to disease genes; (ii) the drug activates down-regulated disease genes and inhibits up-regulated genes through drug-target interactions (DTIs) and molecular pathway. DTIs in our PDOD were obtained from DrugBank and backbone network was constructed from relations of KEGG human pathways including activation, inhibition, expression, and repression. Then, you will get top 10 high scored drugs likely to have reverse effects on your genes if more than one gene existed in our backbone network.

Web server: PDOD.


5.1.18 SigCom LINCS

Introduction: Millions of transcriptomics samples were generated by the Library of Integrated Network-Based Cellular Signatures (LINCS) program. When these data are processed into searchable signatures along with signatures extracted from Genotype-Tissue Expression (GTEx), and Gene Expression Omnibus, connections between drugs, genes, pathways, and diseases can be illuminated. SigCom LINCS is a web-based search engine that serves over 1.5 million gene expression signatures processed, analyzed, and visualized from LINCS, GTEx, and GEO. SigCom LINCS is built from the Signature Commons framework, a cloud-agnostic generic platform that can be used to stand up Data Commons with a focus on searchable signatures. SigCom LINCS provides rapid signature similarity search for mimickers and reversers given sets of up and down genes. Additionally, users of SigCom LINCS can perform a metadata search to find and analyze subsets of signatures, and find information about genes and drugs. SigCom LINCS is findable, accessible, interoperable, and reusable (FAIR) compliant with metadata linked to standard ontologies and vocabularies while all data and signatures within SigCom LINCS are available for download and via a well-documented API. SigComLINCS

Installation:The SigCom LINCS app provide Python code that users can use as a template for more complex queries. To start, make sure you have the request library installed via pip. More technical information about the SigCom LINCS API is available from here: https://maayanlab.cloud/sigcom-lincs/#/API.

Application: The SigCom LINCS tutorial can be found at https://youtu.be/mupAIqTXceE and https://maayanlab.cloud/sigcom-lincs/#/Help.

Web server: SigCom LINCS. SigComLINCS2


5.1.19 DTX

Introduction: Calculate a drug-target-likeness score of a protein of interest for a given disease according to molecular features of nodes/edges in the shortest paths between them by a machine learning approach.

Database relations: DTX2

Web server: DTX. DTX


5.1.20 Phosprof

Introduction: Phosprof (phosphorylation profiling database) is a database to present cellular response to representative drugs as the significant pathways. It based on the analysis of collected experimental data of phosphorylation activities using protein arrays. Phosphorylation activity measurement allows evaluation of signal transduction activity directly and provides direct estimation of responsible pathway of a cellular event. Comparison between various approved drugs, whose target phen-type is already known, can be helpful for drug development.

Application: The Phosprof tutorial can be found at https://phosprof.medals.jp/static/guide/Phosprof_UserGuide_ver1.1.pdf

Web server: Phosprof. Phosprof