Gene-phenotype association analysis provides valuable insights into the genetic basis of phenotypic traits and diseases, offering opportunities for biomarker discovery, personalized medicine, drug target identification, and elucidation of disease mechanisms.
Breaking: Some user-friendly gene-phenotype association analysis tools are provided in this section.
Introduction: PrediXcan is a command-line tool that predicts gene expression from genotype data and performs gene-based association tests, allowing researchers to prioritize genes that are likely to be causal for a phenotype. PrediXcan consists of two steps: 1) Predict gene expression (or whatever biology the models predict) in a cohort with available genotypes; 2) Run associations to a trait measured in the cohort.
Installation: To run PrediXcan you will need:
Application: The PrediXcan vignette can be found at https://github.com/hakyimlab/PrediXcan.
Introduction: MetaXcan is a set of tools to integrate genomic information of biological mechanisms with complex traits. Almost all of the software here is command-line based. This software has been recently migrated to python 3 as python 2 has been sunset.
Installation: The software is developed and tested in Linux and Mac OS environments. The main S-PrediXcan script is also supported in Windows.
To run S-PrediXcan, you need Python 3.5 or higher, with the following libraries: numpy (>=1.11.1) scipy (>=0.18.1) pandas (>=0.18.1) sqlalchemy is needed at some unit tests.
To run PrediXcan Associations and MulTiXcan, you also need: patsy (>=0.5.0) statsmodels (>=0.8.0) h5py (>=2.7.1) h5py-cache (>=1.0.0) *Now folded into h5py
To run prediction of biological mechanisms on individual-level data, you will also need: bgen_reader (>=3.0.3) cyvcf2 (>=0.8.0) R with ggplot and dplyr is needed for some optional statistics and charts.
Application: The MetaXcan vignette can be found at https://github.com/hakyimlab/MetaXcan.
Introduction: FUSION is a suite of tools for performing transcriptome-wide and regulome-wide association studies (TWAS and RWAS). FUSION builds predictive models of the genetic component of a functional/molecular phenotype and predicts and tests that component for association with disease using GWAS summary statistics. The goal is to identify associations between a GWAS phenotype and a functional phenotype that was only measured in reference data. We provide precomputed predictive models from multiple studies to facilitate this analysis.
Installation:
wget https://github.com/gusevlab/fusion_twas/archive/master.zip
unzip master.zip
cd fusion_twas-master
wget https://data.broadinstitute.org/alkesgroup/FUSION/LDREF.tar.bz2
tar xjvf LDREF.tar.bz2
wget https://github.com/gabraham/plink2R/archive/master.zip
unzip master.zip
install.packages(c('optparse','RColorBrewer'))
install.packages('plink2R-master/plink2R/',repos = NULL)
If computing your own weights, the following additional steps are required
- Add the bundled GCTA binary gcta_nr_robust
to path (coded by Po-Ru Loh for robust non-linear optimization)
- Download and install PLINK2, add plink
to path
- Launch R and install the following required libraries:
install.packages(c('glmnet','methods'))
ln -s ./ output
in the directory where you will run FUSION.weights.R
(this is a workaround because GEMMA requires results to go into an output
subdirectory).Application: The Fusion vignette can be found at http://gusevlab.org/projects/fusion/.
Introduction: The SMR software tool was originally developed to implement the SMR & HEIDI methods to test for pleiotropic association between the expression level of a gene and a complex trait of interest using summary-level data from GWAS and expression quantitative trait loci (eQTL) studies (Zhu et al. 2016 Nature Genetics). The SMR & HEIDI methodology can be interpreted as an analysis to test if the effect size of a SNP on the phenotype is mediated by gene expression. This tool can therefore be used to prioritize genes underlying GWAS hits for follow-up functional studies. The methods are applicable to all kinds of molecular QTL (xQTL) data, including DNA methylation QTL (mQTL) and protein abundance QTL (pQTL).
Installation: The source code are released under GPL v2.
Source code: smr_v1.3.1_src.tar.gz https://yanglab.westlake.edu.cn/software/smr/download/smr_v1.3.1_src.tar.gz.
Application: The SMR vignette can be found at https://yanglab.westlake.edu.cn/software/smr/#Overview.
Introduction: A computational framework, probabilistic transcriptome-wide association study (PTWAS) can investigate causal relationships between gene expressions and complex traits. PTWAS applies the established principles from instrumental variables analysis and takes advantage of probabilistic eQTL annotations to delineate and tackle the unique challenges arising in TWAS. PTWAS not only confers higher power than the existing methods but also provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type-specific gene-to-trait effects.
Installation: Software requirement
The bare minimum set of software packages for running PTWAS scan, validation, and estimation procedures are:
To compute PTWAS composite instrumental variables/eQTL weights for scan analysis from users’ own eQTL data, it requires additional software packages/libraries
Application: The PTWAS vignette can be found at https://xqwen.github.io/ptwas/.