1
|
Xu T, Wang S, Ma T, Dong Y, Ashby CR, Hao GF. The identification of essential cellular genes is critical for validating drug targets. Drug Discov Today 2024; 29:104215. [PMID: 39428084 DOI: 10.1016/j.drudis.2024.104215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 10/06/2024] [Accepted: 10/15/2024] [Indexed: 10/22/2024]
Abstract
Accurately identifying biological targets is crucial for advancing treatment options. Essential genes, vital for cell or organism survival, hold promise as potential drug targets in disease treatment. Although many studies have sought to identify essential genes as therapeutic targets in medicine and bioinformatics, systematic reviews on their relationship with drug targets are relatively rare. This work presents a comprehensive analysis to aid in identifying essential genes as potential targets for drug discovery, encompassing their relevance, identification methods, successful case studies, and challenges. This work will facilitate the identification of essential genes as therapeutic targets, thereby boosting new drug development.
Collapse
Affiliation(s)
- Ting Xu
- School of Pharmaceutical Sciences, Guizhou Engineering Laboratory for Synthetic Drugs, Guizhou University, Guiyang 550025, China
| | - Shuang Wang
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Tingting Ma
- School of Pharmaceutical Sciences, Guizhou Engineering Laboratory for Synthetic Drugs, Guizhou University, Guiyang 550025, China
| | - Yawen Dong
- School of Pharmaceutical Sciences, Guizhou Engineering Laboratory for Synthetic Drugs, Guizhou University, Guiyang 550025, China.
| | - Charles R Ashby
- Department of Pharmaceutical Sciences, St. John's University, New York, NY, USA.
| | - Ge-Fei Hao
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, China.
| |
Collapse
|
2
|
Wang R, Su Y, O'Donnell K, Caron J, Meaney M, Meng X, Li Y. Differential interactions between gene expressions and stressors across the lifespan in major depressive disorder. J Affect Disord 2024; 362:688-697. [PMID: 39029669 DOI: 10.1016/j.jad.2024.07.069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 07/05/2024] [Accepted: 07/14/2024] [Indexed: 07/21/2024]
Abstract
BACKGROUND Both genetic predispositions and exposures to stressors have collectively contributed to the development of major depressive disorder (MDD). To deep dive into their roles in MDD, our study aimed to examine which susceptible gene expression interacts with various dimensions of stressors in the MDD risk among a large population cohort. METHODS Data analyzed were from a longitudinal community-based cohort from Southwest Montreal, Canada (N = 1083). Latent profile models were used to identify distinct patterns of stressors for the study cohort. A transcriptome-wide association study (TWAS) method was performed to examine the interactive effects of three dimensions of stressors (threat, deprivation, and cumulative lifetime stress) and gene expression on the MDD risk in a total of 48 tissues from GTEx. Additional analyses were also conducted to further explore and specify these associations including colocalization, and fine-mapping analyses, in addition to enrichment analysis investigations based on TWAS. RESULTS We identified 3321 genes linked to MDD at the nominal p-value <0.05 and found that different patterns of stressors can amplify the genetic susceptibility to MDD. We also observed specific genes and pathways that interacted with deprivation and cumulative lifetime stressors, particularly in specific brain tissues including basal ganglia, prefrontal cortex, brain amygdala, brain cerebellum, brain cortex, and the whole blood. Colocalization analysis also identified these genes as having a high probability of sharing MDD causal variants. LIMITATIONS The study cohort was composed exclusively of individuals of Caucasians, which restricts the generalizability of the findings to other ethnic population groups. CONCLUSIONS The findings of the study unveiled significant interactions between potential tissue-specific gene expression × stressors in the MDD risk and shed light on the intricate etiological attributes of gene expression and specific stressors across the lifespan in MDD. These genetic and environmental attributes in MDD corroborate the vulnerability-stress theory and direct future stress research to have a closer examination of genetic predisposition and potential involvements of omics studies to specify the intricate relationships between genes and stressful environments.
Collapse
Affiliation(s)
- Ruiyang Wang
- Department of Financial and Risk Engineering, New York University, NY, NYC, USA; Department of Psychiatry, McGill University, Montreal, QC, Canada; Douglas Research Centre, Montreal, QC, Canada
| | - Yingying Su
- School of Public Health and Emergency Management, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Kieran O'Donnell
- Department of Psychiatry, McGill University, Montreal, QC, Canada; Douglas Research Centre, Montreal, QC, Canada; Yale Child Study Center, Department of Obstetrics Gynecology & Reproductive Sciences, Yale School of Medicine, Yale University, New Haven, CT, USA; Child & Brain Development Program, CIFAR, Toronto, ON, Canada
| | - Jean Caron
- Department of Psychiatry, McGill University, Montreal, QC, Canada; Douglas Research Centre, Montreal, QC, Canada
| | - Michael Meaney
- Department of Psychiatry, McGill University, Montreal, QC, Canada; Douglas Research Centre, Montreal, QC, Canada
| | - Xiangfei Meng
- Department of Psychiatry, McGill University, Montreal, QC, Canada; Douglas Research Centre, Montreal, QC, Canada.
| | - Yue Li
- School of Computer Science, McGill University, Montreal, QC, Canada.
| |
Collapse
|
3
|
Lu Y, Xu K, Maydanchik N, Kang B, Pierce BL, Yang F, Chen LS. An integrative multi-context Mendelian randomization method for identifying risk genes across human tissues. Am J Hum Genet 2024; 111:1736-1749. [PMID: 39053459 PMCID: PMC11339623 DOI: 10.1016/j.ajhg.2024.06.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 06/11/2024] [Accepted: 06/24/2024] [Indexed: 07/27/2024] Open
Abstract
Mendelian randomization (MR) provides valuable assessments of the causal effect of exposure on outcome, yet the application of conventional MR methods for mapping risk genes encounters new challenges. One of the issues is the limited availability of expression quantitative trait loci (eQTLs) as instrumental variables (IVs), hampering the estimation of sparse causal effects. Additionally, the often context- or tissue-specific eQTL effects challenge the MR assumption of consistent IV effects across eQTL and GWAS data. To address these challenges, we propose a multi-context multivariable integrative MR framework, mintMR, for mapping expression and molecular traits as joint exposures. It models the effects of molecular exposures across multiple tissues in each gene region, while simultaneously estimating across multiple gene regions. It uses eQTLs with consistent effects across more than one tissue type as IVs, improving IV consistency. A major innovation of mintMR involves employing multi-view learning methods to collectively model latent indicators of disease relevance across multiple tissues, molecular traits, and gene regions. The multi-view learning captures the major patterns of disease relevance and uses these patterns to update the estimated tissue relevance probabilities. The proposed mintMR iterates between performing a multi-tissue MR for each gene region and joint learning the disease-relevant tissue probabilities across gene regions, improving the estimation of sparse effects across genes. We apply mintMR to evaluate the causal effects of gene expression and DNA methylation for 35 complex traits using multi-tissue QTLs as IVs. The proposed mintMR controls genome-wide inflation and offers insights into disease mechanisms.
Collapse
Affiliation(s)
- Yihao Lu
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| | - Ke Xu
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA
| | - Nathaniel Maydanchik
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| | - Bowei Kang
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| | - Brandon L Pierce
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| | - Fan Yang
- Yau Mathematical Sciences Center, Tsinghua University, Beijing, China; Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, China.
| | - Lin S Chen
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
4
|
Parrish RL, Buchman AS, Tasaki S, Wang Y, Avey D, Xu J, De Jager PL, Bennett DA, Epstein MP, Yang J. SR-TWAS: leveraging multiple reference panels to improve transcriptome-wide association study power by ensemble machine learning. Nat Commun 2024; 15:6646. [PMID: 39103319 DOI: 10.1038/s41467-024-50983-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 07/26/2024] [Indexed: 08/07/2024] Open
Abstract
Multiple reference panels of a given tissue or multiple tissues often exist, and multiple regression methods could be used for training gene expression imputation models for transcriptome-wide association studies (TWAS). To leverage expression imputation models (i.e., base models) trained with multiple reference panels, regression methods, and tissues, we develop a Stacked Regression based TWAS (SR-TWAS) tool which can obtain optimal linear combinations of base models for a given validation transcriptomic dataset. Both simulation and real studies show that SR-TWAS improves power, due to increased training sample sizes and borrowed strength across multiple regression methods and tissues. Leveraging base models across multiple reference panels, tissues, and regression methods, our real studies identify 6 independent significant risk genes for Alzheimer's disease (AD) dementia for supplementary motor area tissue and 9 independent significant risk genes for Parkinson's disease (PD) for substantia nigra tissue. Relevant biological interpretations are found for these significant risk genes.
Collapse
Affiliation(s)
- Randy L Parrish
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Biostatistics, Emory University School of Public Health, Atlanta, GA, 30322, USA
| | - Aron S Buchman
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Yanling Wang
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Denis Avey
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Jishu Xu
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Philip L De Jager
- Center for Translational and Computational Neuroimmunology, Department of Neurology and Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Michael P Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
5
|
Zhu X, Wang B, Yu H, Li C, Zhao Y, Zhong Y, Tang W, Zhou Y, Huang X, Zhu H, Wu Y, Yang K, Wei Y, Gao Z, Dong J. Icariin attenuates asthmatic airway inflammation via modulating alveolar macrophage activation based on network pharmacology and in vivo experiments. J Gene Med 2024; 26:e3718. [PMID: 38979822 DOI: 10.1002/jgm.3718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 03/23/2024] [Accepted: 06/19/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Icariin (ICA) inhibits inflammatory response in various diseases, but the mechanism underlying ICA treating airway inflammation in asthma needs further understood. We aimed to predict and validate the potential targets of ICA against asthma-associated airway inflammation using network pharmacology and experiments. METHODS The ovalbumin-induced asthma-associated airway inflammation mice model was established. The effects of ICA were evaluated by behavioral, airway hyperresponsiveness, lung pathological changes, inflammatory cell and cytokines counts. Next, the corresponding targets of ICA were mined via the SEA, CTD, HERB, PharmMapper, Symmap database and the literature. Pubmed-Gene and GeneCards databases were used to screen asthma and airway inflammation-related targets. The overlapping targets were used to build an interaction network, analyze gene ontology and enrich pathways. Subsequently, flow cytometry, quantitative real-time PCR and western blotting were employed for validation. RESULTS ICA alleviated the airway inflammation of asthma; 402 targets of ICA, 5136 targets of asthma and 4531 targets of airway inflammation were screened; 216 overlapping targets were matched and predicted ICA possesses the potential to modulate asthmatic airway inflammation by macrophage activation/polarization. Additionally, ICA decreased M1 but elevated M2. Potential targets that were disrupted by asthma inflammation were restored by ICA treatment. CONCLUSIONS ICA alleviates airway inflammation in asthma by inhibiting the M1 polarization of alveolar macrophages, which is related to metabolic reprogramming. Jun, Jak2, Syk, Tnf, Aldh2, Aldh9a1, Nos1, Nos2 and Nos3 represent potential targets of therapeutic intervention. The present study enhances understanding of the anti-airway inflammation effects of ICA, especially in asthma.
Collapse
Affiliation(s)
- Xiaofei Zhu
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Bin Wang
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Hang Yu
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Congcong Li
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Yuhang Zhao
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Yuanyuan Zhong
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Weifeng Tang
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Yaolong Zhou
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Xi Huang
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Huahe Zhu
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Yueren Wu
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Kai Yang
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Ying Wei
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Zhen Gao
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| | - Jingcheng Dong
- Department of Integrative Medicine, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Integrative Medicine, Fudan University, Shanghai, China
| |
Collapse
|
6
|
Zhang Y, Wang M, Li Z, Yang X, Li K, Xie A, Dong F, Wang S, Yan J, Liu J. An overview of detecting gene-trait associations by integrating GWAS summary statistics and eQTLs. SCIENCE CHINA. LIFE SCIENCES 2024; 67:1133-1154. [PMID: 38568343 DOI: 10.1007/s11427-023-2522-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/29/2024] [Indexed: 06/07/2024]
Abstract
Detecting genes that affect specific traits (such as human diseases and crop yields) is important for treating complex diseases and improving crop quality. A genome-wide association study (GWAS) provides new insights and directions for understanding complex traits by identifying important single nucleotide polymorphisms. Many GWAS summary statistics data related to various complex traits have been gathered recently. Studies have shown that GWAS risk loci and expression quantitative trait loci (eQTLs) often have a lot of overlaps, which makes gene expression gradually become an important intermediary to reveal the regulatory role of GWAS. In this review, we review three types of gene-trait association detection methods of integrating GWAS summary statistics and eQTLs data, namely colocalization methods, transcriptome-wide association study-oriented approaches, and Mendelian randomization-related methods. At the theoretical level, we discussed the differences, relationships, advantages, and disadvantages of various algorithms in the three kinds of gene-trait association detection methods. To further discuss the performance of various methods, we summarize the significant gene sets that influence high-density lipoprotein, low-density lipoprotein, total cholesterol, and triglyceride reported in 16 studies. We discuss the performance of various algorithms using the datasets of the four lipid traits. The advantages and limitations of various algorithms are analyzed based on experimental results, and we suggest directions for follow-up studies on detecting gene-trait associations.
Collapse
Affiliation(s)
- Yang Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Mengyao Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zhenguo Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xuan Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Keqin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ao Xie
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Fang Dong
- College of Life Sciences, Nankai University, Tianjin, 300071, China
| | - Shihan Wang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
7
|
Lu Y, Oliva M, Pierce BL, Liu J, Chen LS. Integrative cross-omics and cross-context analysis elucidates molecular links underlying genetic effects on complex traits. Nat Commun 2024; 15:2383. [PMID: 38493154 PMCID: PMC10944527 DOI: 10.1038/s41467-024-46675-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 03/06/2024] [Indexed: 03/18/2024] Open
Abstract
Genetic effects on functionally related 'omic' traits often co-occur in relevant cellular contexts, such as tissues. Motivated by the multi-tissue methylation quantitative trait loci (mQTLs) and expression QTLs (eQTLs) analysis, we propose X-ING (Cross-INtegrative Genomics) for cross-omics and cross-context integrative analysis. X-ING takes as input multiple matrices of association statistics, each obtained from different omics data types across multiple cellular contexts. It models the latent binary association status of each statistic, captures the major association patterns among omics data types and contexts, and outputs the posterior mean and probability for each input statistic. X-ING enables the integration of effects from different omics data with varying effect distributions. In the multi-tissue cis-association analysis, X-ING shows improved detection and replication of mQTLs by integrating eQTL maps. In the trans-association analysis, X-ING reveals an enrichment of trans-associations in many disease/trait-relevant tissues.
Collapse
Affiliation(s)
- Yihao Lu
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| | - Meritxell Oliva
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
- Genomics Research Center, AbbVie, North Chicago, IL, USA
| | - Brandon L Pierce
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| | - Jin Liu
- School of Data Science, The Chinese University of Hong Kong-Shenzhen, Shenzhen, China.
| | - Lin S Chen
- Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA.
| |
Collapse
|
8
|
Lu Y, Xu K, Kang B, Pierce BL, Yang F, Chen LS. An integrative multi-context Mendelian randomization method for identifying risk genes across human tissues. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.04.24303731. [PMID: 38496462 PMCID: PMC10942526 DOI: 10.1101/2024.03.04.24303731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Mendelian randomization (MR) provides valuable assessments of the causal effect of exposure on outcome, yet the application of conventional MR methods for mapping risk genes encounters new challenges. One of the issues is the limited availability of expression quantitative trait loci (eQTLs) as instrumental variables (IVs), hampering the estimation of sparse causal effects. Additionally, the often context/tissue-specific eQTL effects challenge the MR assumption of consistent IV effects across eQTL and GWAS data. To address these challenges, we propose a multi-context multivariable integrative MR framework, mintMR, for mapping expression and molecular traits as joint exposures. It models the effects of molecular exposures across multiple tissues in each gene region, while simultaneously estimating across multiple gene regions. It uses eQTLs with consistent effects across more than one tissue type as IVs, improving IV consistency. A major innovation of mintMR involves employing multi-view learning methods to collectively model latent indicators of disease relevance across multiple tissues, molecular traits, and gene regions. The multi-view learning captures the major patterns of disease-relevance and uses these patterns to update the estimated tissue relevance probabilities. The proposed mintMR iterates between performing a multi-tissue MR for each gene region and joint learning the disease-relevant tissue probabilities across gene regions, improving the estimation of sparse effects across genes. We apply mintMR to evaluate the causal effects of gene expression and DNA methylation for 35 complex traits using multi-tissue QTLs as IVs. The proposed mintMR controls genome-wide inflation and offers new insights into disease mechanisms.
Collapse
|
9
|
Wang T, Yan Z, Zhang Y, Lou Z, Zheng X, Mai D, Wang Y, Shang X, Xiao B, Peng J, Chen J. postGWAS: A web server for deciphering the causality post the genome-wide association studies. Comput Biol Med 2024; 171:108108. [PMID: 38359659 DOI: 10.1016/j.compbiomed.2024.108108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/23/2024] [Accepted: 02/04/2024] [Indexed: 02/17/2024]
Abstract
While genome-wide association studies (GWAS) have unequivocally identified vast disease susceptibility variants, a majority of them are situated in non-coding regions and are in high linkage disequilibrium (LD). To pave the way of translating GWAS signals to clinical drug targets, it is essential to identify the underlying causal variants and further causal genes. To this end, a myriad of post-GWAS methods have been devised, each grounded in distinct principles including fine-mapping, co-localization, and transcriptome-wide association study (TWAS) techniques. Yet, no platform currently exists that seamlessly integrates these diverse post-GWAS methodologies. In this work, we present a user-friendly web server for post-GWAS analysis, that seamlessly integrates 9 distinct methods with 12 models, categorized by fine-mapping, colocalization, and TWAS. The server mainly helps users decipher the causality hindered by complex GWAS signals, including casual variants and casual genes, without the burden of computational skills and complex environment configuration, and provides a convenient platform for post-GWAS analysis, result visualization, facilitating the understanding and interpretation of the genome-wide association studies. The postGWAS server is available at http://g2g.biographml.com/.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Zhihao Yan
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yiming Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Zhuofei Lou
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Xiaozhu Zheng
- Department of Anesthesiology, The People's Hospital of Yubei District, Chongqing, 401120, China
| | - DuoDuo Mai
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Bing Xiao
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Jing Chen
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, China.
| |
Collapse
|
10
|
Cai M, Wang Z, Xiao J, Hu X, Chen G, Yang C. XMAP: Cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias. Nat Commun 2023; 14:6870. [PMID: 37898663 PMCID: PMC10613261 DOI: 10.1038/s41467-023-42614-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 10/17/2023] [Indexed: 10/30/2023] Open
Abstract
Fine-mapping prioritizes risk variants identified by genome-wide association studies (GWASs), serving as a critical step to uncover biological mechanisms underlying complex traits. However, several major challenges still remain for existing fine-mapping methods. First, the strong linkage disequilibrium among variants can limit the statistical power and resolution of fine-mapping. Second, it is computationally expensive to simultaneously search for multiple causal variants. Third, the confounding bias hidden in GWAS summary statistics can produce spurious signals. To address these challenges, we develop a statistical method for cross-population fine-mapping (XMAP) by leveraging genetic diversity and accounting for confounding bias. By using cross-population GWAS summary statistics from global biobanks and genomic consortia, we show that XMAP can achieve greater statistical power, better control of false positive rate, and substantially higher computational efficiency for identifying multiple causal signals, compared to existing methods. Importantly, we show that the output of XMAP can be integrated with single-cell datasets, which greatly improves the interpretation of putative causal variants in their cellular context at single-cell resolution.
Collapse
Affiliation(s)
- Mingxuan Cai
- Department of Biostatistics, City University of Hong Kong, Hong Kong SAR, China.
| | - Zhiwei Wang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Jiashun Xiao
- Shenzhen Research Institute of Big Data, Shenzhen, 518172, China
| | - Xianghong Hu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- WeGene, Shenzhen Zaozhidao Technology Co., Ltd, Shenzhen, 518040, China
- Graduate Affairs, Faculty of Medicine, Chulalongkorn University, 10330, Bangkok, Thailand
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China.
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
11
|
Yu X, Xiao J, Cai M, Jiao Y, Wan X, Liu J, Yang C. PALM: a powerful and adaptive latent model for prioritizing risk variants with functional annotations. Bioinformatics 2023; 39:7028484. [PMID: 36744920 PMCID: PMC9950853 DOI: 10.1093/bioinformatics/btad068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/12/2023] [Accepted: 02/03/2023] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION The findings from genome-wide association studies (GWASs) have greatly helped us to understand the genetic basis of human complex traits and diseases. Despite the tremendous progress, much effects are still needed to address several major challenges arising in GWAS. First, most GWAS hits are located in the non-coding region of human genome, and thus their biological functions largely remain unknown. Second, due to the polygenicity of human complex traits and diseases, many genetic risk variants with weak or moderate effects have not been identified yet. RESULTS To address the above challenges, we propose a powerful and adaptive latent model (PALM) to integrate cell-type/tissue-specific functional annotations with GWAS summary statistics. Unlike existing methods, which are mainly based on linear models, PALM leverages a tree ensemble to adaptively characterize non-linear relationship between functional annotations and the association status of genetic variants. To make PALM scalable to millions of variants and hundreds of functional annotations, we develop a functional gradient-based expectation-maximization algorithm, to fit the tree-based non-linear model in a stable manner. Through comprehensive simulation studies, we show that PALM not only controls false discovery rate well, but also improves statistical power of identifying risk variants. We also apply PALM to integrate summary statistics of 30 GWASs with 127 cell type/tissue-specific functional annotations. The results indicate that PALM can identify more risk variants as well as rank the importance of functional annotations, yielding better interpretation of GWAS results. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/YangLabHKUST/PALM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xinyi Yu
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Jiashun Xiao
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Mingxuan Cai
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.,Department of Biostatistics, City University of Hong Kong, Hong Kong SAR, China
| | - Yuling Jiao
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
| | - Jin Liu
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore.,School of Data Science, The Chinese University of Hong Kong-Shenzhen, Shenzhen 518172, China
| | - Can Yang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| |
Collapse
|
12
|
Gedik H, Peterson RE, Riley BP, Vladimirov VI, Bacanu SA. Integrative Post-Genome-Wide Association Study Analyses Relevant to Psychiatric Disorders: Imputing Transcriptome and Proteome Signals. Complex Psychiatry 2023; 9:130-144. [PMID: 37588130 PMCID: PMC10425719 DOI: 10.1159/000530223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 03/09/2023] [Indexed: 08/18/2023] Open
Abstract
Background The genome-wide association study (GWAS) is a common tool to identify genetic variants associated with complex traits, including psychiatric disorders (PDs). However, post-GWAS analyses are needed to extend the statistical inference to biologically relevant entities, e.g., genes, proteins, and pathways. To achieve this goal, researchers developed methods that incorporate biologically relevant intermediate molecular phenotypes, such as gene expression and protein abundance, which are posited to mediate the variant-trait association. Transcriptome-wide association study (TWAS) and proteome-wide association study (PWAS) are commonly used methods to test the association between these molecular mediators and the trait. Summary In this review, we discuss the most recent developments in TWAS and PWAS. These methods integrate existing "omic" information with the GWAS summary statistics for trait(s) of interest. Specifically, they impute transcript/protein data and test the association between imputed gene expression/protein level with phenotype of interest by using (i) GWAS summary statistics and (ii) reference transcriptomic/proteomic/genomic datasets. TWAS and PWAS are suitable as analysis tools for (i) primary association scan and (ii) fine-mapping to identify potentially causal genes for PDs. Key Messages As post-GWAS analyses, TWAS and PWAS have the potential to highlight causal genes for PDs. These prioritized genes could indicate targets for the development of novel drug therapies. For researchers attempting such analyses, we recommend Mendelian randomization tools that use GWAS statistics for both trait and reference datasets, e.g., summary Mendelian randomization (SMR). We base our recommendation on (i) being able to use the same tool for both TWAS and PWAS, (ii) not requiring the pre-computed weights (and thus easier to update for larger reference datasets), and (iii) most larger transcriptome reference datasets are publicly available and easy to transform into a compatible format for SMR analysis.
Collapse
Affiliation(s)
- Huseyin Gedik
- Integrative Life Sciences, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Roseann E. Peterson
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Brien P. Riley
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Vladimir I. Vladimirov
- Department of Psychiatry, College of Medicine-Phoenix, University of Arizona, Phoenix, AZ, USA
| | - Silviu-Alin Bacanu
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| |
Collapse
|
13
|
Xiao J, Cai M, Yu X, Hu X, Chen G, Wan X, Yang C. Leveraging the local genetic structure for trans-ancestry association mapping. Am J Hum Genet 2022; 109:1317-1337. [PMID: 35714612 PMCID: PMC9300880 DOI: 10.1016/j.ajhg.2022.05.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 05/23/2022] [Indexed: 01/09/2023] Open
Abstract
Over the past two decades, genome-wide association studies (GWASs) have successfully advanced our understanding of the genetic basis of complex traits. Despite the fruitful discovery of GWASs, most GWAS samples are collected from European populations, and these GWASs are often criticized for their lack of ancestry diversity. Trans-ancestry association mapping (TRAM) offers an exciting opportunity to fill the gap of disparities in genetic studies between non-Europeans and Europeans. Here, we propose a statistical method, LOG-TRAM, to leverage the local genetic architecture for TRAM. By using biobank-scale datasets, we showed that LOG-TRAM can greatly improve the statistical power of identifying risk variants in under-represented populations while producing well-calibrated p values. We applied LOG-TRAM to the GWAS summary statistics of various complex traits/diseases from BioBank Japan, UK Biobank, and African populations. We obtained substantial gains in power and achieved effective correction of confounding biases in TRAM. Finally, we showed that LOG-TRAM can be successfully applied to identify ancestry-specific loci and the LOG-TRAM output can be further used for construction of more accurate polygenic risk scores in under-represented populations.
Collapse
Affiliation(s)
- Jiashun Xiao
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Mingxuan Cai
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xinyi Yu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xianghong Hu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China; Pazhou Lab, Guangzhou 510330, China.
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
14
|
Ji Y, Wei Q, Chen R, Wang Q, Tao R, Li B. Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery. PLoS Genet 2022; 18:e1009814. [PMID: 35771864 PMCID: PMC9278751 DOI: 10.1371/journal.pgen.1009814] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 07/13/2022] [Accepted: 05/26/2022] [Indexed: 12/30/2022] Open
Abstract
A common strategy for the functional interpretation of genome-wide association study (GWAS) findings has been the integrative analysis of GWAS and expression data. Using this strategy, many association methods (e.g., PrediXcan and FUSION) have been successful in identifying trait-associated genes via mediating effects on RNA expression. However, these approaches often ignore the effects of splicing, which can carry as much disease risk as expression. Compared to expression data, one challenge to detect associations using splicing data is the large multiple testing burden due to multidimensional splicing events within genes. Here, we introduce a multidimensional splicing gene (MSG) approach, which consists of two stages: 1) we use sparse canonical correlation analysis (sCCA) to construct latent canonical vectors (CVs) by identifying sparse linear combinations of genetic variants and splicing events that are maximally correlated with each other; and 2) we test for the association between the genetically regulated splicing CVs and the trait of interest using GWAS summary statistics. Simulations show that MSG has proper type I error control and substantial power gains over existing multidimensional expression analysis methods (i.e., S-MultiXcan, UTMOST, and sCCA+ACAT) under diverse scenarios. When applied to the Genotype-Tissue Expression Project data and GWAS summary statistics of 14 complex human traits, MSG identified on average 83%, 115%, and 223% more significant genes than sCCA+ACAT, S-MultiXcan, and UTMOST, respectively. We highlight MSG's applications to Alzheimer's disease, low-density lipoprotein cholesterol, and schizophrenia, and found that the majority of MSG-identified genes would have been missed from expression-based analyses. Our results demonstrate that aggregating splicing data through MSG can improve power in identifying gene-trait associations and help better understand the genetic risk of complex traits.
Collapse
Affiliation(s)
- Ying Ji
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Qiang Wei
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Rui Chen
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Quan Wang
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Ran Tao
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- * E-mail: (RT); (BL)
| | - Bingshan Li
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- * E-mail: (RT); (BL)
| |
Collapse
|
15
|
Cao C, Kossinna P, Kwok D, Li Q, He J, Su L, Guo X, Zhang Q, Long Q. Disentangling genetic feature selection and aggregation in transcriptome-wide association studies. Genetics 2021; 220:6444993. [PMID: 34849857 DOI: 10.1093/genetics/iyab216] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 11/04/2021] [Indexed: 12/14/2022] Open
Abstract
The success of transcriptome-wide association studies (TWAS) has led to substantial research towards improving the predictive accuracy of its core component of Genetically Regulated eXpression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps-feature selection and feature aggregation-which can be independently conducted. In this work, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.
Collapse
Affiliation(s)
- Chen Cao
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Pathum Kossinna
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Devin Kwok
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Qing Li
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Jingni He
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Liya Su
- Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Xingyi Guo
- Division of Epidemiology, Department of Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Qingrun Zhang
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.,Department of Mathematics & Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Quan Long
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.,Department of Mathematics & Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada.,Department of Medical Genetics, University of Calgary, Calgary, AB T2N 4N1, Canada.,Hotchkiss Brain Institute, O'Brien Institute for Public Health, University of Calgary, Calgary, AB T2N 4N1, Canada
| |
Collapse
|
16
|
Cao C, Wang J, Kwok D, Cui F, Zhang Z, Zhao D, Li MJ, Zou Q. webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res 2021; 50:D1123-D1130. [PMID: 34669946 PMCID: PMC8728162 DOI: 10.1093/nar/gkab957] [Citation(s) in RCA: 128] [Impact Index Per Article: 42.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 09/24/2021] [Accepted: 10/05/2021] [Indexed: 12/20/2022] Open
Abstract
The development of transcriptome-wide association studies (TWAS) has enabled researchers to better identify and interpret causal genes in many diseases. However, there are currently no resources providing a comprehensive listing of gene-disease associations discovered by TWAS from published GWAS summary statistics. TWAS analyses are also difficult to conduct due to the complexity of TWAS software pipelines. To address these issues, we introduce a new resource called webTWAS, which integrates a database of the most comprehensive disease GWAS datasets currently available with credible sets of potential causal genes identified by multiple TWAS software packages. Specifically, a total of 235 064 gene-diseases associations for a wide range of human diseases are prioritized from 1298 high-quality downloadable European GWAS summary statistics. Associations are calculated with seven different statistical models based on three popular and representative TWAS software packages. Users can explore associations at the gene or disease level, and easily search for related studies or diseases using the MeSH disease tree. Since the effects of diseases are highly tissue-specific, webTWAS applies tissue-specific enrichment analysis to identify significant tissues. A user-friendly web server is also available to run custom TWAS analyses on user-provided GWAS summary statistics data. webTWAS is freely available at http://www.webtwas.net.
Collapse
Affiliation(s)
- Chen Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Jianhua Wang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Devin Kwok
- School of Computer Science, McGill University, Montreal, Canada
| | - Feifei Cui
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Zilong Zhang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Da Zhao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Mulin Jun Li
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
17
|
Cai M, Xiao J, Zhang S, Wan X, Zhao H, Chen G, Yang C. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am J Hum Genet 2021; 108:632-655. [PMID: 33770506 PMCID: PMC8059341 DOI: 10.1016/j.ajhg.2021.03.002] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 03/01/2021] [Indexed: 12/29/2022] Open
Abstract
The development of polygenic risk scores (PRSs) has proved useful to stratify the general European population into different risk groups. However, PRSs are less accurate in non-European populations due to genetic differences across different populations. To improve the prediction accuracy in non-European populations, we propose a cross-population analysis framework for PRS construction with both individual-level (XPA) and summary-level (XPASS) GWAS data. By leveraging trans-ancestry genetic correlation, our methods can borrow information from the Biobank-scale European population data to improve risk prediction in the non-European populations. Our framework can also incorporate population-specific effects to further improve construction of PRS. With innovations in data structure and algorithm design, our methods provide a substantial saving in computational time and memory usage. Through comprehensive simulation studies, we show that our framework provides accurate, efficient, and robust PRS construction across a range of genetic architectures. In a Chinese cohort, our methods achieved 7.3%-198.0% accuracy gain for height and 19.5%-313.3% accuracy gain for body mass index (BMI) in terms of predictive R2 compared to existing PRS approaches. We also show that XPA and XPASS can achieve substantial improvement for construction of height PRSs in the African population, suggesting the generality of our framework across global populations.
Collapse
Affiliation(s)
- Mingxuan Cai
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Jiashun Xiao
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Shunkang Zhang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
| | - Hongyu Zhao
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai 201111, China; Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China; Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
18
|
Zeng P, Dai J, Jin S, Zhou X. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum Mol Genet 2021; 30:939-951. [PMID: 33615361 DOI: 10.1093/hmg/ddab056] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 02/10/2021] [Accepted: 02/15/2021] [Indexed: 12/11/2022] Open
Abstract
Transcriptome-wide association study (TWAS) is an important integrative method for identifying genes that are causally associated with phenotypes. A key step of TWAS involves the construction of expression prediction models for every gene in turn using its cis-SNPs as predictors. Different TWAS methods rely on different models for gene expression prediction, and each such model makes a distinct modeling assumption that is often suitable for a particular genetic architecture underlying expression. However, the genetic architectures underlying gene expression vary across genes throughout the transcriptome. Consequently, different TWAS methods may be beneficial in detecting genes with distinct genetic architectures. Here, we develop a new method, HMAT, which aggregates TWAS association evidence obtained across multiple gene expression prediction models by leveraging the harmonic mean P-value combination strategy. Because each expression prediction model is suited to capture a particular genetic architecture, aggregating TWAS associations across prediction models as in HMAT improves accurate expression prediction and enables subsequent powerful TWAS analysis across the transcriptome. A key feature of HMAT is its ability to accommodate the correlations among different TWAS test statistics and produce calibrated P-values after aggregation. Through numerical simulations, we illustrated the advantage of HMAT over commonly used TWAS methods as well as ad hoc P-value combination rules such as Fisher's method. We also applied HMAT to analyze summary statistics of nine common diseases. In the real data applications, HMAT was on average 30.6% more powerful compared to the next best method, detecting many new disease-associated genes that were otherwise not identified by existing TWAS approaches. In conclusion, HMAT represents a flexible and powerful TWAS method that enjoys robust performance across a range of genetic architectures underlying gene expression.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Jing Dai
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Siyi Jin
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|