1
|
Cao C, Shao M, Wang J, Li Z, Chen H, You T, Li MJ, Ding Y, Zou Q. webTWAS 2.0: update platform for identifying complex disease susceptibility genes through transcriptome-wide association study. Nucleic Acids Res 2025; 53:D1261-D1269. [PMID: 39526380 PMCID: PMC11701649 DOI: 10.1093/nar/gkae1022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/14/2024] [Accepted: 10/17/2024] [Indexed: 11/16/2024] Open
Abstract
Transcriptome-wide association study (TWAS) has successfully identified numerous complex disease susceptibility genes in the post-genome-wide association study (GWAS) era. Over the past 3 years, the focus of TWAS algorithms has shifted from merely identifying associations to understanding how single nucleotide polymorphisms (SNPs) regulate gene expression, with a growing emphasis on incorporating fine-mapping techniques. Additionally, the rapid increase in GWAS summary statistics, driven largely by the UK Biobank and other consortia, has made it essential to update our webTWAS resource. To address these challenges and meet the growing needs of researchers, we developed webTWAS 2.0, an updated platform for identifying susceptibility genes for human complex diseases using TWAS. Additionally, webTWAS 2.0 provides an online TWAS analysis tool that simplifies conducting TWAS analyses. The updated resource includes 7247 GWAS summary statistics covering 1588 complex human diseases from 192 publications. It also incorporates multiple TWAS methods, such as sTF-TWAS, 3'aTWAS and GIFT, along with an updated interactive visualization tool that allows users to easily explore significant associations across different methods. Other upgrades include a personalized online analysis tool for user-submitted GWAS data and a refined search function that makes it easier to identify relevant associations and meet diverse user needs more efficiently. webTWAS 2.0 is freely accessible at http://www.webtwas.net.
Collapse
Affiliation(s)
- Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University,101 Longmian Ave, Nanjing, Jiangsu 211166, China
| | - Mengting Shao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University,101 Longmian Ave, Nanjing, Jiangsu 211166, China
| | - Jianhua Wang
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania—Perelman School of Medicine, 421 Curie Blvd, Philadelphia, PA 19104, USA
| | - Zhenghui Li
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University,101 Longmian Ave, Nanjing, Jiangsu 211166, China
| | - Haoran Chen
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University,101 Longmian Ave, Nanjing, Jiangsu 211166, China
| | - Tianyi You
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, 22 Qixiangtai Road, Tianjin 300203, China
| | - Mulin Jun Li
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, 22 Qixiangtai Road, Tianjin 300203, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang 324003, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang 324003, China
| |
Collapse
|
2
|
Wang N, Ye Z, Ma T. TIPS: a novel pathway-guided joint model for transcriptome-wide association studies. Brief Bioinform 2024; 25:bbae587. [PMID: 39550224 PMCID: PMC11568880 DOI: 10.1093/bib/bbae587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 10/03/2024] [Accepted: 10/30/2024] [Indexed: 11/18/2024] Open
Abstract
In the past two decades, genome-wide association studies (GWAS) have pinpointed numerous SNPs linked to human diseases and traits, yet many of these SNPs are in non-coding regions and hard to interpret. Transcriptome-wide association studies (TWAS) integrate GWAS and expression reference panels to identify the associations at gene level with tissue specificity, potentially improving the interpretability. However, the list of individual genes identified from univariate TWAS contains little unifying biological theme, leaving the underlying mechanisms largely elusive. In this paper, we propose a novel multivariate TWAS method that Incorporates Pathway or gene Set information, namely TIPS, to identify genes and pathways most associated with complex polygenic traits. We jointly modeled the imputation and association steps in TWAS, incorporated a sparse group lasso penalty in the model to induce selection at both gene and pathway levels and developed an expectation-maximization algorithm to estimate the parameters for the penalized likelihood. We applied our method to three different complex traits: systolic and diastolic blood pressure, as well as a brain aging biomarker white matter brain age gap in UK Biobank and identified critical biologically relevant pathways and genes associated with these traits. These pathways cannot be detected by traditional univariate TWAS + pathway enrichment analysis approach, showing the power of our model. We also conducted comprehensive simulations with varying heritability levels and genetic architectures and showed our method outperformed other established TWAS methods in feature selection, statistical power, and prediction. The R package that implements TIPS is available at https://github.com/nwang123/TIPS.
Collapse
Affiliation(s)
- Neng Wang
- Department of Mathematics, University of Maryland, College Park, MD 20742, United States
- Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20742, United States
| | - Zhenyao Ye
- Department of Epidemiology and Public Health, University of Maryland, Baltimore, MD 21201, United States
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20742, United States
| |
Collapse
|
3
|
Parrish RL, Buchman AS, Tasaki S, Wang Y, Avey D, Xu J, De Jager PL, Bennett DA, Epstein MP, Yang J. SR-TWAS: leveraging multiple reference panels to improve transcriptome-wide association study power by ensemble machine learning. Nat Commun 2024; 15:6646. [PMID: 39103319 DOI: 10.1038/s41467-024-50983-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 07/26/2024] [Indexed: 08/07/2024] Open
Abstract
Multiple reference panels of a given tissue or multiple tissues often exist, and multiple regression methods could be used for training gene expression imputation models for transcriptome-wide association studies (TWAS). To leverage expression imputation models (i.e., base models) trained with multiple reference panels, regression methods, and tissues, we develop a Stacked Regression based TWAS (SR-TWAS) tool which can obtain optimal linear combinations of base models for a given validation transcriptomic dataset. Both simulation and real studies show that SR-TWAS improves power, due to increased training sample sizes and borrowed strength across multiple regression methods and tissues. Leveraging base models across multiple reference panels, tissues, and regression methods, our real studies identify 6 independent significant risk genes for Alzheimer's disease (AD) dementia for supplementary motor area tissue and 9 independent significant risk genes for Parkinson's disease (PD) for substantia nigra tissue. Relevant biological interpretations are found for these significant risk genes.
Collapse
Affiliation(s)
- Randy L Parrish
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Biostatistics, Emory University School of Public Health, Atlanta, GA, 30322, USA
| | - Aron S Buchman
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Yanling Wang
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Denis Avey
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Jishu Xu
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Philip L De Jager
- Center for Translational and Computational Neuroimmunology, Department of Neurology and Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Michael P Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
4
|
Zhang Y, Wang M, Li Z, Yang X, Li K, Xie A, Dong F, Wang S, Yan J, Liu J. An overview of detecting gene-trait associations by integrating GWAS summary statistics and eQTLs. SCIENCE CHINA. LIFE SCIENCES 2024; 67:1133-1154. [PMID: 38568343 DOI: 10.1007/s11427-023-2522-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/29/2024] [Indexed: 06/07/2024]
Abstract
Detecting genes that affect specific traits (such as human diseases and crop yields) is important for treating complex diseases and improving crop quality. A genome-wide association study (GWAS) provides new insights and directions for understanding complex traits by identifying important single nucleotide polymorphisms. Many GWAS summary statistics data related to various complex traits have been gathered recently. Studies have shown that GWAS risk loci and expression quantitative trait loci (eQTLs) often have a lot of overlaps, which makes gene expression gradually become an important intermediary to reveal the regulatory role of GWAS. In this review, we review three types of gene-trait association detection methods of integrating GWAS summary statistics and eQTLs data, namely colocalization methods, transcriptome-wide association study-oriented approaches, and Mendelian randomization-related methods. At the theoretical level, we discussed the differences, relationships, advantages, and disadvantages of various algorithms in the three kinds of gene-trait association detection methods. To further discuss the performance of various methods, we summarize the significant gene sets that influence high-density lipoprotein, low-density lipoprotein, total cholesterol, and triglyceride reported in 16 studies. We discuss the performance of various algorithms using the datasets of the four lipid traits. The advantages and limitations of various algorithms are analyzed based on experimental results, and we suggest directions for follow-up studies on detecting gene-trait associations.
Collapse
Affiliation(s)
- Yang Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Mengyao Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zhenguo Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xuan Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Keqin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ao Xie
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Fang Dong
- College of Life Sciences, Nankai University, Tianjin, 300071, China
| | - Shihan Wang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
5
|
Wei K, Lu Y, Ma X, Duan A, Lu X, Abdel-Shafy H, Deng T. Transcriptome-Wide Association Study Reveals Potentially Candidate Genes Responsible for Milk Production Traits in Buffalo. Int J Mol Sci 2024; 25:2626. [PMID: 38473873 DOI: 10.3390/ijms25052626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/17/2024] [Accepted: 02/21/2024] [Indexed: 03/14/2024] Open
Abstract
Identifying key causal genes is critical for unraveling the genetic basis of complex economic traits, yet it remains a formidable challenge. The advent of large-scale sequencing data and computational algorithms, such as transcriptome-wide association studies (TWASs), offers a promising avenue for identifying potential causal genes. In this study, we harnessed the power of TWAS to identify genes potentially responsible for milk production traits, including daily milk yield (MY), fat percentage (FP), and protein percentage (PP), within a cohort of 100 buffaloes. Our approach began by generating the genotype and expression profiles for these 100 buffaloes through whole-genome resequencing and RNA sequencing, respectively. Through comprehensive genome-wide association studies (GWAS), we pinpointed a total of seven and four single nucleotide polymorphisms (SNPs) significantly associated with MY and FP traits, respectively. By using TWAS, we identified 55, 71, and 101 genes as significant signals for MY, FP, and PP traits, respectively. To delve deeper, we conducted protein-protein interaction (PPI) analysis, revealing the categorization of these genes into distinct PPI networks. Interestingly, several TWAS-identified genes within the PPI network played a vital role in milk performance. These findings open new avenues for identifying potentially causal genes underlying important traits, thereby offering invaluable insights for genomics and breeding in buffalo populations.
Collapse
Affiliation(s)
- Kelong Wei
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| | - Ying Lu
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| | - Xiaoya Ma
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| | - Anqian Duan
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| | - Xingrong Lu
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| | - Hamdy Abdel-Shafy
- Department of Animal Production, Faculty of Agriculture, Cairo University, Giza 12613, Egypt
| | - Tingxian Deng
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| |
Collapse
|
6
|
Zhu Z, Chen X, Zhang S, Yu R, Qi C, Cheng L, Zhang X. Leveraging molecular quantitative trait loci to comprehend complex diseases/traits from the omics perspective. Hum Genet 2023; 142:1543-1560. [PMID: 37755483 DOI: 10.1007/s00439-023-02602-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 09/14/2023] [Indexed: 09/28/2023]
Abstract
Comprehending the molecular basis of quantitative genetic variation is a principal goal for complex diseases or traits. Molecular quantitative trait loci (molQTLs) have made it possible to investigate the effects of genetic variants hiding behind large-scale omics data. A deeper understanding of molQTL is urgently required in light of the multi-dimensionalization of omics data to more fully elucidate the pertinent biological mechanisms. Herein, we reviewed molQTLs with the corresponding resource from the omics perspective and further discussed the integrative strategy of GWAS-molQTL to infer their causal effects. Subsequently, we described the opportunities and challenges encountered by molQTL. The case studies showed that molQTL is essential for complex diseases and traits, whether single- or multi-omics QTLs. Overall, we highlighted the functional significance of genetic variants to employ the discovery of molQTL in complex diseases and traits.
Collapse
Affiliation(s)
- Zijun Zhu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Xinyu Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Sainan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Rui Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Changlu Qi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China.
- NHC Key Laboratory of Molecular Probe and Targeted Diagnosis and Therapy, Harbin Medical University, Harbin, 150028, Heilongjiang, China.
| | - Xue Zhang
- NHC Key Laboratory of Molecular Probe and Targeted Diagnosis and Therapy, Harbin Medical University, Harbin, 150028, Heilongjiang, China
- McKusick-Zhang Center for Genetic Medicine, State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China
| |
Collapse
|
7
|
Cai M, Wang Z, Xiao J, Hu X, Chen G, Yang C. XMAP: Cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias. Nat Commun 2023; 14:6870. [PMID: 37898663 PMCID: PMC10613261 DOI: 10.1038/s41467-023-42614-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 10/17/2023] [Indexed: 10/30/2023] Open
Abstract
Fine-mapping prioritizes risk variants identified by genome-wide association studies (GWASs), serving as a critical step to uncover biological mechanisms underlying complex traits. However, several major challenges still remain for existing fine-mapping methods. First, the strong linkage disequilibrium among variants can limit the statistical power and resolution of fine-mapping. Second, it is computationally expensive to simultaneously search for multiple causal variants. Third, the confounding bias hidden in GWAS summary statistics can produce spurious signals. To address these challenges, we develop a statistical method for cross-population fine-mapping (XMAP) by leveraging genetic diversity and accounting for confounding bias. By using cross-population GWAS summary statistics from global biobanks and genomic consortia, we show that XMAP can achieve greater statistical power, better control of false positive rate, and substantially higher computational efficiency for identifying multiple causal signals, compared to existing methods. Importantly, we show that the output of XMAP can be integrated with single-cell datasets, which greatly improves the interpretation of putative causal variants in their cellular context at single-cell resolution.
Collapse
Affiliation(s)
- Mingxuan Cai
- Department of Biostatistics, City University of Hong Kong, Hong Kong SAR, China.
| | - Zhiwei Wang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Jiashun Xiao
- Shenzhen Research Institute of Big Data, Shenzhen, 518172, China
| | - Xianghong Hu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- WeGene, Shenzhen Zaozhidao Technology Co., Ltd, Shenzhen, 518040, China
- Graduate Affairs, Faculty of Medicine, Chulalongkorn University, 10330, Bangkok, Thailand
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China.
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
8
|
Lu H, Zhang S, Jiang Z, Zeng P. Leveraging trans-ethnic genetic risk scores to improve association power for complex traits in underrepresented populations. Brief Bioinform 2023:bbad232. [PMID: 37332016 DOI: 10.1093/bib/bbad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 05/06/2023] [Accepted: 06/04/2023] [Indexed: 06/20/2023] Open
Abstract
Trans-ethnic genome-wide association studies have revealed that many loci identified in European populations can be reproducible in non-European populations, indicating widespread trans-ethnic genetic similarity. However, how to leverage such shared information more efficiently in association analysis is less investigated for traits in underrepresented populations. We here propose a statistical framework, trans-ethnic genetic risk score informed gene-based association mixed model (GAMM), by hierarchically modeling single-nucleotide polymorphism effects in the target population as a function of effects of the same trait in well-studied populations. GAMM powerfully integrates genetic similarity across distinct ancestral groups to enhance power in understudied populations, as confirmed by extensive simulations. We illustrate the usefulness of GAMM via the application to 13 blood cell traits (i.e. basophil count, eosinophil count, hematocrit, hemoglobin concentration, lymphocyte count, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, monocyte count, neutrophil count, platelet count, red blood cell count and total white blood cell count) in Africans of the UK Biobank (n = 3204) while utilizing genetic overlap shared in Europeans (n = 746 667) and East Asians (n = 162 255). We discovered multiple new associated genes, which had otherwise been missed by existing methods, and revealed that the trans-ethnic information indirectly contributed much to the phenotypic variance. Overall, GAMM represents a flexible and powerful statistical framework of association analysis for complex traits in underrepresented populations by integrating trans-ethnic genetic similarity across well-studied populations, and helps attenuate health inequities in current genetics research for people of minority populations.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
9
|
Dai Q, Zhou G, Zhao H, Võsa U, Franke L, Battle A, Teumer A, Lehtimäki T, Raitakari OT, Esko T, Epstein MP, Yang J. OTTERS: a powerful TWAS framework leveraging summary-level reference data. Nat Commun 2023; 14:1271. [PMID: 36882394 PMCID: PMC9992663 DOI: 10.1038/s41467-023-36862-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 02/20/2023] [Indexed: 03/09/2023] Open
Abstract
Most existing TWAS tools require individual-level eQTL reference data and thus are not applicable to summary-level reference eQTL datasets. The development of TWAS methods that can harness summary-level reference data is valuable to enable TWAS in broader settings and enhance power due to increased reference sample size. Thus, we develop a TWAS framework called OTTERS (Omnibus Transcriptome Test using Expression Reference Summary data) that adapts multiple polygenic risk score (PRS) methods to estimate eQTL weights from summary-level eQTL reference data and conducts an omnibus TWAS. We show that OTTERS is a practical and powerful TWAS tool by both simulations and application studies.
Collapse
Affiliation(s)
- Qile Dai
- Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA, 30322, USA
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Geyu Zhou
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Urmo Võsa
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 50090, Tartu, Estonia
| | - Lude Franke
- Department of Genetics, University of Groningen, University Medical Center Groningen, 9700 RB, Groningen, The Netherlands
- Oncode Institute, 3521 AL, Utrecht, The Netherlands
| | - Alexis Battle
- Department of Computer Science, and Departments of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Alexander Teumer
- Institute for Community Medicine, University Medicine Greifswald, 17489, Greifswald, Germany
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Centre for Cardiovascular Disease Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, 33520, Finland
| | - Olli T Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, 20520, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, 20520, Turku, Finland
- Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, 20521, Turku, Finland
| | - Tõnu Esko
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 50090, Tartu, Estonia
| | - Michael P Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
10
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
11
|
Li Z, Zhao W, Shang L, Mosley TH, Kardia SLR, Smith JA, Zhou X. METRO: Multi-ancestry transcriptome-wide association studies for powerful gene-trait association detection. Am J Hum Genet 2022; 109:783-801. [PMID: 35334221 PMCID: PMC9118130 DOI: 10.1016/j.ajhg.2022.03.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 03/01/2022] [Indexed: 12/23/2022] Open
Abstract
Integrative analysis of genome-wide association studies (GWASs) and gene expression studies in the form of a transcriptome-wide association study (TWAS) has the potential to better elucidate the molecular mechanisms underlying disease etiology. Here we present a method, METRO, that can leverage gene expression data collected from multiple genetic ancestries to enhance TWASs. METRO incorporates expression prediction models constructed in different genetic ancestries through a likelihood-based inference framework, producing calibrated p values with substantially improved TWAS power. We illustrate the benefits of METRO in both simulations and applications to seven complex traits and diseases obtained from four GWASs. These GWASs include two of primarily European ancestry (n = 188,577 and 339,226) and two of primarily African ancestry (n = 42,752 and 23,827). In the real data applications, we leverage gene expression data measured on 1,032 African Americans and 801 European Americans from the Genetic Epidemiology Network of Arteriopathy (GENOA) study to identify a substantially larger number of gene-trait associations as compared to existing TWAS approaches. The benefits of METRO are most prominent in applications to GWASs of African ancestry where the sample size is much smaller than GWASs of European ancestry and where a more powerful TWAS method is crucial. Among the identified associations are high-density lipoprotein-associated genes including PLTP and PPARG that are critical for maintaining lipid homeostasis and the type II diabetes-associated gene MAPT that supports microtubule-associated protein tau as a key component underlying impaired insulin secretion.
Collapse
Affiliation(s)
- Zheng Li
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lulu Shang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Thomas H Mosley
- Memory Impairment and Neurodegenerative Dementia (MIND) Center, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
12
|
Wang A, Liu W, Liu Z. A two-sample robust Bayesian Mendelian Randomization method accounting for linkage disequilibrium and idiosyncratic pleiotropy with applications to the COVID-19 outcomes. Genet Epidemiol 2022; 46:159-169. [PMID: 35192729 PMCID: PMC9648496 DOI: 10.1002/gepi.22445] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 11/03/2021] [Accepted: 01/20/2022] [Indexed: 01/02/2023]
Abstract
Mendelian randomization (MR) is a statistical method exploiting genetic variants as instrumental variables to estimate the causal effect of modifiable risk factors on an outcome of interest. Despite wide uses of various popular two-sample MR methods based on genome-wide association study summary level data, however, those methods could suffer from potential power loss or/and biased inference when the chosen genetic variants are in linkage disequilibrium (LD), and also have relatively large direct effects on the outcome whose distribution might be heavy-tailed which is commonly referred to as the idiosyncratic pleiotropy phenomenon. To resolve those two issues, we propose a novel Robust Bayesian Mendelian Randomization (RBMR) model that uses the more robust multivariate generalized t $t$ -distribution to model such direct effects in a probabilistic model framework which can also incorporate the LD structure explicitly. The generalized t $t$ -distribution can be represented as a Gaussian scaled mixture so that our model parameters can be estimated by the expectation maximization (EM)-type algorithms. We compute the standard errors by calibrating the evidence lower bound using the likelihood ratio test. Through extensive simulation studies, we show that our RBMR has robust performance compared with other competing methods. We further apply our RBMR method to two benchmark data sets and find that RBMR has smaller bias and standard errors. Using our proposed RBMR method, we find that coronary artery disease is associated with increased risk of critically ill coronavirus disease 2019. We also develop a user-friendly R package RBMR (https://github.com/AnqiWang2021/RBMR) for public use.
Collapse
Affiliation(s)
- Anqi Wang
- Department of Statistics and Actuarial ScienceUniversity of Hong KongHong KongSARChina
| | - Wei Liu
- Department of Statistics and Actuarial ScienceUniversity of Hong KongHong KongSARChina
| | - Zhonghua Liu
- Department of Statistics and Actuarial ScienceUniversity of Hong KongHong KongSARChina
| |
Collapse
|
13
|
Towards the Genetic Architecture of Complex Gene Expression Traits: Challenges and Prospects for eQTL Mapping in Humans. Genes (Basel) 2022; 13:genes13020235. [PMID: 35205280 PMCID: PMC8871770 DOI: 10.3390/genes13020235] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/21/2022] [Accepted: 01/25/2022] [Indexed: 12/10/2022] Open
Abstract
The discovery of expression quantitative trait loci (eQTLs) and their target genes (eGenes) has not only compensated for the limitations of genome-wide association studies for complex phenotypes but has also provided a basis for predicting gene expression. Efforts have been made to develop analytical methods in statistical genetics, a key discipline in eQTL analysis. In particular, mixed model– and deep learning–based analytical methods have been extremely beneficial in mapping eQTLs and predicting gene expression. Nevertheless, we still face many challenges associated with eQTL discovery. Here, we discuss two key aspects of these challenges: 1, the complexity of eTraits with various factors such as polygenicity and epistasis and 2, the voluminous work required for various types of eQTL profiles. The properties and prospects of statistical methods, including the mixed model method, Bayesian inference, the deep learning method, and the integration method, are presented as future directions for eQTL discovery. This review will help expedite the design and use of efficient methods for eQTL discovery and eTrait prediction.
Collapse
|
14
|
Yang Y, Yeung KF, Liu J. CoMM-S 4: A Collaborative Mixed Model Using Summary-Level eQTL and GWAS Datasets in Transcriptome-Wide Association Studies. Front Genet 2021; 12:704538. [PMID: 34616426 PMCID: PMC8488198 DOI: 10.3389/fgene.2021.704538] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 09/03/2021] [Indexed: 11/13/2022] Open
Abstract
Motivation: Genome-wide association studies (GWAS) have achieved remarkable success in identifying SNP-trait associations in the last decade. However, it is challenging to identify the mechanisms that connect the genetic variants with complex traits as the majority of GWAS associations are in non-coding regions. Methods that integrate genomic and transcriptomic data allow us to investigate how genetic variants may affect a trait through their effect on gene expression. These include CoMM and CoMM-S2, likelihood-ratio-based methods that integrate GWAS and eQTL studies to assess expression-trait association. However, their reliance on individual-level eQTL data render them inapplicable when only summary-level eQTL results, such as those from large-scale eQTL analyses, are available. Result: We develop an efficient probabilistic model, CoMM-S4, to explore the expression-trait association using summary-level eQTL and GWAS datasets. Compared with CoMM-S2, which uses individual-level eQTL data, CoMM-S4 requires only summary-level eQTL data. To test expression-trait association, an efficient variational Bayesian EM algorithm and a likelihood ratio test were constructed. We applied CoMM-S4 to both simulated and real data. The simulation results demonstrate that CoMM-S4 can perform as well as CoMM-S2 and S-PrediXcan, and analyses using GWAS summary statistics from Biobank Japan and eQTL summary statistics from eQTLGen and GTEx suggest novel susceptibility loci for cardiovascular diseases and osteoporosis. Availability and implementation: The developed R package is available at https://github.com/gordonliu810822/CoMM.
Collapse
Affiliation(s)
- Yi Yang
- Centre for Quantitative Medicine, Program in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Kar-Fu Yeung
- Centre for Quantitative Medicine, Program in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
| | - Jin Liu
- Centre for Quantitative Medicine, Program in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
| |
Collapse
|
15
|
Xue H, Shen X, Pan W. Constrained maximum likelihood-based Mendelian randomization robust to both correlated and uncorrelated pleiotropic effects. Am J Hum Genet 2021; 108:1251-1269. [PMID: 34214446 PMCID: PMC8322939 DOI: 10.1016/j.ajhg.2021.05.014] [Citation(s) in RCA: 160] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 05/25/2021] [Indexed: 12/23/2022] Open
Abstract
With the increasing availability of large-scale GWAS summary data on various complex traits and diseases, there have been tremendous interests in applications of Mendelian randomization (MR) to investigate causal relationships between pairs of traits using SNPs as instrumental variables (IVs) based on observational data. In spite of the potential significance of such applications, the validity of their causal conclusions critically depends on some strong modeling assumptions required by MR, which may be violated due to the widespread (horizontal) pleiotropy. Although many MR methods have been proposed recently to relax the assumptions by mainly dealing with uncorrelated pleiotropy, only a few can handle correlated pleiotropy, in which some SNPs/IVs may be associated with hidden confounders, such as some heritable factors shared by both traits. Here we propose a simple and effective approach based on constrained maximum likelihood and model averaging, called cML-MA, applicable to GWAS summary data. To deal with more challenging situations with many invalid IVs with only weak pleiotropic effects, we modify and improve it with data perturbation. Extensive simulations demonstrated that the proposed methods could control the type I error rate better while achieving higher power than other competitors. Applications to 48 risk factor-disease pairs based on large-scale GWAS summary data of 3 cardio-metabolic diseases (coronary artery disease, stroke, and type 2 diabetes), asthma, and 12 risk factors confirmed its superior performance.
Collapse
Affiliation(s)
- Haoran Xue
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA; Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
16
|
Knutson KA, Pan W. Integrating brain imaging endophenotypes with GWAS for Alzheimer's disease. QUANTITATIVE BIOLOGY 2021; 9:185-200. [PMID: 35399757 PMCID: PMC8993183 DOI: 10.1007/s40484-020-0202-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 02/11/2020] [Accepted: 02/28/2020] [Indexed: 01/09/2023]
Abstract
Background Genome wide association studies (GWAS) have identified many genetic variants associated with increased risk of Alzheimer's disease (AD). These susceptibility loci may effect AD indirectly through a combination of physiological brain changes. Many of these neuropathologic features are detectable via magnetic resonance imaging (MRI). Methods In this study, we examine the effects of such brain imaging derived phenotypes (IDPs) with genetic etiology on AD, using and comparing the following methods: two-sample Mendelian randomization (2SMR), generalized summary statistics based Mendelian randomization (GSMR), transcriptome wide association studies (TWAS) and the adaptive sum of powered score (aSPU) test. These methods do not require individual-level genotypic and phenotypic data but instead can rely only on an external reference panel and GWAS summary statistics. Results Using publicly available GWAS datasets from the International Genomics of Alzheimer's Project (IGAP) and UK Biobank's (UKBB) brain imaging initiatives, we identify 35 IDPs possibly associated with AD, many of which have well established or biologically plausible links to the characteristic cognitive impairments of this neurodegenerative disease. Conclusions Our results highlight the increased power for detecting genetic associations achieved by multiple correlated SNP-based methods, i.e., aSPU, GSMR and TWAS, over MR methods based on independent SNPs (as instrumental variables).
Collapse
Affiliation(s)
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
17
|
Tang S, Buchman AS, De Jager PL, Bennett DA, Epstein MP, Yang J. Novel Variance-Component TWAS method for studying complex human diseases with applications to Alzheimer's dementia. PLoS Genet 2021; 17:e1009482. [PMID: 33798195 PMCID: PMC8046351 DOI: 10.1371/journal.pgen.1009482] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 04/14/2021] [Accepted: 03/15/2021] [Indexed: 02/07/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) have been widely used to integrate transcriptomic and genetic data to study complex human diseases. Within a test dataset lacking transcriptomic data, traditional two-stage TWAS methods first impute gene expression by creating a weighted sum that aggregates SNPs with their corresponding cis-eQTL effects on reference transcriptome. Traditional TWAS methods then employ a linear regression model to assess the association between imputed gene expression and test phenotype, thereby assuming the effect of a cis-eQTL SNP on test phenotype is a linear function of the eQTL's estimated effect on reference transcriptome. To increase TWAS robustness to this assumption, we propose a novel Variance-Component TWAS procedure (VC-TWAS) that assumes the effects of cis-eQTL SNPs on phenotype are random (with variance proportional to corresponding reference cis-eQTL effects) rather than fixed. VC-TWAS is applicable to both continuous and dichotomous phenotypes, as well as individual-level and summary-level GWAS data. Using simulated data, we show VC-TWAS is more powerful than traditional TWAS methods based on a two-stage Burden test, especially when eQTL genetic effects on test phenotype are no longer a linear function of their eQTL genetic effects on reference transcriptome. We further applied VC-TWAS to both individual-level (N = ~3.4K) and summary-level (N = ~54K) GWAS data to study Alzheimer's dementia (AD). With the individual-level data, we detected 13 significant risk genes including 6 known GWAS risk genes such as TOMM40 that were missed by traditional TWAS methods. With the summary-level data, we detected 57 significant risk genes considering only cis-SNPs and 71 significant genes considering both cis- and trans- SNPs, which also validated our findings with the individual-level GWAS data. Our VC-TWAS method is implemented in the TIGAR tool for public use.
Collapse
Affiliation(s)
- Shizhen Tang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia, United States of America
- Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, Georgia, United States of America
| | - Aron S. Buchman
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, United States of America
| | - Philip L. De Jager
- Center for Translational and Computational Neuroimmunology, Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, New York, United States of America
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, United States of America
| | - Michael P. Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| |
Collapse
|
18
|
Using Collaborative Mixed Models to Account for Imputation Uncertainty in Transcriptome-Wide Association Studies. Methods Mol Biol 2021. [PMID: 33733352 DOI: 10.1007/978-1-0716-0947-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
Transcriptome-wide association studies (TWASs) integrate expression quantitative trait loci (eQTLs) studies with genome-wide association studies (GWASs) to prioritize candidate target genes for complex traits. TWASs have become increasingly popular. They have been used to analyze many complex traits with expression profiles from different tissues, successfully enhancing the discovery of genetic risk loci for complex traits. Though conceptually straightforward, some steps are required to perform the TWAS properly. Here we provide a step-by-step guide to integrate eQTL data with both GWAS individual-level data and GWAS summary statistics from complex traits.
Collapse
|
19
|
Zeng P, Dai J, Jin S, Zhou X. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum Mol Genet 2021; 30:939-951. [PMID: 33615361 DOI: 10.1093/hmg/ddab056] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 02/10/2021] [Accepted: 02/15/2021] [Indexed: 12/11/2022] Open
Abstract
Transcriptome-wide association study (TWAS) is an important integrative method for identifying genes that are causally associated with phenotypes. A key step of TWAS involves the construction of expression prediction models for every gene in turn using its cis-SNPs as predictors. Different TWAS methods rely on different models for gene expression prediction, and each such model makes a distinct modeling assumption that is often suitable for a particular genetic architecture underlying expression. However, the genetic architectures underlying gene expression vary across genes throughout the transcriptome. Consequently, different TWAS methods may be beneficial in detecting genes with distinct genetic architectures. Here, we develop a new method, HMAT, which aggregates TWAS association evidence obtained across multiple gene expression prediction models by leveraging the harmonic mean P-value combination strategy. Because each expression prediction model is suited to capture a particular genetic architecture, aggregating TWAS associations across prediction models as in HMAT improves accurate expression prediction and enables subsequent powerful TWAS analysis across the transcriptome. A key feature of HMAT is its ability to accommodate the correlations among different TWAS test statistics and produce calibrated P-values after aggregation. Through numerical simulations, we illustrated the advantage of HMAT over commonly used TWAS methods as well as ad hoc P-value combination rules such as Fisher's method. We also applied HMAT to analyze summary statistics of nine common diseases. In the real data applications, HMAT was on average 30.6% more powerful compared to the next best method, detecting many new disease-associated genes that were otherwise not identified by existing TWAS approaches. In conclusion, HMAT represents a flexible and powerful TWAS method that enjoys robust performance across a range of genetic architectures underlying gene expression.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Jing Dai
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Siyi Jin
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
20
|
Xie Y, Shan N, Zhao H, Hou L. Transcriptome wide association studies: general framework and methods. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-020-0228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
21
|
Shi X, Chai X, Yang Y, Cheng Q, Jiao Y, Chen H, Huang J, Yang C, Liu J. A tissue-specific collaborative mixed model for jointly analyzing multiple tissues in transcriptome-wide association studies. Nucleic Acids Res 2020; 48:e109. [PMID: 32978944 PMCID: PMC7641735 DOI: 10.1093/nar/gkaa767] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 08/14/2020] [Accepted: 09/03/2020] [Indexed: 12/13/2022] Open
Abstract
Transcriptome-wide association studies (TWASs) integrate expression quantitative trait loci (eQTLs) studies with genome-wide association studies (GWASs) to prioritize candidate target genes for complex traits. Several statistical methods have been recently proposed to improve the performance of TWASs in gene prioritization by integrating the expression regulatory information imputed from multiple tissues, and made significant achievements in improving the ability to detect gene-trait associations. Unfortunately, most existing multi-tissue methods focus on prioritization of candidate genes, and cannot directly infer the specific functional effects of candidate genes across different tissues. Here, we propose a tissue-specific collaborative mixed model (TisCoMM) for TWASs, leveraging the co-regulation of genetic variations across different tissues explicitly via a unified probabilistic model. TisCoMM not only performs hypothesis testing to prioritize gene-trait associations, but also detects the tissue-specific role of candidate target genes in complex traits. To make full use of widely available GWASs summary statistics, we extend TisCoMM to use summary-level data, namely, TisCoMM-S2. Using extensive simulation studies, we show that type I error is controlled at the nominal level, the statistical power of identifying associated genes is greatly improved, and the false-positive rate (FPR) for non-causal tissues is well controlled at decent levels. We further illustrate the benefits of our methods in applications to summary-level GWASs data of 33 complex traits. Notably, apart from better identifying potential trait-associated genes, we can elucidate the tissue-specific role of candidate target genes. The follow-up pathway analysis from tissue-specific genes for asthma shows that the immune system plays an essential function for asthma development in both thyroid and lung tissues.
Collapse
Affiliation(s)
- Xingjie Shi
- Department of Statistics, Nanjing University of Finance and Economics, Nanjing, China
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore
| | - Xiaoran Chai
- Beijing Advanced Innovation Center for Genomics (ICG) & Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
- School of Medicine, National University of Singapore, Singapore
| | - Yi Yang
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore
| | - Qing Cheng
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore
| | - Yuling Jiao
- School of Mathematics and Statistics, and Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan, China
| | - Haoyue Chen
- School of International Studies, Zhejiang University, Hangzhou, China
| | - Jian Huang
- Department of Statistics and Actuarial Science, University of Iowa, USA
| | - Can Yang
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Jin Liu
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore
| |
Collapse
|
22
|
Liu W, Li M, Zhang W, Zhou G, Wu X, Wang J, Lu Q, Zhao H. Leveraging functional annotation to identify genes associated with complex diseases. PLoS Comput Biol 2020; 16:e1008315. [PMID: 33137096 PMCID: PMC7660930 DOI: 10.1371/journal.pcbi.1008315] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 11/12/2020] [Accepted: 09/05/2020] [Indexed: 02/06/2023] Open
Abstract
To increase statistical power to identify genes associated with complex traits, a number of transcriptome-wide association study (TWAS) methods have been proposed using gene expression as a mediating trait linking genetic variations and diseases. These methods first predict expression levels based on inferred expression quantitative trait loci (eQTLs) and then identify expression-mediated genetic effects on diseases by associating phenotypes with predicted expression levels. The success of these methods critically depends on the identification of eQTLs, which may not be functional in the corresponding tissue, due to linkage disequilibrium (LD) and the correlation of gene expression between tissues. Here, we introduce a new method called T-GEN (Transcriptome-mediated identification of disease-associated Genes with Epigenetic aNnotation) to identify disease-associated genes leveraging epigenetic information. Through prioritizing SNPs with tissue-specific epigenetic annotation, T-GEN can better identify SNPs that are both statistically predictive and biologically functional. We found that a significantly higher percentage (an increase of 18.7% to 47.2%) of eQTLs identified by T-GEN are inferred to be functional by ChromHMM and more are deleterious based on their Combined Annotation Dependent Depletion (CADD) scores. Applying T-GEN to 207 complex traits, we were able to identify more trait-associated genes (ranging from 7.7% to 102%) than those from existing methods. Among the identified genes associated with these traits, T-GEN can better identify genes with high (>0.99) pLI scores compared to other methods. When T-GEN was applied to late-onset Alzheimer's disease, we identified 96 genes located at 15 loci, including two novel loci not implicated in previous GWAS. We further replicated 50 genes in an independent GWAS, including one of the two novel loci.
Collapse
Affiliation(s)
- Wei Liu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Mo Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
| | - Wenfeng Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
| | - Geyu Zhou
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Xing Wu
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, United States of America
| | - Jiawei Wang
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, WI, United States of America
- Department of Statistics, University of Wisconsin-Madison, WI, United States of America
- Center for Demography of Health and Aging, University of Wisconsin-Madison, WI, United States of America
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
- Department of Genetics, Yale School of Medicine, New Haven, CT, United States of America
| |
Collapse
|
23
|
Song M, Greenbaum J, Luttrell J, Zhou W, Wu C, Shen H, Gong P, Zhang C, Deng HW. A Review of Integrative Imputation for Multi-Omics Datasets. Front Genet 2020; 11:570255. [PMID: 33193667 PMCID: PMC7594632 DOI: 10.3389/fgene.2020.570255] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 09/16/2020] [Indexed: 01/05/2023] Open
Abstract
Multi-omics studies, which explore the interactions between multiple types of biological factors, have significant advantages over single-omics analysis for their ability to provide a more holistic view of biological processes, uncover the causal and functional mechanisms for complex diseases, and facilitate new discoveries in precision medicine. However, omics datasets often contain missing values, and in multi-omics study designs it is common for individuals to be represented for some omics layers but not all. Since most statistical analyses cannot be applied directly to the incomplete datasets, imputation is typically performed to infer the missing values. Integrative imputation techniques which make use of the correlations and shared information among multi-omics datasets are expected to outperform approaches that rely on single-omics information alone, resulting in more accurate results for the subsequent downstream analyses. In this review, we provide an overview of the currently available imputation methods for handling missing values in bioinformatics data with an emphasis on multi-omics imputation. In addition, we also provide a perspective on how deep learning methods might be developed for the integrative imputation of multi-omics datasets.
Collapse
Affiliation(s)
- Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Jonathan Greenbaum
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Weihua Zhou
- College of Computing, Michigan Technological University, Houghton, MI, United States
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Hui Shen
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| |
Collapse
|
24
|
The statistical practice of the GTEx Project: from single to multiple tissues. QUANTITATIVE BIOLOGY 2020. [DOI: 10.1007/s40484-020-0210-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
25
|
Transcriptome-wide association studies: a view from Mendelian randomization. QUANTITATIVE BIOLOGY 2020; 9:107-121. [DOI: 10.1007/s40484-020-0207-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
26
|
Cheng Q, Yang Y, Shi X, Yeung KF, Yang C, Peng H, Liu J. MR-LDP: a two-sample Mendelian randomization for GWAS summary statistics accounting for linkage disequilibrium and horizontal pleiotropy. NAR Genom Bioinform 2020; 2:lqaa028. [PMID: 33575584 PMCID: PMC7671398 DOI: 10.1093/nargab/lqaa028] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 02/27/2020] [Accepted: 04/14/2020] [Indexed: 12/12/2022] Open
Abstract
The proliferation of genome-wide association studies (GWAS) has prompted the use of two-sample Mendelian randomization (MR) with genetic variants as instrumental variables (IVs) for drawing reliable causal relationships between health risk factors and disease outcomes. However, the unique features of GWAS demand that MR methods account for both linkage disequilibrium (LD) and ubiquitously existing horizontal pleiotropy among complex traits, which is the phenomenon wherein a variant affects the outcome through mechanisms other than exclusively through the exposure. Therefore, statistical methods that fail to consider LD and horizontal pleiotropy can lead to biased estimates and false-positive causal relationships. To overcome these limitations, we proposed a probabilistic model for MR analysis in identifying the causal effects between risk factors and disease outcomes using GWAS summary statistics in the presence of LD and to properly account for horizontal pleiotropy among genetic variants (MR-LDP) and develop a computationally efficient algorithm to make the causal inference. We then conducted comprehensive simulation studies to demonstrate the advantages of MR-LDP over the existing methods. Moreover, we used two real exposure-outcome pairs to validate the results from MR-LDP compared with alternative methods, showing that our method is more efficient in using all-instrumental variants in LD. By further applying MR-LDP to lipid traits and body mass index (BMI) as risk factors for complex diseases, we identified multiple pairs of significant causal relationships, including a protective effect of high-density lipoprotein cholesterol on peripheral vascular disease and a positive causal effect of BMI on hemorrhoids.
Collapse
Affiliation(s)
- Qing Cheng
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Yi Yang
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Xingjie Shi
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore.,Department of Statistics, Nanjing University of Finance and Economics, Nanjing, 210023, China
| | - Kar-Fu Yeung
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Can Yang
- Department of Mathematics, The Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Heng Peng
- Department of Mathematics, Hong Kong Baptist University, Kowloon, Hong Kong
| | - Jin Liu
- Centre for Quantitative Medicine, Health Services & Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
| |
Collapse
|