1
|
King A, Wu C. Integrative Multi-Omics Approach for Improving Causal Gene Identification. Genet Epidemiol 2025; 49:e22601. [PMID: 39444114 DOI: 10.1002/gepi.22601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 10/01/2024] [Accepted: 10/04/2024] [Indexed: 10/25/2024]
Abstract
Transcriptome-wide association studies (TWAS) have been widely used to identify thousands of likely causal genes for diseases and complex traits using predicted expression models. However, most existing TWAS methods rely on gene expression alone and overlook other regulatory mechanisms of gene expression, including DNA methylation and splicing, that contribute to the genetic basis of these complex traits and diseases. Here we introduce a multi-omics method that integrates gene expression, DNA methylation, and splicing data to improve the identification of associated genes with our traits of interest. Through simulations and by analyzing genome-wide association study (GWAS) summary statistics for 24 complex traits, we show that our integrated method, which leverages these complementary omics biomarkers, achieves higher statistical power, and improves the accuracy of likely causal gene identification in blood tissues over individual omics methods. Finally, we apply our integrated model to a lung cancer GWAS data set, demonstrating the integrated models improved identification of prioritized genes for lung cancer risk.
Collapse
Affiliation(s)
- Austin King
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
2
|
Shao M, Chen K, Zhang S, Tian M, Shen Y, Cao C, Gu N. Multiome-wide Association Studies: Novel Approaches for Understanding Diseases. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae077. [PMID: 39471467 PMCID: PMC11630051 DOI: 10.1093/gpbjnl/qzae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/06/2024] [Accepted: 10/23/2024] [Indexed: 11/01/2024]
Abstract
The rapid development of multiome (transcriptome, proteome, cistrome, imaging, and regulome)-wide association study methods have opened new avenues for biologists to understand the susceptibility genes underlying complex diseases. Thorough comparisons of these methods are essential for selecting the most appropriate tool for a given research objective. This review provides a detailed categorization and summary of the statistical models, use cases, and advantages of recent multiome-wide association studies. In addition, to illustrate gene-disease association studies based on transcriptome-wide association study (TWAS), we collected 478 disease entries across 22 categories from 235 manually reviewed publications. Our analysis reveals that mental disorders are the most frequently studied diseases by TWAS, indicating its potential to deepen our understanding of the genetic architecture of complex diseases. In summary, this review underscores the importance of multiome-wide association studies in elucidating complex diseases and highlights the significance of selecting the appropriate method for each study.
Collapse
Affiliation(s)
- Mengting Shao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Kaiyang Chen
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Shuting Zhang
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Min Tian
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Yan Shen
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Ning Gu
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
- Nanjing Key Laboratory for Cardiovascular Information and Health Engineering Medicine, Institute of Clinical Medicine, Nanjing Drum Tower Hospital, Medical School, Nanjing University, Nanjing 210093, China
| |
Collapse
|
3
|
Meng X, Liu D, Cao M, Wang W, Wang Y. Potentially causal association between immunoglobulin G N-glycans and cardiometabolic diseases: Bidirectional two-sample Mendelian randomization study. Int J Biol Macromol 2024; 279:135125. [PMID: 39208880 DOI: 10.1016/j.ijbiomac.2024.135125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 08/26/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024]
Abstract
BACKGROUND Observational studies support that altered immunoglobulin G (IgG) N-glycosylation and inflammatory factors are associated with cardiometabolic diseases (CMDs); nevertheless, the causality between them remains unclear. METHODS Two-sample Mendelian randomization (MR) analyses were conducted to systematically investigate the bidirectional causality between IgG N-glycans and nine CMDs in both East Asians and Europeans. RESULTS In the forward MR analysis, the univariable MR analysis presented suggestive causality of 14 and eight genetically instrumented IgG N-glycans with CMDs in East Asians and Europeans, respectively; the multivariable MR analysis showed that ten and 11 pairs of glycan-CMD associations were identified in East Asian and European populations, respectively. In the reverse MR analysis, based on East Asians and Europeans, the univariable MR analysis presented suggestive causality of seven and 12 genetically instrumented CMDs with IgG N-glycans, respectively; the multivariable MR analysis presented that six and five CMD-glycan causality were found in East Asian and Europeans, respectively. CONCLUSIONS The comprehensive MR analyses provide suggestive evidence of bidirectional causality between IgG N-glycans and CMDs. This work helps to understand the molecular mechanism of the occurrence/progression of CMDs, optimize existing and develop new strategies to prevent CMDs, and contribute to the early identification of high-risk groups of CMDs.
Collapse
Affiliation(s)
- Xiaoni Meng
- Department of Clinical Epidemiology, Beijing Institute of Respiratory Medicine and Beijing Chao-Yang Hospital, Capital Medical University, Beijing 100020, China; Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, Beijing 100069, China
| | - Di Liu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Meiling Cao
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, Beijing 100069, China
| | - Wei Wang
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, Beijing 100069, China; Centre for Precision Health, Edith Cowan University, Perth, WA 6027, Australia
| | - Youxin Wang
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, Beijing 100069, China; School of Public Health, North China University of Science and Technology, Tangshan 063210, China.
| |
Collapse
|
4
|
Shu M, Yates TB, John C, Harman-Ware AE, Happs RM, Bryant N, Jawdy SS, Ragauskas AJ, Tuskan GA, Muchero W, Chen JG. Providing biological context for GWAS results using eQTL regulatory and co-expression networks in Populus. THE NEW PHYTOLOGIST 2024; 244:603-617. [PMID: 39169686 DOI: 10.1111/nph.20026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 07/16/2024] [Indexed: 08/23/2024]
Abstract
Our study utilized genome-wide association studies (GWAS) to link nucleotide variants to traits in Populus trichocarpa, a species with rapid linkage disequilibrium decay. The aim was to overcome the challenge of interpreting statistical associations at individual loci without sufficient biological context, which often leads to reliance solely on gene annotations from unrelated model organisms. We employed an integrative approach that included GWAS targeting multiple traits using three individual techniques for lignocellulose phenotyping, expression quantitative trait loci (eQTL) analysis to construct transcriptional regulatory networks around each candidate locus and co-expression analysis to provide biological context for these networks, using lignocellulose biosynthesis in Populus trichocarpa as a case study. The research identified three candidate genes potentially involved in lignocellulose formation, including one previously recognized gene (Potri.005G116800/VND1, a critical regulator of secondary cell wall formation) and two genes (Potri.012G130000/AtSAP9 and Potri.004G202900/BIC1) with newly identified putative roles in lignocellulose biosynthesis. Our integrative approach offers a framework for providing biological context to loci associated with trait variation, facilitating the discovery of new genes and regulatory networks.
Collapse
Affiliation(s)
- Mengjun Shu
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
| | - Timothy B Yates
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
| | - Cai John
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, 37996, TN, USA
| | - Anne E Harman-Ware
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, 80401, CO, USA
| | - Renee M Happs
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, 80401, CO, USA
| | - Nathan Bryant
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, 37996, TN, USA
| | - Sara S Jawdy
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
| | - Arthur J Ragauskas
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
- Department of Chemical and Biomolecular Engineering, University of Tennessee, Knoxville, 37996, TN, USA
| | - Gerald A Tuskan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
| | - Wellington Muchero
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
| | - Jin-Gui Chen
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, 37831, TN, USA
| |
Collapse
|
5
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
6
|
Zhang Y, Wang M, Li Z, Yang X, Li K, Xie A, Dong F, Wang S, Yan J, Liu J. An overview of detecting gene-trait associations by integrating GWAS summary statistics and eQTLs. SCIENCE CHINA. LIFE SCIENCES 2024; 67:1133-1154. [PMID: 38568343 DOI: 10.1007/s11427-023-2522-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/29/2024] [Indexed: 06/07/2024]
Abstract
Detecting genes that affect specific traits (such as human diseases and crop yields) is important for treating complex diseases and improving crop quality. A genome-wide association study (GWAS) provides new insights and directions for understanding complex traits by identifying important single nucleotide polymorphisms. Many GWAS summary statistics data related to various complex traits have been gathered recently. Studies have shown that GWAS risk loci and expression quantitative trait loci (eQTLs) often have a lot of overlaps, which makes gene expression gradually become an important intermediary to reveal the regulatory role of GWAS. In this review, we review three types of gene-trait association detection methods of integrating GWAS summary statistics and eQTLs data, namely colocalization methods, transcriptome-wide association study-oriented approaches, and Mendelian randomization-related methods. At the theoretical level, we discussed the differences, relationships, advantages, and disadvantages of various algorithms in the three kinds of gene-trait association detection methods. To further discuss the performance of various methods, we summarize the significant gene sets that influence high-density lipoprotein, low-density lipoprotein, total cholesterol, and triglyceride reported in 16 studies. We discuss the performance of various algorithms using the datasets of the four lipid traits. The advantages and limitations of various algorithms are analyzed based on experimental results, and we suggest directions for follow-up studies on detecting gene-trait associations.
Collapse
Affiliation(s)
- Yang Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Mengyao Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zhenguo Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xuan Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Keqin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ao Xie
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Fang Dong
- College of Life Sciences, Nankai University, Tianjin, 300071, China
| | - Shihan Wang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
7
|
Melton HJ, Zhang Z, Wu C. SUMMIT-FA: a new resource for improved transcriptome imputation using functional annotations. Hum Mol Genet 2024; 33:624-635. [PMID: 38129112 PMCID: PMC10954367 DOI: 10.1093/hmg/ddad205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/24/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), which improves gene expression prediction accuracy by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models in whole blood using SUMMIT-FA with the comprehensive functional database MACIE and eQTL summary-level data from the eQTLGen consortium. We apply these models to GWAS for 24 complex traits and show that SUMMIT-FA identifies significantly more gene-trait associations and improves predictive power for identifying "silver standard" genes compared to several benchmark methods. We further conduct a simulation study to demonstrate the effectiveness of SUMMIT-FA.
Collapse
Affiliation(s)
- Hunter J Melton
- Department of Statistics, Florida State University, 214 Rogers Building, 117 N. Woodward Avenue, Tallahassee, FL 32306, United States
| | - Zichen Zhang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Unit 1689, Houston, TX 77030, United States
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Unit 1689, Houston, TX 77030, United States
| |
Collapse
|
8
|
Wei K, Lu Y, Ma X, Duan A, Lu X, Abdel-Shafy H, Deng T. Transcriptome-Wide Association Study Reveals Potentially Candidate Genes Responsible for Milk Production Traits in Buffalo. Int J Mol Sci 2024; 25:2626. [PMID: 38473873 DOI: 10.3390/ijms25052626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 02/17/2024] [Accepted: 02/21/2024] [Indexed: 03/14/2024] Open
Abstract
Identifying key causal genes is critical for unraveling the genetic basis of complex economic traits, yet it remains a formidable challenge. The advent of large-scale sequencing data and computational algorithms, such as transcriptome-wide association studies (TWASs), offers a promising avenue for identifying potential causal genes. In this study, we harnessed the power of TWAS to identify genes potentially responsible for milk production traits, including daily milk yield (MY), fat percentage (FP), and protein percentage (PP), within a cohort of 100 buffaloes. Our approach began by generating the genotype and expression profiles for these 100 buffaloes through whole-genome resequencing and RNA sequencing, respectively. Through comprehensive genome-wide association studies (GWAS), we pinpointed a total of seven and four single nucleotide polymorphisms (SNPs) significantly associated with MY and FP traits, respectively. By using TWAS, we identified 55, 71, and 101 genes as significant signals for MY, FP, and PP traits, respectively. To delve deeper, we conducted protein-protein interaction (PPI) analysis, revealing the categorization of these genes into distinct PPI networks. Interestingly, several TWAS-identified genes within the PPI network played a vital role in milk performance. These findings open new avenues for identifying potentially causal genes underlying important traits, thereby offering invaluable insights for genomics and breeding in buffalo populations.
Collapse
Affiliation(s)
- Kelong Wei
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| | - Ying Lu
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| | - Xiaoya Ma
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| | - Anqian Duan
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| | - Xingrong Lu
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| | - Hamdy Abdel-Shafy
- Department of Animal Production, Faculty of Agriculture, Cairo University, Giza 12613, Egypt
| | - Tingxian Deng
- Guangxi Provincial Key Laboratory of Buffalo Genetics, Breeding and Reproduction Technology, Buffalo Research Institute, Chinese Academy of Agricultural Sciences, Nanning 530001, China
| |
Collapse
|
9
|
Mishra M, Nahlawi L, Zhong Y, De T, Yang G, Alarcon C, Perera MA. LA-GEM: imputation of gene expression with incorporation of Local Ancestry. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2024; 29:341-358. [PMID: 38160291 PMCID: PMC10764069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Gene imputation and TWAS have become a staple in the genomics medicine discovery space; helping to identify genes whose regulation effects may contribute to disease susceptibility. However, the cohorts on which these methods are built are overwhelmingly of European Ancestry. This means that the unique regulatory variation that exist in non-European populations, specifically African Ancestry populations, may not be included in the current models. Moreover, African Americans are an admixed population, with a mix of European and African segments within their genome. No gene imputation model thus far has incorporated the effect of local ancestry (LA) on gene expression imputation. As such, we created LA-GEM which was trained and tested on a cohort of 60 African American hepatocyte primary cultures. Uniquely, LA-GEM include local ancestry inference in its prediction of gene expression. We compared the performance of LA-GEM to PrediXcan trained the same dataset (with no inclusion of local ancestry) We were able to reliably predict the expression of 2559 genes (1326 in LA-GEM and 1236 in PrediXcan). Of these, 546 genes were unique to LA-GEM, including the CYP3A5 gene which is critical to drug metabolism. We conducted TWAS analysis on two African American clinical cohorts with pharmacogenomics phenotypic information to identity novel gene associations. In our IWPC warfarin cohort, we identified 17 transcriptome-wide significant hits. No gene reached are prespecified significance level in the clopidogrel cohort. We did see suggestive association with RAS3A to P2RY12 Reactivity Units (PRU), a clinical measure of response to anti-platelet therapy. This method demonstrated the need for the incorporation of LA into study in admixed populations.
Collapse
Affiliation(s)
- Mrinal Mishra
- Department of Pharmacology, Center for Pharmacogenomics, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA†Contributed equally to the work
| | | | | | | | | | | | | |
Collapse
|
10
|
Chen Z, Liang H, Wei P. Data-adaptive and pathway-based tests for association studies between somatic mutations and germline variations in human cancers. Genet Epidemiol 2023; 47:617-636. [PMID: 37822029 DOI: 10.1002/gepi.22537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 07/22/2023] [Accepted: 09/18/2023] [Indexed: 10/13/2023]
Abstract
Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case-control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels.p $p$ -values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.
Collapse
Affiliation(s)
- Zhongyuan Chen
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Han Liang
- Department of Bioinformatics and Computational Biology, MD Anderson Cancer Center, Houston, Texas, USA
| | - Peng Wei
- Department of Biostatistics, MD Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
11
|
de Leeuw C, Werme J, Savage JE, Peyrot WJ, Posthuma D. On the interpretation of transcriptome-wide association studies. PLoS Genet 2023; 19:e1010921. [PMID: 37676898 PMCID: PMC10508613 DOI: 10.1371/journal.pgen.1010921] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 09/19/2023] [Accepted: 08/15/2023] [Indexed: 09/09/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) aim to detect relationships between gene expression and a phenotype, and are commonly used for secondary analysis of genome-wide association study (GWAS) results. Results from TWAS analyses are often interpreted as indicating a genetic relationship between gene expression and a phenotype, but this interpretation is not consistent with the null hypothesis that is evaluated in the traditional TWAS framework. In this study we provide a mathematical outline of this TWAS framework, and elucidate what interpretations are warranted given the null hypothesis it actually tests. We then use both simulations and real data analysis to assess the implications of misinterpreting TWAS results as indicative of a genetic relationship between gene expression and the phenotype. Our simulation results show considerably inflated type 1 error rates for TWAS when interpreted this way, with 41% of significant TWAS associations detected in the real data analysis found to have insufficient statistical evidence to infer such a relationship. This demonstrates that in current implementations, TWAS cannot reliably be used to investigate genetic relationships between gene expression and a phenotype, but that local genetic correlation analysis can serve as a potential alternative.
Collapse
Affiliation(s)
- Christiaan de Leeuw
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
| | - Josefin Werme
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
| | - Jeanne E. Savage
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
| | - Wouter J. Peyrot
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
- Department of Psychiatry, Amsterdam UMC, location VUmc, Amsterdam, the Netherlands
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
- Department of Child and Adolescent Psychology and Psychiatry, section Complex Trait Genetics, Amsterdam Neuroscience, VU University Medical Centre, Amsterdam, The Netherlands
| |
Collapse
|
12
|
Xue H, Shen X, Pan W. Causal Inference in Transcriptome-Wide Association Studies with Invalid Instruments and GWAS Summary Data. J Am Stat Assoc 2023; 118:1525-1537. [PMID: 37808547 PMCID: PMC10557939 DOI: 10.1080/01621459.2023.2183127] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 02/14/2023] [Indexed: 02/24/2023]
Abstract
Transcriptome-wide association studies (TWAS) have recently emerged as a popular tool to discover (putative) causal genes by integrating an outcome GWAS dataset with another gene expression/transcriptome GWAS (called eQTL) dataset. In our motivating and target application, we'd like to identify causal genes for low-density lipoprotein cholesterol (LDL), which is crucial for developing new treatments for hyperlipidemia and cardiovascular diseases. The statistical principle underlying TWAS is (two-sample) two-stage least squares (2SLS) using multiple correlated SNPs as instrumental variables (IVs); it is closely related to typical (two-sample) Mendelian randomization (MR) using independent SNPs as IVs, which is expected to be impractical and lower-powered for TWAS (and some other) applications. However, often some of the SNPs used may not be valid IVs, e.g. due to the widespread pleiotropy of their direct effects on the outcome not mediated through the gene of interest, leading to false conclusions by TWAS (or MR). Building on recent advances in sparse regression, we propose a robust and efficient inferential method to account for both hidden confounding and some invalid IVs via two-stage constrained maximum likelihood (2ScML), an extension of 2SLS. We first develop the proposed method with individual-level data, then extend it both theoretically and computationally to GWAS summary data for the most popular two-sample TWAS design, to which almost all existing robust IV regression methods are however not applicable. We show that the proposed method achieves asymptotically valid statistical inference on causal effects, demonstrating its wider applicability and superior finite-sample performance over the standard 2SLS/TWAS (and MR). We apply the methods to identify putative causal genes for LDL by integrating large-scale lipid GWAS summary data with eQTL data.
Collapse
Affiliation(s)
- Haoran Xue
- School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
13
|
Melton HJ, Zhang Z, Wu C. SUMMIT-FA: A new resource for improved transcriptome imputation using functional annotations. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.02.23285208. [PMID: 36798253 PMCID: PMC9934719 DOI: 10.1101/2023.02.02.23285208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), that improves the accuracy of gene expression prediction by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models using SUMMIT-FA with a comprehensive functional database MACIE and the eQTL summary-level data from the eQTLGen consortium. By applying the resulting models to GWASs for 24 complex traits and exploring it through a simulation study, we show that SUMMIT-FA improves the accuracy of gene expression prediction models in whole blood, identifies significantly more gene-trait associations, and improves predictive power for identifying "silver standard" genes compared to several benchmark methods.
Collapse
Affiliation(s)
- Hunter J. Melton
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Zichen Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
14
|
Zhang Z, Bae YE, Bradley JR, Wu L, Wu C. SUMMIT: An integrative approach for better transcriptomic data imputation improves causal gene identification. Nat Commun 2022; 13:6336. [PMID: 36284135 PMCID: PMC9593997 DOI: 10.1038/s41467-022-34016-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 10/11/2022] [Indexed: 12/25/2022] Open
Abstract
Genes with moderate to low expression heritability may explain a large proportion of complex trait etiology, but such genes cannot be sufficiently captured in conventional transcriptome-wide association studies (TWASs), partly due to the relatively small available reference datasets for developing expression genetic prediction models to capture the moderate to low genetically regulated components of gene expression. Here, we introduce a method, the Summary-level Unified Method for Modeling Integrated Transcriptome (SUMMIT), to improve the expression prediction model accuracy and the power of TWAS by using a large expression quantitative trait loci (eQTL) summary-level dataset. We apply SUMMIT to the eQTL summary-level data provided by the eQTLGen consortium. Through simulation studies and analyses of genome-wide association study summary statistics for 24 complex traits, we show that SUMMIT improves the accuracy of expression prediction in blood, successfully builds expression prediction models for genes with low expression heritability, and achieves higher statistical power than several benchmark methods. Finally, we conduct a case study of COVID-19 severity with SUMMIT and identify 11 likely causal genes associated with COVID-19 severity.
Collapse
Affiliation(s)
- Zichen Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Ye Eun Bae
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Jonathan R Bradley
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
15
|
Yan S, Sha Q, Zhang S. Gene-Based Association Tests Using New Polygenic Risk Scores and Incorporating Gene Expression Data. Genes (Basel) 2022; 13:genes13071120. [PMID: 35885903 PMCID: PMC9318573 DOI: 10.3390/genes13071120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 06/14/2022] [Accepted: 06/21/2022] [Indexed: 12/10/2022] Open
Abstract
Recently, gene-based association studies have shown that integrating genome-wide association studies (GWAS) with expression quantitative trait locus (eQTL) data can boost statistical power and that the genetic liability of traits can be captured by polygenic risk scores (PRSs). In this paper, we propose a new gene-based statistical method that leverages gene-expression measurements and new PRSs to identify genes that are associated with phenotypes of interest. We used a generalized linear model to associate phenotypes with gene expression and PRSs and used a score-test statistic to test the association between phenotypes and genes. Our simulation studies show that the newly developed method has correct type I error rates and can boost statistical power compared with other methods that use either gene expression or PRS in association tests. A real data analysis figure based on UK Biobank data for asthma shows that the proposed method is applicable to GWAS.
Collapse
|
16
|
Liu D, Dong J, Zhang J, Xu X, Tian Q, Meng X, Wu L, Zheng D, Chu X, Wang W, Meng Q, Wang Y. Genome-Wide Mapping of Plasma IgG N-Glycan Quantitative Trait Loci Identifies a Potentially Causal Association between IgG N-Glycans and Rheumatoid Arthritis. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2022; 208:2508-2514. [PMID: 35545292 DOI: 10.4049/jimmunol.2100080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 03/30/2022] [Indexed: 01/03/2023]
Abstract
Observational studies highlight associations of IgG N-glycosylation with rheumatoid arthritis (RA); however, the causality between these conditions remains to be determined. Standard and multivariable two-sample Mendelian randomization (MR) analyses integrating a summary genome-wide association study for RA and IgG N-glycan quantitative trait loci (IgG N-glycan-QTL) data were performed to explore the potentially causal associations of IgG N-glycosylation with RA. After correcting for multiple testing (p < 2 × 10-3), the standard MR analysis based on the inverse-variance weighted method showed a significant association of genetically instrumented IgG N-glycan (GP4) with RA (odds ratioGP4 = 0.906, 95% confidence interval = 0.857-0.958, p = 5.246 × 10-4). In addition, we identified seven significant associations of genetically instrumented IgG N-glycans with RA by multivariable MR analysis (p < 2 × 10-3). Results were broadly consistent in sensitivity analyses using MR_Lasso, MR_weighted median, MR_Egger regression, and leave-one-out analysis with different instruments (all p values <0.05). There was limited evidence of pleiotropy bias (all p values > 0.05). In conclusion, our MR analysis incorporating genome-wide association studies and IgG N-glycan-QTL data revealed that IgG N-glycans were potentially causally associated with RA. Our findings shed light on the role of IgG N-glycosylation in the development of RA. Future studies are needed to validate our findings and to explore the underlying physiological mechanisms in the etiology of RA.
Collapse
Affiliation(s)
- Di Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China.,Center for Biomedical Information Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Jing Dong
- Health Management Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Jie Zhang
- Department of Epidemiology and Health Statistics, School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China
| | - Xizhu Xu
- School of Public Health, Shandong First Medical University and Shandong Academy of Medical Sciences, Tai'an, China; and
| | - Qiuyue Tian
- Department of Epidemiology and Health Statistics, School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China
| | - Xiaoni Meng
- Department of Epidemiology and Health Statistics, School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China
| | - Lijuan Wu
- Department of Epidemiology and Health Statistics, School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China
| | - Deqiang Zheng
- Department of Epidemiology and Health Statistics, School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China
| | - Xi Chu
- Health Management Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Wei Wang
- Department of Epidemiology and Health Statistics, School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China.,School of Public Health, Shandong First Medical University and Shandong Academy of Medical Sciences, Tai'an, China; and.,Centre for Precision Health, ECU Strategic Research Centre, Edith Cowan University, Perth, Western Australia, Australia
| | - Qun Meng
- Department of Epidemiology and Health Statistics, School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China
| | - Youxin Wang
- Department of Epidemiology and Health Statistics, School of Public Health, Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China; .,Centre for Precision Health, ECU Strategic Research Centre, Edith Cowan University, Perth, Western Australia, Australia
| |
Collapse
|
17
|
Cao X, Wang X, Zhang S, Sha Q. Gene-based association tests using GWAS summary statistics and incorporating eQTL. Sci Rep 2022; 12:3553. [PMID: 35241742 PMCID: PMC8894384 DOI: 10.1038/s41598-022-07465-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 02/11/2022] [Indexed: 01/29/2023] Open
Abstract
Although genome-wide association studies (GWAS) have been successfully applied to a variety of complex diseases and identified many genetic variants underlying complex diseases via single marker tests, there is still a considerable heritability of complex diseases that could not be explained by GWAS. One alternative approach to overcome the missing heritability caused by genetic heterogeneity is gene-based analysis, which considers the aggregate effects of multiple genetic variants in a single test. Another alternative approach is transcriptome-wide association study (TWAS). TWAS aggregates genomic information into functionally relevant units that map to genes and their expression. TWAS is not only powerful, but can also increase the interpretability in biological mechanisms of identified trait associated genes. In this study, we propose a powerful and computationally efficient gene-based association test, called Overall. Using extended Simes procedure, Overall aggregates information from three types of traditional gene-based association tests and also incorporates expression quantitative trait locus (eQTL) information into a gene-based association test using GWAS summary statistics. We show that after a small number of replications to estimate the correlation among the integrated gene-based tests, the p values of Overall can be calculated analytically. Simulation studies show that Overall can control type I error rates very well and has higher power than the tests that we compared with. We also apply Overall to two schizophrenia GWAS summary datasets and two lipids GWAS summary datasets. The results show that this newly developed method can identify more significant genes than other methods we compared with.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, TX, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
18
|
Ngwa JS, Yanek LR, Kammers K, Kanchan K, Taub MA, Scharpf RB, Faraday N, Becker LC, Mathias RA, Ruczinski I. Secondary analyses for genome-wide association studies using expression quantitative trait loci. Genet Epidemiol 2022; 46:170-181. [PMID: 35312098 PMCID: PMC9086181 DOI: 10.1002/gepi.22448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 11/19/2021] [Accepted: 01/20/2022] [Indexed: 01/01/2023]
Abstract
Genome-wide association studies (GWAS) have successfully identified thousands of single nucleotide polymorphisms (SNPs) associated with complex traits; however, the identified SNPs account for a fraction of trait heritability, and identifying the functional elements through which genetic variants exert their effects remains a challenge. Recent evidence suggests that SNPs associated with complex traits are more likely to be expression quantitative trait loci (eQTL). Thus, incorporating eQTL information can potentially improve power to detect causal variants missed by traditional GWAS approaches. Using genomic, transcriptomic, and platelet phenotype data from the Genetic Study of Atherosclerosis Risk family-based study, we investigated the potential to detect novel genomic risk loci by incorporating information from eQTL in the relevant target tissues (i.e., platelets and megakaryocytes) using established statistical principles in a novel way. Permutation analyses were performed to obtain family-wise error rates for eQTL associations, substantially lowering the genome-wide significance threshold for SNP-phenotype associations. In addition to confirming the well known association between PEAR1 and platelet aggregation, our eQTL-focused approach identified a novel locus (rs1354034) and gene (ARHGEF3) not previously identified in a GWAS of platelet aggregation phenotypes. A colocalization analysis showed strong evidence for a functional role of this eQTL.
Collapse
Affiliation(s)
- Julius S. Ngwa
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreMarylandUSA
| | - Lisa R. Yanek
- Department of MedicineJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Kai Kammers
- Department of OncologyJohns Hopkins University, School of MedicineBaltimoreMarylandUSA
| | - Kanika Kanchan
- Department of MedicineJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Margaret A. Taub
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreMarylandUSA
| | - Robert B. Scharpf
- Department of OncologyJohns Hopkins University, School of MedicineBaltimoreMarylandUSA
| | - Nauder Faraday
- Department of Anesthesiology and Critical Care MedicineJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Lewis C. Becker
- Department of MedicineJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Rasika A. Mathias
- Department of MedicineJohns Hopkins University School of MedicineBaltimoreMarylandUSA
| | - Ingo Ruczinski
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreMarylandUSA
| |
Collapse
|
19
|
Tao S, Ye X, Pan L, Fu M, Huang P, Peng Z, Yang S. Construction and Clinical Translation of Causal Pan-Cancer Gene Score Across Cancer Types. Front Genet 2021; 12:784775. [PMID: 35003220 PMCID: PMC8733729 DOI: 10.3389/fgene.2021.784775] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 11/24/2021] [Indexed: 12/17/2022] Open
Abstract
Pan-cancer strategy, an integrative analysis of different cancer types, can be used to explain oncogenesis and identify biomarkers using a larger statistical power and robustness. Fine-mapping defines the casual loci, whereas genome-wide association studies (GWASs) typically identify thousands of cancer-related loci and not necessarily have a fine-mapping component. In this study, we develop a novel strategy to identify the causal loci using a pan-cancer and fine-mapping assumption, constructing the CAusal Pan-cancER gene (CAPER) score and validating its performance using internal and external validation on 1,287 individuals and 985 cell lines. Summary statistics of 15 cancer types were used to define 54 causal loci in 15 potential genes. Using the Cancer Genome Atlas (TCGA) training set, we constructed the CAPER score and divided cancer patients into two groups. Using the three validation sets, we found that 19 cancer-related variables were statistically significant between the two CAPER score groups and that 81 drugs had significantly different drug sensitivity between the two CAPER score groups. We hope that our strategies for selecting causal genes and for constructing CAPER score would provide valuable clues for guiding the management of different types of cancers.
Collapse
Affiliation(s)
- Shiyue Tao
- Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Xiangyu Ye
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Lulu Pan
- Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Minghan Fu
- Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Peng Huang
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Zhihang Peng
- Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Sheng Yang
- Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, China
| |
Collapse
|
20
|
Bae YE, Wu L, Wu C. InTACT: An adaptive and powerful framework for joint-tissue transcriptome-wide association studies. Genet Epidemiol 2021; 45:848-859. [PMID: 34255882 PMCID: PMC8604767 DOI: 10.1002/gepi.22425] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/22/2021] [Accepted: 06/24/2021] [Indexed: 11/05/2022]
Abstract
Transcriptome-wide association studies (TWAS) that integrate transcriptomic reference data and genome-wide association studies (GWAS) have successfully enhanced the discovery of candidate genes for many complex traits. However, existing methods may suffer from substantial power loss because they fail to effectively consider that expression of many genes tends to be consistent across tissues. Here we propose a computationally efficient testing method, referred to as Integrative Test for Associations via Cauchy Transformation (InTACT), that effectively combines information across multiple tissues and thus improves the power of identifying associated genes. Through simulation studies, we show that InTACT maintains high power while properly controls for Type 1 error rates. We applied InTACT to the largest GWAS of Alzheimer's disease (AD) to date and identified 227 genome-wide significant genes, of which 130 were not identified by benchmark methods, TWAS and MultiXcan. Importantly, InTACT identified five novel loci for AD. We implemented InTACT in publicly available software, "InTACT."
Collapse
Affiliation(s)
- Ye Eun Bae
- Department of Statistics, Florida State University
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa
| | - Chong Wu
- Department of Statistics, Florida State University
| |
Collapse
|
21
|
Yang T, Wei P, Pan W. Integrative analysis of multi-omics data for discovering low-frequency variants associated with low-density lipoprotein cholesterol levels. Bioinformatics 2021; 36:5223-5228. [PMID: 33070182 DOI: 10.1093/bioinformatics/btaa898] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 09/26/2020] [Accepted: 10/06/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The abundance of omics data has facilitated integrative analyses of single and multiple molecular layers with genome-wide association studies focusing on common variants. Built on its successes, we propose a general analysis framework to leverage multi-omics data with sequencing data to improve the statistical power of discovering new associations and understanding of the disease susceptibility due to low-frequency variants. The proposed test features its robustness to model misspecification, high power across a wide range of scenarios and the potential of offering insights into the underlying genetic architecture and disease mechanisms. RESULTS Using the Framingham Heart Study data, we show that low-frequency variants are predictive of DNA methylation, even after conditioning on the nearby common variants. In addition, DNA methylation and gene expression provide complementary information to functional genomics. In the Avon Longitudinal Study of Parents and Children with a sample size of 1497, one gene CLPTM1 is identified to be associated with low-density lipoprotein cholesterol levels by the proposed powerful adaptive gene-based test integrating information from gene expression, methylation and enhancer-promoter interactions. It is further replicated in the TwinsUK study with 1706 samples. The signal is driven by both low-frequency and common variants. AVAILABILITY AND IMPLEMENTATION Models are available at https://github.com/ytzhong/DNAm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tianzhong Yang
- Department of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Wei Pan
- Department of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
22
|
Wu C, Bradley J, Li Y, Wu L, Deng HW. A gene-level methylome-wide association analysis identifies novel Alzheimer's disease genes. Bioinformatics 2021; 37:1933–1940. [PMID: 33523132 PMCID: PMC8337007 DOI: 10.1093/bioinformatics/btab045] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 12/31/2020] [Accepted: 01/20/2021] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Transcriptome-wide association studies (TWAS) have successfully facilitated the discovery of novel genetic risk loci for many complex traits, including late-onset Alzheimer's disease (AD). However, most existing TWAS methods rely only on gene expression and ignore epigenetic modification (i.e., DNA methylation) and functional regulatory information (i.e., enhancer-promoter interactions), both of which contribute significantly to the genetic basis of AD. RESULTS We develop a novel gene-level association testing method that integrates genetically regulated DNA methylation and enhancer-target gene pairs with genome-wide association study (GWAS) summary results. Through simulations, we show that our approach, referred to as the CMO (cross methylome omnibus) test, yielded well controlled type I error rates and achieved much higher statistical power than competing methods under a wide range of scenarios. Furthermore, compared with TWAS, CMO identified an average of 124% more associations when analyzing several brain imaging-related GWAS results. By analyzing to date the largest AD GWAS of 71,880 cases and 383,378 controls, CMO identified six novel loci for AD, which have been ignored by competing methods. AVAILABILITY Software: https://github.com/ChongWuLab/CMO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University
| | | | - Yanming Li
- Department of Biostatistics & Data Science, University of Kansas Medical Center
| | - Lang Wu
- Population Sciences in the Pacific Program, University of Hawaii Cancer center
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine
| |
Collapse
|
23
|
Deng Y, Pan W. Model checking via testing for direct effects in Mendelian Randomization and transcriptome-wide association studies. PLoS Comput Biol 2021; 17:e1009266. [PMID: 34339418 PMCID: PMC8360571 DOI: 10.1371/journal.pcbi.1009266] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 08/12/2021] [Accepted: 07/12/2021] [Indexed: 11/25/2022] Open
Abstract
It is of great interest and potential to discover causal relationships between pairs of exposures and outcomes using genetic variants as instrumental variables (IVs) to deal with hidden confounding in observational studies. Two most popular approaches are Mendelian randomization (MR), which usually use independent genetic variants/SNPs across the genome, and transcriptome-wide association studies (TWAS) (or their generalizations) using cis-SNPs local to a gene (or some genome-wide and likely dependent SNPs), as IVs. In spite of their many promising applications, both approaches face a major challenge: the validity of their causal conclusions depends on three critical assumptions on valid IVs, and more generally on other modeling assumptions, which however may not hold in practice. The most likely as well as challenging situation is due to the wide-spread horizontal pleiotropy, leading to two of the three IV assumptions being violated and thus to biased statistical inference. More generally, we'd like to conduct a goodness-of-fit (GOF) test to check the model being used. Although some methods have been proposed as being robust to various degrees to the violation of some modeling assumptions, they often give different and even conflicting results due to their own modeling assumptions and possibly lower statistical efficiency, imposing difficulties to the practitioner in choosing and interpreting varying results across different methods. Hence, it would help to directly test whether any assumption is violated or not. In particular, there is a lack of such tests for TWAS. We propose a new and general GOF test, called TEDE (TEsting Direct Effects), applicable to both correlated and independent SNPs/IVs (as commonly used in TWAS and MR respectively). Through simulation studies and real data examples, we demonstrate high statistical power and advantages of our new method, while confirming the frequent violation of modeling (including valid IV) assumptions in practice and thus the importance of model checking by applying such a test in MR/TWAS analysis.
Collapse
Affiliation(s)
- Yangqing Deng
- Department of Mathematics, University of North Texas, Denton, Texas, United States of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
24
|
Cao C, Kwok D, Edie S, Li Q, Ding B, Kossinna P, Campbell S, Wu J, Greenberg M, Long Q. kTWAS: integrating kernel machine with transcriptome-wide association studies improves statistical power and reveals novel genes. Brief Bioinform 2021; 22:5985285. [PMID: 33200776 DOI: 10.1093/bib/bbaa270] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 09/17/2020] [Accepted: 09/18/2020] [Indexed: 12/31/2022] Open
Abstract
The power of genotype-phenotype association mapping studies increases greatly when contributions from multiple variants in a focal region are meaningfully aggregated. Currently, there are two popular categories of variant aggregation methods. Transcriptome-wide association studies (TWAS) represent a set of emerging methods that select variants based on their effect on gene expressions, providing pretrained linear combinations of variants for downstream association mapping. In contrast to this, kernel methods such as sequence kernel association test (SKAT) model genotypic and phenotypic variance use various kernel functions that capture genetic similarity between subjects, allowing nonlinear effects to be included. From the perspective of machine learning, these two methods cover two complementary aspects of feature engineering: feature selection/pruning and feature aggregation. Thus far, no thorough comparison has been made between these categories, and no methods exist which incorporate the advantages of TWAS- and kernel-based methods. In this work, we developed a novel method called kernel-based TWAS (kTWAS) that applies TWAS-like feature selection to a SKAT-like kernel association test, combining the strengths of both approaches. Through extensive simulations, we demonstrate that kTWAS has higher power than TWAS and multiple SKAT-based protocols, and we identify novel disease-associated genes in Wellcome Trust Case Control Consortium genotyping array data and MSSNG (Autism) sequence data. The source code for kTWAS and our simulations are available in our GitHub repository (https://github.com/theLongLab/kTWAS).
Collapse
Affiliation(s)
- Chen Cao
- Department of Biochemistry & Molecular Biology, University of Calgary
| | - Devin Kwok
- Department of Mathematics & Statistics, University of Calgary
| | | | - Qing Li
- Department of Biochemistry & Molecular Biology, University of Calgary
| | - Bowei Ding
- Department of Mathematics & Statistics, University of Calgary
| | - Pathum Kossinna
- Department of Biochemistry & Molecular Biology, University of Calgary
| | | | - Jingjing Wu
- Department of Mathematics & Statistics, University of Calgary
| | | | - Quan Long
- Departments of Biochemistry & Molecular Biology, Medical Genetics and Mathematics & Statistics
| |
Collapse
|
25
|
Xue H, Shen X, Pan W. Constrained maximum likelihood-based Mendelian randomization robust to both correlated and uncorrelated pleiotropic effects. Am J Hum Genet 2021; 108:1251-1269. [PMID: 34214446 PMCID: PMC8322939 DOI: 10.1016/j.ajhg.2021.05.014] [Citation(s) in RCA: 160] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 05/25/2021] [Indexed: 12/23/2022] Open
Abstract
With the increasing availability of large-scale GWAS summary data on various complex traits and diseases, there have been tremendous interests in applications of Mendelian randomization (MR) to investigate causal relationships between pairs of traits using SNPs as instrumental variables (IVs) based on observational data. In spite of the potential significance of such applications, the validity of their causal conclusions critically depends on some strong modeling assumptions required by MR, which may be violated due to the widespread (horizontal) pleiotropy. Although many MR methods have been proposed recently to relax the assumptions by mainly dealing with uncorrelated pleiotropy, only a few can handle correlated pleiotropy, in which some SNPs/IVs may be associated with hidden confounders, such as some heritable factors shared by both traits. Here we propose a simple and effective approach based on constrained maximum likelihood and model averaging, called cML-MA, applicable to GWAS summary data. To deal with more challenging situations with many invalid IVs with only weak pleiotropic effects, we modify and improve it with data perturbation. Extensive simulations demonstrated that the proposed methods could control the type I error rate better while achieving higher power than other competitors. Applications to 48 risk factor-disease pairs based on large-scale GWAS summary data of 3 cardio-metabolic diseases (coronary artery disease, stroke, and type 2 diabetes), asthma, and 12 risk factors confirmed its superior performance.
Collapse
Affiliation(s)
- Haoran Xue
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA; Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
26
|
Knutson KA, Pan W. Integrating brain imaging endophenotypes with GWAS for Alzheimer's disease. QUANTITATIVE BIOLOGY 2021; 9:185-200. [PMID: 35399757 PMCID: PMC8993183 DOI: 10.1007/s40484-020-0202-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 02/11/2020] [Accepted: 02/28/2020] [Indexed: 01/09/2023]
Abstract
Background Genome wide association studies (GWAS) have identified many genetic variants associated with increased risk of Alzheimer's disease (AD). These susceptibility loci may effect AD indirectly through a combination of physiological brain changes. Many of these neuropathologic features are detectable via magnetic resonance imaging (MRI). Methods In this study, we examine the effects of such brain imaging derived phenotypes (IDPs) with genetic etiology on AD, using and comparing the following methods: two-sample Mendelian randomization (2SMR), generalized summary statistics based Mendelian randomization (GSMR), transcriptome wide association studies (TWAS) and the adaptive sum of powered score (aSPU) test. These methods do not require individual-level genotypic and phenotypic data but instead can rely only on an external reference panel and GWAS summary statistics. Results Using publicly available GWAS datasets from the International Genomics of Alzheimer's Project (IGAP) and UK Biobank's (UKBB) brain imaging initiatives, we identify 35 IDPs possibly associated with AD, many of which have well established or biologically plausible links to the characteristic cognitive impairments of this neurodegenerative disease. Conclusions Our results highlight the increased power for detecting genetic associations achieved by multiple correlated SNP-based methods, i.e., aSPU, GSMR and TWAS, over MR methods based on independent SNPs (as instrumental variables).
Collapse
Affiliation(s)
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
27
|
Zhao B, Shan Y, Yang Y, Yu Z, Li T, Wang X, Luo T, Zhu Z, Sullivan P, Zhao H, Li Y, Zhu H. Transcriptome-wide association analysis of brain structures yields insights into pleiotropy with complex neuropsychiatric traits. Nat Commun 2021; 12:2878. [PMID: 34001886 PMCID: PMC8128893 DOI: 10.1038/s41467-021-23130-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/16/2021] [Indexed: 02/03/2023] Open
Abstract
Structural variations of the human brain are heritable and highly polygenic traits, with hundreds of associated genes identified in recent genome-wide association studies (GWAS). Transcriptome-wide association studies (TWAS) can both prioritize these GWAS findings and also identify additional gene-trait associations. Here we perform cross-tissue TWAS analysis of 211 structural neuroimaging and discover 278 associated genes exceeding Bonferroni significance threshold of 1.04 × 10-8. The TWAS-significant genes for brain structures have been linked to a wide range of complex traits in different domains. Through TWAS gene-based polygenic risk scores (PRS) prediction, we find that TWAS PRS gains substantial power in association analysis compared to conventional variant-based GWAS PRS, and up to 6.97% of phenotypic variance (p-value = 7.56 × 10-31) can be explained in independent testing data sets. In conclusion, our study illustrates that TWAS can be a powerful supplement to traditional GWAS in imaging genetics studies for gene discovery-validation, genetic co-architecture analysis, and polygenic risk prediction.
Collapse
Affiliation(s)
- Bingxin Zhao
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yue Shan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yue Yang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Zhaolong Yu
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Tengfei Li
- Department of Radiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Biomedical Research Imaging Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Xifeng Wang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Tianyou Luo
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ziliang Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Patrick Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Hongyu Zhao
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Biomedical Research Imaging Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
28
|
Lu H, Zhang J, Jiang Z, Zhang M, Wang T, Zhao H, Zeng P. Detection of Genetic Overlap Between Rheumatoid Arthritis and Systemic Lupus Erythematosus Using GWAS Summary Statistics. Front Genet 2021; 12:656545. [PMID: 33815486 PMCID: PMC8012913 DOI: 10.3389/fgene.2021.656545] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/01/2021] [Indexed: 01/04/2023] Open
Abstract
Background Clinical and epidemiological studies have suggested systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) are comorbidities and common genetic etiologies can partly explain such coexistence. However, shared genetic determinations underlying the two diseases remain largely unknown. Methods Our analysis relied on summary statistics available from genome-wide association studies of SLE (N = 23,210) and RA (N = 58,284). We first evaluated the genetic correlation between RA and SLE through the linkage disequilibrium score regression (LDSC). Then, we performed a multiple-tissue eQTL (expression quantitative trait loci) weighted integrative analysis for each of the two diseases and aggregated association evidence across these tissues via the recently proposed harmonic mean P-value (HMP) combination strategy, which can produce a single well-calibrated P-value for correlated test statistics. Afterwards, we conducted the pleiotropy-informed association using conjunction conditional FDR (ccFDR) to identify potential pleiotropic genes associated with both RA and SLE. Results We found there existed a significant positive genetic correlation (rg = 0.404, P = 6.01E-10) via LDSC between RA and SLE. Based on the multiple-tissue eQTL weighted integrative analysis and the HMP combination across various tissues, we discovered 14 potential pleiotropic genes by ccFDR, among which four were likely newly novel genes (i.e., INPP5B, OR5K2, RP11-2C24.5, and CTD-3105H18.4). The SNP effect sizes of these pleiotropic genes were typically positively dependent, with an average correlation of 0.579. Functionally, these genes were implicated in multiple auto-immune relevant pathways such as inositol phosphate metabolic process, membrane and glucagon signaling pathway. Conclusion This study reveals common genetic components between RA and SLE and provides candidate associated loci for understanding of molecular mechanism underlying the comorbidity of the two diseases.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Zhou Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Meng Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
29
|
Xie Y, Shan N, Zhao H, Hou L. Transcriptome wide association studies: general framework and methods. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-020-0228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
30
|
Zhao Y, Sun L. On set‐based association tests: Insights from a regression using summary statistics. CAN J STAT 2020. [DOI: 10.1002/cjs.11584] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Yanyan Zhao
- Department of Statistical Sciences University of Toronto Toronto M5S 3G3 Ontario Canada
| | - Lei Sun
- Department of Statistical Sciences University of Toronto Toronto M5S 3G3 Ontario Canada
- Division of Biostatistics, Dalla Lana School of Public Health University of Toronto Toronto M5T 3M7 Ontario Canada
| |
Collapse
|
31
|
Liu W, Li M, Zhang W, Zhou G, Wu X, Wang J, Lu Q, Zhao H. Leveraging functional annotation to identify genes associated with complex diseases. PLoS Comput Biol 2020; 16:e1008315. [PMID: 33137096 PMCID: PMC7660930 DOI: 10.1371/journal.pcbi.1008315] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 11/12/2020] [Accepted: 09/05/2020] [Indexed: 02/06/2023] Open
Abstract
To increase statistical power to identify genes associated with complex traits, a number of transcriptome-wide association study (TWAS) methods have been proposed using gene expression as a mediating trait linking genetic variations and diseases. These methods first predict expression levels based on inferred expression quantitative trait loci (eQTLs) and then identify expression-mediated genetic effects on diseases by associating phenotypes with predicted expression levels. The success of these methods critically depends on the identification of eQTLs, which may not be functional in the corresponding tissue, due to linkage disequilibrium (LD) and the correlation of gene expression between tissues. Here, we introduce a new method called T-GEN (Transcriptome-mediated identification of disease-associated Genes with Epigenetic aNnotation) to identify disease-associated genes leveraging epigenetic information. Through prioritizing SNPs with tissue-specific epigenetic annotation, T-GEN can better identify SNPs that are both statistically predictive and biologically functional. We found that a significantly higher percentage (an increase of 18.7% to 47.2%) of eQTLs identified by T-GEN are inferred to be functional by ChromHMM and more are deleterious based on their Combined Annotation Dependent Depletion (CADD) scores. Applying T-GEN to 207 complex traits, we were able to identify more trait-associated genes (ranging from 7.7% to 102%) than those from existing methods. Among the identified genes associated with these traits, T-GEN can better identify genes with high (>0.99) pLI scores compared to other methods. When T-GEN was applied to late-onset Alzheimer's disease, we identified 96 genes located at 15 loci, including two novel loci not implicated in previous GWAS. We further replicated 50 genes in an independent GWAS, including one of the two novel loci.
Collapse
Affiliation(s)
- Wei Liu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Mo Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
| | - Wenfeng Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
| | - Geyu Zhou
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Xing Wu
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, United States of America
| | - Jiawei Wang
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, WI, United States of America
- Department of Statistics, University of Wisconsin-Madison, WI, United States of America
- Center for Demography of Health and Aging, University of Wisconsin-Madison, WI, United States of America
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States of America
- Department of Genetics, Yale School of Medicine, New Haven, CT, United States of America
| |
Collapse
|
32
|
Xue H, Pan W. Inferring causal direction between two traits in the presence of horizontal pleiotropy with GWAS summary data. PLoS Genet 2020; 16:e1009105. [PMID: 33137120 PMCID: PMC7660933 DOI: 10.1371/journal.pgen.1009105] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 11/12/2020] [Accepted: 09/08/2020] [Indexed: 01/14/2023] Open
Abstract
Orienting the causal relationship between pairs of traits is a fundamental task in scientific research with significant implications in practice, such as in prioritizing molecular targets and modifiable risk factors for developing therapeutic and interventional strategies for complex diseases. A recent method, called Steiger's method, using a single SNP as an instrument variable (IV) in the framework of Mendelian randomization (MR), has since been widely applied. We report the following new contributions. First, we propose a single SNP-based alternative, overcoming a severe limitation of Steiger's method in simply assuming, instead of inferring, the existence of a causal relationship. We also clarify a condition necessary for the validity of the methods in the presence of hidden confounding. Second, to improve statistical power, we propose combining the results from multiple, and possibly correlated, SNPs as multiple instruments. Third, we develop three goodness-of-fit tests to check modeling assumptions, including those required for valid IVs. Fourth, by relaxing one of the three IV assumptions in MR, we propose several methods, including an Egger regression-like approach and its multivariable version (analogous to multivariable MR), to account for horizontal pleiotropy of the SNPs/IVs, which is often unavoidable in practice. All our methods can simultaneously infer both the existence and (if so) the direction of a causal relationship, largely expanding their applicability over that of Steiger's method. Although we focus on uni-directional causal relationships, we also briefly discuss an extension to bi-directional relationships. Through extensive simulations and an application to infer the causal directions between low density lipoprotein (LDL) cholesterol, or high density lipoprotein (HDL) cholesterol, and coronary artery disease (CAD), we demonstrate the superior performance and advantage of our proposed methods over Steiger's method and bi-directional MR. In particular, after accounting for horizontal pleiotropy, our method confirmed the well known causal direction from LDL to CAD, while other methods, including bi-directional MR, might fail.
Collapse
Affiliation(s)
- Haoran Xue
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
33
|
Yang T, Tang H, Risch HA, Olson SH, Petersen G, Bracci PM, Gallinger S, Hung R, Neale RE, Scelo G, Duell EJ, Kurtz RC, Khaw KT, Severi G, Sund M, Wareham N, Amos CI, Li D, Wei P. Incorporating multiple sets of eQTL weights into gene-by-environment interaction analysis identifies novel susceptibility loci for pancreatic cancer. Genet Epidemiol 2020; 44:880-892. [PMID: 32779232 PMCID: PMC7657998 DOI: 10.1002/gepi.22348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 07/14/2020] [Accepted: 07/30/2020] [Indexed: 11/11/2022]
Abstract
It is of great scientific interest to identify interactions between genetic variants and environmental exposures that may modify the risk of complex diseases. However, larger sample sizes are usually required to detect gene-by-environment interaction (G × E) than required to detect genetic main association effects. To boost the statistical power and improve the understanding of the underlying molecular mechanisms, we incorporate functional genomics information, specifically, expression quantitative trait loci (eQTLs), into a data-adaptive G × E test, called aGEw. This test adaptively chooses the best eQTL weights from multiple tissues and provides an extra layer of weighting at the genetic variant level. Extensive simulations show that the aGEw test can control the Type 1 error rate, and the power is resilient to the inclusion of neutral variants and noninformative external weights. We applied the proposed aGEw test to the Pancreatic Cancer Case-Control Consortium (discovery cohort of 3,585 cases and 3,482 controls) and the PanScan II genome-wide association study data (replication cohort of 2,021 cases and 2,105 controls) with smoking as the exposure of interest. Two novel putative smoking-related pancreatic cancer susceptibility genes, TRIP10 and KDM3A, were identified. The aGEw test is implemented in an R package aGE.
Collapse
Affiliation(s)
- Tianzhong Yang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Divison of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| | - Hongwei Tang
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Sara H. Olson
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, US
| | - Gloria Petersen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Paige M. Bracci
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Steven Gallinger
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, Canada
| | - Rayjean Hung
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, Canada
| | - Rachel E. Neale
- Cancer Aetiology and Prevention Group, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | | | - Eric J. Duell
- Unit of Nutrition and Cancer, Cancer Epidemiology Research Program Catalan Institute of Oncology - Bellvitge Biomedical Research Institute (ICO-IDIBELL) Avda. Gran Via 199-203 08908 L’Hospitalet de Llobregat, Barcelona, Spain
| | - Robert C. Kurtz
- Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Kay-Tee Khaw
- Department of Public Health and Primary Care, University of Cambridge, UK
| | - Gianluca Severi
- Gustave Roussy, F-94805, Villejuif, France
- CESP, Fac. de médecine - Univ. Paris-Sud, Fac. de médecine - UVSQ, INSERM, Université Paris-Saclay, 94805, Villejuif, France
| | - Malin Sund
- Department of Surgical and Perioperative Sciences, Umeå University, Sweden
| | - Nick Wareham
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | - Christopher I Amos
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Donghui Li
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
34
|
Song M, Greenbaum J, Luttrell J, Zhou W, Wu C, Shen H, Gong P, Zhang C, Deng HW. A Review of Integrative Imputation for Multi-Omics Datasets. Front Genet 2020; 11:570255. [PMID: 33193667 PMCID: PMC7594632 DOI: 10.3389/fgene.2020.570255] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 09/16/2020] [Indexed: 01/05/2023] Open
Abstract
Multi-omics studies, which explore the interactions between multiple types of biological factors, have significant advantages over single-omics analysis for their ability to provide a more holistic view of biological processes, uncover the causal and functional mechanisms for complex diseases, and facilitate new discoveries in precision medicine. However, omics datasets often contain missing values, and in multi-omics study designs it is common for individuals to be represented for some omics layers but not all. Since most statistical analyses cannot be applied directly to the incomplete datasets, imputation is typically performed to infer the missing values. Integrative imputation techniques which make use of the correlations and shared information among multi-omics datasets are expected to outperform approaches that rely on single-omics information alone, resulting in more accurate results for the subsequent downstream analyses. In this review, we provide an overview of the currently available imputation methods for handling missing values in bioinformatics data with an emphasis on multi-omics imputation. In addition, we also provide a perspective on how deep learning methods might be developed for the integrative imputation of multi-omics datasets.
Collapse
Affiliation(s)
- Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Jonathan Greenbaum
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Weihua Zhou
- College of Computing, Michigan Technological University, Houghton, MI, United States
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Hui Shen
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| |
Collapse
|
35
|
Xue H, Wu C, Pan W. Leveraging existing GWAS summary data of genetically correlated and uncorrelated traits to improve power for a new GWAS. Genet Epidemiol 2020; 44:717-732. [PMID: 32677173 PMCID: PMC7722071 DOI: 10.1002/gepi.22333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 06/09/2020] [Accepted: 06/18/2020] [Indexed: 11/08/2022]
Abstract
In spite of the tremendous success of genome-wide association studies (GWAS) in identifying genetic variants associated with complex traits and common diseases, many more are yet to be discovered. Hence, it is always desirable to improve the statistical power of GWAS. Paralleling with the intensive efforts of integrating GWAS with functional annotations or other omic data, we propose leveraging other published GWAS summary data to boost statistical power for a new/focus GWAS; the traits of the published GWAS may or may not be genetically correlated with the target trait of the new GWAS. Building on weighted hypothesis testing with a solid theoretical foundation, we develop a novel and effective method to construct single-nucleotide polymorphism (SNP)-specific weights based on 22 published GWAS data sets with various traits, detecting sometimes dramatically increased numbers of significant SNPs and independent loci as compared to the standard/unweighted analysis. For example, by integrating a schizophrenia GWAS summary data set with 19 other GWAS summary data sets of nonschizophrenia traits, our new method identified 1,585 genome-wide significant SNPs mapping to 15 linkage disequilibrium-independent loci, largely exceeding 818 significant SNPs in 13 independent loci identified by the standard/unweighted analysis; furthermore, using a later and larger schizophrenia GWAS summary data set as the validation data, 1,423 (out of 1,585) significant SNPs identified by the weighted analysis, compared to 705 (out of 818) by the unweighted analysis, were confirmed, while all 15 and 13 independent loci were also confirmed. Similar conclusions were reached with lipids and Alzheimer's disease (AD) traits. We conclude that the proposed approach is simple and cost-effective to improve GWAS power.
Collapse
Affiliation(s)
- Haoran Xue
- School of Statistics, University of Minnesota, Minneapolis, Minnesota
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, Florida
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
36
|
Evans P, Cox NJ, Gamazon ER. The regulatory genome constrains protein sequence evolution: implications for the search for disease-associated genes. PeerJ 2020; 8:e9554. [PMID: 32765967 PMCID: PMC7380284 DOI: 10.7717/peerj.9554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 06/24/2020] [Indexed: 11/20/2022] Open
Abstract
The development of explanatory models of protein sequence evolution has broad implications for our understanding of cellular biology, population history, and disease etiology. Here we analyze the GTEx transcriptome resource to quantify the effect of the transcriptome on protein sequence evolution in a multi-tissue framework. We find substantial variation among the central nervous system tissues in the effect of expression variance on evolutionary rate, with highly variable genes in the cortex showing significantly greater purifying selection than highly variable genes in subcortical regions (Mann-Whitney U p = 1.4 × 10-4). The remaining tissues cluster in observed expression correlation with evolutionary rate, enabling evolutionary analysis of genes in diverse physiological systems, including digestive, reproductive, and immune systems. Importantly, the tissue in which a gene attains its maximum expression variance significantly varies (p = 5.55 × 10-284) with evolutionary rate, suggesting a tissue-anchored model of protein sequence evolution. Using a large-scale reference resource, we show that the tissue-anchored model provides a transcriptome-based approach to predicting the primary affected tissue of developmental disorders. Using gradient boosted regression trees to model evolutionary rate under a range of model parameters, selected features explain up to 62% of the variation in evolutionary rate and provide additional support for the tissue model. Finally, we investigate several methodological implications, including the importance of evolutionary-rate-aware gene expression imputation models using genetic data for improved search for disease-associated genes in transcriptome-wide association studies. Collectively, this study presents a comprehensive transcriptome-based analysis of a range of factors that may constrain molecular evolution and proposes a novel framework for the study of gene function and disease mechanism.
Collapse
Affiliation(s)
- Patrick Evans
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Nancy J Cox
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America
| | - Eric R Gamazon
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, United States of America.,Clare Hall, University of Cambridge, Cambridge, United Kingdom.,MRC Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom.,Data Science Institute, Vanderbilt University, Nashville, TN, United States of America
| |
Collapse
|
37
|
Werren EA, Garcia O, Bigham AW. Identifying adaptive alleles in the human genome: from selection mapping to functional validation. Hum Genet 2020; 140:241-276. [PMID: 32728809 DOI: 10.1007/s00439-020-02206-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 07/07/2020] [Indexed: 12/19/2022]
Abstract
The suite of phenotypic diversity across geographically distributed human populations is the outcome of genetic drift, gene flow, and natural selection throughout human evolution. Human genetic variation underlying local biological adaptations to selective pressures is incompletely characterized. With the emergence of population genetics modeling of large-scale genomic data derived from diverse populations, scientists are able to map signatures of natural selection in the genome in a process known as selection mapping. Inferred selection signals further can be used to identify candidate functional alleles that underlie putative adaptive phenotypes. Phenotypic association, fine mapping, and functional experiments facilitate the identification of candidate adaptive alleles. Functional investigation of candidate adaptive variation using novel techniques in molecular biology is slowly beginning to unravel how selection signals translate to changes in biology that underlie the phenotypic spectrum of our species. In addition to informing evolutionary hypotheses of adaptation, the discovery and functional annotation of adaptive alleles also may be of clinical significance. While selection mapping efforts in non-European populations are growing, there remains a stark under-representation of diverse human populations in current public genomic databases, of both clinical and non-clinical cohorts. This lack of inclusion limits the study of human biological variation. Identifying and functionally validating candidate adaptive alleles in more global populations is necessary for understanding basic human biology and human disease.
Collapse
Affiliation(s)
- Elizabeth A Werren
- Department of Human Genetics, The University of Michigan, Ann Arbor, MI, USA
- Department of Anthropology, The University of Michigan, Ann Arbor, MI, USA
| | - Obed Garcia
- Department of Anthropology, The University of Michigan, Ann Arbor, MI, USA
| | - Abigail W Bigham
- Department of Anthropology, University of California Los Angeles, 341 Haines Hall, Los Angeles, CA, 90095, USA.
| |
Collapse
|
38
|
Wu C, Xu G, Shen X, Pan W. A Regularization-Based Adaptive Test for High-Dimensional Generalized Linear Models. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2020; 21:128. [PMID: 32802002 PMCID: PMC7425805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type I error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type I error rates while yielding high statistical power across a wide range of alternatives. To calculate its p-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer's Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer's disease. We also put R package "aispu" implementing the proposed test on GitHub.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University, FL, USA
| | - Gongjun Xu
- Department of Statistics, University of Michigan, MI, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, MN, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, MN, USA
| |
Collapse
|
39
|
Abstract
Since the initial success of genome-wide association studies (GWAS) in 2005, tens of thousands of genetic variants have been identified for hundreds of human diseases and traits. In a GWAS, genotype information at up to millions of genetic markers is collected from up to hundreds of thousands of individuals, together with their phenotype information. Several scientific goals can be accomplished through the analysis of GWAS data, including the identification of variants, genes, and pathways associated with diseases and traits of interest; the inference of the genetic architecture of these traits; and the development of genetic risk prediction models. In this review, we provide an overview of the statistical challenges in achieving these goals and recent progress in statistical methodology to address these challenges.
Collapse
Affiliation(s)
- Ning Sun
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| |
Collapse
|
40
|
Rajarajan P, Akbarian S. Use of the epigenetic toolbox
to contextualize common variants associated with schizophrenia risk
. DIALOGUES IN CLINICAL NEUROSCIENCE 2020; 21:407-416. [PMID: 31949408 PMCID: PMC6952750 DOI: 10.31887/dcns.2019.21.4/sakbarian] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Schizophrenia is a debilitating psychiatric disorder with a complex genetic architecture and limited understanding of its neuropathology, reflected by the lack of diagnostic measures and effective pharmacological treatments. Geneticists have recently identified more than 145 risk loci comprising hundreds of common variants of small effect sizes, most of which lie in noncoding genomic regions. This review will discuss how the epigenetic toolbox can be applied to contextualize genetic findings in schizophrenia. Progress in next-generation sequencing, along with increasing methodological complexity, has led to the compilation of genome-wide maps of DNA methylation, histone modifications, RNA expression, and more. Integration of chromatin conformation datasets is one of the latest efforts in deciphering schizophrenia risk, allowing the identification of genes in contact with regulatory variants across 100s of kilobases. Large-scale multiomics studies will facilitate the prioritization of putative causal risk variants and gene networks that contribute to schizophrenia etiology, informing clinical diagnostics and treatment downstream.
.
Collapse
Affiliation(s)
- Prashanth Rajarajan
- Graduate School of Biomedical Sciences; Department of Psychiatry; Friedman Brain Institute; Icahn School of Medicine at Mount Sinai, New York, NY, US
| | - Schahram Akbarian
- Department of Psychiatry; Friedman Brain Institute; Icahn School of Medicine at Mount Sinai, New York, NY, US
| |
Collapse
|
41
|
Wu C, Pan W. Integration of methylation QTL and enhancer-target gene maps with schizophrenia GWAS summary results identifies novel genes. Bioinformatics 2020; 35:3576-3583. [PMID: 30850848 DOI: 10.1093/bioinformatics/btz161] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 02/04/2019] [Accepted: 03/05/2019] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION Most trait-associated genetic variants identified in genome-wide association studies (GWASs) are located in non-coding regions of the genome and thought to act through their regulatory roles. RESULTS To account for enriched association signals in DNA regulatory elements, we propose a novel and general gene-based association testing strategy that integrates enhancer-target gene pairs and methylation quantitative trait locus data with GWAS summary results; it aims to both boost statistical power for new discoveries and enhance mechanistic interpretability of any new discovery. By reanalyzing two large-scale schizophrenia GWAS summary datasets, we demonstrate that the proposed method could identify some significant and novel genes (containing no genome-wide significant SNPs nearby) that would have been missed by other competing approaches, including the standard and some integrative gene-based association methods, such as one incorporating enhancer-target gene pairs and one integrating expression quantitative trait loci. AVAILABILITY AND IMPLEMENTATION Software: wuchong.org/egmethyl.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
42
|
Zhang J, Xie S, Gonzales S, Liu J, Wang X. A fast and powerful eQTL weighted method to detect genes associated with complex trait using GWAS summary data. Genet Epidemiol 2020; 44:550-563. [PMID: 32350919 DOI: 10.1002/gepi.22297] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 04/13/2020] [Accepted: 04/14/2020] [Indexed: 02/06/2023]
Abstract
Although genomewide association studies (GWASs) have identified many genetic variants underlying complex traits, a large fraction of heritability still remains unexplained. Integrative analysis that incorporates additional information, such as expression quantitativetrait locus (eQTL) data into sequencing studies (denoted as transcriptomewide association study [TWAS]), can aid the discovery of trait-associated genetic variants. However, general TWAS methods only incorporate one eQTL-derived weight (e.g., cis-effect), and thus can suffer a substantial loss of power when the single estimated cis-effect is not predictive for the effect size of a genetic variant or when there are estimation errors in the estimated cis-effect, or if the data are not consistent with the model assumption. In this study, we propose an omnibus test (OT) which utilizes a Cauchy association test to integrate association evidence demonstrated by three different traditional tests (burden test, quadratic test, and adaptive test) using GWAS summary data with multiple eQTL-derived weights. The p value of the proposed test can be calculated analytically, and thus it is fast and efficient. We applied our proposed test to two schizophrenia (SCZ) GWAS summary data sets and two lipids trait (HDL) GWAS summary data sets. Compared with the three traditional tests, our proposed OT can identify more trait-associated genes.
Collapse
Affiliation(s)
- Jianjun Zhang
- Department of Mathematics, University of North Texas, Denton, Texas
| | - Sicong Xie
- Beijing National Day School, Beijing, China
| | - Samantha Gonzales
- Department of Computer Science and Engineering, University of North Texas, Denton, Texas
| | - Jianguo Liu
- Department of Mathematics, University of North Texas, Denton, Texas
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, Texas
| |
Collapse
|
43
|
Xue H, Pan W. Some statistical consideration in transcriptome-wide association studies. Genet Epidemiol 2020; 44:221-232. [PMID: 31821608 PMCID: PMC7064426 DOI: 10.1002/gepi.22274] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 10/01/2019] [Accepted: 11/25/2019] [Indexed: 11/08/2022]
Abstract
The methodology of transcriptome-wide association studies (TWAS) has become popular in integrating a reference expression quantitative trait (eQTL) data set with an independent main GWAS data set to identify (putatively) causal genes, shedding mechanistic insights to biological pathways from genetic variants to a GWAS trait mediated by gene expression. Statistically TWAS is a (two-sample) 2-stage least squares (2SLS) method in the framework of instrumental variables analysis for causal inference: in Stage 1 it uses the reference eQTL data to impute a genes expression for the main GWAS data, then in Stage 2 it tests for association between the imputed gene expression and the GWAS trait; if an association is detected in Stage 2, a (putatively) causal relationship between the gene and the GWAS trait is claimed. If a nonlinear model or a generalized linear model (GLM) is fitted in Stage 2 (e.g., for a binary GWAS trait), it is known that using only imputed gene expression, as in standard TWAS, in general does not lead to a consistent (i.e., asymptotically unbiased) estimate for the causal effect; accordingly, a variation of 2SLS, called two-stage residual inclusion (2SRI), has been proposed to yield better estimates (e.g., being consistent under suitable conditions). Our main goal is to investigate whether it is necessary or even better to apply 2SRI, instead of the standard 2SLS. In addition, due to the use of imputed gene expression (i.e., with measurement errors), it is known that in general some correction to the standard error estimate of the causal effect estimate has to be applied, while in the standard TWAS no correction is applied. Is this an issue? We also compare one-sample 2SLS with two-sample 2SLS (i.e., the standard TWAS). We used the Alzheimer's Disease Neuroimaging Initiative (ADNI) data and simulated data mimicking the ADNI data to address the above questions. At the end, we conclude that, in practice with the large sample sizes and small effect sizes of genetic variants, the standard TWAS performs well and is recommended.
Collapse
Affiliation(s)
- Haoran Xue
- School of Statistics, University of Minnesota, Minneapolis, Minnesota
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
44
|
Deng Y, Pan W. A powerful and versatile colocalization test. PLoS Comput Biol 2020; 16:e1007778. [PMID: 32275709 PMCID: PMC7176287 DOI: 10.1371/journal.pcbi.1007778] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 04/22/2020] [Accepted: 03/08/2020] [Indexed: 12/17/2022] Open
Abstract
Transcriptome-wide association studies (TWAS and PrediXcan) have been increasingly applied to detect associations between genetically predicted gene expressions and GWAS traits, which may suggest, however do not completely determine, causal genes for GWAS traits, due to the likely violation of their imposed strong assumptions for causal inference. Testing colocalization moves it closer to establishing causal relationships: if a GWAS trait and a gene's expression share the same associated SNP, it may suggest a regulatory (and thus putative causal) role of the SNP mediated through the gene on the GWAS trait. Accordingly, it is of interest to develop and apply various colocalization testing approaches. The existing approaches may each have some severe limitations. For instance, some methods test the null hypothesis that there is colocalization, which is not ideal because often the null hypothesis cannot be rejected simply due to limited statistical power (with too small sample sizes). Some other methods arbitrarily restrict the maximum number of causal SNPs in a locus, which may lead to loss of power in the presence of wide-spread allelic heterogeneity. Importantly, most methods cannot be applied to either GWAS/eQTL summary statistics or cases with more than two possibly correlated traits. Here we present a simple and general approach based on conditional analysis of a locus on multiple traits, overcoming the above and other shortcomings of the existing methods. We demonstrate that, compared with other methods, our new method can be applied to a wider range of scenarios and often perform better. We showcase its applications to both simulated and real data, including a large-scale Alzheimer's disease GWAS summary dataset and a gene expression dataset, and a large-scale blood lipid GWAS summary association dataset. An R package "jointsum" implementing the proposed method is publicly available at github.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
45
|
Wu C, Pan W. A powerful fine-mapping method for transcriptome-wide association studies. Hum Genet 2020; 139:199-213. [PMID: 31844974 PMCID: PMC6983348 DOI: 10.1007/s00439-019-02098-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 12/07/2019] [Indexed: 01/14/2023]
Abstract
Transcriptome-wide association studies (TWAS) have been recently applied to successfully identify many novel genes associated with complex traits. While appealing, TWAS tend to identify multiple significant genes per locus, and many of them may not be causal due to confounding through linkage disequilibrium (LD) among SNPs. Here we introduce a powerful fine-mapping method that prioritizes putative causal genes by accounting for local LD. We apply a weighted adaptive test with eQTL-derived weights to maintain high power across various scenarios. Through simulations, we show that our new approach yielded a well-controlled Type I error rate while achieving higher power and AUC than competing methods. We applied our approach to a schizophrenia GWAS summary dataset and successfully prioritized some well-known schizophrenia-related genes, such as C4A. Importantly, our approach identified some putative causal genes (e.g., B3GAT1 and RGS6) that were missed by competing methods and TWAS. Our results suggest that our approach is a useful tool to prioritize putative causal genes, gaining insights into the mechanisms of complex traits.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA.
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
46
|
Yang T, Wu C, Wei P, Pan W. Integrating DNA sequencing and transcriptomic data for association analyses of low-frequency variants and lipid traits. Hum Mol Genet 2020; 29:515-526. [PMID: 31919517 PMCID: PMC7015848 DOI: 10.1093/hmg/ddz314] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 12/11/2019] [Accepted: 12/16/2019] [Indexed: 12/13/2022] Open
Abstract
Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and transcriptomic data to showcase their improved statistical power of identifying gene-trait associations while, importantly, offering further biological insights. TWAS have thus far focused on common variants as available from GWAS. Compared with common variants, the findings for or even applications to low-frequency variants are limited and their underlying role in regulating gene expression is less clear. To fill this gap, we extend TWAS to integrating whole genome sequencing data with transcriptomic data for low-frequency variants. Using the data from the Framingham Heart Study, we demonstrate that low-frequency variants play an important and universal role in predicting gene expression, which is not completely due to linkage disequilibrium with the nearby common variants. By including low-frequency variants, in addition to common variants, we increase the predictivity of gene expression for 79% of the examined genes. Incorporating this piece of functional genomic information, we perform association testing for five lipid traits in two UK10K whole genome sequencing cohorts, hypothesizing that cis-expression quantitative trait loci, including low-frequency variants, are more likely to be trait-associated. We discover that two genes, LDLR and TTC22, are genome-wide significantly associated with low-density lipoprotein cholesterol based on 3203 subjects and that the association signals are largely independent of common variants. We further demonstrate that a joint analysis of both common and low-frequency variants identifies association signals that would be missed by testing on either common variants or low-frequency variants alone.
Collapse
Affiliation(s)
- Tianzhong Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
47
|
Keel BN, Snelling WM, Lindholm-Perry AK, Oliver WT, Kuehn LA, Rohrer GA. Using SNP Weights Derived From Gene Expression Modules to Improve GWAS Power for Feed Efficiency in Pigs. Front Genet 2020; 10:1339. [PMID: 32038708 PMCID: PMC6985563 DOI: 10.3389/fgene.2019.01339] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 12/09/2019] [Indexed: 01/24/2023] Open
Abstract
The "large p small n" problem has posed a significant challenge in the analysis and interpretation of genome-wide association studies (GWAS). The use of prior information to rank genomic regions and perform SNP selection could increase the power of GWAS. In this study, we propose the use of gene expression data from RNA-Seq of multiple tissues as prior information to assign weights to SNP, select SNP based on a weight threshold, and utilize weighted hypothesis testing to conduct a GWAS. RNA-Seq libraries from hypothalamus, duodenum, ileum, and jejunum tissue of 30 pigs with divergent feed efficiency phenotypes were sequenced, and a three-way gene x individual x tissue clustering analysis was performed, using constrained tensor decomposition, to obtain a total of 10 gene expression modules. Loading values from each gene module were used to assign weights to 49,691 commercial SNP markers, and SNP were selected using a weight threshold, resulting in 10 SNP sets ranging in size from 101 to 955 markers. Weighted GWAS for feed intake in 4,200 pigs was performed separately for each of the 10 SNP sets. A total of 36 unique significant SNP associations were identified across the ten gene modules (SNP sets). For comparison, a standard unweighted GWAS using all 49,691 SNP was performed, and only 2 SNP were significant. None of the SNP from the unweighted analysis resided in known QTL related to swine feed efficiency (feed intake, average daily gain, and feed conversion ratio) compared to 29 (80.6%) in the weighted analyses, with 9 SNP residing in feed intake QTL. These results suggest that the heritability of feed intake is driven by many SNP that individually do not attain genome-wide significance in GWAS. Hence, the proposed procedure for prioritizing SNP based on gene expression data across multiple tissues provides a promising approach for improving the power of GWAS.
Collapse
Affiliation(s)
- Brittney N. Keel
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE, United States
| | | | | | | | | | | |
Collapse
|
48
|
Yang T, Kim J, Wu C, Ma Y, Wei P, Pan W. An adaptive test for meta-analysis of rare variant association studies. Genet Epidemiol 2020; 44:104-116. [PMID: 31830326 PMCID: PMC6980317 DOI: 10.1002/gepi.22273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/12/2019] [Accepted: 11/25/2019] [Indexed: 01/02/2023]
Abstract
Single genome-wide studies may be underpowered to detect trait-associated rare variants with moderate or weak effect sizes. As a viable alternative, meta-analysis is widely used to increase power by combining different studies. The power of meta-analysis critically depends on the underlying association patterns and heterogeneity levels, which are unknown and vary from locus to locus. However, existing methods mainly focus on one or only a few combinations of the association pattern and heterogeneity level, thus may lose power in many situations. To address this issue, we propose a general and unified framework by combining a class of tests including and beyond some existing ones, leading to high power across a wide range of scenarios. We demonstrate that the proposed test is more powerful than some existing methods in simulation studies, then show their performance with the NHLBI Exome-Sequencing Project (ESP) data. One gene (B4GALNT2) was found by our proposed test, but not by others, to be statistically significantly associated with plasma triglyceride. The signal was driven by African-ancestry subjects but it was previously reported to be associated with coronary artery disease among European-ancestry subjects. We implemented our method in an R package aSPUmeta, publicly available at https://github.com/ytzhong/metaRV and will be on CRAN soon.
Collapse
Affiliation(s)
- Tianzhong Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Junghi Kim
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Yiding Ma
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
49
|
Pattee J, Zhan X, Xiao G, Pan W. Integrating germline and somatic genetics to identify genes associated with lung cancer. Genet Epidemiol 2019; 44:233-247. [PMID: 31821614 DOI: 10.1002/gepi.22275] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 10/31/2019] [Accepted: 11/25/2019] [Indexed: 12/22/2022]
Abstract
Genome-wide association studies (GWAS) have successfully identified many genetic variants associated with complex traits. However, GWAS experience power issues, resulting in the failure to detect certain associated variants. Additionally, GWAS are often unable to parse the biological mechanisms of driving associations. An existing gene-based association test framework, Transcriptome-Wide Association Studies (TWAS), leverages expression quantitative trait loci data to increase the power of association tests and illuminate the biological mechanisms by which genetic variants modulate complex traits. We extend the TWAS methodology to incorporate somatic information from tumors. By integrating germline and somatic data we are able to leverage information from the nuanced somatic landscape of tumors. Thus we can augment the power of TWAS-type tests to detect germline genetic variants associated with cancer phenotypes. We use somatic and germline data on lung adenocarcinomas from The Cancer Genome Atlas in conjunction with a meta-analyzed lung cancer GWAS to identify novel genes associated with lung cancer.
Collapse
Affiliation(s)
- Jack Pattee
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Xiaowei Zhan
- Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Guanghua Xiao
- Quantitative Biomedical Research Center, Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
50
|
Gusev A, Lawrenson K, Lin X, Lyra PC, Kar S, Vavra KC, Segato F, Fonseca MA, Lee JM, Pejovic T, Liu G, Karlan BY, Freedman ML, Noushmehr H, Monteiro AN, Pharoah PD, Pasaniuc B, Gayther SA. A transcriptome-wide association study of high-grade serous epithelial ovarian cancer identifies new susceptibility genes and splice variants. Nat Genet 2019; 51:815-823. [PMID: 31043753 PMCID: PMC6548545 DOI: 10.1038/s41588-019-0395-x] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 03/15/2019] [Indexed: 12/31/2022]
Abstract
We sought to identify susceptibility genes for high-grade serous ovarian cancer (HGSOC) by performing a transcriptome-wide association study of gene expression and splice junction usage in HGSOC-relevant tissue types (N = 2,169) and the largest genome-wide association study available for HGSOC (N = 13,037 cases and 40,941 controls). We identified 25 transcriptome-wide association study significant genes, 7 at the junction level only, including LRRC46 at 19q21.32, (P = 1 × 10-9), CHMP4C at 8q21 (P = 2 × 10-11) and a PRC1 junction at 15q26 (P = 7 × 10-9). In vitro assays for CHMP4C showed that the associated variant induces allele-specific exon inclusion (P = 0.0024). Functional screens in HGSOC cell lines found evidence of essentiality for three of the new genes we identified: HAUS6, KANSL1 and PRC1, with the latter comparable to MYC. Our study implicates at least one target gene for 6 out of 13 distinct genome-wide association study regions, identifying 23 new candidate susceptibility genes for HGSOC.
Collapse
Affiliation(s)
- Alexander Gusev
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Kate Lawrenson
- Women’s Cancer Program at the Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Suite 290W, Los Angeles, CA, USA
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Sciences, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Xianzhi Lin
- Women’s Cancer Program at the Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Suite 290W, Los Angeles, CA, USA
| | - Paulo C. Lyra
- Cancer Epidemiology Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL USA
| | - Siddhartha Kar
- CR-UK Department of Oncology, University of Cambridge, Strangeways Research Laboratory, Cambridge, UK
| | - Kevin C. Vavra
- Women’s Cancer Program at the Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Suite 290W, Los Angeles, CA, USA
| | - Felipe Segato
- Department of Genetics, Ribeirão Preto Medical School, University of São Paulo, 14049-900, Brazil
| | - Marcos A.S. Fonseca
- Women’s Cancer Program at the Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Suite 290W, Los Angeles, CA, USA
| | - Janet M Lee
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Sciences, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Tanya Pejovic
- Department of Obstetrics and Gynecology, Oregon Health and Science University, Portland, OR, USA
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Gang Liu
- Women’s Cancer Program at the Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Suite 290W, Los Angeles, CA, USA
| | | | - Beth Y. Karlan
- Women’s Cancer Program at the Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Suite 290W, Los Angeles, CA, USA
| | - Matthew L. Freedman
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Houtan Noushmehr
- Cancer Epidemiology Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL USA
- Department of Neurosurgery, Henry Ford Hospital, Detroit, MI, USA
| | - Alvaro N. Monteiro
- Cancer Epidemiology Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL USA
| | - Paul D.P. Pharoah
- CR-UK Department of Oncology, University of Cambridge, Strangeways Research Laboratory, Cambridge, UK
| | - Bogdan Pasaniuc
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Simon A. Gayther
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Sciences, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| |
Collapse
|