51
|
Opportunities and challenges for transcriptome-wide association studies. Nat Genet 2019; 51:592-599. [PMID: 30926968 DOI: 10.1038/s41588-019-0385-z] [Citation(s) in RCA: 495] [Impact Index Per Article: 82.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 02/13/2019] [Indexed: 11/08/2022]
Abstract
Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and gene expression datasets to identify gene-trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes at GWAS loci, by using simulations and case studies of literature-curated candidate causal genes for schizophrenia, low-density-lipoprotein cholesterol and Crohn's disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene as well as loci where TWAS prioritizes multiple genes, some likely to be non-causal, owing to sharing of expression quantitative trait loci (eQTL). TWAS is especially prone to spurious prioritization with expression data from non-trait-related tissues or cell types, owing to substantial cross-cell-type variation in expression levels and eQTL strengths. Nonetheless, TWAS prioritizes candidate causal genes more accurately than simple baselines. We suggest best practices for causal-gene prioritization with TWAS and discuss future opportunities for improvement. Our results showcase the strengths and limitations of using eQTL datasets to determine causal genes at GWAS loci.
Collapse
|
52
|
Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, Yu Z, Li B, Gu J, Muchnik S, Shi Y, Kunkle BW, Mukherjee S, Natarajan P, Naj A, Kuzma A, Zhao Y, Crane PK, Lu H, Zhao H. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat Genet 2019; 51:568-576. [PMID: 30804563 PMCID: PMC6788740 DOI: 10.1038/s41588-019-0345-7] [Citation(s) in RCA: 220] [Impact Index Per Article: 36.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 01/09/2019] [Indexed: 12/12/2022]
Abstract
Transcriptome-wide association analysis is a powerful approach to studying the genetic architecture of complex traits. A key component of this approach is to build a model to impute gene expression levels from genotypes by using samples with matched genotypes and gene expression data in a given tissue. However, it is challenging to develop robust and accurate imputation models with a limited sample size for any single tissue. Here, we first introduce a multi-task learning method to jointly impute gene expression in 44 human tissues. Compared with single-tissue methods, our approach achieved an average of 39% improvement in imputation accuracy and generated effective imputation models for an average of 120% more genes. We describe a summary-statistic-based testing framework that combines multiple single-tissue associations into a powerful metric to quantify the overall gene-trait association. We applied our method, called UTMOST (unified test for molecular signatures), to multiple genome-wide-association results and demonstrate its advantages over single-tissue strategies.
Collapse
Affiliation(s)
- Yiming Hu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Mo Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Haoyi Weng
- Division of Biostatistics, The Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
| | - Jiawei Wang
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Seyedeh M Zekavat
- Yale School of Medicine, New Haven, CT, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zhaolong Yu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Boyang Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Jianlei Gu
- SJTU-Yale Joint Center for Biostatistics, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, China
| | - Sydney Muchnik
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Yu Shi
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Brian W Kunkle
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | | | - Pradeep Natarajan
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Adam Naj
- Center for Clinical Epidemiology and Biostatistic, and the Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Amanda Kuzma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yi Zhao
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Paul K Crane
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Hui Lu
- SJTU-Yale Joint Center for Biostatistics, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
- SJTU-Yale Joint Center for Biostatistics, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, China.
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
53
|
Ferreira MAR, Vonk JM, Baurecht H, Marenholz I, Tian C, Hoffman JD, Helmer Q, Tillander A, Ullemar V, Lu Y, Rüschendorf F, Hinds DA, Hübner N, Weidinger S, Magnusson PKE, Jorgenson E, Lee YA, Boomsma DI, Karlsson R, Almqvist C, Koppelman GH, Paternoster L. Eleven loci with new reproducible genetic associations with allergic disease risk. J Allergy Clin Immunol 2019; 143:691-699. [PMID: 29679657 PMCID: PMC7189804 DOI: 10.1016/j.jaci.2018.03.012] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Revised: 02/01/2018] [Accepted: 03/19/2018] [Indexed: 12/18/2022]
Abstract
BACKGROUND A recent genome-wide association study (GWAS) identified 99 loci that contain genetic risk variants shared between asthma, hay fever, and eczema. Many more risk loci shared between these common allergic diseases remain to be discovered, which could point to new therapeutic opportunities. OBJECTIVE We sought to identify novel risk loci shared between asthma, hay fever, and eczema by applying a gene-based test of association to results from a published GWAS that included data from 360,838 subjects. METHODS We used approximate conditional analysis to adjust the results from the published GWAS for the effects of the top risk variants identified in that study. We then analyzed the adjusted GWAS results with the EUGENE gene-based approach, which combines evidence for association with disease risk across regulatory variants identified in different tissues. Novel gene-based associations were followed up in an independent sample of 233,898 subjects from the UK Biobank study. RESULTS Of the 19,432 genes tested, 30 had a significant gene-based association at a Bonferroni-corrected P value of 2.5 × 10-6. Of these, 20 were also significantly associated (P < .05/30 = .0016) with disease risk in the replication sample, including 19 that were located in 11 loci not reported to contain allergy risk variants in previous GWASs. Among these were 9 genes with a known function that is directly relevant to allergic disease: FOSL2, VPRBP, IPCEF1, PRR5L, NCF4, APOBR, IL27, ATXN2L, and LAT. For 4 genes (eg, ATXN2L), a genetically determined decrease in gene expression was associated with decreased allergy risk, and therefore drugs that inhibit gene expression or function are predicted to ameliorate disease symptoms. The opposite directional effect was observed for 14 genes, including IL27, a cytokine known to suppress TH2 responses. CONCLUSION Using a gene-based approach, we identified 11 risk loci for allergic disease that were not reported in previous GWASs. Functional studies that investigate the contribution of the 19 associated genes to the pathophysiology of allergic disease and assess their therapeutic potential are warranted.
Collapse
Affiliation(s)
- Manuel A R Ferreira
- Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, Australia.
| | - Judith M Vonk
- Epidemiology, University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD, Groningen, The Netherlands
| | - Hansjörg Baurecht
- Department of Dermatology, Allergology and Venereology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Ingo Marenholz
- Max Delbrück Center (MDC) for Molecular Medicine, Berlin, Germany; Clinic for Pediatric Allergy, Experimental and Clinical Research Center of Charité Universitätsmedizin Berlin and Max Delbrück Center, Berlin, Germany
| | | | - Joshua D Hoffman
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, Calif
| | - Quinta Helmer
- Department Biological Psychology, Netherlands Twin Register, Vrije University, Amsterdam, The Netherlands
| | - Annika Tillander
- Department of Medical Epidemiology and Biostatistics and the Swedish Twin Registry, Karolinska Institutet, Stockholm, Sweden
| | - Vilhelmina Ullemar
- Department of Medical Epidemiology and Biostatistics and the Swedish Twin Registry, Karolinska Institutet, Stockholm, Sweden
| | - Yi Lu
- Department of Medical Epidemiology and Biostatistics and the Swedish Twin Registry, Karolinska Institutet, Stockholm, Sweden
| | | | | | - Norbert Hübner
- Max Delbrück Center (MDC) for Molecular Medicine, Berlin, Germany
| | - Stephan Weidinger
- Department of Dermatology, Allergology and Venereology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Patrik K E Magnusson
- Department of Medical Epidemiology and Biostatistics and the Swedish Twin Registry, Karolinska Institutet, Stockholm, Sweden
| | - Eric Jorgenson
- Division of Research, Kaiser Permanente Northern California, Oakland, Calif
| | - Young-Ae Lee
- Max Delbrück Center (MDC) for Molecular Medicine, Berlin, Germany; Clinic for Pediatric Allergy, Experimental and Clinical Research Center of Charité Universitätsmedizin Berlin and Max Delbrück Center, Berlin, Germany
| | - Dorret I Boomsma
- Department Biological Psychology, Netherlands Twin Register, Vrije University, Amsterdam, The Netherlands
| | - Robert Karlsson
- Department of Medical Epidemiology and Biostatistics and the Swedish Twin Registry, Karolinska Institutet, Stockholm, Sweden
| | - Catarina Almqvist
- Department of Medical Epidemiology and Biostatistics and the Swedish Twin Registry, Karolinska Institutet, Stockholm, Sweden; Pediatric Allergy and Pulmonology Unit at Astrid Lindgren Children's Hospital, Karolinska University Hospital, Stockholm, Sweden
| | - Gerard H Koppelman
- University of Groningen, University Medical Center Groningen, Beatrix Children's Hospital, Pediatric Pulmonology and Pediatric Allergology, and University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD, Groningen, The Netherlands
| | - Lavinia Paternoster
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
54
|
Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet 2019; 15:e1007889. [PMID: 30668570 PMCID: PMC6358100 DOI: 10.1371/journal.pgen.1007889] [Citation(s) in RCA: 188] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 02/01/2019] [Accepted: 12/12/2018] [Indexed: 11/19/2022] Open
Abstract
Integration of genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is needed to improve our understanding of the biological mechanisms underlying GWAS hits, and our ability to identify therapeutic targets. Gene-level association methods such as PrediXcan can prioritize candidate targets. However, limited eQTL sample sizes and absence of relevant developmental and disease context restrict our ability to detect associations. Here we propose an efficient statistical method (MultiXcan) that leverages the substantial sharing of eQTLs across tissues and contexts to improve our ability to identify potential target genes. MultiXcan integrates evidence across multiple panels using multivariate regression, which naturally takes into account the correlation structure. We apply our method to simulated and real traits from the UK Biobank and show that, in realistic settings, we can detect a larger set of significantly associated genes than using each panel separately. To improve applicability, we developed a summary result-based extension called S-MultiXcan, which we show yields highly concordant results with the individual level version when LD is well matched. Our multivariate model-based approach allowed us to use the individual level results as a gold standard to calibrate and develop a robust implementation of the summary-based extension. Results from our analysis as well as software and necessary resources to apply our method are publicly available. We develop a new method, MultiXcan, to test the mediating role of gene expression variation on complex traits, integrating information available across multiple tissue studies. We show this approach has higher power than traditional single-tissue methods. We extend this method to use only summary-statistics from public GWAS. We apply these methods to 222 complex traits available in the UK Biobank cohort, and 109 complex traits from public GWAS and discuss the findings.
Collapse
|
55
|
|
56
|
Abstract
Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.
Collapse
Affiliation(s)
- Marylyn D. Ritchie
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
57
|
Wu C, Pan W. Integration of Enhancer-Promoter Interactions with GWAS Summary Results Identifies Novel Schizophrenia-Associated Genes and Pathways. Genetics 2018; 209:699-709. [PMID: 29728367 PMCID: PMC6028261 DOI: 10.1534/genetics.118.300805] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 05/02/2018] [Indexed: 12/28/2022] Open
Abstract
It remains challenging to boost statistical power of genome-wide association studies (GWASs) to identify more risk variants or loci that can account for "missing heritability." Furthermore, since most identified variants are not in gene-coding regions, a biological interpretation of their function is largely lacking. On the other hand, recent biotechnological advances have made it feasible to experimentally measure the three-dimensional organization of the genome, including enhancer-promoter interactions in high resolutions. Due to the well-known critical roles of enhancer-promoter interactions in regulating gene expression programs, such data have been applied to link GWAS risk variants to their putative target genes, gaining insights into underlying biological mechanisms. However, their direct use in GWAS association testing is yet to be exploited. Here we propose integrating enhancer-promoter interactions into GWAS association analysis to both boost statistical power and enhance interpretability. We demonstrate that through an application to two large-scale schizophrenia (SCZ) GWAS summary data sets, the proposed method could identify some novel SCZ-associated genes and pathways (containing no significant SNPs). For example, after the Bonferroni correction, for the larger SCZ data set with 36,989 cases and 113,075 controls, our method applied to the gene body and enhancer regions identified 27 novel genes and 11 novel KEGG pathways to be significant, all missed by the transcriptome-wide association study (TWAS) approach. We conclude that our proposed method is potentially useful and is complementary to TWAS and other standard gene- and pathway-based methods.
Collapse
Affiliation(s)
- Chong Wu
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
58
|
Deng Y, Pan W. Improved Use of Small Reference Panels for Conditional and Joint Analysis with GWAS Summary Statistics. Genetics 2018; 209:401-408. [PMID: 29674520 PMCID: PMC5972416 DOI: 10.1534/genetics.118.300813] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 04/04/2018] [Indexed: 02/08/2023] Open
Abstract
Due to issues of practicality and confidentiality of genomic data sharing on a large scale, typically only meta- or mega-analyzed genome-wide association study (GWAS) summary data, not individual-level data, are publicly available. Reanalyses of such GWAS summary data for a wide range of applications have become more and more common and useful, which often require the use of an external reference panel with individual-level genotypic data to infer linkage disequilibrium (LD) among genetic variants. However, with a small sample size in only hundreds, as for the most popular 1000 Genomes Project European sample, estimation errors for LD are not negligible, leading to often dramatically increased numbers of false positives in subsequent analyses of GWAS summary data. To alleviate the problem in the context of association testing for a group of SNPs, we propose an alternative estimator of the covariance matrix with an idea similar to multiple imputation. We use numerical examples based on both simulated and real data to demonstrate the severe problem with the use of the 1000 Genomes Project reference panels, and the improved performance of our new approach.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
59
|
Wu C, Pan W. Integrating eQTL data with GWAS summary statistics in pathway-based analysis with application to schizophrenia. Genet Epidemiol 2018; 42:303-316. [PMID: 29411426 PMCID: PMC5851843 DOI: 10.1002/gepi.22110] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Revised: 01/04/2018] [Accepted: 01/04/2018] [Indexed: 12/11/2022]
Abstract
Many genetic variants affect complex traits through gene expression, which can be exploited to boost statistical power and enhance interpretation in genome-wide association studies (GWASs) as demonstrated by the transcriptome-wide association study (TWAS) approach. Furthermore, due to polygenic inheritance, a complex trait is often affected by multiple genes with similar functions as annotated in gene pathways. Here, we extend TWAS from gene-based analysis to pathway-based analysis: we integrate public pathway collections, expression quantitative trait locus (eQTL) data and GWAS summary association statistics (or GWAS individual-level data) to identify gene pathways associated with complex traits. The basic idea is to weight the SNPs of the genes in a pathway based on their estimated cis-effects on gene expression, then adaptively test for association of the pathway with a GWAS trait by effectively aggregating possibly weak association signals across the genes in the pathway. The P values can be calculated analytically and thus fast. We applied our proposed test with the KEGG and GO pathways to two schizophrenia (SCZ) GWAS summary association data sets, denoted by SCZ1 and SCZ2 with about 20,000 and 150,000 subjects, respectively. Most of the significant pathways identified by analyzing the SCZ1 data were reproduced by the SCZ2 data. Importantly, we identified 15 novel pathways associated with SCZ, such as GABA receptor complex (GO:1902710), which could not be uncovered by the standard single SNP-based analysis or gene-based TWAS. The newly identified pathways may help us gain insights into the biological mechanism underlying SCZ. Our results showcase the power of incorporating gene expression information and gene functional annotations into pathway-based association testing for GWAS.
Collapse
Affiliation(s)
- Chong Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
60
|
Xu Z, Wu C, Pan W. Imaging-wide association study: Integrating imaging endophenotypes in GWAS. Neuroimage 2017; 159:159-169. [PMID: 28736311 PMCID: PMC5671364 DOI: 10.1016/j.neuroimage.2017.07.036] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 06/22/2017] [Accepted: 07/18/2017] [Indexed: 10/19/2022] Open
Abstract
A new and powerful approach, called imaging-wide association study (IWAS), is proposed to integrate imaging endophenotypes with GWAS to boost statistical power and enhance biological interpretation for GWAS discoveries. IWAS extends the promising transcriptome-wide association study (TWAS) from using gene expression endophenotypes to using imaging and other endophenotypes with a much wider range of possible applications. As illustration, we use gray-matter volumes of several brain regions of interest (ROIs) drawn from the ADNI-1 structural MRI data as imaging endophenotypes, which are then applied to the individual-level GWAS data of ADNI-GO/2 and a large meta-analyzed GWAS summary statistics dataset (based on about 74,000 individuals), uncovering some novel genes significantly associated with Alzheimer's disease (AD). We also compare the performance of IWAS with TWAS, showing much larger numbers of significant AD-associated genes discovered by IWAS, presumably due to the stronger link between brain atrophy and AD than that between gene expression of normal individuals and the risk for AD. The proposed IWAS is general and can be applied to other imaging endophenotypes, and GWAS individual-level or summary association data.
Collapse
Affiliation(s)
- Zhiyuan Xu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Chong Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|