1
|
Malakhov MM, Dai B, Shen XT, Pan W. A BOOTSTRAP MODEL COMPARISON TEST FOR IDENTIFYING GENES WITH CONTEXT-SPECIFIC PATTERNS OF GENETIC REGULATION. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.06.531446. [PMID: 36945657 PMCID: PMC10028853 DOI: 10.1101/2023.03.06.531446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Understanding how genetic variation affects gene expression is essential for a complete picture of the functional pathways that give rise to complex traits. Although numerous studies have established that many genes are differentially expressed in distinct human tissues and cell types, no tools exist for identifying the genes whose expression is differentially regulated. Here we introduce DRAB (Differential Regulation Analysis by Bootstrapping), a gene-based method for testing whether patterns of genetic regulation are significantly different between tissues or other biological contexts. DRAB first leverages the elastic net to learn context-specific models of local genetic regulation and then applies a novel bootstrap-based model comparison test to check their equivalency. Unlike previous model comparison tests, our proposed approach can determine whether population-level models have equal predictive performance by accounting for the variability of feature selection and model training. We validated DRAB on mRNA expression data from a variety of human tissues in the Genotype-Tissue Expression (GTEx) Project. DRAB yielded biologically reasonable results and had sufficient power to detect genes with tissue-specific regulatory profiles while effectively controlling false positives. By providing a framework that facilitates the prioritization of differentially regulated genes, our study enables future discoveries on the genetic architecture of molecular phenotypes.
Collapse
Affiliation(s)
| | - Ben Dai
- Department of Statistics, The Chinese University of Hong Kong
| | | | - Wei Pan
- Division of Biostatistics, University of Minnesota
| |
Collapse
|
2
|
Araujo DS, Nguyen C, Hu X, Mikhaylova AV, Gignoux C, Ardlie K, Taylor KD, Durda P, Liu Y, Papanicolaou G, Cho MH, Rich SS, Rotter JI, Im HK, Manichaikul A, Wheeler HE. Multivariate adaptive shrinkage improves cross-population transcriptome prediction and association studies in underrepresented populations. HGG ADVANCES 2023; 4:100216. [PMID: 37869564 PMCID: PMC10589725 DOI: 10.1016/j.xhgg.2023.100216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/27/2023] [Indexed: 10/24/2023] Open
Abstract
Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.
Collapse
Affiliation(s)
- Daniel S. Araujo
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
| | - Chris Nguyen
- Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
| | - Xiaowei Hu
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Anna V. Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristin Ardlie
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Peter Durda
- Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT 05446, USA
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, USA
| | - George Papanicolaou
- Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - NHLBI TOPMed Consortium
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
- Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO 80045, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
- Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT 05446, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, USA
- Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Heather E. Wheeler
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
- Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
| |
Collapse
|
3
|
Mai J, Lu M, Gao Q, Zeng J, Xiao J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun Biol 2023; 6:899. [PMID: 37658226 PMCID: PMC10474133 DOI: 10.1038/s42003-023-05279-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 08/24/2023] [Indexed: 09/03/2023] Open
Abstract
Genome-wide association study has identified fruitful variants impacting heritable traits. Nevertheless, identifying critical genes underlying those significant variants has been a great task. Transcriptome-wide association study (TWAS) is an instrumental post-analysis to detect significant gene-trait associations focusing on modeling transcription-level regulations, which has made numerous progresses in recent years. Leveraging from expression quantitative loci (eQTL) regulation information, TWAS has advantages in detecting functioning genes regulated by disease-associated variants, thus providing insight into mechanisms of diseases and other phenotypes. Considering its vast potential, this review article comprehensively summarizes TWAS, including the methodology, applications and available resources.
Collapse
Affiliation(s)
- Jialin Mai
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Mingming Lu
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qianwen Gao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jingyao Zeng
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
| | - Jingfa Xiao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
4
|
Ren J, Lin Z, He R, Shen X, Pan W. Using GWAS summary data to impute traits for genotyped individuals. HGG ADVANCES 2023; 4:100197. [PMID: 37181332 PMCID: PMC10173780 DOI: 10.1016/j.xhgg.2023.100197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 04/07/2023] [Indexed: 05/16/2023] Open
Abstract
Genome-wide association study (GWAS) summary data have become extremely useful in daily routine data analysis, largely facilitating new methods development and new applications. However, a severe limitation with the current use of GWAS summary data is its exclusive restriction to only linear single nucleotide polymorphism (SNP)-trait association analyses. To further expand the use of GWAS summary data, along with a large sample of individual-level genotypes, we propose a nonparametric method for large-scale imputation of the genetic component of the trait for the given genotypes. The imputed individual-level trait values, along with the individual-level genotypes, make it possible to conduct any analysis as with individual-level GWAS data, including nonlinear SNP-trait associations and predictions. We use the UK Biobank data to highlight the usefulness and effectiveness of the proposed method in three applications that currently cannot be done with only GWAS summary data (for SNP-trait associations): marginal SNP-trait association analysis under non-additive genetic models, detection of SNP-SNP interactions, and genetic prediction of a trait using a nonlinear model of SNPs.
Collapse
Affiliation(s)
- Jingchen Ren
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Zhaotong Lin
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Ruoyu He
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
- Corresponding author
| |
Collapse
|
5
|
Araujo DS, Nguyen C, Hu X, Mikhaylova AV, Gignoux C, Ardlie K, Taylor KD, Durda P, Liu Y, Papanicolaou G, Cho MH, Rich SS, Rotter JI, Im HK, Manichaikul A, Wheeler HE. Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.09.527747. [PMID: 36798214 PMCID: PMC9934635 DOI: 10.1101/2023.02.09.527747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized methods that leverage shared regulatory effects across different conditions, in this case, across different populations may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWAS) using different methods (Elastic Net, Joint-Tissue Imputation (JTI), Matrix eQTL, Multivariate Adaptive Shrinkage in R (MASHR), and Transcriptome-Integrated Genetic Association Resource (TIGAR)) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWAS, we integrated publicly available multi-ethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology Study (PAGE) and Pan-UK Biobank with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multi-ethnic TWAS, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWAS and new loci previously not found in GWAS. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWAS for multi-ethnic or underrepresented populations.
Collapse
Affiliation(s)
- Daniel S. Araujo
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, 60660, USA
| | - Chris Nguyen
- Department of Biology, Loyola University Chicago, Chicago, IL, 60660, USA
| | - Xiaowei Hu
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA
| | - Anna V. Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Kristin Ardlie
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | - Peter Durda
- Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT, 05446, USA
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - George Papanicolaou
- Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD, 20892, USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, 02115, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | | | - Hae Kyung Im
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA
| | - Heather E. Wheeler
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, 60660, USA
- Department of Biology, Loyola University Chicago, Chicago, IL, 60660, USA
| |
Collapse
|
6
|
Sathyanarayanan A, Mueller TT, Ali Moni M, Schueler K, Baune BT, Lio P, Mehta D, Baune BT, Dierssen M, Ebert B, Fabbri C, Fusar-Poli P, Gennarelli M, Harmer C, Howes OD, Janzing JGE, Lio P, Maron E, Mehta D, Minelli A, Nonell L, Pisanu C, Potier MC, Rybakowski F, Serretti A, Squassina A, Stacey D, van Westrhenen R, Xicota L. Multi-omics data integration methods and their applications in psychiatric disorders. Eur Neuropsychopharmacol 2023; 69:26-46. [PMID: 36706689 DOI: 10.1016/j.euroneuro.2023.01.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 11/22/2022] [Accepted: 01/02/2023] [Indexed: 01/27/2023]
Abstract
To study mental illness and health, in the past researchers have often broken down their complexity into individual subsystems (e.g., genomics, transcriptomics, proteomics, clinical data) and explored the components independently. Technological advancements and decreasing costs of high throughput sequencing has led to an unprecedented increase in data generation. Furthermore, over the years it has become increasingly clear that these subsystems do not act in isolation but instead interact with each other to drive mental illness and health. Consequently, individual subsystems are now analysed jointly to promote a holistic understanding of the underlying biological complexity of health and disease. Complementing the increasing data availability, current research is geared towards developing novel methods that can efficiently combine the information rich multi-omics data to discover biologically meaningful biomarkers for diagnosis, treatment, and prognosis. However, clinical translation of the research is still challenging. In this review, we summarise conventional and state-of-the-art statistical and machine learning approaches for discovery of biomarker, diagnosis, as well as outcome and treatment response prediction through integrating multi-omics and clinical data. In addition, we describe the role of biological model systems and in silico multi-omics model designs in clinical translation of psychiatric research from bench to bedside. Finally, we discuss the current challenges and explore the application of multi-omics integration in future psychiatric research. The review provides a structured overview and latest updates in the field of multi-omics in psychiatry.
Collapse
Affiliation(s)
- Anita Sathyanarayanan
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Kelvin Grove, Queensland 4059, Australia
| | - Tamara T Mueller
- Institute for Artificial Intelligence and Informatics in Medicine, TU Munich, 80333 Munich, Germany
| | - Mohammad Ali Moni
- Artificial Intelligence and Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Katja Schueler
- Clinic for Psychosomatics, Hospital zum Heiligen Geist, Frankfurt am Main, Germany; Frankfurt Psychoanalytic Institute, Frankfurt am Main, Germany
| | - Bernhard T Baune
- Department of Psychiatry and Psychotherapy, University of Münster, Germany; Department of Psychiatry, Melbourne Medical School, University of Melbourne, Australia; The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Australia
| | - Pietro Lio
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Divya Mehta
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Kelvin Grove, Queensland 4059, Australia.
| | | | - Bernhard T Baune
- Department of Psychiatry and Psychotherapy, University of Münster, Germany; Department of Psychiatry, Melbourne Medical School, University of Melbourne, Australia; The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Australia
| | - Mara Dierssen
- Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology; Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Bjarke Ebert
- Medical Strategy & Communication, H. Lundbeck A/S, Valby, Denmark
| | - Chiara Fabbri
- Department of Biomedical and NeuroMotor Sciences, University of Bologna, Bologna, Italy; Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Paolo Fusar-Poli
- Early Psychosis: Intervention and Clinical-detection (EPIC) Lab, Department of Psychosis Studies, King's College London, United Kingdom; Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Massimo Gennarelli
- Department of Molecular and Translational Medicine, University of Brescia; Genetics Unit, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | | | - Oliver D Howes
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom; Psychiatric Imaging, Medical Research Council Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London, United Kingdom
| | | | - Pietro Lio
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Eduard Maron
- Department of Psychiatry, University of Tartu, Tartu, Estonia; Centre for Neuropsychopharmacology, Division of Brain Sciences, Imperial College London, London, United Kingdom; Documental Ltd, Tallin, Estonia; West Tallinn Central Hospital, Tallinn, Estonia
| | - Divya Mehta
- Queensland University of Technology, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Faculty of Health, Kelvin Grove, Queensland 4059, Australia
| | - Alessandra Minelli
- Department of Molecular and Translational Medicine, University of Brescia; Genetics Unit, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | - Lara Nonell
- MARGenomics, IMIM (Hospital del Mar Research Institute), Barcelona, Spain
| | - Claudia Pisanu
- Department of Biomedical Sciences, Section of Neuroscience and Clinical Pharmacology, University of Cagliari, Cagliari, Italy
| | | | - Filip Rybakowski
- Department of Psychiatry, Poznan University of Medical Sciences, Poznan, Poland
| | - Alessandro Serretti
- Department of Biomedical and NeuroMotor Sciences, University of Bologna, Bologna, Italy
| | - Alessio Squassina
- Department of Biomedical Sciences, Section of Neuroscience and Clinical Pharmacology, University of Cagliari, Cagliari, Italy
| | - David Stacey
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Roos van Westrhenen
- Parnassia Psychiatric Institute, Amsterdam, the Netherlands; Department of Psychiatry and Neuropsychology, Faculty of Health and Sciences, Maastricht University, Maastricht, the Netherlands; Institute of Psychiatry, Psychology & Neuroscience (IoPPN) King's College London, United Kingdom
| | - Laura Xicota
- Paris Brain Institute ICM, Salpetriere Hospital, Paris, France
| |
Collapse
|
7
|
Fryett JJ, Morris AP, Cordell HJ. Investigating the prediction of CpG methylation levels from SNP genotype data to help elucidate relationships between methylation, gene expression and complex traits. Genet Epidemiol 2022; 46:629-643. [PMID: 35930604 PMCID: PMC9804820 DOI: 10.1002/gepi.22496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 06/27/2022] [Accepted: 07/19/2022] [Indexed: 01/09/2023]
Abstract
As popularised by PrediXcan (and related methods), transcriptome-wide association studies (TWAS), in which gene expression is imputed from single-nucleotide polymorphism (SNP) genotypes and tested for association with a phenotype, are a popular approach for investigating the role of gene expression in complex traits. Like gene expression, DNA methylation is an important biological process and, being under genetic regulation, may be imputable from SNP genotypes. Here, we investigate prediction of CpG methylation levels from SNP genotype data to help elucidate relationships between methylation, gene expression and complex traits. We start by examining how well CpG methylation can be predicted from SNP genotypes, comparing three penalised regression approaches and examining whether changing the window size improves prediction accuracy. Although methylation at most CpG sites cannot be accurately predicted from SNP genotypes, for a subset it can be predicted well. We next apply our methylation prediction models (trained using the optimal method and window size) to carry out a methylome-wide association study (MWAS) of primary biliary cholangitis. We intersect the regions identified via MWAS with those identified via TWAS, providing insight into the interplay between CpG methylation, gene expression and disease status. We conclude that MWAS has the potential to improve understanding of biological mechanisms in complex traits.
Collapse
Affiliation(s)
- James J. Fryett
- Population Health Sciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Andrew P. Morris
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal ResearchUniversity of ManchesterManchesterUK
| | - Heather J. Cordell
- Population Health Sciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| |
Collapse
|
8
|
Highland HM, Wojcik GL, Graff M, Nishimura KK, Hodonsky CJ, Baldassari AR, Cote AC, Cheng I, Gignoux CR, Tao R, Li Y, Boerwinkle E, Fornage M, Haessler J, Hindorff LA, Hu Y, Justice AE, Lin BM, Lin D, Stram DO, Haiman CA, Kooperberg C, Le Marchand L, Matise TC, Kenny EE, Carlson CS, Stahl EA, Avery CL, North KE, Ambite JL, Buyske S, Loos RJ, Peters U, Young KL, Bien SA, Huckins LM. Predicted gene expression in ancestrally diverse populations leads to discovery of susceptibility loci for lifestyle and cardiometabolic traits. Am J Hum Genet 2022; 109:669-679. [PMID: 35263625 PMCID: PMC9069067 DOI: 10.1016/j.ajhg.2022.02.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 02/15/2022] [Indexed: 02/06/2023] Open
Abstract
One mechanism by which genetic factors influence complex traits and diseases is altering gene expression. Direct measurement of gene expression in relevant tissues is rarely tenable; however, genetically regulated gene expression (GReX) can be estimated using prediction models derived from large multi-omic datasets. These approaches have led to the discovery of many gene-trait associations, but whether models derived from predominantly European ancestry (EA) reference panels can map novel associations in ancestrally diverse populations remains unclear. We applied PrediXcan to impute GReX in 51,520 ancestrally diverse Population Architecture using Genomics and Epidemiology (PAGE) participants (35% African American, 45% Hispanic/Latino, 10% Asian, and 7% Hawaiian) across 25 key cardiometabolic traits and relevant tissues to identify 102 novel associations. We then compared associations in PAGE to those in a random subset of 50,000 White British participants from UK Biobank (UKBB50k) for height and body mass index (BMI). We identified 517 associations across 47 tissues in PAGE but not UKBB50k, demonstrating the importance of diverse samples in identifying trait-associated GReX. We observed that variants used in PrediXcan models were either more or less differentiated across continental-level populations than matched-control variants depending on the specific population reflecting sampling bias. Additionally, variants from identified genes specific to either PAGE or UKBB50k analyses were more ancestrally differentiated than those in genes detected in both analyses, underlining the value of population-specific discoveries. This suggests that while EA-derived transcriptome imputation models can identify new associations in non-EA populations, models derived from closely matched reference panels may yield further insights. Our findings call for more diversity in reference datasets of tissue-specific gene expression.
Collapse
Affiliation(s)
- Heather M Highland
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA.
| | - Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Mariaelisa Graff
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Katherine K Nishimura
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Chani J Hodonsky
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA; Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Antoine R Baldassari
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Alanna C Cote
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Iona Cheng
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Yuqing Li
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Eric Boerwinkle
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, TX 77030, USA
| | - Myriam Fornage
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, TX 77030, USA; Brown Foundation Institute for Molecular Medicine, McGovern Medical School, The University of Texas Health Science Center, Houston, TX 77030, USA
| | - Jeffrey Haessler
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Lucia A Hindorff
- Division of Genomic Medicine, NIH National Human Genome Research Institute, Bethesda, MD 20892, USA
| | - Yao Hu
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Anne E Justice
- Department of Population Health Sciences, Geisinger Health System, Danville, PA 17822, USA
| | - Bridget M Lin
- Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Danyu Lin
- Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Daniel O Stram
- Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Christopher A Haiman
- Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Charles Kooperberg
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; School of Public Health, University of Washington, Seattle, WA 98195, USA
| | | | - Tara C Matise
- Genetics, Rutgers University, New Brunswick, NJ 08901-8554, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Christopher S Carlson
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Eli A Stahl
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Christy L Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Jose Luis Ambite
- Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA
| | - Steven Buyske
- Statistics, Rutgers University, New Brunswick, NJ 08901-8554, USA
| | - Ruth J Loos
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ulrike Peters
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; School of Public Health, University of Washington, Seattle, WA 98195, USA
| | - Kristin L Young
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Stephanie A Bien
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Laura M Huckins
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Mental Illness Research, Education and Clinical Centers, James J. Peters Department of Veterans Affairs Medical Center, Bronx, NY 14068, USA.
| |
Collapse
|
9
|
Schubert R, Geoffroy E, Gregga I, Mulford AJ, Aguet F, Ardlie K, Gerszten R, Clish C, Van Den Berg D, Taylor KD, Durda P, Johnson WC, Cornell E, Guo X, Liu Y, Tracy R, Conomos M, Blackwell T, Papanicolaou G, Lappalainen T, Mikhaylova AV, Thornton TA, Cho MH, Gignoux CR, Lange L, Lange E, Rich SS, Rotter JI, Manichaikul A, Im HK, Wheeler HE. Protein prediction for trait mapping in diverse populations. PLoS One 2022; 17:e0264341. [PMID: 35202437 PMCID: PMC8870552 DOI: 10.1371/journal.pone.0264341] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 02/08/2022] [Indexed: 11/18/2022] Open
Abstract
Genetically regulated gene expression has helped elucidate the biological mechanisms underlying complex traits. Improved high-throughput technology allows similar interrogation of the genetically regulated proteome for understanding complex trait mechanisms. Here, we used the Trans-omics for Precision Medicine (TOPMed) Multi-omics pilot study, which comprises data from Multi-Ethnic Study of Atherosclerosis (MESA), to optimize genetic predictors of the plasma proteome for genetically regulated proteome-wide association studies (PWAS) in diverse populations. We built predictive models for protein abundances using data collected in TOPMed MESA, for which we have measured 1,305 proteins by a SOMAscan assay. We compared predictive models built via elastic net regression to models integrating posterior inclusion probabilities estimated by fine-mapping SNPs prior to elastic net. In order to investigate the transferability of predictive models across ancestries, we built protein prediction models in all four of the TOPMed MESA populations, African American (n = 183), Chinese (n = 71), European (n = 416), and Hispanic/Latino (n = 301), as well as in all populations combined. As expected, fine-mapping produced more significant protein prediction models, especially in African ancestries populations, potentially increasing opportunity for discovery. When we tested our TOPMed MESA models in the independent European INTERVAL study, fine-mapping improved cross-ancestries prediction for some proteins. Using GWAS summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises ∼50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we applied S-PrediXcan to perform PWAS for 28 complex traits. The most protein-trait associations were discovered, colocalized, and replicated in large independent GWAS using proteome prediction model training populations with similar ancestries to PAGE. At current training population sample sizes, performance between baseline and fine-mapped protein prediction models in PWAS was similar, highlighting the utility of elastic net. Our predictive models in diverse populations are publicly available for use in proteome mapping methods at https://doi.org/10.5281/zenodo.4837327.
Collapse
Affiliation(s)
- Ryan Schubert
- Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, United States of America
- Department of Biology, Loyola University Chicago, Chicago, IL, United States of America
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
| | - Elyse Geoffroy
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
| | - Isabelle Gregga
- Department of Biology, Loyola University Chicago, Chicago, IL, United States of America
| | - Ashley J. Mulford
- Department of Biology, Loyola University Chicago, Chicago, IL, United States of America
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
| | - Francois Aguet
- Broad Institute, Cambridge, MA, United States of America
| | - Kristin Ardlie
- Broad Institute, Cambridge, MA, United States of America
| | - Robert Gerszten
- Beth Israel Deaconess Medical Center, Boston, MA, United States of America
| | - Clary Clish
- Broad Institute, Cambridge, MA, United States of America
| | - David Van Den Berg
- University of Southern California, Los Angeles, CA, United States of America
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, United States of America
| | - Peter Durda
- Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, VT, United States of America
| | - W. Craig Johnson
- Collaborative Health Studies Coordinating Center, University of Washington, Seattle, WA, United States of America
| | - Elaine Cornell
- Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, VT, United States of America
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, United States of America
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC, United States of America
| | - Russell Tracy
- Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, VT, United States of America
| | - Matthew Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, United States of America
| | - Tom Blackwell
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
| | - George Papanicolaou
- Epidemiology Branch, National Heart, Lung and Blood Institute, Bethesda, MD, United States of America
| | - Tuuli Lappalainen
- New York Genome Center and Department of Systems Biology, Columbia University, New York, NY United States of America
| | - Anna V. Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA, United States of America
| | - Timothy A. Thornton
- Department of Biostatistics, University of Washington, Seattle, WA, United States of America
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States of America
| | - Christopher R. Gignoux
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Leslie Lange
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Ethan Lange
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States of America
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, United States of America
| | | | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States of America
| | - Hae Kyung Im
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, United States of America
| | - Heather E. Wheeler
- Department of Biology, Loyola University Chicago, Chicago, IL, United States of America
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, United States of America
- * E-mail:
| |
Collapse
|
10
|
Wang F, Panjwani N, Wang C, Sun L, Strug LJ. A flexible summary statistics-based colocalization method with application to the mucin cystic fibrosis lung disease modifier locus. Am J Hum Genet 2022; 109:253-269. [PMID: 35065708 PMCID: PMC8874229 DOI: 10.1016/j.ajhg.2021.12.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 12/15/2021] [Indexed: 12/18/2022] Open
Abstract
Mucus obstruction is a central feature in the cystic fibrosis (CF) airways. A genome-wide association study (GWAS) of lung disease by the CF Gene Modifier Consortium (CFGMC) identified a significant locus containing two mucin genes, MUC20 and MUC4. Expression quantitative trait locus (eQTL) analysis using human nasal epithelia (HNE) from 94 CF-affected Canadians in the CFGMC demonstrated MUC4 eQTLs that mirrored the lung association pattern in the region, suggesting that MUC4 expression may mediate CF lung disease. Complications arose, however, with colocalization testing using existing methods: the locus is complex and the associated SNPs span a 0.2 Mb region with high linkage disequilibrium (LD) and evidence of allelic heterogeneity. We previously developed the Simple Sum (SS), a powerful colocalization test in regions with allelic heterogeneity, but SS assumed eQTLs to be present to achieve type I error control. Here we propose a two-stage SS (SS2) colocalization test that avoids a priori eQTL assumptions, accounts for multiple hypothesis testing and the composite null hypothesis, and enables meta-analysis. We compare SS2 to published approaches through simulation and demonstrate type I error control for all settings with the greatest power in the presence of high LD and allelic heterogeneity. Applying SS2 to the MUC20/MUC4 CF lung disease locus with eQTLs from CF HNE revealed significant colocalization with MUC4 (p = 1.31 × 10−5) rather than with MUC20. The SS2 is a powerful method to inform the responsible gene(s) at a locus and guide future functional studies. SS2 has been implemented in the application LocusFocus.
Collapse
Affiliation(s)
- Fan Wang
- Department of Statistical Sciences, University of Toronto, Toronto, ON M5G 1Z5, Canada; Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Naim Panjwani
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Cheng Wang
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Lei Sun
- Department of Statistical Sciences, University of Toronto, Toronto, ON M5G 1Z5, Canada; Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada.
| | - Lisa J Strug
- Department of Statistical Sciences, University of Toronto, Toronto, ON M5G 1Z5, Canada; Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada; The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada.
| |
Collapse
|
11
|
Cross-tissue transcriptome-wide association studies identify susceptibility genes shared between schizophrenia and inflammatory bowel disease. Commun Biol 2022; 5:80. [PMID: 35058554 PMCID: PMC8776955 DOI: 10.1038/s42003-022-03031-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 12/23/2021] [Indexed: 12/11/2022] Open
Abstract
Genetic correlations and an increased incidence of psychiatric disorders in inflammatory-bowel disease have been reported, but shared molecular mechanisms are unknown. We performed cross-tissue and multiple-gene conditioned transcriptome-wide association studies for 23 tissues of the gut-brain-axis using genome-wide association studies data sets (total 180,592 patients) for Crohn’s disease, ulcerative colitis, primary sclerosing cholangitis, schizophrenia, bipolar disorder, major depressive disorder and attention-deficit/hyperactivity disorder. We identified NR5A2, SATB2, and PPP3CA (encoding a target for calcineurin inhibitors in refractory ulcerative colitis) as shared susceptibility genes with transcriptome-wide significance both for Crohn’s disease, ulcerative colitis and schizophrenia, largely explaining fine-mapped association signals at nearby genome-wide association study susceptibility loci. Analysis of bulk and single-cell RNA-sequencing data showed that PPP3CA expression was strongest in neurons and in enteroendocrine and Paneth-like cells of the ileum, colon, and rectum, indicating a possible link to the gut-brain-axis. PPP3CA together with three further suggestive loci can be linked to calcineurin-related signaling pathways such as NFAT activation or Wnt. Florian Uellendahl-Werth et al. conduct cross-tissue transcriptome-wide association studies to explore genetic mechanisms shared across immune-related and psychiatric traits. Their results identify several genes (including PPP3CA) that could mediate the interplay between psychiatric and inflammatory disease.
Collapse
|
12
|
Bae YE, Wu L, Wu C. InTACT: An adaptive and powerful framework for joint-tissue transcriptome-wide association studies. Genet Epidemiol 2021; 45:848-859. [PMID: 34255882 PMCID: PMC8604767 DOI: 10.1002/gepi.22425] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/22/2021] [Accepted: 06/24/2021] [Indexed: 11/05/2022]
Abstract
Transcriptome-wide association studies (TWAS) that integrate transcriptomic reference data and genome-wide association studies (GWAS) have successfully enhanced the discovery of candidate genes for many complex traits. However, existing methods may suffer from substantial power loss because they fail to effectively consider that expression of many genes tends to be consistent across tissues. Here we propose a computationally efficient testing method, referred to as Integrative Test for Associations via Cauchy Transformation (InTACT), that effectively combines information across multiple tissues and thus improves the power of identifying associated genes. Through simulation studies, we show that InTACT maintains high power while properly controls for Type 1 error rates. We applied InTACT to the largest GWAS of Alzheimer's disease (AD) to date and identified 227 genome-wide significant genes, of which 130 were not identified by benchmark methods, TWAS and MultiXcan. Importantly, InTACT identified five novel loci for AD. We implemented InTACT in publicly available software, "InTACT."
Collapse
Affiliation(s)
- Ye Eun Bae
- Department of Statistics, Florida State University
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa
| | - Chong Wu
- Department of Statistics, Florida State University
| |
Collapse
|
13
|
Okoro PC, Schubert R, Guo X, Johnson WC, Rotter JI, Hoeschele I, Liu Y, Im HK, Luke A, Dugas LR, Wheeler HE. Transcriptome prediction performance across machine learning models and diverse ancestries. HGG ADVANCES 2021; 2:100019. [PMID: 33937878 PMCID: PMC8087249 DOI: 10.1016/j.xhgg.2020.100019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.
Collapse
Affiliation(s)
- Paul C Okoro
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA
| | - Ryan Schubert
- Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, USA
| | - Xiuqing Guo
- Institute for Translational Genomics and Population Sciences, The Lundquist Institute and Department of Pediatrics at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - W Craig Johnson
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, The Lundquist Institute and Department of Pediatrics at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Ina Hoeschele
- Fralin Life Sciences Institute, Virginia Tech, Blacksburg, VA, USA.,Department of Statistics, Virginia Tech, Blacksburg, VA, USA.,Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Amy Luke
- Department of Public Health Sciences, Parkinson School of Health Sciences and Public Health, Loyola University Chicago, Maywood, IL, USA
| | - Lara R Dugas
- Department of Public Health Sciences, Parkinson School of Health Sciences and Public Health, Loyola University Chicago, Maywood, IL, USA.,Department of Human Biology, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Heather E Wheeler
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA.,Department of Biology, Loyola University Chicago, Chicago, IL, USA.,Department of Computer Science, Loyola University Chicago, Chicago, IL, USA
| |
Collapse
|
14
|
Grinberg NF, Wallace C. Multi-tissue transcriptome-wide association studies. Genet Epidemiol 2021; 45:324-337. [PMID: 33369784 PMCID: PMC8048510 DOI: 10.1002/gepi.22374] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 11/04/2020] [Accepted: 11/18/2020] [Indexed: 12/20/2022]
Abstract
A transcriptome-wide association study (TWAS) attempts to identify disease associated genes by imputing gene expression into a genome-wide association study (GWAS) using an expression quantitative trait loci (eQTL) data set and then testing for associations with a trait of interest. Regulatory processes may be shared across related tissues and one natural extension of TWAS is harnessing cross-tissue correlation in gene expression to improve prediction accuracy. Here, we studied multi-tissue extensions of lasso regression and random forests (RF), joint lasso and RF-MTL (multi-task learning RF), respectively. We found that, on our chosen eQTL data set, multi-tissue methods were generally more accurate than their single-tissue counterparts, with RF-MTL performing the best. Simulations showed that these benefits generally translated into more associated genes identified, although highlighted that joint lasso had a tendency to erroneously identify genes in one tissue if there existed an eQTL signal for that gene in another. Applying the four methods to a type 1 diabetes GWAS, we found that multi-tissue methods found more unique associated genes for most of the tissues considered. We conclude that multi-tissue methods are competitive and, for some cell types, superior to single-tissue approaches and hold much promise for TWAS studies.
Collapse
Affiliation(s)
- Nastasiya F. Grinberg
- Department of Medicine, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, Cambridge Institute of Therapeutic Immunology and Infectious DiseaseUniversity of CambridgeCambridgeUK
| | - Chris Wallace
- Department of Medicine, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, Cambridge Institute of Therapeutic Immunology and Infectious DiseaseUniversity of CambridgeCambridgeUK
- MRC Biostatistics UnitUniversity of CambridgeCambridgeUK
| |
Collapse
|
15
|
Song M, Greenbaum J, Luttrell J, Zhou W, Wu C, Shen H, Gong P, Zhang C, Deng HW. A Review of Integrative Imputation for Multi-Omics Datasets. Front Genet 2020; 11:570255. [PMID: 33193667 PMCID: PMC7594632 DOI: 10.3389/fgene.2020.570255] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 09/16/2020] [Indexed: 01/05/2023] Open
Abstract
Multi-omics studies, which explore the interactions between multiple types of biological factors, have significant advantages over single-omics analysis for their ability to provide a more holistic view of biological processes, uncover the causal and functional mechanisms for complex diseases, and facilitate new discoveries in precision medicine. However, omics datasets often contain missing values, and in multi-omics study designs it is common for individuals to be represented for some omics layers but not all. Since most statistical analyses cannot be applied directly to the incomplete datasets, imputation is typically performed to infer the missing values. Integrative imputation techniques which make use of the correlations and shared information among multi-omics datasets are expected to outperform approaches that rely on single-omics information alone, resulting in more accurate results for the subsequent downstream analyses. In this review, we provide an overview of the currently available imputation methods for handling missing values in bioinformatics data with an emphasis on multi-omics imputation. In addition, we also provide a perspective on how deep learning methods might be developed for the integrative imputation of multi-omics datasets.
Collapse
Affiliation(s)
- Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Jonathan Greenbaum
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Weihua Zhou
- College of Computing, Michigan Technological University, Houghton, MI, United States
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Hui Shen
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| |
Collapse
|
16
|
Keys KL, Mak ACY, White MJ, Eckalbar WL, Dahl AW, Mefford J, Mikhaylova AV, Contreras MG, Elhawary JR, Eng C, Hu D, Huntsman S, Oh SS, Salazar S, Lenoir MA, Ye JC, Thornton TA, Zaitlen N, Burchard EG, Gignoux CR. On the cross-population generalizability of gene expression prediction models. PLoS Genet 2020; 16:e1008927. [PMID: 32797036 PMCID: PMC7449671 DOI: 10.1371/journal.pgen.1008927] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 08/26/2020] [Accepted: 06/10/2020] [Indexed: 11/21/2022] Open
Abstract
The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.
Collapse
Affiliation(s)
- Kevin L. Keys
- Department of Medicine, University of California, San Francisco, California, United States of America
- Berkeley Institute for Data Science, University of California, Berkeley, California, United States of America
| | - Angel C. Y. Mak
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Marquitta J. White
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Walter L. Eckalbar
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Andrew W. Dahl
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Joel Mefford
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Anna V. Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - María G. Contreras
- Department of Medicine, University of California, San Francisco, California, United States of America
- San Francisco State University, San Francisco, California, United States of America
| | - Jennifer R. Elhawary
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Celeste Eng
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Donglei Hu
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Scott Huntsman
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Sam S. Oh
- Department of Medicine, University of California, San Francisco, California, United States of America
| | - Sandra Salazar
- Department of Medicine, University of California, San Francisco, California, United States of America
| | | | - Jimmie C. Ye
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Biosciences, University of California, San Francisco, California, United States of America
| | - Timothy A. Thornton
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Noah Zaitlen
- Department of Neurology, University of California, Los Angeles, California, United States of America
| | - Esteban G. Burchard
- Department of Medicine, University of California, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Biosciences, University of California, San Francisco, California, United States of America
| | - Christopher R. Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| |
Collapse
|