1
|
Melton HJ, Zhang Z, Deng HW, Wu L, Wu C. MIMOSA: a resource consisting of improved methylome prediction models increases power to identify DNA methylation-phenotype associations. Epigenetics 2024; 19:2370542. [PMID: 38963888 PMCID: PMC11225927 DOI: 10.1080/15592294.2024.2370542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 06/12/2024] [Indexed: 07/06/2024] Open
Abstract
Although DNA methylation (DNAm) has been implicated in the pathogenesis of numerous complex diseases, from cancer to cardiovascular disease to autoimmune disease, the exact methylation sites that play key roles in these processes remain elusive. One strategy to identify putative causal CpG sites and enhance disease etiology understanding is to conduct methylome-wide association studies (MWASs), in which predicted DNA methylation that is associated with complex diseases can be identified. However, current MWAS models are primarily trained using the data from single studies, thereby limiting the methylation prediction accuracy and the power of subsequent association studies. Here, we introduce a new resource, MWAS Imputing Methylome Obliging Summary-level mQTLs and Associated LD matrices (MIMOSA), a set of models that substantially improve the prediction accuracy of DNA methylation and subsequent MWAS power through the use of a large summary-level mQTL dataset provided by the Genetics of DNA Methylation Consortium (GoDMC). Through the analyses of GWAS (genome-wide association study) summary statistics for 28 complex traits and diseases, we demonstrate that MIMOSA considerably increases the accuracy of DNA methylation prediction in whole blood, crafts fruitful prediction models for low heritability CpG sites, and determines markedly more CpG site-phenotype associations than preceding methods. Finally, we use MIMOSA to conduct a case study on high cholesterol, pinpointing 146 putatively causal CpG sites.
Collapse
Affiliation(s)
- Hunter J. Melton
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Zichen Zhang
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Hong-Wen Deng
- Center of Bioinformatics and Genomics, Tulane University, New Orleans, LA, USA
| | - Lang Wu
- Center of Bioinformatics and Genomics, Tulane University, New Orleans, LA, USA
| | - Chong Wu
- Cancer Epidemiology Division, University of Hawaii Cancer Center, Honolulu, HI, USA
- Institute for Data Science in Oncology, The UT MD Anderson Cancer Center
| |
Collapse
|
2
|
Wang L, Khunsriraksakul C, Markus H, Chen D, Zhang F, Chen F, Zhan X, Carrel L, Liu DJ, Jiang B. Integrating single cell expression quantitative trait loci summary statistics to understand complex trait risk genes. Nat Commun 2024; 15:4260. [PMID: 38769300 DOI: 10.1038/s41467-024-48143-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 04/22/2024] [Indexed: 05/22/2024] Open
Abstract
Transcriptome-wide association study (TWAS) is a popular approach to dissect the functional consequence of disease associated non-coding variants. Most existing TWAS use bulk tissues and may not have the resolution to reveal cell-type specific target genes. Single-cell expression quantitative trait loci (sc-eQTL) datasets are emerging. The largest bulk- and sc-eQTL datasets are most conveniently available as summary statistics, but have not been broadly utilized in TWAS. Here, we present a new method EXPRESSO (EXpression PREdiction with Summary Statistics Only), to analyze sc-eQTL summary statistics, which also integrates 3D genomic data and epigenomic annotation to prioritize causal variants. EXPRESSO substantially improves existing methods. We apply EXPRESSO to analyze multi-ancestry GWAS datasets for 14 autoimmune diseases. EXPRESSO uniquely identifies 958 novel gene x trait associations, which is 26% more than the second-best method. Among them, 492 are unique to cell type level analysis and missed by TWAS using whole blood. We also develop a cell type aware drug repurposing pipeline, which leverages EXPRESSO results to identify drug compounds that can reverse disease gene expressions in relevant cell types. Our results point to multiple drugs with therapeutic potentials, including metformin for type 1 diabetes, and vitamin K for ulcerative colitis.
Collapse
Affiliation(s)
- Lida Wang
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Chachrit Khunsriraksakul
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Institute for Personalized Medicine; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Havell Markus
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Institute for Personalized Medicine; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Dieyi Chen
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Fan Zhang
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Fang Chen
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Xiaowei Zhan
- Department of Statistical Science, Southern Methodist University, Dallas, TX, US
- Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX, US
- Center for Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX, US
| | - Laura Carrel
- Department of Biochemistry and Molecular Biology; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
| | - Dajiang J Liu
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
- Department of Statistical Science, Southern Methodist University, Dallas, TX, US.
| | - Bibo Jiang
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
| |
Collapse
|
3
|
Mews MA, Naj AC, Griswold AJ, Below JE, Bush WS. Brain and Blood Transcriptome-Wide Association Studies Identify Five Novel Genes Associated with Alzheimer's Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.17.24305737. [PMID: 38699333 PMCID: PMC11065015 DOI: 10.1101/2024.04.17.24305737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
INTRODUCTION Transcriptome-wide Association Studies (TWAS) extend genome-wide association studies (GWAS) by integrating genetically-regulated gene expression models. We performed the most powerful AD-TWAS to date, using summary statistics from cis -eQTL meta-analyses and the largest clinically-adjudicated Alzheimer's Disease (AD) GWAS. METHODS We implemented the OTTERS TWAS pipeline, leveraging cis -eQTL data from cortical brain tissue (MetaBrain; N=2,683) and blood (eQTLGen; N=31,684) to predict gene expression, then applied these models to AD-GWAS data (Cases=21,982; Controls=44,944). RESULTS We identified and validated five novel gene associations in cortical brain tissue ( PRKAG1 , C3orf62 , LYSMD4 , ZNF439 , SLC11A2 ) and six genes proximal to known AD-related GWAS loci (Blood: MYBPC3 ; Brain: MTCH2 , CYB561 , MADD , PSMA5 , ANXA11 ). Further, using causal eQTL fine-mapping, we generated sparse models that retained the strength of the AD-TWAS association for MTCH2 , MADD , ZNF439 , CYB561 , and MYBPC3 . DISCUSSION Our comprehensive AD-TWAS discovered new gene associations and provided insights into the functional relevance of previously associated variants.
Collapse
|
4
|
He J, Antonyan L, Zhu H, Ardila K, Li Q, Enoma D, Zhang W, Liu A, Chekouo T, Cao B, MacDonald ME, Arnold PD, Long Q. A statistical method for image-mediated association studies discovers genes and pathways associated with four brain disorders. Am J Hum Genet 2024; 111:48-69. [PMID: 38118447 PMCID: PMC10806749 DOI: 10.1016/j.ajhg.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 11/04/2023] [Accepted: 11/16/2023] [Indexed: 12/22/2023] Open
Abstract
Brain imaging and genomics are critical tools enabling characterization of the genetic basis of brain disorders. However, imaging large cohorts is expensive and may be unavailable for legacy datasets used for genome-wide association studies (GWASs). Using an integrated feature selection/aggregation model, we developed an image-mediated association study (IMAS), which utilizes borrowed imaging/genomics data to conduct association mapping in legacy GWAS cohorts. By leveraging the UK Biobank image-derived phenotypes (IDPs), the IMAS discovered genetic bases underlying four neuropsychiatric disorders and verified them by analyzing annotations, pathways, and expression quantitative trait loci (eQTLs). A cerebellar-mediated mechanism was identified to be common to the four disorders. Simulations show that, if the goal is identifying genetic risk, our IMAS is more powerful than a hypothetical protocol in which the imaging results were available in the GWAS dataset. This implies the feasibility of reanalyzing legacy GWAS datasets without conducting additional imaging, yielding cost savings for integrated analysis of genetics and imaging.
Collapse
Affiliation(s)
- Jingni He
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Lilit Antonyan
- Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Harold Zhu
- Department of Biological Sciences, Faculty of Science, University of Calgary, Calgary, AB, Canada
| | - Karen Ardila
- Department of Biomedical Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada
| | - Qing Li
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - David Enoma
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | | | - Andy Liu
- Sir Winston Churchill High School, Calgary, AB, Canada; College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Thierry Chekouo
- Department of Mathematics and Statistics, Faculty of Science, University of Calgary, Calgary, AB, Canada; Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Bo Cao
- Department of Psychiatry, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
| | - M Ethan MacDonald
- The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Biomedical Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada; Department of Electrical and Software Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada; Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Paul D Arnold
- Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Psychiatry, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.
| | - Quan Long
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Mathematics and Statistics, Faculty of Science, University of Calgary, Calgary, AB, Canada.
| |
Collapse
|
5
|
Zhu Z, Chen X, Zhang S, Yu R, Qi C, Cheng L, Zhang X. Leveraging molecular quantitative trait loci to comprehend complex diseases/traits from the omics perspective. Hum Genet 2023; 142:1543-1560. [PMID: 37755483 DOI: 10.1007/s00439-023-02602-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 09/14/2023] [Indexed: 09/28/2023]
Abstract
Comprehending the molecular basis of quantitative genetic variation is a principal goal for complex diseases or traits. Molecular quantitative trait loci (molQTLs) have made it possible to investigate the effects of genetic variants hiding behind large-scale omics data. A deeper understanding of molQTL is urgently required in light of the multi-dimensionalization of omics data to more fully elucidate the pertinent biological mechanisms. Herein, we reviewed molQTLs with the corresponding resource from the omics perspective and further discussed the integrative strategy of GWAS-molQTL to infer their causal effects. Subsequently, we described the opportunities and challenges encountered by molQTL. The case studies showed that molQTL is essential for complex diseases and traits, whether single- or multi-omics QTLs. Overall, we highlighted the functional significance of genetic variants to employ the discovery of molQTL in complex diseases and traits.
Collapse
Affiliation(s)
- Zijun Zhu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Xinyu Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Sainan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Rui Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Changlu Qi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China.
- NHC Key Laboratory of Molecular Probe and Targeted Diagnosis and Therapy, Harbin Medical University, Harbin, 150028, Heilongjiang, China.
| | - Xue Zhang
- NHC Key Laboratory of Molecular Probe and Targeted Diagnosis and Therapy, Harbin Medical University, Harbin, 150028, Heilongjiang, China
- McKusick-Zhang Center for Genetic Medicine, State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China
| |
Collapse
|
6
|
Mai J, Lu M, Gao Q, Zeng J, Xiao J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun Biol 2023; 6:899. [PMID: 37658226 PMCID: PMC10474133 DOI: 10.1038/s42003-023-05279-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 08/24/2023] [Indexed: 09/03/2023] Open
Abstract
Genome-wide association study has identified fruitful variants impacting heritable traits. Nevertheless, identifying critical genes underlying those significant variants has been a great task. Transcriptome-wide association study (TWAS) is an instrumental post-analysis to detect significant gene-trait associations focusing on modeling transcription-level regulations, which has made numerous progresses in recent years. Leveraging from expression quantitative loci (eQTL) regulation information, TWAS has advantages in detecting functioning genes regulated by disease-associated variants, thus providing insight into mechanisms of diseases and other phenotypes. Considering its vast potential, this review article comprehensively summarizes TWAS, including the methodology, applications and available resources.
Collapse
Affiliation(s)
- Jialin Mai
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Mingming Lu
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qianwen Gao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jingyao Zeng
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
| | - Jingfa Xiao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|