1
|
Shen S, Sobczyk MK, Paternoster L, Brown SJ. From GWASs toward Mechanistic Understanding with Case Studies in Dermatogenetics. J Invest Dermatol 2024; 144:1189-1199.e8. [PMID: 38782533 DOI: 10.1016/j.jid.2024.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 02/13/2024] [Accepted: 03/06/2024] [Indexed: 05/25/2024]
Abstract
Many human skin diseases result from the complex interplay of genetic and environmental mechanisms that are largely unknown. GWASs have yielded insight into the genetic aspect of complex disease by highlighting regions of the genome or specific genetic variants associated with disease. Leveraging this information to identify causal genes and cell types will provide insight into fundamental biology, inform diagnostics, and aid drug discovery. However, the etiological mechanisms from genetic variant to disease are still unestablished in most cases. There now exists an unprecedented wealth of data and computational methods for variant interpretation in a functional context. It can be challenging to decide where to start owing to a lack of consensus on the best way to identify causal genetic mechanisms. This article highlights 3 key aspects of genetic variant interpretation: prioritizing causal genes, cell types, and pathways. We provide a practical overview of the main methods and datasets, giving examples from recent atopic dermatitis studies to provide a blueprint for variant interpretation. A collection of resources, including brief description and links to the packages and web tools, is provided for researchers looking to start in silico follow-up genetic analysis of associated genetic variants.
Collapse
Affiliation(s)
- Silvia Shen
- Centre for Genomic & Experimental Medicine, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, United Kingdom; Institute for Evolution and Ecology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom.
| | - Maria K Sobczyk
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Lavinia Paternoster
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Sara J Brown
- Centre for Genomic & Experimental Medicine, Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, United Kingdom; Department of Dermatology, NHS Lothian, Edinburgh, United Kingdom
| |
Collapse
|
2
|
Yang Y, Wang Q, Wang C, Buxbaum J, Ionita-Laza I. KnockoffHybrid: A knockoff framework for hybrid analysis of trio and population designs in genome-wide association studies. Am J Hum Genet 2024:S0002-9297(24)00166-6. [PMID: 38821058 DOI: 10.1016/j.ajhg.2024.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 05/02/2024] [Accepted: 05/06/2024] [Indexed: 06/02/2024] Open
Abstract
Both trio and population designs are popular study designs for identifying risk genetic variants in genome-wide association studies (GWASs). The trio design, as a family-based design, is robust to confounding due to population structure, whereas the population design is often more powerful due to larger sample sizes. Here, we propose KnockoffHybrid, a knockoff-based statistical method for hybrid analysis of both the trio and population designs. KnockoffHybrid provides a unified framework that brings together the advantages of both designs and produces powerful hybrid analysis while controlling the false discovery rate (FDR) in the presence of linkage disequilibrium and population structure. Furthermore, KnockoffHybrid has the flexibility to leverage different types of summary statistics for hybrid analyses, including expression quantitative trait loci (eQTL) and GWAS summary statistics. We demonstrate in simulations that KnockoffHybrid offers power gains over non-hybrid methods for the trio and population designs with the same number of cases while controlling the FDR with complex correlation among variants and population structure among subjects. In hybrid analyses of three trio cohorts for autism spectrum disorders (ASDs) from the Autism Speaks MSSNG, Autism Sequencing Consortium, and Autism Genome Project with GWAS summary statistics from the iPSYCH project and eQTL summary statistics from the MetaBrain project, KnockoffHybrid outperforms conventional methods by replicating several known risk genes for ASDs and identifying additional associations with variants in other genes, including the PRAME family genes involved in axon guidance and which may act as common targets for human speech/language evolution and related disorders.
Collapse
Affiliation(s)
- Yi Yang
- Department of Biostatistics, City University of Hong Kong, Hong Kong SAR, China; School of Data Science, City University of Hong Kong, Hong Kong SAR, China.
| | - Qi Wang
- School of Data Science, City University of Hong Kong, Hong Kong SAR, China
| | - Chen Wang
- Department of Biostatistics, Columbia University, New York, NY 10032, USA
| | - Joseph Buxbaum
- Departments of Psychiatry, Neuroscience, and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Columbia University, New York, NY 10032, USA; Department of Statistics, Lund University, Lund, Sweden
| |
Collapse
|
3
|
Wang L, Khunsriraksakul C, Markus H, Chen D, Zhang F, Chen F, Zhan X, Carrel L, Liu DJ, Jiang B. Integrating single cell expression quantitative trait loci summary statistics to understand complex trait risk genes. Nat Commun 2024; 15:4260. [PMID: 38769300 DOI: 10.1038/s41467-024-48143-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 04/22/2024] [Indexed: 05/22/2024] Open
Abstract
Transcriptome-wide association study (TWAS) is a popular approach to dissect the functional consequence of disease associated non-coding variants. Most existing TWAS use bulk tissues and may not have the resolution to reveal cell-type specific target genes. Single-cell expression quantitative trait loci (sc-eQTL) datasets are emerging. The largest bulk- and sc-eQTL datasets are most conveniently available as summary statistics, but have not been broadly utilized in TWAS. Here, we present a new method EXPRESSO (EXpression PREdiction with Summary Statistics Only), to analyze sc-eQTL summary statistics, which also integrates 3D genomic data and epigenomic annotation to prioritize causal variants. EXPRESSO substantially improves existing methods. We apply EXPRESSO to analyze multi-ancestry GWAS datasets for 14 autoimmune diseases. EXPRESSO uniquely identifies 958 novel gene x trait associations, which is 26% more than the second-best method. Among them, 492 are unique to cell type level analysis and missed by TWAS using whole blood. We also develop a cell type aware drug repurposing pipeline, which leverages EXPRESSO results to identify drug compounds that can reverse disease gene expressions in relevant cell types. Our results point to multiple drugs with therapeutic potentials, including metformin for type 1 diabetes, and vitamin K for ulcerative colitis.
Collapse
Affiliation(s)
- Lida Wang
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Chachrit Khunsriraksakul
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Institute for Personalized Medicine; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Havell Markus
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Institute for Personalized Medicine; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Dieyi Chen
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Fan Zhang
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Fang Chen
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Xiaowei Zhan
- Department of Statistical Science, Southern Methodist University, Dallas, TX, US
- Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX, US
- Center for Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX, US
| | - Laura Carrel
- Department of Biochemistry and Molecular Biology; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
| | - Dajiang J Liu
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
- Department of Statistical Science, Southern Methodist University, Dallas, TX, US.
| | - Bibo Jiang
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
| |
Collapse
|
4
|
Qi T, Song L, Guo Y, Chen C, Yang J. From genetic associations to genes: methods, applications, and challenges. Trends Genet 2024:S0168-9525(24)00095-7. [PMID: 38734482 DOI: 10.1016/j.tig.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 05/13/2024]
Abstract
Genome-wide association studies (GWASs) have identified numerous genetic loci associated with human traits and diseases. However, pinpointing the causal genes remains a challenge, which impedes the translation of GWAS findings into biological insights and medical applications. In this review, we provide an in-depth overview of the methods and technologies used for prioritizing genes from GWAS loci, including gene-based association tests, integrative analysis of GWAS and molecular quantitative trait loci (xQTL) data, linking GWAS variants to target genes through enhancer-gene connection maps, and network-based prioritization. We also outline strategies for generating context-dependent xQTL data and their applications in gene prioritization. We further highlight the potential of gene prioritization in drug repurposing. Lastly, we discuss future challenges and opportunities in this field.
Collapse
Affiliation(s)
- Ting Qi
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, China; School of Life Sciences, Westlake University, Hangzhou 310024, China.
| | - Liyang Song
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, China; School of Life Sciences, Westlake University, Hangzhou 310024, China
| | - Yazhou Guo
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, China; School of Life Sciences, Westlake University, Hangzhou 310024, China
| | - Chang Chen
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, China; School of Life Sciences, Westlake University, Hangzhou 310024, China
| | - Jian Yang
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, China; School of Life Sciences, Westlake University, Hangzhou 310024, China.
| |
Collapse
|
5
|
Mews MA, Naj AC, Griswold AJ, Below JE, Bush WS. Brain and Blood Transcriptome-Wide Association Studies Identify Five Novel Genes Associated with Alzheimer's Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.17.24305737. [PMID: 38699333 PMCID: PMC11065015 DOI: 10.1101/2024.04.17.24305737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
INTRODUCTION Transcriptome-wide Association Studies (TWAS) extend genome-wide association studies (GWAS) by integrating genetically-regulated gene expression models. We performed the most powerful AD-TWAS to date, using summary statistics from cis -eQTL meta-analyses and the largest clinically-adjudicated Alzheimer's Disease (AD) GWAS. METHODS We implemented the OTTERS TWAS pipeline, leveraging cis -eQTL data from cortical brain tissue (MetaBrain; N=2,683) and blood (eQTLGen; N=31,684) to predict gene expression, then applied these models to AD-GWAS data (Cases=21,982; Controls=44,944). RESULTS We identified and validated five novel gene associations in cortical brain tissue ( PRKAG1 , C3orf62 , LYSMD4 , ZNF439 , SLC11A2 ) and six genes proximal to known AD-related GWAS loci (Blood: MYBPC3 ; Brain: MTCH2 , CYB561 , MADD , PSMA5 , ANXA11 ). Further, using causal eQTL fine-mapping, we generated sparse models that retained the strength of the AD-TWAS association for MTCH2 , MADD , ZNF439 , CYB561 , and MYBPC3 . DISCUSSION Our comprehensive AD-TWAS discovered new gene associations and provided insights into the functional relevance of previously associated variants.
Collapse
|
6
|
Hu T, Dai Q, Epstein MP, Yang J. Proteome-wide association studies using summary proteomic data identified 23 risk genes of Alzheimer's disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.28.24305044. [PMID: 38585769 PMCID: PMC10996749 DOI: 10.1101/2024.03.28.24305044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Characterizing the genetic mechanisms underlying Alzheimer's disease (AD) dementia is crucial for developing new therapeutics. Proteome-wide association study (PWAS) integrating proteomics data with genome-wide association study (GWAS) summary data was shown as a powerful tool for detecting risk genes. The identified PWAS risk genes can be interpretated as having genetic effects mediated through the genetically regulated protein abundances. Existing PWAS analyses of AD often rely on the availability of individual-level proteomics and genetics data of a reference cohort. Leveraging summary-level protein quantitative trait loci (pQTL) reference data of multiple relevant tissues is expected to improve PWAS findings for studying AD. Here, we applied our recently developed OTTERS tool to conduct PWAS of AD dementia, by leveraging summary-level pQTL data of brain, cerebrospinal fluid (CSF), and plasma tissues, and multiple statistical methods. For each target protein, imputation models of the protein abundance with genetic predictors were trained from summary-level pQTL data, estimating a set of pQTL weights for considered genetic predictors. PWAS p-values were obtained by integrating GWAS summary data of AD dementia with estimated pQTL weights. PWAS p-values from multiple statistical methods were combined by the aggregated Cauchy association test to yield one omnibus PWAS p-value for the target protein. We identified significant PWAS risk genes through omnibus PWAS p-values and analyzed their protein-protein interactions using STRING. Their potential causal effects were assessed by the probabilistic Mendelian randomization (PMR-Egger). As a result, we identified a total of 23 significant PWAS risk genes for AD dementia in brain, CSF, and plasma tissues, including 7 novel findings. We showed that 15 of these risk genes were interconnected within a protein-protein interaction network involving the well-known AD risk gene of APOE and 5 novel findings, and enriched in immune functions and lipids pathways including positive regulation of immune system process, positive regulation of macrophage proliferation, humoral immune response, and high-density lipoprotein particle clearance. Existing biological evidence was found to relate our novel findings with AD. We validated the mediated causal effects of 14 risk genes (60.8%). In conclusion, we identified both known and novel PWAS risk genes, providing novel insights into the genetic mechanisms in brain, CSF, and plasma tissues, and targeted therapeutics development of AD dementia. Our study also demonstrated the effectiveness of integrating public available summary-level pQTL data with GWAS summary data for mapping risk genes of complex human diseases.
Collapse
Affiliation(s)
- Tingyang Hu
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Qile Dai
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA, 30322, USA
| | - Michael P. Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| |
Collapse
|
7
|
Ball RL, Bogue MA, Liang H, Srivastava A, Ashbrook DG, Lamoureux A, Gerring MW, Hatoum AS, Kim MJ, He H, Emerson J, Berger AK, Walton DO, Sheppard K, El Kassaby B, Castellanos F, Kunde-Ramamoorthy G, Lu L, Bluis J, Desai S, Sundberg BA, Peltz G, Fang Z, Churchill GA, Williams RW, Agrawal A, Bult CJ, Philip VM, Chesler EJ. GenomeMUSter mouse genetic variation service enables multitrait, multipopulation data integration and analysis. Genome Res 2024; 34:145-159. [PMID: 38290977 PMCID: PMC10903950 DOI: 10.1101/gr.278157.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 01/10/2024] [Indexed: 02/01/2024]
Abstract
Hundreds of inbred mouse strains and intercross populations have been used to characterize the function of genetic variants that contribute to disease. Thousands of disease-relevant traits have been characterized in mice and made publicly available. New strains and populations including consomics, the collaborative cross, expanded BXD, and inbred wild-derived strains add to existing complex disease mouse models, mapping populations, and sensitized backgrounds for engineered mutations. The genome sequences of inbred strains, along with dense genotypes from others, enable integrated analysis of trait-variant associations across populations, but these analyses are hampered by the sparsity of genotypes available. Moreover, the data are not readily interoperable with other resources. To address these limitations, we created a uniformly dense variant resource by harmonizing multiple data sets. Missing genotypes were imputed using the Viterbi algorithm with a data-driven technique that incorporates local phylogenetic information, an approach that is extendable to other model organisms. The result is a web- and programmatically accessible data service called GenomeMUSter, comprising single-nucleotide variants covering 657 strains at 106.8 million segregating sites. Interoperation with phenotype databases, analytic tools, and other resources enable a wealth of applications, including multitrait, multipopulation meta-analysis. We show this in cross-species comparisons of type 2 diabetes and substance use disorder meta-analyses, leveraging mouse data to characterize the likely role of human variant effects in disease. Other applications include refinement of mapped loci and prioritization of strain backgrounds for disease modeling to further unlock extant mouse diversity for genetic and genomic studies in health and disease.
Collapse
Affiliation(s)
- Robyn L Ball
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA;
| | - Molly A Bogue
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | - Anuj Srivastava
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - David G Ashbrook
- University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | | | | | - Alexander S Hatoum
- Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130, USA
- Artificial Intelligence and the Internet of Things Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Matthew J Kim
- University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Hao He
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | - Jake Emerson
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | | | | | | | | | | | - Lu Lu
- University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - John Bluis
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | - Sejal Desai
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | - Gary Peltz
- Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Zhuoqing Fang
- Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | | | - Robert W Williams
- University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Arpana Agrawal
- Department of Psychiatry, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Carol J Bult
- The Jackson Laboratory, Bar Harbor, Maine 04609, USA
| | | | | |
Collapse
|
8
|
Chen Z, Song W, Shu XO, Wen W, Devall M, Dampier C, Moratalla-Navarro F, Cai Q, Long J, Van Kaer L, Wu L, Huyghe JR, Thomas M, Hsu L, Woods MO, Albanes D, Buchanan DD, Gsur A, Hoffmeister M, Vodicka P, Wolk A, Marchand LL, Wu AH, Phipps AI, Moreno V, Ulrike P, Zheng W, Casey G, Guo X. Novel insights into genetic susceptibility for colorectal cancer from transcriptome-wide association and functional investigation. J Natl Cancer Inst 2024; 116:127-137. [PMID: 37632791 PMCID: PMC10777674 DOI: 10.1093/jnci/djad178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 07/10/2023] [Accepted: 08/19/2023] [Indexed: 08/28/2023] Open
Abstract
BACKGROUND Transcriptome-wide association studies have been successful in identifying candidate susceptibility genes for colorectal cancer (CRC). To strengthen susceptibility gene discovery, we conducted a large transcriptome-wide association study and an alternative splicing transcriptome-wide association study in CRC using improved genetic prediction models and performed in-depth functional investigations. METHODS We analyzed RNA-sequencing data from normal colon tissues and genotype data from 423 European descendants to build genetic prediction models of gene expression and alternative splicing and evaluated model performance using independent RNA-sequencing data from normal colon tissues of the Genotype-Tissue Expression Project. We applied the verified models to genome-wide association studies (GWAS) summary statistics among 58 131 CRC cases and 67 347 controls of European ancestry to evaluate associations of genetically predicted gene expression and alternative splicing with CRC risk. We performed in vitro functional assays for 3 selected genes in multiple CRC cell lines. RESULTS We identified 57 putative CRC susceptibility genes, which included the 48 genes from transcriptome-wide association studies and 15 genes from splicing transcriptome-wide association studies, at a Bonferroni-corrected P value less than .05. Of these, 16 genes were not previously implicated in CRC susceptibility, including a gene PDE7B (6q23.3) at locus previously not reported by CRC GWAS. Gene knockdown experiments confirmed the oncogenic roles for 2 unreported genes, TRPS1 and METRNL, and a recently reported gene, C14orf166. CONCLUSION This study discovered new putative susceptibility genes of CRC and provided novel insights into the biological mechanisms underlying CRC development.
Collapse
Affiliation(s)
- Zhishan Chen
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Wenqiang Song
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Xiao-Ou Shu
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Wanqing Wen
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Matthew Devall
- Department of Public Health Sciences, Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Christopher Dampier
- Department of Public Health Sciences, Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Ferran Moratalla-Navarro
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL Program, Institut de Recerca Biomedica de Bellvitge (IDIBELL), L’Hospitalet de Llobregat, Barcelona, Spain
- Department of Clinical Sciences, Faculty of Medicine and Health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona (UB), L’Hospitalet de Llobregat, Barcelona, Spain
- Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Qiuyin Cai
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Jirong Long
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Luc Van Kaer
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lan Wu
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jeroen R Huyghe
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Minta Thomas
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Michael O Woods
- Memorial University of Newfoundland, Discipline of Genetics, St. John’s, ON, Canada
| | - Demetrius Albanes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Daniel D Buchanan
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville, VIC, Australia
- University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, Parkville, VIC, Australia
- Genetic Medicine and Family Cancer Clinic, The Royal Melbourne Hospital, Parkville, VIC, Australia
| | - Andrea Gsur
- Center for Cancer Research, Medical University of Vienna, Vienna, Austria
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Pavel Vodicka
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, Prague, Czech Republic
- Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, Prague, Czech Republic
- Faculty of Medicine and Biomedical Center in Pilsen, Charles University, Pilsen, Czech Republic
| | - Alicja Wolk
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | | | - Anna H Wu
- Preventative Medicine, University of Southern California, Los Angeles, CA, USA
| | - Amanda I Phipps
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Victor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, Spain
- Colorectal Cancer Group, ONCOBELL Program, Institut de Recerca Biomedica de Bellvitge (IDIBELL), L’Hospitalet de Llobregat, Barcelona, Spain
- Department of Clinical Sciences, Faculty of Medicine and Health Sciences and Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona (UB), L’Hospitalet de Llobregat, Barcelona, Spain
- Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), Madrid, Spain
| | - Peters Ulrike
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Wei Zheng
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Graham Casey
- Department of Public Health Sciences, Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Xingyi Guo
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| |
Collapse
|
9
|
Freitas FAO, Brito LF, Fanalli SL, Gonçales JL, da Silva BPM, Durval MC, Ciconello FN, de Oliveira CS, Nascimento LE, Gervásio IC, Gomes JD, Moreira GCM, Silva-Vignato B, Coutinho LL, de Almeida VV, Cesar ASM. Identification of eQTLs using different sets of single nucleotide polymorphisms associated with carcass and body composition traits in pigs. BMC Genomics 2024; 25:14. [PMID: 38166730 PMCID: PMC10759680 DOI: 10.1186/s12864-023-09863-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Mapping expression quantitative trait loci (eQTLs) in skeletal muscle tissue in pigs is crucial for understanding the relationship between genetic variation and phenotypic expression of carcass traits in meat animals. Therefore, the primary objective of this study was to evaluate the impact of different sets of single nucleotide polymorphisms (SNP), including scenarios removing SNPs pruned for linkage disequilibrium (LD) and SNPs derived from SNP chip arrays and RNA-seq data from liver, brain, and skeletal muscle tissues, on the identification of eQTLs in the Longissimus lumborum tissue, associated with carcass and body composition traits in Large White pigs. The SNPs identified from muscle mRNA were combined with SNPs identified in the brain and liver tissue transcriptomes, as well as SNPs from the GGP Porcine 50 K SNP chip array. Cis- and trans-eQTLs were identified based on the skeletal muscle gene expression level, followed by functional genomic analyses and statistical associations with carcass and body composition traits in Large White pigs. RESULTS The number of cis- and trans-eQTLs identified across different sets of SNPs (scenarios) ranged from 261 to 2,539 and from 29 to 13,721, respectively. Furthermore, 6,180 genes were modulated by eQTLs in at least one of the scenarios evaluated. The eQTLs identified were not significantly associated with carcass and body composition traits but were significantly enriched for many traits in the "Meat and Carcass" type QTL. The scenarios with the highest number of cis- (n = 304) and trans- (n = 5,993) modulated genes were the unpruned and LD-pruned SNP set scenarios identified from the muscle transcriptome. These genes include 84 transcription factor coding genes. CONCLUSIONS After LD pruning, the set of SNPs identified based on the transcriptome of the skeletal muscle tissue of pigs resulted in the highest number of genes modulated by eQTLs. Most eQTLs are of the trans type and are associated with genes influencing complex traits in pigs, such as transcription factors and enhancers. Furthermore, the incorporation of SNPs from other genomic regions to the set of SNPs identified in the porcine skeletal muscle transcriptome contributed to the identification of eQTLs that had not been identified based on the porcine skeletal muscle transcriptome alone.
Collapse
Affiliation(s)
- Felipe André Oliveira Freitas
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
- Department of Animal Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Luiz F Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN, 47907, USA
- Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, 13635- 900, SP, Brazil
| | - Simara Larissa Fanalli
- Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, 13635- 900, SP, Brazil
| | - Janaína Lustosa Gonçales
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
| | | | - Mariah Castro Durval
- Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, 13635- 900, SP, Brazil
| | - Fernanda Nery Ciconello
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
| | | | | | - Izally Carvalho Gervásio
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
| | - Julia Dezen Gomes
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
| | | | - Bárbara Silva-Vignato
- Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, 13635- 900, SP, Brazil
| | - Luiz Lehmann Coutinho
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
| | - Vivian Vezzoni de Almeida
- College of Veterinary Medicine and Animal Science, Federal University of Goiás, Goiânia, 74001-970, GO, Brazil
| | - Aline Silva Mello Cesar
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil.
- Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, 13635- 900, SP, Brazil.
| |
Collapse
|
10
|
Kim S, Qin Y, Park HJ, Yue M, Xu Z, Forno E, Chen W, Celedón JC. Methyl-TWAS: A powerful method for in silico transcriptome-wide association studies (TWAS) using long-range DNA methylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566586. [PMID: 38014125 PMCID: PMC10680683 DOI: 10.1101/2023.11.10.566586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
In silico transcriptome-wide association studies (TWAS) are commonly used to test whether expression of specific genes is linked to a complex trait. However, genotype-based in silico TWAS such as PrediXcan, exhibit low prediction accuracy for a majority of genes because genotypic data lack tissue- and disease-specificity and are not affected by the environment. Because methylation is tissue-specific and, like gene expression, can be modified by environment or disease status, methylation should predict gene expression with more accuracy than SNPs. Therefore, we propose Methyl-TWAS, the first approach that utilizes long-range methylation markers to impute gene expression for in silico TWAS through penalized regression. Methyl-TWAS 1) predicts epigenetically regulated/associated expression (eGReX), which incorporates tissue-specific expression and both genetically- (GReX) and environmentally-regulated expression to identify differentially expressed genes (DEGs) that could not be identified by genotype-based methods; and 2) incorporates both cis- and trans- CpGs, including various regulatory regions to identify DEGs that would be missed using cis- methylation only. Methyl-TWAS outperforms PrediXcan and two other methods in imputing gene expression in the nasal epithelium, particularly for immunity-related genes and DEGs in atopic asthma. Methyl-TWAS identified 3,681 (85.2%) of the 4,316 DEGs identified in a previous TWAS of atopic asthma using measured expression, while PrediXcan could not identify any gene. Methyl-TWAS also outperforms PrediXcan for expression imputation as well as in silico TWAS in white blood cells. Methyl-TWAS is a valuable tool for in silico TWAS, leveraging a growing body of publicly available genome-wide DNA methylation data for a variety of human tissues.
Collapse
Affiliation(s)
- Soyeon Kim
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Yidi Qin
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hyun Jung Park
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Molin Yue
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zhongli Xu
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
- School of Medicine, Tsinghua University, Beijing, China
| | - Erick Forno
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei Chen
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Juan C. Celedón
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
11
|
Ball RL, Bogue MA, Liang H, Srivastava A, Ashbrook DG, Lamoureux A, Gerring MW, Hatoum AS, Kim M, He H, Emerson J, Berger AK, Walton DO, Sheppard K, Kassaby BE, Castellanos F, Kunde-Ramamoorthy G, Lu L, Bluis J, Desai S, Sundberg BA, Peltz G, Fang Z, Churchill GA, Williams RW, Agrawal A, Bult CJ, Philip VM, Chesler EJ. GenomeMUSter mouse genetic variation service enables multi-trait, multi-population data integration and analyses. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.08.552506. [PMID: 37609331 PMCID: PMC10441370 DOI: 10.1101/2023.08.08.552506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Hundreds of inbred laboratory mouse strains and intercross populations have been used to functionalize genetic variants that contribute to disease. Thousands of disease relevant traits have been characterized in mice and made publicly available. New strains and populations including the Collaborative Cross, expanded BXD and inbred wild-derived strains add to set of complex disease mouse models, genetic mapping resources and sensitized backgrounds against which to evaluate engineered mutations. The genome sequences of many inbred strains, along with dense genotypes from others could allow integrated analysis of trait - variant associations across populations, but these analyses are not feasible due to the sparsity of genotypes available. Moreover, the data are not readily interoperable with other resources. To address these limitations, we created a uniformly dense data resource by harmonizing multiple variant datasets. Missing genotypes were imputed using the Viterbi algorithm with a data-driven technique that incorporates local phylogenetic information, an approach that is extensible to other model organism species. The result is a web- and programmatically-accessible data service called GenomeMUSter ( https://muster.jax.org ), comprising allelic data covering 657 strains at 106.8M segregating sites. Interoperation with phenotype databases, analytic tools and other resources enable a wealth of applications including multi-trait, multi-population meta-analysis. We demonstrate this in a cross-species comparison of the meta-analysis of Type 2 Diabetes and of substance use disorders, resulting in the more specific characterization of the role of human variant effects in light of mouse phenotype data. Other applications include refinement of mapped loci and prioritization of strain backgrounds for disease modeling to further unlock extant mouse diversity for genetic and genomic studies in health and disease.
Collapse
|