1
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
2
|
Hu T, Parrish RL, Dai Q, Buchman AS, Tasaki S, Bennett DA, Seyfried NT, Epstein MP, Yang J. Omnibus proteome-wide association study identifies 43 risk genes for Alzheimer disease dementia. Am J Hum Genet 2024; 111:1848-1863. [PMID: 39079537 PMCID: PMC11393696 DOI: 10.1016/j.ajhg.2024.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 06/28/2024] [Accepted: 07/02/2024] [Indexed: 09/08/2024] Open
Abstract
Transcriptome-wide association study (TWAS) tools have been applied to conduct proteome-wide association studies (PWASs) by integrating proteomics data with genome-wide association study (GWAS) summary data. The genetic effects of PWAS-identified significant genes are potentially mediated through genetically regulated protein abundance, thus informing the underlying disease mechanisms better than GWAS loci. However, existing TWAS/PWAS tools are limited by considering only one statistical model. We propose an omnibus PWAS pipeline to account for multiple statistical models and demonstrate improved performance by simulation and application studies of Alzheimer disease (AD) dementia. We employ the Aggregated Cauchy Association Test to derive omnibus PWAS (PWAS-O) p values from PWAS p values obtained by three existing tools assuming complementary statistical models-TIGAR, PrediXcan, and FUSION. Our simulation studies demonstrated improved power, with well-calibrated type I error, for PWAS-O over all three individual tools. We applied PWAS-O to studying AD dementia with reference proteomic data profiled from dorsolateral prefrontal cortex of postmortem brains from individuals of European ancestry. We identified 43 risk genes, including 5 not identified by previous studies, which are interconnected through a protein-protein interaction network that includes the well-known AD risk genes TOMM40, APOC1, and APOC2. We also validated causal genetic effects mediated through the proteome for 27 (63%) PWAS-O risk genes, providing insights into the underlying biological mechanisms of AD dementia and highlighting promising targets for therapeutic development. PWAS-O can be easily applied to studying other complex diseases.
Collapse
Affiliation(s)
- Tingyang Hu
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA; Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Randy L Parrish
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA; Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA 30322, USA
| | - Qile Dai
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA; Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA 30322, USA
| | - Aron S Buchman
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Nicholas T Seyfried
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Michael P Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA.
| |
Collapse
|
3
|
Hu T, Dai Q, Epstein MP, Yang J. Proteome-wide association studies using summary pQTL data of three tissues identified 30 risk genes of Alzheimer's disease dementia. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.28.24305044. [PMID: 38585769 PMCID: PMC10996749 DOI: 10.1101/2024.03.28.24305044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Background Proteome-wide association study (PWAS) integrating proteomic data with genome-wide association study (GWAS) summary data is a powerful tool for studying Alzheimer's disease (AD) dementia. Existing PWAS analyses of AD often rely on the availability of individual-level proteomic and genetic data of a reference panel. Leveraging summary protein quantitative trait loci (pQTL) reference data of multiple AD-relevant tissues is expected to improve PWAS findings of AD dementia. Methods We conducted PWAS of AD dementia by integrating publicly available summary pQTL data of brain, cerebrospinal fluid (CSF), and plasma tissues, with the latest GWAS summary data of AD dementia. For each target protein per tissue, we employed our recently published OTTERS tool to obtain omnibus PWAS p-value, to test whether the genetically regulated protein abundance in the corresponding tissue is associated with AD dementia. Protein-protein interactions and enriched pathways of identified significant PWAS risk genes were analyzed by STRING. The potential causal effects of these PWAS risk genes were assessed by probabilistic Mendelian randomization analyses. Results We identified 30 unique significant PWAS risk genes for AD dementia, including 11 for brain, 9 for CSF, and 16 for plasma tissues. Four of these were shared by at least two tissues, and gene MAPK3 was found in all three tissues. We found that 11 of these PWAS risk genes were associated with AD or AD pathological hall marks as shown in GWAS Catalog; 18 of these were detected by transcriptome-wide association studies (TWAS); and 25 of these, including 8 out of 9 novel genes, were interconnected within a protein-protein interaction network involving the well-known AD risk gene APOE . Especially, these PWAS risk genes were enriched in immune response, glial cell proliferation, and high-density lipoprotein particle clearance pathways. Mediated causal effects were validated for 13 PWAS risk genes (43.3%). Conclusions Our findings provide novel insights into the genetic mechanisms of AD dementia in brain, CSF, and plasma tissues, and targets for developing therapeutic interventions. We also demonstrated the effectiveness of integrating summary pQTL and GWAS data for mapping risk genes of complex human diseases.
Collapse
Affiliation(s)
- Tingyang Hu
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Qile Dai
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA, 30322, USA
| | - Michael P. Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| |
Collapse
|
4
|
Parrish RL, Buchman AS, Tasaki S, Wang Y, Avey D, Xu J, De Jager PL, Bennett DA, Epstein MP, Yang J. SR-TWAS: leveraging multiple reference panels to improve transcriptome-wide association study power by ensemble machine learning. Nat Commun 2024; 15:6646. [PMID: 39103319 DOI: 10.1038/s41467-024-50983-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 07/26/2024] [Indexed: 08/07/2024] Open
Abstract
Multiple reference panels of a given tissue or multiple tissues often exist, and multiple regression methods could be used for training gene expression imputation models for transcriptome-wide association studies (TWAS). To leverage expression imputation models (i.e., base models) trained with multiple reference panels, regression methods, and tissues, we develop a Stacked Regression based TWAS (SR-TWAS) tool which can obtain optimal linear combinations of base models for a given validation transcriptomic dataset. Both simulation and real studies show that SR-TWAS improves power, due to increased training sample sizes and borrowed strength across multiple regression methods and tissues. Leveraging base models across multiple reference panels, tissues, and regression methods, our real studies identify 6 independent significant risk genes for Alzheimer's disease (AD) dementia for supplementary motor area tissue and 9 independent significant risk genes for Parkinson's disease (PD) for substantia nigra tissue. Relevant biological interpretations are found for these significant risk genes.
Collapse
Affiliation(s)
- Randy L Parrish
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Biostatistics, Emory University School of Public Health, Atlanta, GA, 30322, USA
| | - Aron S Buchman
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Yanling Wang
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Denis Avey
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Jishu Xu
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Philip L De Jager
- Center for Translational and Computational Neuroimmunology, Department of Neurology and Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
| | - Michael P Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
5
|
Wang A, Tian P, Zhang YD. TWAS-GKF: a novel method for causal gene identification in transcriptome-wide association studies with knockoff inference. Bioinformatics 2024; 40:btae502. [PMID: 39189955 PMCID: PMC11361808 DOI: 10.1093/bioinformatics/btae502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 07/02/2024] [Accepted: 08/24/2024] [Indexed: 08/28/2024] Open
Abstract
MOTIVATION Transcriptome-wide association study (TWAS) aims to identify trait-associated genes regulated by significant variants to explore the underlying biological mechanisms at a tissue-specific level. Despite the advancement of current TWAS methods to cover diverse traits, traditional approaches still face two main challenges: (i) the lack of methods that can guarantee finite-sample false discovery rate (FDR) control in identifying trait-associated genes; and (ii) the requirement for individual-level data, which is often inaccessible. RESULTS To address this challenge, we propose a powerful knockoff inference method termed TWAS-GKF to identify candidate trait-associated genes with a guaranteed finite-sample FDR control. TWAS-GKF introduces the main idea of Ghostknockoff inference to generate knockoff variables using only summary statistics instead of individual-level data. In extensive studies, we demonstrate that TWAS-GKF successfully controls the finite-sample FDR under a pre-specified FDR level across all settings. We further apply TWAS-GKF to identify genes in brain cerebellum tissue from the Genotype-Tissue Expression (GTEx) v8 project associated with schizophrenia (SCZ) from the Psychiatric Genomics Consortium (PGC), and genes in liver tissue related to low-density lipoprotein cholesterol (LDL-C) from the UK Biobank, respectively. The results reveal that the majority of the identified genes are validated by Open Targets Validation Platform. AVAILABILITY AND IMPLEMENTATION The R package TWAS.GKF is publicly available at https://github.com/AnqiWang2021/TWAS.GKF.
Collapse
Affiliation(s)
- Anqi Wang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, 999077, China
| | - Peixin Tian
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, 999077, China
| | - Yan Dora Zhang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, 999077, China
| |
Collapse
|
6
|
Song S, Wang L, Hou L, Liu JS. Partitioning and aggregating cross-tissue and tissue-specific genetic effects to identify gene-trait associations. Nat Commun 2024; 15:5769. [PMID: 38982044 PMCID: PMC11233643 DOI: 10.1038/s41467-024-49924-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 06/25/2024] [Indexed: 07/11/2024] Open
Abstract
TWAS have shown great promise in extending GWAS loci to a functional understanding of disease mechanisms. In an effort to fully unleash the TWAS and GWAS information, we propose MTWAS, a statistical framework that partitions and aggregates cross-tissue and tissue-specific genetic effects in identifying gene-trait associations. We introduce a non-parametric imputation strategy to augment the inaccessible tissues, accommodating complex interactions and non-linear expression data structures across various tissues. We further classify eQTLs into cross-tissue eQTLs and tissue-specific eQTLs via a stepwise procedure based on the extended Bayesian information criterion, which is consistent under high-dimensional settings. We show that MTWAS significantly improves the prediction accuracy across all 47 tissues of the GTEx dataset, compared with other single-tissue and multi-tissue methods, such as PrediXcan, TIGAR, and UTMOST. Applying MTWAS to the DICE and OneK1K datasets with bulk and single-cell RNA sequencing data on immune cell types showcases consistent improvements in prediction accuracy. MTWAS also identifies more predictable genes, and the improvement can be replicated with independent studies. We apply MTWAS to 84 UK Biobank GWAS studies, which provides insights into disease etiology.
Collapse
Affiliation(s)
- Shuang Song
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Lijun Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Lin Hou
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China.
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
7
|
Head ST, Dezem F, Todor A, Yang J, Plummer J, Gayther S, Kar S, Schildkraut J, Epstein MP. Cis- and trans-eQTL TWASs of breast and ovarian cancer identify more than 100 susceptibility genes in the BCAC and OCAC consortia. Am J Hum Genet 2024; 111:1084-1099. [PMID: 38723630 PMCID: PMC11179407 DOI: 10.1016/j.ajhg.2024.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/11/2024] [Accepted: 04/16/2024] [Indexed: 05/21/2024] Open
Abstract
Transcriptome-wide association studies (TWASs) have investigated the role of genetically regulated transcriptional activity in the etiologies of breast and ovarian cancer. However, methods performed to date have focused on the regulatory effects of risk-associated SNPs thought to act in cis on a nearby target gene. With growing evidence for distal (trans) regulatory effects of variants on gene expression, we performed TWASs of breast and ovarian cancer using a Bayesian genome-wide TWAS method (BGW-TWAS) that considers effects of both cis- and trans-expression quantitative trait loci (eQTLs). We applied BGW-TWAS to whole-genome and RNA sequencing data in breast and ovarian tissues from the Genotype-Tissue Expression project to train expression imputation models. We applied these models to large-scale GWAS summary statistic data from the Breast Cancer and Ovarian Cancer Association Consortia to identify genes associated with risk of overall breast cancer, non-mucinous epithelial ovarian cancer, and 10 cancer subtypes. We identified 101 genes significantly associated with risk with breast cancer phenotypes and 8 with ovarian phenotypes. These loci include established risk genes and several novel candidate risk loci, such as ACAP3, whose associations are predominantly driven by trans-eQTLs. We replicated several associations using summary statistics from an independent GWAS of these cancer phenotypes. We further used genotype and expression data in normal and tumor breast tissue from the Cancer Genome Atlas to examine the performance of our trained expression imputation models. This work represents an in-depth look into the role of trans eQTLs in the complex molecular mechanisms underlying these diseases.
Collapse
Affiliation(s)
- S Taylor Head
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Felipe Dezem
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Andrei Todor
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Jingjing Yang
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Jasmine Plummer
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Simon Gayther
- Department of Biomedical Sciences, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Siddhartha Kar
- Early Cancer Institute, Department of Oncology, University of Cambridge, Cambridge CB2 0XZ, UK
| | - Joellen Schildkraut
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Michael P Epstein
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA.
| |
Collapse
|
8
|
Guo S, Yang J. Bayesian genome-wide TWAS with reference transcriptomic data of brain and blood tissues identified 141 risk genes for Alzheimer's disease dementia. Alzheimers Res Ther 2024; 16:120. [PMID: 38824563 PMCID: PMC11144322 DOI: 10.1186/s13195-024-01488-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 05/27/2024] [Indexed: 06/03/2024]
Abstract
BACKGROUND Transcriptome-wide association study (TWAS) is an influential tool for identifying genes associated with complex diseases whose genetic effects are likely mediated through transcriptome. TWAS utilizes reference genetic and transcriptomic data to estimate effect sizes of genetic variants on gene expression (i.e., effect sizes of a broad sense of expression quantitative trait loci, eQTL). These estimated effect sizes are employed as variant weights in gene-based association tests, facilitating the mapping of risk genes with genome-wide association study (GWAS) data. However, most existing TWAS of Alzheimer's disease (AD) dementia are limited to studying only cis-eQTL proximal to the test gene. To overcome this limitation, we applied the Bayesian Genome-wide TWAS (BGW-TWAS) method to leveraging both cis- and trans- eQTL of brain and blood tissues, in order to enhance mapping risk genes for AD dementia. METHODS We first applied BGW-TWAS to the Genotype-Tissue Expression (GTEx) V8 dataset to estimate cis- and trans- eQTL effect sizes of the prefrontal cortex, cortex, and whole blood tissues. Estimated eQTL effect sizes were integrated with the summary data of the most recent GWAS of AD dementia to obtain BGW-TWAS (i.e., gene-based association test) p-values of AD dementia per gene per tissue type. Then we used the aggregated Cauchy association test to combine TWAS p-values across three tissues to obtain omnibus TWAS p-values per gene. RESULTS We identified 85 significant genes in prefrontal cortex, 82 in cortex, and 76 in whole blood that were significantly associated with AD dementia. By combining BGW-TWAS p-values across these three tissues, we obtained 141 significant risk genes including 34 genes primarily due to trans-eQTL and 35 mapped risk genes in GWAS Catalog. With these 141 significant risk genes, we detected functional clusters comprised of both known mapped GWAS risk genes of AD in GWAS Catalog and our identified TWAS risk genes by protein-protein interaction network analysis, as well as several enriched phenotypes related to AD. CONCLUSION We applied BGW-TWAS and aggregated Cauchy test methods to integrate both cis- and trans- eQTL data of brain and blood tissues with GWAS summary data, identifying 141 TWAS risk genes of AD dementia. These identified risk genes provide novel insights into the underlying biological mechanisms of AD dementia and potential gene targets for therapeutics development.
Collapse
Affiliation(s)
- Shuyi Guo
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
9
|
Zhang Y, Wang M, Li Z, Yang X, Li K, Xie A, Dong F, Wang S, Yan J, Liu J. An overview of detecting gene-trait associations by integrating GWAS summary statistics and eQTLs. SCIENCE CHINA. LIFE SCIENCES 2024; 67:1133-1154. [PMID: 38568343 DOI: 10.1007/s11427-023-2522-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/29/2024] [Indexed: 06/07/2024]
Abstract
Detecting genes that affect specific traits (such as human diseases and crop yields) is important for treating complex diseases and improving crop quality. A genome-wide association study (GWAS) provides new insights and directions for understanding complex traits by identifying important single nucleotide polymorphisms. Many GWAS summary statistics data related to various complex traits have been gathered recently. Studies have shown that GWAS risk loci and expression quantitative trait loci (eQTLs) often have a lot of overlaps, which makes gene expression gradually become an important intermediary to reveal the regulatory role of GWAS. In this review, we review three types of gene-trait association detection methods of integrating GWAS summary statistics and eQTLs data, namely colocalization methods, transcriptome-wide association study-oriented approaches, and Mendelian randomization-related methods. At the theoretical level, we discussed the differences, relationships, advantages, and disadvantages of various algorithms in the three kinds of gene-trait association detection methods. To further discuss the performance of various methods, we summarize the significant gene sets that influence high-density lipoprotein, low-density lipoprotein, total cholesterol, and triglyceride reported in 16 studies. We discuss the performance of various algorithms using the datasets of the four lipid traits. The advantages and limitations of various algorithms are analyzed based on experimental results, and we suggest directions for follow-up studies on detecting gene-trait associations.
Collapse
Affiliation(s)
- Yang Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Mengyao Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zhenguo Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xuan Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Keqin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ao Xie
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Fang Dong
- College of Life Sciences, Nankai University, Tianjin, 300071, China
| | - Shihan Wang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
10
|
Wang L, Khunsriraksakul C, Markus H, Chen D, Zhang F, Chen F, Zhan X, Carrel L, Liu DJ, Jiang B. Integrating single cell expression quantitative trait loci summary statistics to understand complex trait risk genes. Nat Commun 2024; 15:4260. [PMID: 38769300 DOI: 10.1038/s41467-024-48143-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 04/22/2024] [Indexed: 05/22/2024] Open
Abstract
Transcriptome-wide association study (TWAS) is a popular approach to dissect the functional consequence of disease associated non-coding variants. Most existing TWAS use bulk tissues and may not have the resolution to reveal cell-type specific target genes. Single-cell expression quantitative trait loci (sc-eQTL) datasets are emerging. The largest bulk- and sc-eQTL datasets are most conveniently available as summary statistics, but have not been broadly utilized in TWAS. Here, we present a new method EXPRESSO (EXpression PREdiction with Summary Statistics Only), to analyze sc-eQTL summary statistics, which also integrates 3D genomic data and epigenomic annotation to prioritize causal variants. EXPRESSO substantially improves existing methods. We apply EXPRESSO to analyze multi-ancestry GWAS datasets for 14 autoimmune diseases. EXPRESSO uniquely identifies 958 novel gene x trait associations, which is 26% more than the second-best method. Among them, 492 are unique to cell type level analysis and missed by TWAS using whole blood. We also develop a cell type aware drug repurposing pipeline, which leverages EXPRESSO results to identify drug compounds that can reverse disease gene expressions in relevant cell types. Our results point to multiple drugs with therapeutic potentials, including metformin for type 1 diabetes, and vitamin K for ulcerative colitis.
Collapse
Affiliation(s)
- Lida Wang
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Chachrit Khunsriraksakul
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Institute for Personalized Medicine; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Havell Markus
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
- Institute for Personalized Medicine; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Dieyi Chen
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Fan Zhang
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Fang Chen
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Xiaowei Zhan
- Department of Statistical Science, Southern Methodist University, Dallas, TX, US
- Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX, US
- Center for Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX, US
| | - Laura Carrel
- Department of Biochemistry and Molecular Biology; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
| | - Dajiang J Liu
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
- Bioinformatics and Genomics PhD Program; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
- Department of Statistical Science, Southern Methodist University, Dallas, TX, US.
| | - Bibo Jiang
- Department of Public Health Sciences; Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA.
| |
Collapse
|
11
|
Parrish RL, Buchman AS, Tasaki S, Wang Y, Avey D, Xu J, De Jager PL, Bennett DA, Epstein MP, Yang J. SR-TWAS: Leveraging Multiple Reference Panels to Improve TWAS Power by Ensemble Machine Learning. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.06.20.23291605. [PMID: 37425698 PMCID: PMC10327185 DOI: 10.1101/2023.06.20.23291605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Multiple reference panels of a given tissue or multiple tissues often exist, and multiple regression methods could be used for training gene expression imputation models for TWAS. To leverage expression imputation models (i.e., base models) trained with multiple reference panels, regression methods, and tissues, we develop a Stacked Regression based TWAS (SR-TWAS) tool which can obtain optimal linear combinations of base models for a given validation transcriptomic dataset. Both simulation and real studies showed that SR-TWAS improved power, due to increased effective training sample sizes and borrowed strength across multiple regression methods and tissues. Leveraging base models across multiple reference panels, tissues, and regression methods, our real application studies identified 6 independent significant risk genes for Alzheimer's disease (AD) dementia for supplementary motor area tissue and 9 independent significant risk genes for Parkinson's disease (PD) for substantia nigra tissue. Relevant biological interpretations were found for these significant risk genes.
Collapse
|
12
|
Wu YS, Zheng WH, Liu TH, Sun Y, Xu YT, Shao LZ, Cai QY, Tang YQ. Joint-tissue integrative analysis identifies high-risk genes for Parkinson's disease. Front Neurosci 2024; 18:1309684. [PMID: 38576865 PMCID: PMC10991821 DOI: 10.3389/fnins.2024.1309684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 02/22/2024] [Indexed: 04/06/2024] Open
Abstract
The loss of dopaminergic neurons in the substantia nigra and the abnormal accumulation of synuclein proteins and neurotransmitters in Lewy bodies constitute the primary symptoms of Parkinson's disease (PD). Besides environmental factors, scholars are in the early stages of comprehending the genetic factors involved in the pathogenic mechanism of PD. Although genome-wide association studies (GWAS) have unveiled numerous genetic variants associated with PD, precisely pinpointing the causal variants remains challenging due to strong linkage disequilibrium (LD) among them. Addressing this issue, expression quantitative trait locus (eQTL) cohorts were employed in a transcriptome-wide association study (TWAS) to infer the genetic correlation between gene expression and a particular trait. Utilizing the TWAS theory alongside the enhanced Joint-Tissue Imputation (JTI) technique and Mendelian Randomization (MR) framework (MR-JTI), we identified a total of 159 PD-associated genes by amalgamating LD score, GTEx eQTL data, and GWAS summary statistic data from a substantial cohort. Subsequently, Fisher's exact test was conducted on these PD-associated genes using 5,152 differentially expressed genes sourced from 12 PD-related datasets. Ultimately, 29 highly credible PD-associated genes, including CTX1B, SCNA, and ARSA, were uncovered. Furthermore, GO and KEGG enrichment analyses indicated that these genes primarily function in tissue synthesis, regulation of neuron projection development, vesicle organization and transportation, and lysosomal impact. The potential PD-associated genes identified in this study not only offer fresh insights into the disease's pathophysiology but also suggest potential biomarkers for early disease detection.
Collapse
Affiliation(s)
- Ya-Shi Wu
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
- Department of Cell Biology and Medical Genetics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Wen-Han Zheng
- Department of Cell Biology and Medical Genetics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Tai-Hang Liu
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Yan Sun
- Department of Cell Biology and Medical Genetics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Yu-Ting Xu
- Department of Cell Biology and Medical Genetics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Li-Zhen Shao
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Qin-Yu Cai
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Ya Qin Tang
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| |
Collapse
|
13
|
Melton HJ, Zhang Z, Wu C. SUMMIT-FA: a new resource for improved transcriptome imputation using functional annotations. Hum Mol Genet 2024; 33:624-635. [PMID: 38129112 PMCID: PMC10954367 DOI: 10.1093/hmg/ddad205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/24/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), which improves gene expression prediction accuracy by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models in whole blood using SUMMIT-FA with the comprehensive functional database MACIE and eQTL summary-level data from the eQTLGen consortium. We apply these models to GWAS for 24 complex traits and show that SUMMIT-FA identifies significantly more gene-trait associations and improves predictive power for identifying "silver standard" genes compared to several benchmark methods. We further conduct a simulation study to demonstrate the effectiveness of SUMMIT-FA.
Collapse
Affiliation(s)
- Hunter J Melton
- Department of Statistics, Florida State University, 214 Rogers Building, 117 N. Woodward Avenue, Tallahassee, FL 32306, United States
| | - Zichen Zhang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Unit 1689, Houston, TX 77030, United States
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Unit 1689, Houston, TX 77030, United States
| |
Collapse
|
14
|
Zhang S, Jiang Z, Zeng P. Incorporating genetic similarity of auxiliary samples into eGene identification under the transfer learning framework. J Transl Med 2024; 22:258. [PMID: 38461317 PMCID: PMC10924384 DOI: 10.1186/s12967-024-05053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/01/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND The term eGene has been applied to define a gene whose expression level is affected by at least one independent expression quantitative trait locus (eQTL). It is both theoretically and empirically important to identify eQTLs and eGenes in genomic studies. However, standard eGene detection methods generally focus on individual cis-variants and cannot efficiently leverage useful knowledge acquired from auxiliary samples into target studies. METHODS We propose a multilocus-based eGene identification method called TLegene by integrating shared genetic similarity information available from auxiliary studies under the statistical framework of transfer learning. We apply TLegene to eGene identification in ten TCGA cancers which have an explicit relevant tissue in the GTEx project, and learn genetic effect of variant in TCGA from GTEx. We also adopt TLegene to the Geuvadis project to evaluate its usefulness in non-cancer studies. RESULTS We observed substantial genetic effect correlation of cis-variants between TCGA and GTEx for a larger number of genes. Furthermore, consistent with the results of our simulations, we found that TLegene was more powerful than existing methods and thus identified 169 distinct candidate eGenes, which was much larger than the approach that did not consider knowledge transfer across target and auxiliary studies. Previous studies and functional enrichment analyses provided empirical evidence supporting the associations of discovered eGenes, and it also showed evidence of allelic heterogeneity of gene expression. Furthermore, TLegene identified more eGenes in Geuvadis and revealed that these eGenes were mainly enriched in cells EBV transformed lymphocytes tissue. CONCLUSION Overall, TLegene represents a flexible and powerful statistical method for eGene identification through transfer learning of genetic similarity shared across auxiliary and target studies.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Xuzhou Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
15
|
Liu L, Yan R, Guo P, Ji J, Gong W, Xue F, Yuan Z, Zhou X. Conditional transcriptome-wide association study for fine-mapping candidate causal genes. Nat Genet 2024; 56:348-356. [PMID: 38279040 DOI: 10.1038/s41588-023-01645-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 12/08/2023] [Indexed: 01/28/2024]
Abstract
Transcriptome-wide association studies (TWASs) aim to integrate genome-wide association studies with expression-mapping studies to identify genes with genetically predicted expression (GReX) associated with a complex trait. In the present report, we develop a method, GIFT (gene-based integrative fine-mapping through conditional TWAS), that performs conditional TWAS analysis by explicitly controlling for GReX of all other genes residing in a local region to fine-map putatively causal genes. GIFT is frequentist in nature, explicitly models both expression correlation and cis-single nucleotide polymorphism linkage disequilibrium across multiple genes and uses a likelihood framework to account for expression prediction uncertainty. As a result, GIFT produces calibrated P values and is effective for fine-mapping. We apply GIFT to analyze six traits in the UK Biobank, where GIFT narrows down the set size of putatively causal genes by 32.16-91.32% compared with existing TWAS fine-mapping approaches. The genes identified by GIFT highlight the importance of vessel regulation in determining blood pressures and lipid metabolism for regulating lipid levels.
Collapse
Affiliation(s)
- Lu Liu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ran Yan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ping Guo
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jiadong Ji
- Institute for Financial Studies, Shandong University, Jinan, China
| | - Weiming Gong
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China.
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China.
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
16
|
Yang J, Liu X, Oveisgharan S, Zammit AR, Nag S, Bennett DA, Buchman AS. Inferring Alzheimer's disease pathologic traits from clinical measures in living adults. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.05.08.23289668. [PMID: 37214885 PMCID: PMC10197717 DOI: 10.1101/2023.05.08.23289668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Background Alzheimer's disease neuropathologic changes (AD-NC) are important for identify people with high risk for AD dementia (ADD) and subtyping ADD. Objective Develop imputation models based on clinical measures to infer AD-NC. Methods We used penalized generalized linear regression to train imputation models for four AD-NC traits (amyloid-β, tangles, global AD pathology, and pathologic AD) in Rush Memory and Aging Project decedents, using clinical measures at the last visit prior to death as predictors. We validated these models by inferring AD-NC traits with clinical measures at the last visit prior to death for independent Religious Orders Study (ROS) decedents. We inferred baseline AD-NC traits for all ROS participants at study entry, and then tested if inferred AD-NC traits at study entry predicted incident ADD and postmortem pathologic AD. Results Inferred AD-NC traits at the last visit prior to death were related to postmortem measures with R2=(0.188,0.316,0.262) respectively for amyloid-β, tangles, and global AD pathology, and prediction Area Under the receiver operating characteristic Curve (AUC) 0.765 for pathologic AD. Inferred baseline levels of all four AD-NC traits predicted ADD. The strongest prediction was obtained by the inferred baseline probabilities of pathologic AD with AUC=(0.919,0.896) for predicting the development of ADD in 3 and 5 years from baseline. The inferred baseline levels of all four AD-NC traits significantly discriminated pathologic AD profiled eight years later with p-values<1.4 × 10-10. Conclusion Inferred AD-NC traits based on clinical measures may provide effective AD biomarkers that can estimate the burden of AD-NC traits in aging adults.
Collapse
Affiliation(s)
- Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, 615 Michael St, Atlanta, GA, 30322, USA
| | - Xizhu Liu
- Department of Biostatistics, Yale University School of Public Health, 60 College St, New Haven, CT, 06510, USA
| | - Shahram Oveisgharan
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, 1620 W Harrison St, Chicago, IL, 60612, USA
| | - Andrea R. Zammit
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, 1620 W Harrison St, Chicago, IL, 60612, USA
| | - Sukriti Nag
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, 1620 W Harrison St, Chicago, IL, 60612, USA
| | - David A Bennett
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, 1620 W Harrison St, Chicago, IL, 60612, USA
| | - Aron S Buchman
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, 1620 W Harrison St, Chicago, IL, 60612, USA
| |
Collapse
|
17
|
Yang J, Liu X, Oveisgharan S, Zammit AR, Nag S, Bennett DA, Buchman AS. Inferring Alzheimer's Disease Pathologic Traits from Clinical Measures in Living Adults. J Alzheimers Dis 2024; 98:95-107. [PMID: 38427476 PMCID: PMC11034758 DOI: 10.3233/jad-230639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2024]
Abstract
Background Alzheimer's disease neuropathologic changes (AD-NC) are important to identify people with high risk for AD dementia (ADD) and subtyping ADD. Objective Develop imputation models based on clinical measures to infer AD-NC. Methods We used penalized generalized linear regression to train imputation models for four AD-NC traits (amyloid-β, tangles, global AD pathology, and pathologic AD) in Rush Memory and Aging Project decedents, using clinical measures at the last visit prior to death as predictors. We validated these models by inferring AD-NC traits with clinical measures at the last visit prior to death for independent Religious Orders Study (ROS) decedents. We inferred baseline AD-NC traits for all ROS participants at study entry, and then tested if inferred AD-NC traits at study entry predicted incident ADD and postmortem pathologic AD. Results Inferred AD-NC traits at the last visit prior to death were related to postmortem measures with R2 = (0.188,0.316,0.262) respectively for amyloid-β, tangles, and global AD pathology, and prediction Area Under the receiver operating characteristic Curve (AUC) 0.765 for pathologic AD. Inferred baseline levels of all four AD-NC traits predicted ADD. The strongest prediction was obtained by the inferred baseline probabilities of pathologic AD with AUC = (0.919,0.896) for predicting the development of ADD in 3 and 5 years from baseline. The inferred baseline levels of all four AD-NC traits significantly discriminated pathologic AD profiled eight years later with p-values < 1.4×10-10. Conclusions Inferred AD-NC traits based on clinical measures may provide effective AD biomarkers that can estimate the burden of AD-NC traits in aging adults.
Collapse
Affiliation(s)
- Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, 615 Michael St, Atlanta, GA, 30322, USA
| | - Xizhu Liu
- Department of Biostatistics, Yale University School of Public Health, 60 College St, New Haven, CT, 06510, USA
| | - Shahram Oveisgharan
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, 1620 W Harrison St, Chicago, IL, 60612, USA
| | - Andrea R. Zammit
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, 1620 W Harrison St, Chicago, IL, 60612, USA
| | - Sukriti Nag
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, 1620 W Harrison St, Chicago, IL, 60612, USA
| | - David A Bennett
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, 1620 W Harrison St, Chicago, IL, 60612, USA
| | - Aron S Buchman
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, 1620 W Harrison St, Chicago, IL, 60612, USA
| |
Collapse
|
18
|
Chhotaray S, Vohra V, Uttam V, Santhosh A, Saxena P, Gahlyan RK, Gowane G. TWAS revealed significant causal loci for milk production and its composition in Murrah buffaloes. Sci Rep 2023; 13:22401. [PMID: 38104199 PMCID: PMC10725422 DOI: 10.1038/s41598-023-49767-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 12/12/2023] [Indexed: 12/19/2023] Open
Abstract
Milk yield is the most complex trait in dairy animals, and mapping all causal variants even with smallest effect sizes has been difficult with the genome-wide association study (GWAS) sample sizes available in geographical regions with small livestock holdings such as Indian sub-continent. However, Transcriptome-wide association studies (TWAS) could serve as an alternate for fine mapping of expression quantitative trait loci (eQTLs). This is a maiden attempt to identify milk production and its composition related genes using TWAS in Murrah buffaloes (Bubalus bubalis). TWAS was conducted on a test (N = 136) set of Murrah buffaloes genotyped through ddRAD sequencing. Their gene expression level was predicted using reference (N = 8) animals having both genotype and mammary epithelial cell (MEC) transcriptome information. Gene expression prediction was performed using Elastic-Net and Dirichlet Process Regression (DPR) model with fivefold cross-validation and without any cross-validation. DPR model without cross-validation predicted 80.92% of the total genes in the test group of Murrah buffaloes which was highest compared to other methods. TWAS in test individuals based on predicted gene expression, identified a significant association of one unique gene for Fat%, and two for SNF% at Bonferroni corrected threshold. The false discovery rates (FDR) corrected P-values of the top ten SNPs identified through GWAS were comparatively higher than TWAS. Gene ontology of TWAS-identified genes was performed to understand the function of these genes, it was revealed that milk production and composition genes were mainly involved in Relaxin, AMPK, and JAK-STAT signaling pathway, along with CCRI, and several key metabolic processes. The present study indicates that TWAS offers a lower false discovery rate and higher significant hits than GWAS for milk production and its composition traits. Hence, it is concluded that TWAS can be effectively used to identify genes and cis-SNPs in a population, which can be used for fabricating a low-density genomic chip for predicting milk production in Murrah buffaloes.
Collapse
Affiliation(s)
- Supriya Chhotaray
- Division of Animal Genetics and Breeding, ICAR-Central Institute for Research on Buffaloes, Hisar, Haryana, 125001, India
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| | - Vikas Vohra
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India.
| | - Vishakha Uttam
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| | - Ameya Santhosh
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| | - Punjika Saxena
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| | - Rajesh Kumar Gahlyan
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| | - Gopal Gowane
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| |
Collapse
|
19
|
Head ST, Dezem F, Todor A, Yang J, Plummer J, Gayther S, Kar S, Schildkraut J, Epstein MP. Cis- and trans-eQTL TWAS of breast and ovarian cancer identify more than 100 risk associated genes in the BCAC and OCAC consortia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566218. [PMID: 38014246 PMCID: PMC10680675 DOI: 10.1101/2023.11.09.566218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Transcriptome-wide association studies (TWAS) have investigated the role of genetically regulated transcriptional activity in the etiologies of breast and ovarian cancer. However, methods performed to date have only considered regulatory effects of risk associated SNPs thought to act in cis on a nearby target gene. With growing evidence for distal (trans) regulatory effects of variants on gene expression, we performed TWAS of breast and ovarian cancer using a Bayesian genome-wide TWAS method (BGW-TWAS) that considers effects of both cis- and trans-expression quantitative trait loci (eQTLs). We applied BGW-TWAS to whole genome and RNA sequencing data in breast and ovarian tissues from the Genotype-Tissue Expression project to train expression imputation models. We applied these models to large-scale GWAS summary statistic data from the Breast Cancer and Ovarian Cancer Association Consortia to identify genes associated with risk of overall breast cancer, non-mucinous epithelial ovarian cancer, and 10 cancer subtypes. We identified 101 genes significantly associated with risk with breast cancer phenotypes and 8 with ovarian phenotypes. These loci include established risk genes and several novel candidate risk loci, such as ACAP3, whose associations are predominantly driven by trans-eQTLs. We replicated several associations using summary statistics from an independent GWAS of these cancer phenotypes. We further used genotype and expression data in normal and tumor breast tissue from the Cancer Genome Atlas to examine the performance of our trained expression imputation models. This work represents a first look into the role of trans-eQTLs in the complex molecular mechanisms underlying these diseases.
Collapse
Affiliation(s)
- S. Taylor Head
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Felipe Dezem
- Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Andrei Todor
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Jingjing Yang
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Jasmine Plummer
- Department of Developmental Neurobiology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Simon Gayther
- Department of Biomedical Sciences, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Siddhartha Kar
- Early Cancer Institute, Department of Oncology, University of Cambridge, Cambridge CB2 0XZ, UK
| | - Joellen Schildkraut
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Michael P. Epstein
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
20
|
Araujo DS, Nguyen C, Hu X, Mikhaylova AV, Gignoux C, Ardlie K, Taylor KD, Durda P, Liu Y, Papanicolaou G, Cho MH, Rich SS, Rotter JI, Im HK, Manichaikul A, Wheeler HE. Multivariate adaptive shrinkage improves cross-population transcriptome prediction and association studies in underrepresented populations. HGG ADVANCES 2023; 4:100216. [PMID: 37869564 PMCID: PMC10589725 DOI: 10.1016/j.xhgg.2023.100216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/27/2023] [Indexed: 10/24/2023] Open
Abstract
Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized that methods that leverage shared regulatory effects across different conditions, in this case, across different populations, may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWASs) using different methods (elastic net, joint-tissue imputation [JTI], matrix expression quantitative trait loci [Matrix eQTL], multivariate adaptive shrinkage in R [MASHR], and transcriptome-integrated genetic association resource [TIGAR]) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWASs, we integrated publicly available multiethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study and Pan-ancestry genetic analysis of the UK Biobank (PanUKBB) with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multiethnic TWASs, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWASs and loci previously not found in GWASs. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWASs for multiethnic or underrepresented populations.
Collapse
Affiliation(s)
- Daniel S. Araujo
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
| | - Chris Nguyen
- Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
| | - Xiaowei Hu
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Anna V. Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kristin Ardlie
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Peter Durda
- Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT 05446, USA
| | - Yongmei Liu
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, USA
| | - George Papanicolaou
- Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - NHLBI TOPMed Consortium
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
- Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO 80045, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, the Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
- Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT 05446, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710, USA
- Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Heather E. Wheeler
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL 60660, USA
- Department of Biology, Loyola University Chicago, Chicago, IL 60660, USA
| |
Collapse
|
21
|
Xu C, Ganesh SK, Zhou X. mtPGS: Leverage multiple correlated traits for accurate polygenic score construction. Am J Hum Genet 2023; 110:1673-1689. [PMID: 37716346 PMCID: PMC10577082 DOI: 10.1016/j.ajhg.2023.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/18/2023] [Accepted: 08/27/2023] [Indexed: 09/18/2023] Open
Abstract
Accurate polygenic scores (PGSs) facilitate the genetic prediction of complex traits and aid in the development of personalized medicine. Here, we develop a statistical method called multi-trait assisted PGS (mtPGS), which can construct accurate PGSs for a target trait of interest by leveraging multiple traits relevant to the target trait. Specifically, mtPGS borrows SNP effect size similarity information between the target trait and its relevant traits to improve the effect size estimation on the target trait, thus achieving accurate PGSs. In the process, mtPGS flexibly models the shared genetic architecture between the target and the relevant traits to achieve robust performance, while explicitly accounting for the environmental covariance among them to accommodate different study designs with various sample overlap patterns. In addition, mtPGS uses only summary statistics as input and relies on a deterministic algorithm with several algebraic techniques for scalable computation. We evaluate the performance of mtPGS through comprehensive simulations and applications to 25 traits in the UK Biobank, where in the real data mtPGS achieves an average of 0.90%-52.91% accuracy gain compared to the state-of-the-art PGS methods. Overall, mtPGS represents an accurate, fast, and robust solution for PGS construction in biobank-scale datasets.
Collapse
Affiliation(s)
- Chang Xu
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Santhi K Ganesh
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
| |
Collapse
|
22
|
Mai J, Lu M, Gao Q, Zeng J, Xiao J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun Biol 2023; 6:899. [PMID: 37658226 PMCID: PMC10474133 DOI: 10.1038/s42003-023-05279-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 08/24/2023] [Indexed: 09/03/2023] Open
Abstract
Genome-wide association study has identified fruitful variants impacting heritable traits. Nevertheless, identifying critical genes underlying those significant variants has been a great task. Transcriptome-wide association study (TWAS) is an instrumental post-analysis to detect significant gene-trait associations focusing on modeling transcription-level regulations, which has made numerous progresses in recent years. Leveraging from expression quantitative loci (eQTL) regulation information, TWAS has advantages in detecting functioning genes regulated by disease-associated variants, thus providing insight into mechanisms of diseases and other phenotypes. Considering its vast potential, this review article comprehensively summarizes TWAS, including the methodology, applications and available resources.
Collapse
Affiliation(s)
- Jialin Mai
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Mingming Lu
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qianwen Gao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jingyao Zeng
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
| | - Jingfa Xiao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
23
|
de Leeuw C, Werme J, Savage JE, Peyrot WJ, Posthuma D. On the interpretation of transcriptome-wide association studies. PLoS Genet 2023; 19:e1010921. [PMID: 37676898 PMCID: PMC10508613 DOI: 10.1371/journal.pgen.1010921] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 09/19/2023] [Accepted: 08/15/2023] [Indexed: 09/09/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) aim to detect relationships between gene expression and a phenotype, and are commonly used for secondary analysis of genome-wide association study (GWAS) results. Results from TWAS analyses are often interpreted as indicating a genetic relationship between gene expression and a phenotype, but this interpretation is not consistent with the null hypothesis that is evaluated in the traditional TWAS framework. In this study we provide a mathematical outline of this TWAS framework, and elucidate what interpretations are warranted given the null hypothesis it actually tests. We then use both simulations and real data analysis to assess the implications of misinterpreting TWAS results as indicative of a genetic relationship between gene expression and the phenotype. Our simulation results show considerably inflated type 1 error rates for TWAS when interpreted this way, with 41% of significant TWAS associations detected in the real data analysis found to have insufficient statistical evidence to infer such a relationship. This demonstrates that in current implementations, TWAS cannot reliably be used to investigate genetic relationships between gene expression and a phenotype, but that local genetic correlation analysis can serve as a potential alternative.
Collapse
Affiliation(s)
- Christiaan de Leeuw
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
| | - Josefin Werme
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
| | - Jeanne E. Savage
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
| | - Wouter J. Peyrot
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
- Department of Psychiatry, Amsterdam UMC, location VUmc, Amsterdam, the Netherlands
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
- Department of Child and Adolescent Psychology and Psychiatry, section Complex Trait Genetics, Amsterdam Neuroscience, VU University Medical Centre, Amsterdam, The Netherlands
| |
Collapse
|
24
|
Guo S, Yang J. Bayesian genome-wide TWAS with reference transcriptomic data of brain and blood tissues identified 93 risk genes for Alzheimer's disease dementia. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.06.23292336. [PMID: 37503151 PMCID: PMC10370241 DOI: 10.1101/2023.07.06.23292336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Background Transcriptome-wide association study (TWAS) is an influential tool for identifying novel genes associated with complex diseases, where their genetic effects may be mediated through transcriptome. TWAS utilizes reference genetic and transcriptomic data to estimate genetic effect sizes on expression quantitative traits of target genes (i.e., effect sizes of a broad sense of expression quantitative trait loci, eQTL). These estimated effect sizes are then employed as variant weights in burden gene-based association test statistics, facilitating the mapping of risk genes for complex diseases with genome-wide association study (GWAS) data. However, most existing TWAS of Alzheimer's disease (AD) dementia have primarily focused on cis -eQTL, disregarding potential trans -eQTL. To overcome this limitation, we applied the Bayesian Genome-wide TWAS (BGW-TWAS) method which incorporated both cis - and trans -eQTL of brain and blood tissues to enhance mapping risk genes for AD dementia. Methods We first applied BGW-TWAS to the Genotype-Tissue Expression (GTEx) V8 dataset to estimate cis - and trans -eQTL effect sizes of the prefrontal cortex, cortex, and whole blood tissues. Subsequently, estimated eQTL effect sizes were integrated with the summary data of the most recent GWAS of AD dementia to obtain BGW-TWAS (i.e., gene-based association test) p-values of AD dementia per tissue type. Finally, we used the aggregated Cauchy association test to combine TWAS p-values across three tissues to obtain omnibus TWAS p-values per gene. Results We identified 37 genes in prefrontal cortex, 55 in cortex, and 51 in whole blood that were significantly associated with AD dementia. By combining BGW-TWAS p-values across these three tissues, we obtained 93 significant risk genes including 29 genes primarily due to trans -eQTL and 50 novel genes. Utilizing protein-protein interaction network and phenotype enrichment analyses with these 93 significant risk genes, we detected 5 functional clusters comprised of both known and novel AD risk genes and 7 enriched phenotypes. Conclusion We applied BGW-TWAS and aggregated Cauchy test methods to integrate both cis - and trans -eQTL data of brain and blood tissues with GWAS summary data to identify risk genes of AD dementia. The risk genes we identified provide novel insights into the underlying biological pathways implicated in AD dementia.
Collapse
|
25
|
Wang YH, Luo PP, Geng AY, Li X, Liu TH, He YJ, Huang L, Tang YQ. Identification of highly reliable risk genes for Alzheimer's disease through joint-tissue integrative analysis. Front Aging Neurosci 2023; 15:1183119. [PMID: 37416324 PMCID: PMC10320295 DOI: 10.3389/fnagi.2023.1183119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 05/30/2023] [Indexed: 07/08/2023] Open
Abstract
Numerous genetic variants associated with Alzheimer's disease (AD) have been identified through genome-wide association studies (GWAS), but their interpretation is hindered by the strong linkage disequilibrium (LD) among the variants, making it difficult to identify the causal variants directly. To address this issue, the transcriptome-wide association study (TWAS) was employed to infer the association between gene expression and a trait at the genetic level using expression quantitative trait locus (eQTL) cohorts. In this study, we applied the TWAS theory and utilized the improved Joint-Tissue Imputation (JTI) approach and Mendelian Randomization (MR) framework (MR-JTI) to identify potential AD-associated genes. By integrating LD score, GTEx eQTL data, and GWAS summary statistic data from a large cohort using MR-JTI, a total of 415 AD-associated genes were identified. Then, 2873 differentially expressed genes from 11 AD-related datasets were used for the Fisher test of these AD-associated genes. We finally obtained 36 highly reliable AD-associated genes, including APOC1, CR1, ERBB2, and RIN3. Moreover, the GO and KEGG enrichment analysis revealed that these genes are primarily involved in antigen processing and presentation, amyloid-beta formation, tau protein binding, and response to oxidative stress. The identification of these potential AD-associated genes not only provides insights into the pathogenesis of AD but also offers biomarkers for early diagnosis of the disease.
Collapse
Affiliation(s)
- Yong Heng Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
- Joint International Research Laboratory of Reproduction and Development, Chongqing Medical University, Chongqing, China
| | - Pan Pan Luo
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Ao Yi Geng
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Xinwei Li
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
| | - Tai-Hang Liu
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
- Joint International Research Laboratory of Reproduction and Development, Chongqing Medical University, Chongqing, China
| | - Yi Jie He
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Lin Huang
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| | - Ya Qin Tang
- Department of Bioinformatics, School of Basic Medical Sciences, Chongqing Medical University, Chongqing, China
| |
Collapse
|
26
|
Gedik H, Nguyen TH, Peterson RE, Chatzinakos C, Vladimirov VI, Riley BP, Bacanu SA. Identifying potential risk genes and pathways for neuropsychiatric and substance use disorders using intermediate molecular mediator information. Front Genet 2023; 14:1191264. [PMID: 37415601 PMCID: PMC10320396 DOI: 10.3389/fgene.2023.1191264] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 05/23/2023] [Indexed: 07/08/2023] Open
Abstract
Neuropsychiatric and substance use disorders (NPSUDs) have a complex etiology that includes environmental and polygenic risk factors with significant cross-trait genetic correlations. Genome-wide association studies (GWAS) of NPSUDs yield numerous association signals. However, for most of these regions, we do not yet have a firm understanding of either the specific risk variants or the effects of these variants. Post-GWAS methods allow researchers to use GWAS summary statistics and molecular mediators (transcript, protein, and methylation abundances) infer the effect of these mediators on risk for disorders. One group of post-GWAS approaches is commonly referred to as transcriptome/proteome/methylome-wide association studies, which are abbreviated as T/P/MWAS (or collectively as XWAS). Since these approaches use biological mediators, the multiple testing burden is reduced to the number of genes (∼20,000) instead of millions of GWAS SNPs, which leads to increased signal detection. In this work, our aim is to uncover likely risk genes for NPSUDs by performing XWAS analyses in two tissues-blood and brain. First, to identify putative causal risk genes, we performed an XWAS using the Summary-data-based Mendelian randomization, which uses GWAS summary statistics, reference xQTL data, and a reference LD panel. Second, given the large comorbidities among NPSUDs and the shared cis-xQTLs between blood and the brain, we improved XWAS signal detection for underpowered analyses by performing joint concordance analyses between XWAS results i) across the two tissues and ii) across NPSUDs. All XWAS signals i) were adjusted for heterogeneity in dependent instruments (HEIDI) (non-causality) p-values and ii) used to test for pathway enrichment. The results suggest that there were widely shared gene/protein signals within the major histocompatibility complex region on chromosome 6 (BTN3A2 and C4A) and elsewhere in the genome (FURIN, NEK4, RERE, and ZDHHC5). The identification of putative molecular genes and pathways underlying risk may offer new targets for therapeutic development. Our study revealed an enrichment of XWAS signals in vitamin D and omega-3 gene sets. So, including vitamin D and omega-3 in treatment plans may have a modest but beneficial effect on patients with bipolar disorder.
Collapse
Affiliation(s)
- Huseyin Gedik
- Integrative Life Sciences, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, United States
| | - Tan Hoang Nguyen
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, United States
| | - Roseann E. Peterson
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY, United States
- Department of Psychiatry and Behavioral Sciences, SUNY Downstate Health Sciences University, Brooklyn, NY, United States
| | - Christos Chatzinakos
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, United States
- Department of Psychiatry, McLean Hospital and Harvard Medical School, Belmont, MA, United States
| | - Vladimir I. Vladimirov
- Department of Psychiatry, College of Medicine, University of Arizona Phoenix, Phoenix, AZ, United States
| | - Brien P. Riley
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, United States
| | - Silviu-Alin Bacanu
- Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
27
|
Cai W, Zhang Y, Chang T, Wang Z, Zhu B, Chen Y, Gao X, Xu L, Zhang L, Gao H, Song J, Li J. The eQTL colocalization and transcriptome-wide association study identify potentially causal genes responsible for economic traits in Simmental beef cattle. J Anim Sci Biotechnol 2023; 14:78. [PMID: 37165455 PMCID: PMC10173583 DOI: 10.1186/s40104-023-00876-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 04/05/2023] [Indexed: 05/12/2023] Open
Abstract
BACKGROUND A detailed understanding of genetic variants that affect beef merit helps maximize the efficiency of breeding for improved production merit in beef cattle. To prioritize the putative variants and genes, we ran a comprehensive genome-wide association studies (GWAS) analysis for 21 agronomic traits using imputed whole-genome variants in Simmental beef cattle. Then, we applied expression quantitative trait loci (eQTL) mapping between the genotype variants and transcriptome of three tissues (longissimus dorsi muscle, backfat, and liver) in 120 cattle. RESULTS We identified 1,580 association signals for 21 beef agronomic traits using GWAS. We then illuminated 854,498 cis-eQTLs for 6,017 genes and 46,970 trans-eQTLs for 1,903 genes in three tissues and built a synergistic network by integrating transcriptomics with agronomic traits. These cis-eQTLs were preferentially close to the transcription start site and enriched in functional regulatory regions. We observed an average of 43.5% improvement in cis-eQTL discovery using multi-tissue eQTL mapping. Fine-mapping analysis revealed that 111, 192, and 194 variants were most likely to be causative to regulate gene expression in backfat, liver, and muscle, respectively. The transcriptome-wide association studies identified 722 genes significantly associated with 11 agronomic traits. Via the colocalization and Mendelian randomization analyses, we found that eQTLs of several genes were associated with the GWAS signals of agronomic traits in three tissues, which included genes, such as NADSYN1, NDUFS3, LTF and KIFC2 in liver, GRAMD1C, TMTC2 and ZNF613 in backfat, as well as TIGAR, NDUFS3 and L3HYPDH in muscle that could serve as the candidate genes for economic traits. CONCLUSIONS The extensive atlas of GWAS, eQTL, fine-mapping, and transcriptome-wide association studies aid in the suggestion of potentially functional variants and genes in cattle agronomic traits and will be an invaluable source for genomics and breeding in beef cattle.
Collapse
Affiliation(s)
- Wentao Cai
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Yapeng Zhang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Tianpeng Chang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Zezhao Wang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Bo Zhu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Yan Chen
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Xue Gao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Lingyang Xu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Lupei Zhang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Huijiang Gao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Jiuzhou Song
- Department of Animal and Avian Science, University of Maryland, College Park, MD, 20742, USA.
| | - Junya Li
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| |
Collapse
|
28
|
Wang X, Lu Z, Bhattacharya A, Pasaniuc B, Mancuso N. twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis. Bioinformatics 2023; 39:btad288. [PMID: 37099718 PMCID: PMC10172036 DOI: 10.1093/bioinformatics/btad288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 04/06/2023] [Accepted: 04/13/2023] [Indexed: 04/28/2023] Open
Abstract
SUMMARY Genome-wide association studies (GWASs) have identified numerous genetic variants associated with complex disease risk; however, most of these associations are non-coding, complicating identifying their proximal target gene. Transcriptome-wide association studies (TWASs) have been proposed to mitigate this gap by integrating expression quantitative trait loci (eQTL) data with GWAS data. Numerous methodological advancements have been made for TWAS, yet each approach requires ad hoc simulations to demonstrate feasibility. Here, we present twas_sim, a computationally scalable and easily extendable tool for simplified performance evaluation and power analysis for TWAS methods. AVAILABILITY AND IMPLEMENTATION Software and documentation are available at https://github.com/mancusolab/twas_sim.
Collapse
Affiliation(s)
- Xinran Wang
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, United States
| | - Zeyun Lu
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, United States
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, United States
- Institute of Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, United States
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, United States
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, United States
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, United States
| | - Nicholas Mancuso
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, United States
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90097, United States
| |
Collapse
|
29
|
Hess JL, Quinn TP, Zhang C, Hearn GC, Chen S, Kong SW, Cairns M, Tsuang MT, Faraone SV, Glatt SJ. BrainGENIE: The Brain Gene Expression and Network Imputation Engine. Transl Psychiatry 2023; 13:98. [PMID: 36949060 PMCID: PMC10033657 DOI: 10.1038/s41398-023-02390-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 02/23/2023] [Accepted: 02/28/2023] [Indexed: 03/24/2023] Open
Abstract
In vivo experimental analysis of human brain tissue poses substantial challenges and ethical concerns. To address this problem, we developed a computational method called the Brain Gene Expression and Network-Imputation Engine (BrainGENIE) that leverages peripheral-blood transcriptomes to predict brain tissue-specific gene-expression levels. Paired blood-brain transcriptomic data collected by the Genotype-Tissue Expression (GTEx) Project was used to train BrainGENIE models to predict gene-expression levels in ten distinct brain regions using whole-blood gene-expression profiles. The performance of BrainGENIE was compared to PrediXcan, a popular method for imputing gene expression levels from genotypes. BrainGENIE significantly predicted brain tissue-specific expression levels for 2947-11,816 genes (false-discovery rate-adjusted p < 0.05), including many transcripts that cannot be predicted significantly by a transcriptome-imputation method such as PrediXcan. BrainGENIE recapitulated measured diagnosis-related gene-expression changes in the brain for autism, bipolar disorder, and schizophrenia better than direct correlations from blood and predictions from PrediXcan. We developed a convenient software toolset for deploying BrainGENIE, and provide recommendations for how best to implement models. BrainGENIE complements and, in some ways, outperforms existing transcriptome-imputation tools, providing biologically meaningful predictions and opening new research avenues.
Collapse
Affiliation(s)
- Jonathan L Hess
- Department of Psychiatry & Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, USA
| | - Thomas P Quinn
- Applied Artificial Intelligence Institute (A2I2), Deakin University, Geelong, Australia
| | - Chunling Zhang
- Department of Neuroscience & Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, USA
| | - Gentry C Hearn
- Department of Neuroscience & Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, USA
| | - Samuel Chen
- Department of Psychiatry & Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, USA
| | - Sek Won Kong
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Murray Cairns
- School of Biomedical Sciences & Pharmacy, Faculty of Health, The University of Newcastle, New South Wales, Callaghan, New South Wales, Australia
- Hunter Medical Research Institute, Newcastle, Australia
- Centre for Brain & Mental Health Research, The University of Newcastle, Callaghan, Australia
| | - Ming T Tsuang
- Center for Behavioral Genomics, Department of Psychiatry, Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, USA
- Harvard Institute of Psychiatric Epidemiology and Genetics, Boston, MA, USA
| | - Stephen V Faraone
- Department of Psychiatry & Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, USA
- Department of Neuroscience & Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, USA
| | - Stephen J Glatt
- Department of Psychiatry & Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, USA.
- Department of Neuroscience & Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, USA.
| |
Collapse
|
30
|
Dai Q, Zhou G, Zhao H, Võsa U, Franke L, Battle A, Teumer A, Lehtimäki T, Raitakari OT, Esko T, Epstein MP, Yang J. OTTERS: a powerful TWAS framework leveraging summary-level reference data. Nat Commun 2023; 14:1271. [PMID: 36882394 PMCID: PMC9992663 DOI: 10.1038/s41467-023-36862-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 02/20/2023] [Indexed: 03/09/2023] Open
Abstract
Most existing TWAS tools require individual-level eQTL reference data and thus are not applicable to summary-level reference eQTL datasets. The development of TWAS methods that can harness summary-level reference data is valuable to enable TWAS in broader settings and enhance power due to increased reference sample size. Thus, we develop a TWAS framework called OTTERS (Omnibus Transcriptome Test using Expression Reference Summary data) that adapts multiple polygenic risk score (PRS) methods to estimate eQTL weights from summary-level eQTL reference data and conducts an omnibus TWAS. We show that OTTERS is a practical and powerful TWAS tool by both simulations and application studies.
Collapse
Affiliation(s)
- Qile Dai
- Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA, 30322, USA
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Geyu Zhou
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Urmo Võsa
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 50090, Tartu, Estonia
| | - Lude Franke
- Department of Genetics, University of Groningen, University Medical Center Groningen, 9700 RB, Groningen, The Netherlands
- Oncode Institute, 3521 AL, Utrecht, The Netherlands
| | - Alexis Battle
- Department of Computer Science, and Departments of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Alexander Teumer
- Institute for Community Medicine, University Medicine Greifswald, 17489, Greifswald, Germany
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Centre for Cardiovascular Disease Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, 33520, Finland
| | - Olli T Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, 20520, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, 20520, Turku, Finland
- Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, 20521, Turku, Finland
| | - Tõnu Esko
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 50090, Tartu, Estonia
| | - Michael P Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
31
|
CoNet: Efficient Network Regression for Survival Analysis in Transcriptome-Wide Association Studies—With Applications to Studies of Breast Cancer. Genes (Basel) 2023; 14:genes14030586. [PMID: 36980857 PMCID: PMC10048118 DOI: 10.3390/genes14030586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 03/02/2023] Open
Abstract
Transcriptome-wide association studies (TWASs) aim to detect associations between genetically predicted gene expression and complex diseases or traits through integrating genome-wide association studies (GWASs) and expression quantitative trait loci (eQTL) mapping studies. Most current TWAS methods analyze one gene at a time, ignoring the correlations between multiple genes. Few of the existing TWAS methods focus on survival outcomes. Here, we propose a novel method, namely a COx proportional hazards model for NEtwork regression in TWAS (CoNet), that is applicable for identifying the association between one given network and the survival time. CoNet considers the general relationship among the predicted gene expression as edges of the network and quantifies it through pointwise mutual information (PMI), which is under a two-stage TWAS. Extensive simulation studies illustrate that CoNet can not only achieve type I error calibration control in testing both the node effect and edge effect, but it can also gain more power compared with currently available methods. In addition, it demonstrates superior performance in real data application, namely utilizing the breast cancer survival data of UK Biobank. CoNet effectively accounts for network structure and can simultaneously identify the potential effecting nodes and edges that are related to survival outcomes in TWAS.
Collapse
|
32
|
Melton HJ, Zhang Z, Wu C. SUMMIT-FA: A new resource for improved transcriptome imputation using functional annotations. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.02.23285208. [PMID: 36798253 PMCID: PMC9934719 DOI: 10.1101/2023.02.02.23285208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), that improves the accuracy of gene expression prediction by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models using SUMMIT-FA with a comprehensive functional database MACIE and the eQTL summary-level data from the eQTLGen consortium. By applying the resulting models to GWASs for 24 complex traits and exploring it through a simulation study, we show that SUMMIT-FA improves the accuracy of gene expression prediction models in whole blood, identifies significantly more gene-trait associations, and improves predictive power for identifying "silver standard" genes compared to several benchmark methods.
Collapse
Affiliation(s)
- Hunter J. Melton
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Zichen Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
33
|
Chen F, Wang X, Jang SK, Quach BC, Weissenkampen JD, Khunsriraksakul C, Yang L, Sauteraud R, Albert CM, Allred NDD, Arnett DK, Ashley-Koch AE, Barnes KC, Barr RG, Becker DM, Bielak LF, Bis JC, Blangero J, Boorgula MP, Chasman DI, Chavan S, Chen YDI, Chuang LM, Correa A, Curran JE, David SP, Fuentes LDL, Deka R, Duggirala R, Faul JD, Garrett ME, Gharib SA, Guo X, Hall ME, Hawley NL, He J, Hobbs BD, Hokanson JE, Hsiung CA, Hwang SJ, Hyde TM, Irvin MR, Jaffe AE, Johnson EO, Kaplan R, Kardia SLR, Kaufman JD, Kelly TN, Kleinman JE, Kooperberg C, Lee IT, Levy D, Lutz SM, Manichaikul AW, Martin LW, Marx O, McGarvey ST, Minster RL, Moll M, Moussa KA, Naseri T, North KE, Oelsner EC, Peralta JM, Peyser PA, Psaty BM, Rafaels N, Raffield LM, Reupena MS, Rich SS, Rotter JI, Schwartz DA, Shadyab AH, Sheu WHH, Sims M, Smith JA, Sun X, Taylor KD, Telen MJ, Watson H, Weeks DE, Weir DR, Yanek LR, Young KA, Young KL, Zhao W, Hancock DB, Jiang B, Vrieze S, Liu DJ. Multi-ancestry transcriptome-wide association analyses yield insights into tobacco use biology and drug repurposing. Nat Genet 2023; 55:291-300. [PMID: 36702996 PMCID: PMC9925385 DOI: 10.1038/s41588-022-01282-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 12/08/2022] [Indexed: 01/27/2023]
Abstract
Most transcriptome-wide association studies (TWASs) so far focus on European ancestry and lack diversity. To overcome this limitation, we aggregated genome-wide association study (GWAS) summary statistics, whole-genome sequences and expression quantitative trait locus (eQTL) data from diverse ancestries. We developed a new approach, TESLA (multi-ancestry integrative study using an optimal linear combination of association statistics), to integrate an eQTL dataset with a multi-ancestry GWAS. By exploiting shared phenotypic effects between ancestries and accommodating potential effect heterogeneities, TESLA improves power over other TWAS methods. When applied to tobacco use phenotypes, TESLA identified 273 new genes, up to 55% more compared with alternative TWAS methods. These hits and subsequent fine mapping using TESLA point to target genes with biological relevance. In silico drug-repurposing analyses highlight several drugs with known efficacy, including dextromethorphan and galantamine, and new drugs such as muscle relaxants that may be repurposed for treating nicotine addiction.
Collapse
Affiliation(s)
- Fang Chen
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Xingyan Wang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Seon-Kyeong Jang
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | | | - J Dylan Weissenkampen
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychology, Penn State College of Medicine, Hershey, PA, USA
| | | | - Lina Yang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Renan Sauteraud
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Christine M Albert
- Department of Cardiology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Donna K Arnett
- College of Public Health, University of Kentucky, Lexington, KY, USA
| | - Allison E Ashley-Koch
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC, USA
- Department of Medicine, Duke University Medical Center, Durham, NC, USA
- Duke Comprehensive Sickle Cell Center, Duke University Medical Center, Durham, NC, USA
| | - Kathleen C Barnes
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Center, Aurora, CO, USA
| | - R Graham Barr
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
| | - Diane M Becker
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Joshua C Bis
- Department of Medicine, Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Meher Preethi Boorgula
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Center, Aurora, CO, USA
| | - Daniel I Chasman
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Sameer Chavan
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Center, Aurora, CO, USA
| | - Yii-Der I Chen
- Department of Pediatrics, Institute for Translational Genomics and Population Sciences, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Lee-Ming Chuang
- Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Adolfo Correa
- Department of Medicine, Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA
| | - Joanne E Curran
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Sean P David
- University of Chicago, Chicago, IL, USA
- NorthShore University Health System, Evanston, IL, USA
| | - Lisa de Las Fuentes
- Department of Medicine, Division of Biostatistics and Cardiovascular Division, Washington University School of Medicine, St. Louis, MO, USA
| | - Ranjan Deka
- Department of Environmental and Public Health Sciences, College of Medicine, University of Cincinnati, Cincinnati, OH, USA
| | - Ravindranath Duggirala
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Jessica D Faul
- Institute for Social Research, Survey Research Center, University of Michigan, Ann Arbor, MI, USA
| | - Melanie E Garrett
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC, USA
- Department of Medicine, Duke University Medical Center, Durham, NC, USA
| | - Sina A Gharib
- Department of Medicine, Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
- Computational Medicine Core at Center for Lung Biology, Division of Pulmonary, Critical Care and Sleep Medicine, University of Washington, Seattle, WA, USA
| | - Xiuqing Guo
- Department of Pediatrics, Institute for Translational Genomics and Population Sciences, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Michael E Hall
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Nicola L Hawley
- Department of Epidemiology (Chronic Disease), School of Public Health, Yale University, New Haven, CT, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Brian D Hobbs
- Harvard Medical School, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - John E Hokanson
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Chao A Hsiung
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan
| | - Shih-Jen Hwang
- The Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
- The Framingham Heart Study, Framingham, MA, USA
| | - Thomas M Hyde
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Andrew E Jaffe
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Mental Health and Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Human Genetics and Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | - Robert Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, The Bronx, NY, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Joel D Kaufman
- Departments of Environmental & Occupational Health Sciences, Medicine, and Epidemiology, University of Washington Seattle, Seattle, WA, USA
| | - Tanika N Kelly
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Joel E Kleinman
- Lieber Institute for Brain Development, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | - I-Te Lee
- Department of Internal Medicine, Division of Endocrinology and Metabolism, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Daniel Levy
- The Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sharon M Lutz
- Department of Population Medicine, Harvard Pilgrim Health Care, Boston, MA, USA
| | - Ani W Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Lisa W Martin
- Division of Cardiology, George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | - Olivia Marx
- Department of Biomedical Sciences, Penn State College of Medicine, Hershey, PA, USA
| | - Stephen T McGarvey
- Department of Epidemiology, International Health Institute, Brown University School of Public Health, Providence, RI, USA
| | - Ryan L Minster
- Department of Human Genetics and Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Matthew Moll
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Karine A Moussa
- Penn State Huck Institutes of Life Sciences, Penn State College of Medicine, University Park, PA, USA
| | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Elizabeth C Oelsner
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
| | - Juan M Peralta
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M Psaty
- Department of Medicine, Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Nicholas Rafaels
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Center, Aurora, CO, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- Department of Pediatrics, Institute for Translational Genomics and Population Sciences, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | | | - Aladdin H Shadyab
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | | | - Mario Sims
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Institute for Social Research, Survey Research Center, University of Michigan, Ann Arbor, MI, USA
| | - Xiao Sun
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Kent D Taylor
- Department of Pediatrics, Institute for Translational Genomics and Population Sciences, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Marilyn J Telen
- Department of Medicine, Duke University Medical Center, Durham, NC, USA
| | - Harold Watson
- Faculty of Medical Sciences, University of the West Indies, Cave Hill Campus, Barbados
| | - Daniel E Weeks
- Department of Human Genetics and Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - David R Weir
- Institute for Social Research, Survey Research Center, University of Michigan, Ann Arbor, MI, USA
| | - Lisa R Yanek
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kendra A Young
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Kristin L Young
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Institute for Social Research, Survey Research Center, University of Michigan, Ann Arbor, MI, USA
| | | | - Bibo Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA.
| | - Scott Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA.
| | - Dajiang J Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA, USA.
| |
Collapse
|
34
|
Song X, Ji J, Rothstein JH, Alexeeff SE, Sakoda LC, Sistig A, Achacoso N, Jorgenson E, Whittemore AS, Klein RJ, Habel LA, Wang P, Sieh W. MiXcan: a framework for cell-type-aware transcriptome-wide association studies with an application to breast cancer. Nat Commun 2023; 14:377. [PMID: 36690614 PMCID: PMC9871010 DOI: 10.1038/s41467-023-35888-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 01/05/2023] [Indexed: 01/25/2023] Open
Abstract
Human bulk tissue samples comprise multiple cell types with diverse roles in disease etiology. Conventional transcriptome-wide association study approaches predict genetically regulated gene expression at the tissue level, without considering cell-type heterogeneity, and test associations of predicted tissue-level expression with disease. Here we develop MiXcan, a cell-type-aware transcriptome-wide association study approach that predicts cell-type-level expression, identifies disease-associated genes via combination of cell-type-level association signals for multiple cell types, and provides insight into the disease-critical cell type. As a proof of concept, we conducted cell-type-aware analyses of breast cancer in 58,648 women and identified 12 transcriptome-wide significant genes using MiXcan compared with only eight genes using conventional approaches. Importantly, MiXcan identified genes with distinct associations in mammary epithelial versus stromal cells, including three new breast cancer susceptibility genes. These findings demonstrate that cell-type-aware transcriptome-wide analyses can reveal new insights into the genetic and cellular etiology of breast cancer and other diseases.
Collapse
Affiliation(s)
- Xiaoyu Song
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Jiayi Ji
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joseph H Rothstein
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Stacey E Alexeeff
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Lori C Sakoda
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Adriana Sistig
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ninah Achacoso
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Eric Jorgenson
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Alice S Whittemore
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Robert J Klein
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Laurel A Habel
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Pei Wang
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Weiva Sieh
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
35
|
Gedik H, Peterson RE, Riley BP, Vladimirov VI, Bacanu SA. Integrative Post-Genome-Wide Association Study Analyses Relevant to Psychiatric Disorders: Imputing Transcriptome and Proteome Signals. Complex Psychiatry 2023; 9:130-144. [PMID: 37588130 PMCID: PMC10425719 DOI: 10.1159/000530223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 03/09/2023] [Indexed: 08/18/2023] Open
Abstract
Background The genome-wide association study (GWAS) is a common tool to identify genetic variants associated with complex traits, including psychiatric disorders (PDs). However, post-GWAS analyses are needed to extend the statistical inference to biologically relevant entities, e.g., genes, proteins, and pathways. To achieve this goal, researchers developed methods that incorporate biologically relevant intermediate molecular phenotypes, such as gene expression and protein abundance, which are posited to mediate the variant-trait association. Transcriptome-wide association study (TWAS) and proteome-wide association study (PWAS) are commonly used methods to test the association between these molecular mediators and the trait. Summary In this review, we discuss the most recent developments in TWAS and PWAS. These methods integrate existing "omic" information with the GWAS summary statistics for trait(s) of interest. Specifically, they impute transcript/protein data and test the association between imputed gene expression/protein level with phenotype of interest by using (i) GWAS summary statistics and (ii) reference transcriptomic/proteomic/genomic datasets. TWAS and PWAS are suitable as analysis tools for (i) primary association scan and (ii) fine-mapping to identify potentially causal genes for PDs. Key Messages As post-GWAS analyses, TWAS and PWAS have the potential to highlight causal genes for PDs. These prioritized genes could indicate targets for the development of novel drug therapies. For researchers attempting such analyses, we recommend Mendelian randomization tools that use GWAS statistics for both trait and reference datasets, e.g., summary Mendelian randomization (SMR). We base our recommendation on (i) being able to use the same tool for both TWAS and PWAS, (ii) not requiring the pre-computed weights (and thus easier to update for larger reference datasets), and (iii) most larger transcriptome reference datasets are publicly available and easy to transform into a compatible format for SMR analysis.
Collapse
Affiliation(s)
- Huseyin Gedik
- Integrative Life Sciences, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Roseann E. Peterson
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Brien P. Riley
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Vladimir I. Vladimirov
- Department of Psychiatry, College of Medicine-Phoenix, University of Arizona, Phoenix, AZ, USA
| | - Silviu-Alin Bacanu
- Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| |
Collapse
|
36
|
Zhang L, Ju T, Jin X, Ji J, Han J, Zhou X, Yuan Z. Network regression analysis for binary and ordinal categorical phenotypes in transcriptome-wide association studies. Genetics 2022; 222:iyac153. [PMID: 36227056 PMCID: PMC9713396 DOI: 10.1093/genetics/iyac153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/28/2022] [Indexed: 12/13/2022] Open
Abstract
Transcriptome-wide association studies aim to integrate genome-wide association studies and expression quantitative trait loci mapping studies for exploring the gene regulatory mechanisms underlying diseases. Existing transcriptome-wide association study methods primarily focus on 1 gene at a time. However, complex diseases are seldom resulted from the abnormality of a single gene, but from the biological network involving multiple genes. In addition, binary or ordinal categorical phenotypes are commonly encountered in biomedicine. We develop a proportional odds logistic model for network regression in transcriptome-wide association study, Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study, to detect the association between a network and binary or ordinal categorical phenotype. Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study relies on 2-stage transcriptome-wide association study framework. It first adopts the distribution-robust nonparametric Dirichlet process regression model in expression quantitative trait loci study to obtain the SNP effect estimate on each gene within the network. Then, Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study uses pointwise mutual information to represent the general relationship among the network nodes of predicted gene expression in genome-wide association study, followed by the association analysis with all nodes and edges involved in proportional odds logistic model. A key feature of Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study is its ability to simultaneously identify the disease-related network nodes or edges. With extensive realistic simulations including those under various between-node correlation patterns, we show Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study can provide calibrated type I error control and yield higher power than other existing methods. We finally apply Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study to analyze bipolar and major depression status and blood pressure from UK Biobank to illustrate its benefits in real data analysis.
Collapse
Affiliation(s)
- Liye Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China
| | - Tao Ju
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China
| | - Xiuyuan Jin
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China
| | - Jiadong Ji
- Institute for Financial Studies, Shandong University, Jinan, Shandong 250100, China
| | - Jiayi Han
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China
| | - Xiang Zhou
- Department of Biostatistics, The University of Michigan, Ann Arbor, MI 48109, USA
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
- Institute for Medical Dataology, Shandong University, Jinan, Shandong 250003, China
| |
Collapse
|
37
|
Zhang Z, Bae YE, Bradley JR, Wu L, Wu C. SUMMIT: An integrative approach for better transcriptomic data imputation improves causal gene identification. Nat Commun 2022; 13:6336. [PMID: 36284135 PMCID: PMC9593997 DOI: 10.1038/s41467-022-34016-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 10/11/2022] [Indexed: 12/25/2022] Open
Abstract
Genes with moderate to low expression heritability may explain a large proportion of complex trait etiology, but such genes cannot be sufficiently captured in conventional transcriptome-wide association studies (TWASs), partly due to the relatively small available reference datasets for developing expression genetic prediction models to capture the moderate to low genetically regulated components of gene expression. Here, we introduce a method, the Summary-level Unified Method for Modeling Integrated Transcriptome (SUMMIT), to improve the expression prediction model accuracy and the power of TWAS by using a large expression quantitative trait loci (eQTL) summary-level dataset. We apply SUMMIT to the eQTL summary-level data provided by the eQTLGen consortium. Through simulation studies and analyses of genome-wide association study summary statistics for 24 complex traits, we show that SUMMIT improves the accuracy of expression prediction in blood, successfully builds expression prediction models for genes with low expression heritability, and achieves higher statistical power than several benchmark methods. Finally, we conduct a case study of COVID-19 severity with SUMMIT and identify 11 likely causal genes associated with COVID-19 severity.
Collapse
Affiliation(s)
- Zichen Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Ye Eun Bae
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Jonathan R Bradley
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
38
|
Bendl J, Hauberg ME, Girdhar K, Im E, Vicari JM, Rahman S, Fernando MB, Townsley KG, Dong P, Misir R, Kleopoulos SP, Reach SM, Apontes P, Zeng B, Zhang W, Voloudakis G, Brennand KJ, Nixon RA, Haroutunian V, Hoffman GE, Fullard JF, Roussos P. The three-dimensional landscape of cortical chromatin accessibility in Alzheimer's disease. Nat Neurosci 2022; 25:1366-1378. [PMID: 36171428 PMCID: PMC9581463 DOI: 10.1038/s41593-022-01166-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 08/16/2022] [Indexed: 02/06/2023]
Abstract
To characterize the dysregulation of chromatin accessibility in Alzheimer's disease (AD), we generated 636 ATAC-seq libraries from neuronal and nonneuronal nuclei isolated from the superior temporal gyrus and entorhinal cortex of 153 AD cases and 56 controls. By analyzing a total of ~20 billion read pairs, we expanded the repertoire of known open chromatin regions (OCRs) in the human brain and identified cell-type-specific enhancer-promoter interactions. We show that interindividual variability in OCRs can be leveraged to identify cis-regulatory domains (CRDs) that capture the three-dimensional structure of the genome (3D genome). We identified AD-associated effects on chromatin accessibility, the 3D genome and transcription factor (TF) regulatory networks. For one of the most AD-perturbed TFs, USF2, we validated its regulatory effect on lysosomal genes. Overall, we applied a systematic approach to understanding the role of the 3D genome in AD. We provide all data as an online resource for widespread community-based analysis.
Collapse
Affiliation(s)
- Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mads E Hauberg
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- The Lundbeck Foundation Initiative of Integrative Psychiatric Research (iPSYCH), Aarhus University, Aarhus, Denmark
- Centre for Integrative Sequencing (iSEQ), Aarhus University, Aarhus, Denmark
| | - Kiran Girdhar
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eunju Im
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
- Department of Psychiatry, New York University Langone Health, New York, NY, USA
| | - James M Vicari
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Graduate School of Biomedical Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Samir Rahman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michael B Fernando
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Graduate School of Biomedical Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kayla G Townsley
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Graduate School of Biomedical Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Pengfei Dong
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ruth Misir
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Steven P Kleopoulos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sarah M Reach
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Pasha Apontes
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Biao Zeng
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Wen Zhang
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Georgios Voloudakis
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kristen J Brennand
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Yale University, New Haven, CT, USA
| | - Ralph A Nixon
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
- Department of Psychiatry, New York University Langone Health, New York, NY, USA
- Department of Cell Biology, New York University Langone Health, New York, NY, USA
- New York University Neuroscience Institute, New York, NY, USA
| | - Vahram Haroutunian
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mental Illness Research Education and Clinical Center (MIRECC), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - John F Fullard
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA.
- Mental Illness Research Education and Clinical Center (MIRECC), James J. Peters VA Medical Center, Bronx, NY, USA.
| |
Collapse
|
39
|
Jin X, Zhang L, Ji J, Ju T, Zhao J, Yuan Z. Network regression analysis in transcriptome-wide association studies. BMC Genomics 2022; 23:562. [PMID: 35933330 PMCID: PMC9356418 DOI: 10.1186/s12864-022-08809-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 08/02/2022] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Transcriptome-wide association studies (TWASs) have shown great promise in interpreting the findings from genome-wide association studies (GWASs) and exploring the disease mechanisms, by integrating GWAS and eQTL mapping studies. Almost all TWAS methods only focus on one gene at a time, with exception of only two published multiple-gene methods nevertheless failing to account for the inter-dependence as well as the network structure among multiple genes, which may lead to power loss in TWAS analysis as complex disease often owe to multiple genes that interact with each other as a biological network. We therefore developed a Network Regression method in a two-stage TWAS framework (NeRiT) to detect whether a given network is associated with the traits of interest. NeRiT adopts the flexible Bayesian Dirichlet process regression to obtain the gene expression prediction weights in the first stage, uses pointwise mutual information to represent the general between-node correlation in the second stage and can effectively take the network structure among different gene nodes into account. RESULTS Comprehensive and realistic simulations indicated NeRiT had calibrated type I error control for testing both the node effect and edge effect, and yields higher power than the existed methods, especially in testing the edge effect. The results were consistent regardless of the GWAS sample size, the gene expression prediction model in the first step of TWAS, the network structure as well as the correlation pattern among different gene nodes. Real data applications through analyzing systolic blood pressure and diastolic blood pressure from UK Biobank showed that NeRiT can simultaneously identify the trait-related nodes as well as the trait-related edges. CONCLUSIONS NeRiT is a powerful and efficient network regression method in TWAS.
Collapse
Affiliation(s)
- Xiuyuan Jin
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China.,Institute for Medical Dataology, Shandong University, Jinan, 250003, Shandong, China
| | - Liye Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China.,Institute for Medical Dataology, Shandong University, Jinan, 250003, Shandong, China
| | - Jiadong Ji
- Institute for Financial Studies, Shandong University, Jinan, 250100, Shandong, China
| | - Tao Ju
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China.,Institute for Medical Dataology, Shandong University, Jinan, 250003, Shandong, China
| | - Jinghua Zhao
- Department of Public Health and Primary Care, Cardiovascular Epidemiology Unit, University of Cambridge, Cambridge, UK.
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, Shandong, China. .,Institute for Medical Dataology, Shandong University, Jinan, 250003, Shandong, China.
| |
Collapse
|
40
|
Khunsriraksakul C, McGuire D, Sauteraud R, Chen F, Yang L, Wang L, Hughey J, Eckert S, Dylan Weissenkampen J, Shenoy G, Marx O, Carrel L, Jiang B, Liu DJ. Integrating 3D genomic and epigenomic data to enhance target gene discovery and drug repurposing in transcriptome-wide association studies. Nat Commun 2022; 13:3258. [PMID: 35672318 PMCID: PMC9171100 DOI: 10.1038/s41467-022-30956-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 05/25/2022] [Indexed: 02/08/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) are popular approaches to test for association between imputed gene expression levels and traits of interest. Here, we propose an integrative method PUMICE (Prediction Using Models Informed by Chromatin conformations and Epigenomics) to integrate 3D genomic and epigenomic data with expression quantitative trait loci (eQTL) to more accurately predict gene expressions. PUMICE helps define and prioritize regions that harbor cis-regulatory variants, which outperforms competing methods. We further describe an extension to our method PUMICE +, which jointly combines TWAS results from single- and multi-tissue models. Across 79 traits, PUMICE + identifies 22% more independent novel genes and increases median chi-square statistics values at known loci by 35% compared to the second-best method, as well as achieves the narrowest credible interval size. Lastly, we perform computational drug repurposing and confirm that PUMICE + outperforms other TWAS methods.
Collapse
Affiliation(s)
- Chachrit Khunsriraksakul
- grid.29857.310000 0001 2097 4281Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Daniel McGuire
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Renan Sauteraud
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Fang Chen
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Lina Yang
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Lida Wang
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Jordan Hughey
- grid.29857.310000 0001 2097 4281Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Scott Eckert
- grid.29857.310000 0001 2097 4281Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - J. Dylan Weissenkampen
- grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Ganesh Shenoy
- grid.29857.310000 0001 2097 4281Department of Neurosurgery, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Olivia Marx
- grid.29857.310000 0001 2097 4281Biomedical Science Program, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Laura Carrel
- grid.29857.310000 0001 2097 4281Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Bibo Jiang
- grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| | - Dajiang J. Liu
- grid.29857.310000 0001 2097 4281Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA ,grid.29857.310000 0001 2097 4281Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA 17033 USA
| |
Collapse
|
41
|
Wolc A, Dekkers JCM. Application of Bayesian genomic prediction methods to genome-wide association analyses. Genet Sel Evol 2022; 54:31. [PMID: 35562659 PMCID: PMC9103490 DOI: 10.1186/s12711-022-00724-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Background Bayesian genomic prediction methods were developed to simultaneously fit all genotyped markers to a set of available phenotypes for prediction of breeding values for quantitative traits, allowing for differences in the genetic architecture (distribution of marker effects) of traits. These methods also provide a flexible and reliable framework for genome-wide association (GWA) studies. The objective here was to review developments in Bayesian hierarchical and variable selection models for GWA analyses. Results By fitting all genotyped markers simultaneously, Bayesian GWA methods implicitly account for population structure and the multiple-testing problem of classical single-marker GWA. Implemented using Markov chain Monte Carlo methods, Bayesian GWA methods allow for control of error rates using probabilities obtained from posterior distributions. Power of GWA studies using Bayesian methods can be enhanced by using informative priors based on previous association studies, gene expression analyses, or functional annotation information. Applied to multiple traits, Bayesian GWA analyses can give insight into pleiotropic effects by multi-trait, structural equation, or graphical models. Bayesian methods can also be used to combine genomic, transcriptomic, proteomic, and other -omics data to infer causal genotype to phenotype relationships and to suggest external interventions that can improve performance. Conclusions Bayesian hierarchical and variable selection methods provide a unified and powerful framework for genomic prediction, GWA, integration of prior information, and integration of information from other -omics platforms to identify causal mutations for complex quantitative traits.
Collapse
Affiliation(s)
- Anna Wolc
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.,Hy-Line International, 2583 240th Street, Dallas Center, IA, 50063, USA
| | - Jack C M Dekkers
- Department of Animal Science, Iowa State University, 806 Stange Road, 239 Kildee Hall, Ames, IA, 50010, USA.
| |
Collapse
|
42
|
Cao X, Wang X, Zhang S, Sha Q. Gene-based association tests using GWAS summary statistics and incorporating eQTL. Sci Rep 2022; 12:3553. [PMID: 35241742 PMCID: PMC8894384 DOI: 10.1038/s41598-022-07465-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 02/11/2022] [Indexed: 01/29/2023] Open
Abstract
Although genome-wide association studies (GWAS) have been successfully applied to a variety of complex diseases and identified many genetic variants underlying complex diseases via single marker tests, there is still a considerable heritability of complex diseases that could not be explained by GWAS. One alternative approach to overcome the missing heritability caused by genetic heterogeneity is gene-based analysis, which considers the aggregate effects of multiple genetic variants in a single test. Another alternative approach is transcriptome-wide association study (TWAS). TWAS aggregates genomic information into functionally relevant units that map to genes and their expression. TWAS is not only powerful, but can also increase the interpretability in biological mechanisms of identified trait associated genes. In this study, we propose a powerful and computationally efficient gene-based association test, called Overall. Using extended Simes procedure, Overall aggregates information from three types of traditional gene-based association tests and also incorporates expression quantitative trait locus (eQTL) information into a gene-based association test using GWAS summary statistics. We show that after a small number of replications to estimate the correlation among the integrated gene-based tests, the p values of Overall can be calculated analytically. Simulation studies show that Overall can control type I error rates very well and has higher power than the tests that we compared with. We also apply Overall to two schizophrenia GWAS summary datasets and two lipids GWAS summary datasets. The results show that this newly developed method can identify more significant genes than other methods we compared with.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, TX, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
43
|
Wang T, Qiao J, Zhang S, Wei Y, Zeng P. Simultaneous test and estimation of total genetic effect in eQTL integrative analysis through mixed models. Brief Bioinform 2022; 23:6535679. [PMID: 35212359 DOI: 10.1093/bib/bbac038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 01/22/2022] [Accepted: 02/07/2021] [Indexed: 11/14/2022] Open
Abstract
Integration of expression quantitative trait loci (eQTL) into genome-wide association studies (GWASs) is a promising manner to reveal functional roles of associated single-nucleotide polymorphisms (SNPs) in complex phenotypes and has become an active research field in post-GWAS era. However, how to efficiently incorporate eQTL mapping study into GWAS for prioritization of causal genes remains elusive. We herein proposed a novel method termed as Mixed transcriptome-wide association studies (TWAS) and mediated Variance estimation (MTV) by modeling the effects of cis-SNPs of a gene as a function of eQTL. MTV formulates the integrative method and TWAS within a unified framework via mixed models and therefore includes many prior methods/tests as special cases. We further justified MTV from another two statistical perspectives of mediation analysis and two-stage Mendelian randomization. Relative to existing methods, MTV is superior for pronounced features including the processing of direct effects of cis-SNPs on phenotypes, the powerful likelihood ratio test for assessment of joint effects of cis-SNPs and genetically regulated gene expression (GReX), two useful quantities to measure relative genetic contributions of GReX and cis-SNPs to phenotypic variance, and the computationally efferent parameter expansion expectation maximum algorithm. With extensive simulations, we identified that MTV correctly controlled the type I error in joint evaluation of the total genetic effect and proved more powerful to discover true association signals across various scenarios compared to existing methods. We finally applied MTV to 41 complex traits/diseases available from three GWASs and discovered many new associated genes that had otherwise been missed by existing methods. We also revealed that a small but substantial fraction of phenotypic variation was mediated by GReX. Overall, MTV constructs a robust and realistic modeling foundation for integrative omics analysis and has the advantage of offering more attractive biological interpretations of GWAS results.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Jiahao Qiao
- Department of Biostatistics at Xuzhou Medical University, China
| | - Shuo Zhang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Yongyue Wei
- Department of Biostatistics at Nanjing Medical University, China
| | - Ping Zeng
- Department of Biostatistics, Center for Medical Statistics and Data Analysis and Key Laboratory of Human Genetics and Environmental Medicine at Xuzhou Medical University, China
| |
Collapse
|
44
|
Parrish RL, Gibson GC, Epstein MP, Yang J. TIGAR-V2: Efficient TWAS tool with nonparametric Bayesian eQTL weights of 49 tissue types from GTEx V8. HGG ADVANCES 2022; 3:100068. [PMID: 35047855 PMCID: PMC8756507 DOI: 10.1016/j.xhgg.2021.100068] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 11/01/2021] [Indexed: 01/12/2023] Open
Abstract
Standard transcriptome-wide association study (TWAS) methods first train gene expression prediction models using reference transcriptomic data and then test the association between the predicted genetically regulated gene expression and phenotype of interest. Most existing TWAS tools require cumbersome preparation of genotype input files and extra coding to enable parallel computation. To improve the efficiency of TWAS tools, we developed Transcriptome-Integrated Genetic Association Resource V2 (TIGAR-V2), which directly reads Variant Call Format (VCF) files, enables parallel computation, and reduces up to 90% of computation cost (mainly due to loading genotype data) compared to the original version. TIGAR-V2 can train gene expression imputation models using either nonparametric Bayesian Dirichlet process regression (DPR) or Elastic-Net (as used by PrediXcan), perform TWASs using either individual-level or summary-level genome-wide association study (GWAS) data, and implement both burden and variance-component statistics for gene-based association tests. We trained gene expression prediction models by DPR for 49 tissues using Genotype-Tissue Expression (GTEx) V8 by TIGAR-V2 and illustrated the usefulness of these Bayesian cis-expression quantitative trait locus (eQTL) weights through TWASs of breast and ovarian cancer utilizing public GWAS summary statistics. We identified 88 and 37 risk genes, respectively, for breast and ovarian cancer, most of which are either known or near previously identified GWAS (∼95%) or TWAS (∼40%) risk genes and three novel independent TWAS risk genes with known functions in carcinogenesis. These findings suggest that TWASs can provide biological insight into the transcriptional regulation of complex diseases. The TIGAR-V2 tool, trained Bayesian cis-eQTL weights, and linkage disequilibrium (LD) information from GTEx V8 are publicly available, providing a useful resource for mapping risk genes of complex diseases.
Collapse
Affiliation(s)
- Randy L. Parrish
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Greg C. Gibson
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Michael P. Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| |
Collapse
|
45
|
Yang J, Oveisgharan S, Liu X, Wilson RS, Bennett DA, Buchman AS. Risk Models Based on Non-Cognitive Measures May Identify Presymptomatic Alzheimer's Disease. J Alzheimers Dis 2022; 89:1249-1262. [PMID: 35988224 PMCID: PMC10083073 DOI: 10.3233/jad-220446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Alzheimer's disease (AD) is a progressive disorder without a cure. Develop risk prediction models for detecting presymptomatic AD using non-cognitive measures is necessary to enable early interventions. OBJECTIVE Examine if non-cognitive metrics alone can be used to construct risk models to identify adults at risk for AD dementia and cognitive impairment. METHODS Clinical data from older adults without dementia from the Memory and Aging Project (MAP, n = 1,179) and Religious Orders Study (ROS, n = 1,103) were analyzed using Cox proportional hazard models to develop risk prediction models for AD dementia and cognitive impairment. Models using only non-cognitive covariates were compared to models that added cognitive covariates. All models were trained in MAP, tested in ROS, and evaluated by the AUC of ROC curve. RESULTS Models based on non-cognitive covariates alone achieved AUC (0.800,0.785) for predicting AD dementia (3.5) years from baseline. Including additional cognitive covariates improved AUC to (0.916,0.881). A model with a single covariate of composite cognition score achieved AUC (0.905,0.863). Models based on non-cognitive covariates alone achieved AUC (0.717,0.714) for predicting cognitive impairment (3.5) years from baseline. Including additional cognitive covariates improved AUC to (0.783,0.770). A model with a single covariate of composite cognition score achieved AUC (0.754,0.730). CONCLUSION Risk models based on non-cognitive metrics predict both AD dementia and cognitive impairment. However, non-cognitive covariates do not provide incremental predictivity for models that include cognitive metrics in predicting AD dementia, but do in models predicting cognitive impairment. Further improved risk prediction models for cognitive impairment are needed.
Collapse
Affiliation(s)
- Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, 615 Michael St, Atlanta, GA, 30322, USA
| | - Shahram Oveisgharan
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, Chicago, IL, 60612, USA
| | - Xizhu Liu
- Quantitative Theory and Methods program, College of Arts and Sciences, Emory University, Atlanta, GA, 30322, USA
| | - Robert S Wilson
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, Chicago, IL, 60612, USA
| | - David A Bennett
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, Chicago, IL, 60612, USA
| | - Aron S Buchman
- Rush Alzheimer’s Disease Center, Rush University Medicine Center, Chicago, IL, 60612, USA
| |
Collapse
|
46
|
Ahmed KT, Sun J, Cheng S, Yong J, Zhang W. Multi-omics data integration by generative adversarial network. Bioinformatics 2021; 38:179-186. [PMID: 34415323 DOI: 10.1093/bioinformatics/btab608] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/27/2021] [Accepted: 08/18/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate disease phenotype prediction plays an important role in the treatment of heterogeneous diseases like cancer in the era of precision medicine. With the advent of high throughput technologies, more comprehensive multi-omics data is now available that can effectively link the genotype to phenotype. However, the interactive relation of multi-omics datasets makes it particularly challenging to incorporate different biological layers to discover the coherent biological signatures and predict phenotypic outcomes. In this study, we introduce omicsGAN, a generative adversarial network model to integrate two omics data and their interaction network. The model captures information from the interaction network as well as the two omics datasets and fuse them to generate synthetic data with better predictive signals. RESULTS Large-scale experiments on The Cancer Genome Atlas breast cancer, lung cancer and ovarian cancer datasets validate that (i) the model can effectively integrate two omics data (e.g. mRNA and microRNA expression data) and their interaction network (e.g. microRNA-mRNA interaction network). The synthetic omics data generated by the proposed model has a better performance on cancer outcome classification and patients survival prediction compared to original omics datasets. (ii) The integrity of the interaction network plays a vital role in the generation of synthetic data with higher predictive quality. Using a random interaction network does not allow the framework to learn meaningful information from the omics datasets; therefore, results in synthetic data with weaker predictive signals. AVAILABILITY AND IMPLEMENTATION Source code is available at: https://github.com/CompbioLabUCF/omicsGAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Khandakar Tanvir Ahmed
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Jiao Sun
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Sze Cheng
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| | - Jeongsik Yong
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
47
|
Genetically regulated expression in late-onset Alzheimer's disease implicates risk genes within known and novel loci. Transl Psychiatry 2021; 11:618. [PMID: 34873149 PMCID: PMC8648734 DOI: 10.1038/s41398-021-01677-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 09/27/2021] [Accepted: 10/06/2021] [Indexed: 12/22/2022] Open
Abstract
Late-onset Alzheimer disease (LOAD) is highly polygenic, with a heritability estimated between 40 and 80%, yet risk variants identified in genome-wide studies explain only ~8% of phenotypic variance. Due to its increased power and interpretability, genetically regulated expression (GReX) analysis is an emerging approach to investigate the genetic mechanisms of complex diseases. Here, we conducted GReX analysis within and across 51 tissues on 39 LOAD GWAS data sets comprising 58,713 cases and controls from the Alzheimer's Disease Genetics Consortium (ADGC) and the International Genomics of Alzheimer's Project (IGAP). Meta-analysis across studies identified 216 unique significant genes, including 72 with no previously reported LOAD GWAS associations. Cross-brain-tissue and cross-GTEx models revealed eight additional genes significantly associated with LOAD. Conditional analysis of previously reported loci using established LOAD-risk variants identified eight genes reaching genome-wide significance independent of known signals. Moreover, the proportion of SNP-based heritability is highly enriched in genes identified by GReX analysis. In summary, GReX-based meta-analysis in LOAD identifies 216 genes (including 72 novel genes), illuminating the role of gene regulatory models in LOAD.
Collapse
|
48
|
Bae YE, Wu L, Wu C. InTACT: An adaptive and powerful framework for joint-tissue transcriptome-wide association studies. Genet Epidemiol 2021; 45:848-859. [PMID: 34255882 PMCID: PMC8604767 DOI: 10.1002/gepi.22425] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/22/2021] [Accepted: 06/24/2021] [Indexed: 11/05/2022]
Abstract
Transcriptome-wide association studies (TWAS) that integrate transcriptomic reference data and genome-wide association studies (GWAS) have successfully enhanced the discovery of candidate genes for many complex traits. However, existing methods may suffer from substantial power loss because they fail to effectively consider that expression of many genes tends to be consistent across tissues. Here we propose a computationally efficient testing method, referred to as Integrative Test for Associations via Cauchy Transformation (InTACT), that effectively combines information across multiple tissues and thus improves the power of identifying associated genes. Through simulation studies, we show that InTACT maintains high power while properly controls for Type 1 error rates. We applied InTACT to the largest GWAS of Alzheimer's disease (AD) to date and identified 227 genome-wide significant genes, of which 130 were not identified by benchmark methods, TWAS and MultiXcan. Importantly, InTACT identified five novel loci for AD. We implemented InTACT in publicly available software, "InTACT."
Collapse
Affiliation(s)
- Ye Eun Bae
- Department of Statistics, Florida State University
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa
| | - Chong Wu
- Department of Statistics, Florida State University
| |
Collapse
|
49
|
Cao C, Kossinna P, Kwok D, Li Q, He J, Su L, Guo X, Zhang Q, Long Q. Disentangling genetic feature selection and aggregation in transcriptome-wide association studies. Genetics 2021; 220:6444993. [PMID: 34849857 DOI: 10.1093/genetics/iyab216] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 11/04/2021] [Indexed: 12/14/2022] Open
Abstract
The success of transcriptome-wide association studies (TWAS) has led to substantial research towards improving the predictive accuracy of its core component of Genetically Regulated eXpression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps-feature selection and feature aggregation-which can be independently conducted. In this work, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.
Collapse
Affiliation(s)
- Chen Cao
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Pathum Kossinna
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Devin Kwok
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Qing Li
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Jingni He
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Liya Su
- Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Xingyi Guo
- Division of Epidemiology, Department of Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Qingrun Zhang
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.,Department of Mathematics & Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Quan Long
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.,Department of Mathematics & Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada.,Department of Medical Genetics, University of Calgary, Calgary, AB T2N 4N1, Canada.,Hotchkiss Brain Institute, O'Brien Institute for Public Health, University of Calgary, Calgary, AB T2N 4N1, Canada
| |
Collapse
|
50
|
Cao C, Wang J, Kwok D, Cui F, Zhang Z, Zhao D, Li MJ, Zou Q. webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res 2021; 50:D1123-D1130. [PMID: 34669946 PMCID: PMC8728162 DOI: 10.1093/nar/gkab957] [Citation(s) in RCA: 117] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 09/24/2021] [Accepted: 10/05/2021] [Indexed: 12/20/2022] Open
Abstract
The development of transcriptome-wide association studies (TWAS) has enabled researchers to better identify and interpret causal genes in many diseases. However, there are currently no resources providing a comprehensive listing of gene-disease associations discovered by TWAS from published GWAS summary statistics. TWAS analyses are also difficult to conduct due to the complexity of TWAS software pipelines. To address these issues, we introduce a new resource called webTWAS, which integrates a database of the most comprehensive disease GWAS datasets currently available with credible sets of potential causal genes identified by multiple TWAS software packages. Specifically, a total of 235 064 gene-diseases associations for a wide range of human diseases are prioritized from 1298 high-quality downloadable European GWAS summary statistics. Associations are calculated with seven different statistical models based on three popular and representative TWAS software packages. Users can explore associations at the gene or disease level, and easily search for related studies or diseases using the MeSH disease tree. Since the effects of diseases are highly tissue-specific, webTWAS applies tissue-specific enrichment analysis to identify significant tissues. A user-friendly web server is also available to run custom TWAS analyses on user-provided GWAS summary statistics data. webTWAS is freely available at http://www.webtwas.net.
Collapse
Affiliation(s)
- Chen Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Jianhua Wang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Devin Kwok
- School of Computer Science, McGill University, Montreal, Canada
| | - Feifei Cui
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Zilong Zhang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Da Zhao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Mulin Jun Li
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|