1
|
Akinbiyi T, McPeek MS, Abney M. ADELLE: A global testing method for trans-eQTL mapping. PLoS Genet 2025; 21:e1011563. [PMID: 39792937 PMCID: PMC11756770 DOI: 10.1371/journal.pgen.1011563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 01/23/2025] [Accepted: 12/31/2024] [Indexed: 01/12/2025] Open
Abstract
Understanding the genetic regulatory mechanisms of gene expression is an ongoing challenge. Genetic variants that are associated with expression levels are readily identified when they are proximal to the gene (i.e., cis-eQTLs), but SNPs distant from the gene whose expression levels they are associated with (i.e., trans-eQTLs) have been much more difficult to discover, even though they account for a majority of the heritability in gene expression levels. A major impediment to the identification of more trans-eQTLs is the lack of statistical methods that are powerful enough to overcome the obstacles of small effect sizes and large multiple testing burden of trans-eQTL mapping. Here, we propose ADELLE, a powerful statistical testing framework that requires only summary statistics and is designed to be most sensitive to SNPs that are associated with multiple gene expression levels, a characteristic of many trans-eQTLs. In simulations, we show that for detecting SNPs that are associated with 0.1%-2% of 10,000 traits, among the 8 methods we consider ADELLE is clearly the most powerful overall, with either the highest power or power not significantly different from the highest for all settings in that range. We apply ADELLE to a mouse advanced intercross line data set and show its ability to find trans-eQTLs that were not significant under a standard analysis. We also apply ADELLE to trans-eQTL mapping in the eQTLGen data, and for 1,451 previously identified trans-eQTLs, we discover trans association with additional expression traits beyond those previously identified. This demonstrates that ADELLE is a powerful tool at uncovering trans regulators of genetic expression.
Collapse
Affiliation(s)
- Takintayo Akinbiyi
- Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Mark Abney
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
2
|
Sadeghi-Alavijeh O, Chan MM, Doctor GT, Voinescu CD, Stuckey A, Kousathanas A, Ho AT, Stanescu HC, Bockenhauer D, Sandford RN, Levine AP, Gale DP. Quantifying variant contributions in cystic kidney disease using national-scale whole-genome sequencing. J Clin Invest 2024; 134:e181467. [PMID: 39190624 PMCID: PMC11444187 DOI: 10.1172/jci181467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 08/15/2024] [Indexed: 08/29/2024] Open
Abstract
BACKGROUNDCystic kidney disease (CyKD) is a predominantly familial disease in which gene discovery has been led by family-based and candidate gene studies, an approach that is susceptible to ascertainment and other biases.METHODSUsing whole-genome sequencing data from 1,209 cases and 26,096 ancestry-matched controls participating in the 100,000 Genomes Project, we adopted hypothesis-free approaches to generate quantitative estimates of disease risk for each genetic contributor to CyKD, across genes, variant types and allelic frequencies.RESULTSIn 82.3% of cases, a qualifying potentially disease-causing rare variant in an established gene was found. There was an enrichment of rare coding, splicing, and structural variants in known CyKD genes, with statistically significant gene-based signals in COL4A3 and (monoallelic) PKHD1. Quantification of disease risk for each gene (with replication in the separate UK Biobank study) revealed substantially lower risk associated with genes more recently associated with autosomal dominant polycystic kidney disease, with odds ratios for some below what might usually be regarded as necessary for classical Mendelian inheritance. Meta-analysis of common variants did not reveal significant associations, but suggested this category of variation contributes 3%-9% to the heritability of CyKD across European ancestries.CONCLUSIONBy providing unbiased quantification of risk effects per gene, this research suggests that not all rare variant genetic contributors to CyKD are equally likely to manifest as a Mendelian trait in families. This information may inform genetic testing and counseling in the clinic.
Collapse
Affiliation(s)
- Omid Sadeghi-Alavijeh
- Centre for Kidney and Bladder Health, University College London, London, United Kingdom
| | - Melanie My Chan
- Centre for Kidney and Bladder Health, University College London, London, United Kingdom
| | - Gabriel T Doctor
- Centre for Kidney and Bladder Health, University College London, London, United Kingdom
| | - Catalin D Voinescu
- Centre for Kidney and Bladder Health, University College London, London, United Kingdom
| | - Alexander Stuckey
- Genomics England, Queen Mary University of London, London, United Kingdom
| | | | - Alexander T Ho
- Genomics England, Queen Mary University of London, London, United Kingdom
| | - Horia C Stanescu
- Centre for Kidney and Bladder Health, University College London, London, United Kingdom
| | - Detlef Bockenhauer
- Centre for Kidney and Bladder Health, University College London, London, United Kingdom
- University Hospital and Katholic University Leuven, Leuven, Belgium
| | - Richard N Sandford
- Academic Department of Medical Genetics, Cambridge University, Cambridge, United Kingdom
| | - Adam P Levine
- Centre for Kidney and Bladder Health, University College London, London, United Kingdom
- Research Department of Pathology, University College London, London, United Kingdom
| | - Daniel P Gale
- Centre for Kidney and Bladder Health, University College London, London, United Kingdom
| |
Collapse
|
3
|
Wong MMK, Sha Z, Lütje L, Kong XZ, van Heukelum S, van de Berg WDJ, Jonkman LE, Fisher SE, Francks C. The neocortical infrastructure for language involves region-specific patterns of laminar gene expression. Proc Natl Acad Sci U S A 2024; 121:e2401687121. [PMID: 39133845 PMCID: PMC11348331 DOI: 10.1073/pnas.2401687121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 06/27/2024] [Indexed: 08/29/2024] Open
Abstract
The language network of the human brain has core components in the inferior frontal cortex and superior/middle temporal cortex, with left-hemisphere dominance in most people. Functional specialization and interconnectivity of these neocortical regions is likely to be reflected in their molecular and cellular profiles. Excitatory connections between cortical regions arise and innervate according to layer-specific patterns. Here, we generated a gene expression dataset from human postmortem cortical tissue samples from core language network regions, using spatial transcriptomics to discriminate gene expression across cortical layers. Integration of these data with existing single-cell expression data identified 56 genes that showed differences in laminar expression profiles between the frontal and temporal language cortex together with upregulation in layer II/III and/or layer V/VI excitatory neurons. Based on data from large-scale genome-wide screening in the population, DNA variants within these 56 genes showed set-level associations with interindividual variation in structural connectivity between the left-hemisphere frontal and temporal language cortex, and with the brain-related disorders dyslexia and schizophrenia which often involve affected language. These findings identify region-specific patterns of laminar gene expression as a feature of the brain's language network.
Collapse
Affiliation(s)
- Maggie M. K. Wong
- Language & Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen6525XD, The Netherlands
| | - Zhiqiang Sha
- Language & Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen6525XD, The Netherlands
| | - Lukas Lütje
- Language & Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen6525XD, The Netherlands
| | - Xiang-Zhen Kong
- Language & Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen6525XD, The Netherlands
- Department of Psychology and Behavioral Sciences, Zhejiang University, Hangzhou310058, China
- State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou311121, China
| | - Sabrina van Heukelum
- Language & Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen6525XD, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen6525 GA, The Netherlands
| | - Wilma D. J. van de Berg
- Section Clinical Neuroanatomy and Biobanking, Department of Anatomy and Neurosciences, Amsterdam University Medical Center, Location Vrije Universiteit Amsterdam, Amsterdam1007 MB, The Netherlands
- Neurodegeneration, Amsterdam Neuroscience, Amsterdam1007 MB, The Netherlands
| | - Laura E. Jonkman
- Section Clinical Neuroanatomy and Biobanking, Department of Anatomy and Neurosciences, Amsterdam University Medical Center, Location Vrije Universiteit Amsterdam, Amsterdam1007 MB, The Netherlands
- Neurodegeneration, Amsterdam Neuroscience, Amsterdam1007 MB, The Netherlands
- Brain Imaging, Amsterdam Neuroscience, Amsterdam1007 MB, The Netherlands
| | - Simon E. Fisher
- Language & Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen6525XD, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen6525 GA, The Netherlands
| | - Clyde Francks
- Language & Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen6525XD, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen6525 GA, The Netherlands
- Department of Cognitive Neuroscience, Radboud University Medical Center, Nijmegen6525 GA, The Netherlands
| |
Collapse
|
4
|
Akinbiyi T, McPeek MS, Abney M. ADELLE: A global testing method for Trans-eQTL mapping. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.24.581871. [PMID: 38464248 PMCID: PMC10925110 DOI: 10.1101/2024.02.24.581871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Understanding the genetic regulatory mechanisms of gene expression is a challenging and ongoing problem. Genetic variants that are associated with expression levels are readily identified when they are proximal to the gene (i.e., cis-eQTLs), but SNPs distant from the gene whose expression levels they are associated with (i.e., trans-eQTLs) have been much more difficult to discover, even though they account for a majority of the heritability in gene expression levels. A major impediment to the identification of more trans-eQTLs is the lack of statistical methods that are powerful enough to overcome the obstacles of small effect sizes and large multiple testing burden of trans-eQTL mapping. Here, we propose ADELLE, a powerful statistical testing framework that requires only summary statistics and is designed to be most sensitive to SNPs that are associated with multiple gene expression levels, a characteristic of many trans-eQTLs. In simulations, we show that for detecting SNPs that are associated with 0.1%-2% of 10,000 traits, among the 7 methods we consider ADELLE is clearly the most powerful overall, with either the highest power or power not significantly different from the highest for all settings in that range. We apply ADELLE to a mouse advanced intercross line data set and show its ability to find trans-eQTLs that were not significant under a standard analysis. This demonstrates that ADELLE is a powerful tool at uncovering trans regulators of genetic expression.
Collapse
|
5
|
Guo X, Chatterjee N, Dutta D. Subset-based method for cross-tissue transcriptome-wide association studies improves power and interpretability. HGG ADVANCES 2024; 5:100283. [PMID: 38491773 PMCID: PMC10999697 DOI: 10.1016/j.xhgg.2024.100283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 03/09/2024] [Accepted: 03/09/2024] [Indexed: 03/18/2024] Open
Abstract
Integrating results from genome-wide association studies (GWASs) and studies of molecular phenotypes such as gene expressions can improve our understanding of the biological functions of trait-associated variants and can help prioritize candidate genes for downstream analysis. Using reference expression quantitative trait locus (eQTL) studies, several methods have been proposed to identify gene-trait associations, primarily based on gene expression imputation. To increase the statistical power by leveraging substantial eQTL sharing across tissues, meta-analysis methods aggregating such gene-based test results across multiple tissues or contexts have been developed as well. However, most existing meta-analysis methods have limited power to identify associations when the gene has weaker associations in only a few tissues and cannot identify the subset of tissues in which the gene is "activated." For this, we developed a cross-tissue subset-based transcriptome-wide association study (CSTWAS) meta-analysis method that improves power under such scenarios and can extract the set of potentially associated tissues. To improve applicability, CSTWAS uses only GWAS summary statistics and pre-computed correlation matrices to identify a subset of tissues that have the maximal evidence of gene-trait association. Through numerical simulations, we found that CSTWAS can maintain a well-calibrated type-I error rate, improves power especially when there is a small number of associated tissues for a gene-trait association, and identifies an accurate associated tissue set. By analyzing GWAS summary statistics of three complex traits and diseases, we demonstrate that CSTWAS could identify biological meaningful signals while providing an interpretation of disease etiology by extracting a set of potentially associated tissues.
Collapse
Affiliation(s)
- Xinyu Guo
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90007, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Diptavo Dutta
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology & Genetics, National Cancer Institute, Rockville, MD 20850, USA.
| |
Collapse
|
6
|
Cao X, Wang X, Zhang S, Sha Q. Gene-based association tests using GWAS summary statistics and incorporating eQTL. Sci Rep 2022; 12:3553. [PMID: 35241742 PMCID: PMC8894384 DOI: 10.1038/s41598-022-07465-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 02/11/2022] [Indexed: 01/29/2023] Open
Abstract
Although genome-wide association studies (GWAS) have been successfully applied to a variety of complex diseases and identified many genetic variants underlying complex diseases via single marker tests, there is still a considerable heritability of complex diseases that could not be explained by GWAS. One alternative approach to overcome the missing heritability caused by genetic heterogeneity is gene-based analysis, which considers the aggregate effects of multiple genetic variants in a single test. Another alternative approach is transcriptome-wide association study (TWAS). TWAS aggregates genomic information into functionally relevant units that map to genes and their expression. TWAS is not only powerful, but can also increase the interpretability in biological mechanisms of identified trait associated genes. In this study, we propose a powerful and computationally efficient gene-based association test, called Overall. Using extended Simes procedure, Overall aggregates information from three types of traditional gene-based association tests and also incorporates expression quantitative trait locus (eQTL) information into a gene-based association test using GWAS summary statistics. We show that after a small number of replications to estimate the correlation among the integrated gene-based tests, the p values of Overall can be calculated analytically. Simulation studies show that Overall can control type I error rates very well and has higher power than the tests that we compared with. We also apply Overall to two schizophrenia GWAS summary datasets and two lipids GWAS summary datasets. The results show that this newly developed method can identify more significant genes than other methods we compared with.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, TX, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, 49931, USA.
| |
Collapse
|
7
|
Bi W, Lee S. Scalable and Robust Regression Methods for Phenome-Wide Association Analysis on Large-Scale Biobank Data. Front Genet 2021; 12:682638. [PMID: 34211504 PMCID: PMC8239389 DOI: 10.3389/fgene.2021.682638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/17/2021] [Indexed: 02/05/2023] Open
Abstract
With the advances in genotyping technologies and electronic health records (EHRs), large biobanks have been great resources to identify novel genetic associations and gene-environment interactions on a genome-wide and even a phenome-wide scale. To date, several phenome-wide association studies (PheWAS) have been performed on biobank data, which provides comprehensive insights into many aspects of human genetics and biology. Although inspiring, PheWAS on large-scale biobank data encounters new challenges including computational burden, unbalanced phenotypic distribution, and genetic relationship. In this paper, we first discuss these new challenges and their potential impact on data analysis. Then, we summarize approaches that are scalable and robust in GWAS and PheWAS. This review can serve as a practical guide for geneticists, epidemiologists, and other medical researchers to identify genetic variations associated with health-related phenotypes in large-scale biobank data analysis. Meanwhile, it can also help statisticians to gain a comprehensive and up-to-date understanding of the current technical tool development.
Collapse
Affiliation(s)
- Wenjian Bi
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, South Korea
| |
Collapse
|