1
|
Loupe JM, Anderson AG, Rizzardi LF, Rodriguez-Nunez I, Moyers B, Trausch-Lowther K, Jain R, Bunney WE, Bunney BG, Cartagena P, Sequeira A, Watson SJ, Akil H, Cooper GM, Myers RM. Multiomic profiling of transcription factor binding and function in human brain. Nat Neurosci 2024; 27:1387-1399. [PMID: 38831039 DOI: 10.1038/s41593-024-01658-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 04/19/2024] [Indexed: 06/05/2024]
Abstract
Transcription factors (TFs) orchestrate gene expression programs crucial for brain function, but we lack detailed information about TF binding in human brain tissue. We generated a multiomic resource (ChIP-seq, ATAC-seq, RNA-seq, DNA methylation) on bulk tissues and sorted nuclei from several postmortem brain regions, including binding maps for more than 100 TFs. We demonstrate improved measurements of TF activity, including motif recognition and gene expression modeling, upon identification and removal of high TF occupancy regions. Further, predictive TF binding models demonstrate a bias for these high-occupancy sites. Neuronal TFs SATB2 and TBR1 bind unique regions depleted for such sites and promote neuronal gene expression. Binding sites for TFs, including TBR1 and PKNOX1, are enriched for risk variants associated with neuropsychiatric disorders, predominantly in neurons. This work, titled BrainTF, is a powerful resource for future studies seeking to understand the roles of specific TFs in regulating gene expression in the human brain.
Collapse
Affiliation(s)
- Jacob M Loupe
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | - Lindsay F Rizzardi
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Department of Biochemistry and Molecular Biology, The University of Alabama in Birmingham, Birmingham, AL, USA
| | | | - Belle Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | - Rashmi Jain
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - William E Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA, USA
| | - Blynn G Bunney
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA, USA
| | - Preston Cartagena
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA, USA
| | - Adolfo Sequeira
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA, USA
| | - Stanley J Watson
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Huda Akil
- The Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | | | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
| |
Collapse
|
2
|
Kim SS, Truong B, Jagadeesh K, Dey KK, Shen AZ, Raychaudhuri S, Kellis M, Price AL. Leveraging single-cell ATAC-seq and RNA-seq to identify disease-critical fetal and adult brain cell types. Nat Commun 2024; 15:563. [PMID: 38233398 PMCID: PMC10794712 DOI: 10.1038/s41467-024-44742-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 01/02/2024] [Indexed: 01/19/2024] Open
Abstract
Prioritizing disease-critical cell types by integrating genome-wide association studies (GWAS) with functional data is a fundamental goal. Single-cell chromatin accessibility (scATAC-seq) and gene expression (scRNA-seq) have characterized cell types at high resolution, and studies integrating GWAS with scRNA-seq have shown promise, but studies integrating GWAS with scATAC-seq have been limited. Here, we identify disease-critical fetal and adult brain cell types by integrating GWAS summary statistics from 28 brain-related diseases/traits (average N = 298 K) with 3.2 million scATAC-seq and scRNA-seq profiles from 83 cell types. We identified disease-critical fetal (respectively adult) brain cell types for 22 (respectively 23) of 28 traits using scATAC-seq, and for 8 (respectively 17) of 28 traits using scRNA-seq. Significant scATAC-seq enrichments included fetal photoreceptor cells for major depressive disorder, fetal ganglion cells for BMI, fetal astrocytes for ADHD, and adult VGLUT2 excitatory neurons for schizophrenia. Our findings improve our understanding of brain-related diseases/traits and inform future analyses.
Collapse
Affiliation(s)
- Samuel S Kim
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, UK.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, UK.
| | - Buu Truong
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, UK.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, UK.
| | - Karthik Jagadeesh
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, UK
| | - Kushal K Dey
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, UK
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Amber Z Shen
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Soumya Raychaudhuri
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Manolis Kellis
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, UK
| | - Alkes L Price
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, UK.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, UK.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, UK.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
3
|
Hodonsky CJ, Turner AW, Khan MD, Barrientos NB, Methorst R, Ma L, Lopez NG, Mosquera JV, Auguste G, Farber E, Ma WF, Wong D, Onengut-Gumuscu S, Kavousi M, Peyser PA, van der Laan SW, Leeper NJ, Kovacic JC, Björkegren JLM, Miller CL. Multi-ancestry genetic analysis of gene regulation in coronary arteries prioritizes disease risk loci. CELL GENOMICS 2024; 4:100465. [PMID: 38190101 PMCID: PMC10794848 DOI: 10.1016/j.xgen.2023.100465] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 09/07/2023] [Accepted: 11/19/2023] [Indexed: 01/09/2024]
Abstract
Genome-wide association studies (GWASs) have identified hundreds of risk loci for coronary artery disease (CAD). However, non-European populations are underrepresented in GWASs, and the causal gene-regulatory mechanisms of these risk loci during atherosclerosis remain unclear. We incorporated local ancestry and haplotypes to identify quantitative trait loci for expression (eQTLs) and splicing (sQTLs) in coronary arteries from 138 ancestrally diverse Americans. Of 2,132 eQTL-associated genes (eGenes), 47% were previously unreported in coronary artery; 19% exhibited cell-type-specific expression. Colocalization revealed subgroups of eGenes unique to CAD and blood pressure GWAS. Fine-mapping highlighted additional eGenes, including TBX20 and IL5. We also identified sQTLs for 1,690 genes, among which TOR1AIP1 and ULK3 sQTLs demonstrated the importance of evaluating splicing to accurately identify disease-relevant isoform expression. Our work provides a patient-derived coronary artery eQTL resource and exemplifies the need for diverse study populations and multifaceted approaches to characterize gene regulation in disease processes.
Collapse
Affiliation(s)
- Chani J Hodonsky
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Adam W Turner
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Mohammad Daud Khan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Nelson B Barrientos
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Ruben Methorst
- Central Diagnostics Laboratory, Division Laboratories, Pharmacy, and Biomedical Genetics, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, the Netherlands
| | - Lijiang Ma
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Nicolas G Lopez
- Division of Vascular Surgery, Department of Surgery, Stanford University, Stanford, CA 94305, USA
| | - Jose Verdezoto Mosquera
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
| | - Gaëlle Auguste
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Emily Farber
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Wei Feng Ma
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Medical Scientist Training Program, Department of Pathology, University of Virginia, Charlottesville, VA 22908, USA
| | - Doris Wong
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908, USA
| | - Suna Onengut-Gumuscu
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Maryam Kavousi
- Department of Epidemiology, Erasmus University Medical Center, 3000 CA Rotterdam, the Netherlands
| | - Patricia A Peyser
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48019, USA
| | - Sander W van der Laan
- Central Diagnostics Laboratory, Division Laboratories, Pharmacy, and Biomedical Genetics, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, the Netherlands
| | - Nicholas J Leeper
- Division of Vascular Surgery, Department of Surgery, Stanford University, Stanford, CA 94305, USA
| | - Jason C Kovacic
- Cardiovascular Research Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia; St. Vincent's Clinical School, University of New South Wales, Sydney, NSW 2052, Australia
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Huddinge, Karolinska Institutet, 141 52 Huddinge, Sweden
| | - Clint L Miller
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Division of Vascular Surgery, Department of Surgery, Stanford University, Stanford, CA 94305, USA; Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA.
| |
Collapse
|
4
|
Jeong R, Bulyk ML. Blood cell traits' GWAS loci colocalization with variation in PU.1 genomic occupancy prioritizes causal noncoding regulatory variants. CELL GENOMICS 2023; 3:100327. [PMID: 37492098 PMCID: PMC10363807 DOI: 10.1016/j.xgen.2023.100327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 02/10/2023] [Accepted: 04/25/2023] [Indexed: 07/27/2023]
Abstract
Genome-wide association studies (GWASs) have uncovered numerous trait-associated loci across the human genome, most of which are located in noncoding regions, making interpretation difficult. Moreover, causal variants are hard to statistically fine-map at many loci because of widespread linkage disequilibrium. To address this challenge, we present a strategy utilizing transcription factor (TF) binding quantitative trait loci (bQTLs) for colocalization analysis to identify trait associations likely mediated by TF occupancy variation and to pinpoint likely causal variants using motif scores. We applied this approach to PU.1 bQTLs in lymphoblastoid cell lines and blood cell trait GWAS data. Colocalization analysis revealed 69 blood cell trait GWAS loci putatively driven by PU.1 occupancy variation. We nominate PU.1 motif-altering variants as the likely shared causal variants at 51 loci. Such integration of TF bQTL data with other GWAS data may reveal transcriptional regulatory mechanisms and causal noncoding variants underlying additional complex traits.
Collapse
Affiliation(s)
- Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
5
|
Jeong R, Bulyk ML. Colocalization of blood cell traits GWAS associations and variation in PU.1 genomic occupancy prioritizes causal noncoding regulatory variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.29.534582. [PMID: 37034747 PMCID: PMC10081269 DOI: 10.1101/2023.03.29.534582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Genome-wide association studies (GWAS) have uncovered numerous trait-associated loci across the human genome, most of which are located in noncoding regions, making interpretations difficult. Moreover, causal variants are hard to statistically fine-map at many loci because of widespread linkage disequilibrium. To address this challenge, we present a strategy utilizing transcription factor (TF) binding quantitative trait loci (bQTLs) for colocalization analysis to identify trait associations likely mediated by TF occupancy variation and to pinpoint likely causal variants using motif scores. We applied this approach to PU.1 bQTLs in lymphoblastoid cell lines and blood cell traits GWAS data. Colocalization analysis revealed 69 blood cell trait GWAS loci putatively driven by PU.1 occupancy variation. We nominate PU.1 motif-altering variants as the likely shared causal variants at 51 loci. Such integration of TF bQTL data with other GWAS data may reveal transcriptional regulatory mechanisms and causal noncoding variants underlying additional complex traits.
Collapse
Affiliation(s)
- Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
6
|
Hodonsky CJ, Turner AW, Khan MD, Barrientos NB, Methorst R, Ma L, Lopez NG, Mosquera JV, Auguste G, Farber E, Ma WF, Wong D, Onengut-Gumuscu S, Kavousi M, Peyser PA, van der Laan SW, Leeper NJ, Kovacic JC, Björkegren JLM, Miller CL. Integrative multi-ancestry genetic analysis of gene regulation in coronary arteries prioritizes disease risk loci. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.09.23285622. [PMID: 36824883 PMCID: PMC9949190 DOI: 10.1101/2023.02.09.23285622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
Genome-wide association studies (GWAS) have identified hundreds of genetic risk loci for coronary artery disease (CAD). However, non-European populations are underrepresented in GWAS and the causal gene-regulatory mechanisms of these risk loci during atherosclerosis remain unclear. We incorporated local ancestry and haplotype information to identify quantitative trait loci (QTL) for gene expression and splicing in coronary arteries obtained from 138 ancestrally diverse Americans. Of 2,132 eQTL-associated genes (eGenes), 47% were previously unreported in coronary arteries and 19% exhibited cell-type-specific expression. Colocalization analysis with GWAS identified subgroups of eGenes unique to CAD and blood pressure. Fine-mapping highlighted additional eGenes of interest, including TBX20 and IL5 . Splicing (s)QTLs for 1,690 genes were also identified, among which TOR1AIP1 and ULK3 sQTLs demonstrated the importance of evaluating splicing events to accurately identify disease-relevant gene expression. Our work provides the first human coronary artery eQTL resource from a patient sample and exemplifies the necessity of diverse study populations and multi-omic approaches to characterize gene regulation in critical disease processes. Study Design Overview
Collapse
|
7
|
Feng Z, Duren Z, Xin J, Yuan Q, He Y, Su B, Wong WH, Wang Y. Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification. eLife 2022; 11:82535. [PMID: 36525361 PMCID: PMC9810332 DOI: 10.7554/elife.82535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Systems genetics holds the promise to decipher complex traits by interpreting their associated SNPs through gene regulatory networks derived from comprehensive multi-omics data of cell types, tissues, and organs. Here, we propose SpecVar to integrate paired chromatin accessibility and gene expression data into context-specific regulatory network atlas and regulatory categories, conduct heritability enrichment analysis with genome-wide association studies (GWAS) summary statistics, identify relevant tissues, and estimate relevance correlation to depict common genetic factors acting in the shared regulatory networks between traits. Our method improves power upon existing approaches by associating SNPs with context-specific regulatory elements to assess heritability enrichments and by explicitly prioritizing gene regulations underlying relevant tissues. Ablation studies, independent data validation, and comparison experiments with existing methods on GWAS of six phenotypes show that SpecVar can improve heritability enrichment, accurately detect relevant tissues, and reveal causal regulations. Furthermore, SpecVar correlates the relevance patterns for pairs of phenotypes and better reveals shared SNP-associated regulations of phenotypes than existing methods. Studying GWAS of 206 phenotypes in UK Biobank demonstrates that SpecVar leverages the context-specific regulatory network atlas to prioritize phenotypes' relevant tissues and shared heritability for biological and therapeutic insights. SpecVar provides a powerful way to interpret SNPs via context-specific regulatory networks and is available at https://github.com/AMSSwanglab/SpecVar, copy archived at swh:1:rev:cf27438d3f8245c34c357ec5f077528e6befe829.
Collapse
Affiliation(s)
- Zhanying Feng
- CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of SciencesBeijingChina
- School of Mathematics, University of Chinese Academy of Sciences, Chinese Academy of SciencesBeijingChina
| | - Zhana Duren
- Center for Human Genetics and Department of Genetics and Biochemistry, Clemson UniversityGreenwoodUnited States
| | - Jingxue Xin
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford UniversityStanfordUnited States
| | - Qiuyue Yuan
- Center for Human Genetics and Department of Genetics and Biochemistry, Clemson UniversityGreenwoodUnited States
| | - Yaoxi He
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of SciencesKunmingChina
| | - Bing Su
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of SciencesKunmingChina
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of SciencesKunmingChina
| | - Wing Hung Wong
- Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford UniversityStanfordUnited States
| | - Yong Wang
- CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of SciencesBeijingChina
- School of Mathematics, University of Chinese Academy of Sciences, Chinese Academy of SciencesBeijingChina
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of SciencesKunmingChina
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of SciencesHangzhouChina
| |
Collapse
|
8
|
Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nat Genet 2022; 54:1479-1492. [PMID: 36175791 PMCID: PMC9910198 DOI: 10.1038/s41588-022-01187-9] [Citation(s) in RCA: 63] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 08/18/2022] [Indexed: 12/13/2022]
Abstract
Genome-wide association studies provide a powerful means of identifying loci and genes contributing to disease, but in many cases, the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important for identifying pathogenic processes and developing therapeutics. In the present study, we introduce sc-linker, a framework for integrating single-cell RNA-sequencing, epigenomic SNP-to-gene maps and genome-wide association study summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. The inferred disease enrichments recapitulated known biology and highlighted notable cell-disease relationships, including γ-aminobutyric acid-ergic neurons in major depressive disorder, a disease-dependent M-cell program in ulcerative colitis and a disease-specific complement cascade process in multiple sclerosis. In autoimmune disease, both healthy and disease-dependent immune cell-type programs were associated, whereas only disease-dependent epithelial cell programs were prominent, suggesting a role in disease response rather than initiation. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease.
Collapse
|
9
|
Dey KK, Gazal S, van de Geijn B, Kim SS, Nasser J, Engreitz JM, Price AL. SNP-to-gene linking strategies reveal contributions of enhancer-related and candidate master-regulator genes to autoimmune disease. CELL GENOMICS 2022; 2:100145. [PMID: 35873673 PMCID: PMC9306342 DOI: 10.1016/j.xgen.2022.100145] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
We assess contributions to autoimmune disease of genes whose regulation is driven by enhancer regions (enhancer-related) and genes that regulate other genes in trans (candidate master-regulator). We link these genes to SNPs using several SNP-to-gene (S2G) strategies and apply heritability analyses to draw three conclusions about 11 autoimmune/blood-related diseases/traits. First, several characterizations of enhancer-related genes using functional genomics data are informative for autoimmune disease heritability after conditioning on a broad set of regulatory annotations. Second, candidate master-regulator genes defined using trans-eQTL in blood are also conditionally informative for autoimmune disease heritability. Third, integrating enhancer-related and master-regulator gene sets with protein-protein interaction (PPI) network information magnified their disease signal. The resulting PPI-enhancer gene score produced >2-fold stronger heritability signal and >2-fold stronger enrichment for drug targets, compared with the recently proposed enhancer domain score. In each case, functionally informed S2G strategies produced 4.1- to 13-fold stronger disease signals than conventional window-based strategies.
Collapse
Affiliation(s)
- Kushal K. Dey
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Corresponding author
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Bryce van de Geijn
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Genentech, South San Francisco, CA 94080, USA
| | - Samuel Sungil Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Joseph Nasser
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jesse M. Engreitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- BASE Initiative, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Stanford University School of Medicine, Stanford, CA 94304, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L. Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
10
|
Weighill D, Ben Guebila M, Glass K, Quackenbush J, Platig J. Predicting genotype-specific gene regulatory networks. Genome Res 2022; 32:524-533. [PMID: 35193937 PMCID: PMC8896459 DOI: 10.1101/gr.275107.120] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 01/11/2022] [Indexed: 11/25/2022]
Abstract
Understanding how each person's unique genotype influences their individual patterns of gene regulation has the potential to improve our understanding of human health and development, and to refine genotype-specific disease risk assessments and treatments. However, the effects of genetic variants are not typically considered when constructing gene regulatory networks, despite the fact that many disease-associated genetic variants are thought to have regulatory effects, including the disruption of transcription factor (TF) binding. We developed EGRET (Estimating the Genetic Regulatory Effect on TFs), which infers a genotype-specific gene regulatory network for each individual in a study population. EGRET begins by constructing a genotype-informed TF-gene prior network derived using TF motif predictions, expression quantitative trait locus (eQTL) data, individual genotypes, and the predicted effects of genetic variants on TF binding. It then uses a technique known as message passing to integrate this prior network with gene expression and TF protein–protein interaction data to produce a refined, genotype-specific regulatory network. We used EGRET to infer gene regulatory networks for two blood-derived cell lines and identified genotype-associated, cell line–specific regulatory differences that we subsequently validated using allele-specific expression, chromatin accessibility QTLs, and differential ChIP-seq TF binding. We also inferred EGRET networks for three cell types from each of 119 individuals and identified cell type–specific regulatory differences associated with diseases related to those cell types. EGRET is, to our knowledge, the first method that infers networks reflective of individual genetic variation in a way that provides insight into the genetic regulatory associations driving complex phenotypes.
Collapse
Affiliation(s)
- Deborah Weighill
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | | | - Kimberly Glass
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA
| | - John Quackenbush
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
| | - John Platig
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
11
|
Jagadeesh KA, Dey KK, Montoro DT, Mohan R, Gazal S, Engreitz JM, Xavier RJ, Price AL, Regev A. Identifying disease-critical cell types and cellular processes across the human body by integration of single-cell profiles and human genetics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.03.19.436212. [PMID: 34845454 PMCID: PMC8629197 DOI: 10.1101/2021.03.19.436212] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Genome-wide association studies (GWAS) provide a powerful means to identify loci and genes contributing to disease, but in many cases the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important for identifying pathogenic processes and developing therapeutics. Here, we introduce sc-linker, a framework for integrating single-cell RNA-seq (scRNA-seq), epigenomic maps and GWAS summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. We analyzed 1.6 million scRNA-seq profiles from 209 individuals spanning 11 tissue types and 6 disease conditions, and constructed gene programs capturing cell types, disease progression, and cellular processes both within and across cell types. We evaluated these gene programs for disease enrichment by transforming them to SNP annotations with tissue-specific epigenomic maps and computing enrichment scores across 60 diseases and complex traits (average N= 297K). Cell type, disease progression, and cellular process programs captured distinct heritability signals even within the same cell type, as we show in multiple complex diseases that affect the brain (Alzheimer’s disease, multiple sclerosis), colon (ulcerative colitis) and lung (asthma, idiopathic pulmonary fibrosis, severe COVID-19). The inferred disease enrichments recapitulated known biology and highlighted novel cell-disease relationships, including GABAergic neurons in major depressive disorder (MDD), a disease progression M cell program in ulcerative colitis, and a disease-specific complement cascade process in multiple sclerosis. In autoimmune disease, both healthy and disease progression immune cell type programs were associated, whereas for epithelial cells, disease progression programs were most prominent, perhaps suggesting a role in disease progression over initiation. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease.
Collapse
|
12
|
Evaluating the informativeness of deep learning annotations for human complex diseases. Nat Commun 2020; 11:4703. [PMID: 32943643 PMCID: PMC7499261 DOI: 10.1038/s41467-020-18515-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 08/25/2020] [Indexed: 12/12/2022] Open
Abstract
Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations from two previous deep learning models, DeepSEA and Basenji, by applying stratified LD score regression to 41 diseases and traits (average N = 320K), conditioning on a broad set of coding, conserved and regulatory annotations. We aggregated annotations across all (respectively blood or brain) tissues/cell-types in meta-analyses across all (respectively 11 blood or 8 brain) traits. The annotations were highly enriched for disease heritability, but produced only limited conditionally significant results: non-tissue-specific and brain-specific Basenji-H3K4me3 for all traits and brain traits respectively. We conclude that deep learning models have yet to achieve their full potential to provide considerable unique information for complex disease, and that their conditional informativeness for disease cannot be inferred from their accuracy in predicting regulatory annotations. Deep learning models have shown great promise in predicting regulatory effects from DNA sequence. Here the authors evaluate sequence-based epigenomic deep learning models and conclude that these models are not yet ready to inform our knowledge of human disease.
Collapse
|