1
|
Shireen H, Batool F, Khatoon H, Parveen N, Sehar NU, Hussain I, Ali S, Abbasi AA. Predicting genome-wide tissue-specific enhancers via combinatorial transcription factor genomic occupancy analysis. FEBS Lett 2025; 599:100-119. [PMID: 39367524 DOI: 10.1002/1873-3468.15030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 08/27/2024] [Accepted: 09/13/2024] [Indexed: 10/06/2024]
Abstract
Enhancers are non-coding cis-regulatory elements crucial for transcriptional regulation. Mutations in enhancers can disrupt gene regulation, leading to disease phenotypes. Identifying enhancers and their tissue-specific activity is challenging due to their lack of stereotyped sequences. This study presents a sequence-based computational model that uses combinatorial transcription factor (TF) genomic occupancy to predict tissue-specific enhancers. Trained on diverse datasets, including ENCODE and Vista enhancer browser data, the model predicted 25 000 forebrain-specific cis-regulatory modules (CRMs) in the human genome. Validation using biochemical features, disease-associated SNPs, and in vivo zebrafish analysis confirmed its effectiveness. This model aids in predicting enhancers lacking well-characterized chromatin features, complementing experimental approaches in tissue-specific enhancer discovery.
Collapse
Affiliation(s)
- Huma Shireen
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Fatima Batool
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Hizran Khatoon
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Nazia Parveen
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Noor Us Sehar
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Irfan Hussain
- Centre for Regenerative Medicine and Stem Cells Research, Agha Khan University hospital, Karachi, Pakistan
| | - Shahid Ali
- Department of Organismal Biology and Anatomy, The University of Chicago, Chicago, IL, USA
| | - Amir Ali Abbasi
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| |
Collapse
|
2
|
Toneyan S, Koo PK. Interpreting cis-regulatory interactions from large-scale deep neural networks. Nat Genet 2024; 56:2517-2527. [PMID: 39284975 DOI: 10.1038/s41588-024-01923-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 08/21/2024] [Indexed: 09/25/2024]
Abstract
The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with orthogonal experimental data, providing insights into generalization but offering limited insights into their decision-making process. Existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences. Here we present cis-regulatory element model explanations (CREME), an in silico perturbation toolkit that interprets the rules of gene regulation learned by a genomic DNN. Applying CREME to Enformer, a state-of-the-art DNN, we identify cis-regulatory elements that enhance or silence gene expression and characterize their complex interactions. CREME can provide interpretations across multiple scales of genomic organization, from cis-regulatory elements to fine-mapped functional sequence elements within them, offering high-resolution insights into the regulatory architecture of the genome. CREME provides a powerful toolkit for translating the predictions of genomic DNNs into mechanistic insights of gene regulation.
Collapse
Affiliation(s)
- Shushan Toneyan
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, New York, NY, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, New York, NY, USA.
| |
Collapse
|
3
|
Asma H, Tieke E, Deem KD, Rahmat J, Dong T, Huang X, Tomoyasu Y, Halfon MS. Regulatory genome annotation of 33 insect species. eLife 2024; 13:RP96738. [PMID: 39392676 PMCID: PMC11469670 DOI: 10.7554/elife.96738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/12/2024] Open
Abstract
Annotation of newly sequenced genomes frequently includes genes, but rarely covers important non-coding genomic features such as the cis-regulatory modules-e.g., enhancers and silencers-that regulate gene expression. Here, we begin to remedy this situation by developing a workflow for rapid initial annotation of insect regulatory sequences, and provide a searchable database resource with enhancer predictions for 33 genomes. Using our previously developed SCRMshaw computational enhancer prediction method, we predict over 2.8 million regulatory sequences along with the tissues where they are expected to be active, in a set of insect species ranging over 360 million years of evolution. Extensive analysis and validation of the data provides several lines of evidence suggesting that we achieve a high true-positive rate for enhancer prediction. One, we show that our predictions target specific loci, rather than random genomic locations. Two, we predict enhancers in orthologous loci across a diverged set of species to a significantly higher degree than random expectation would allow. Three, we demonstrate that our predictions are highly enriched for regions of accessible chromatin. Four, we achieve a validation rate in excess of 70% using in vivo reporter gene assays. As we continue to annotate both new tissues and new species, our regulatory annotation resource will provide a rich source of data for the research community and will have utility for both small-scale (single gene, single species) and large-scale (many genes, many species) studies of gene regulation. In particular, the ability to search for functionally related regulatory elements in orthologous loci should greatly facilitate studies of enhancer evolution even among distantly related species.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New YorkBuffaloUnited States
| | - Ellen Tieke
- Department of Biology, Miami UniversityOxfordUnited States
| | - Kevin D Deem
- Department of Biology, Miami UniversityOxfordUnited States
| | - Jabale Rahmat
- Department of Biology, Miami UniversityOxfordUnited States
| | - Tiffany Dong
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
| | - Xinbo Huang
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
| | | | - Marc S Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biomedical Informatics, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biological Sciences, University at Buffalo-State University of New YorkBuffaloUnited States
| |
Collapse
|
4
|
Mulero-Hernández J, Mironov V, Miñarro-Giménez JA, Kuiper M, Fernández-Breis J. Integration of chromosome locations and functional aspects of enhancers and topologically associating domains in knowledge graphs enables versatile queries about gene regulation. Nucleic Acids Res 2024; 52:e69. [PMID: 38967009 PMCID: PMC11347148 DOI: 10.1093/nar/gkae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 06/12/2024] [Accepted: 06/19/2024] [Indexed: 07/06/2024] Open
Abstract
Knowledge about transcription factor binding and regulation, target genes, cis-regulatory modules and topologically associating domains is not only defined by functional associations like biological processes or diseases but also has a determinative genome location aspect. Here, we exploit these location and functional aspects together to develop new strategies to enable advanced data querying. Many databases have been developed to provide information about enhancers, but a schema that allows the standardized representation of data, securing interoperability between resources, has been lacking. In this work, we use knowledge graphs for the standardized representation of enhancers and topologically associating domains, together with data about their target genes, transcription factors, location on the human genome, and functional data about diseases and gene ontology annotations. We used this schema to integrate twenty-five enhancer datasets and two domain datasets, creating the most powerful integrative resource in this field to date. The knowledge graphs have been implemented using the Resource Description Framework and integrated within the open-access BioGateway knowledge network, generating a resource that contains an interoperable set of knowledge graphs (enhancers, TADs, genes, proteins, diseases, GO terms, and interactions between domains). We show how advanced queries, which combine functional and location restrictions, can be used to develop new hypotheses about functional aspects of gene expression regulation.
Collapse
Affiliation(s)
- Juan Mulero-Hernández
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Vladimir Mironov
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - José Antonio Miñarro-Giménez
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| |
Collapse
|
5
|
Ni P, Wu S, Su Z. Validated Negative Regions (VNRs) in the VISTA Database might be Truncated Forms of Bona Fide Enhancers. ADVANCED GENETICS (HOBOKEN, N.J.) 2024; 5:2300209. [PMID: 38884049 PMCID: PMC11170074 DOI: 10.1002/ggn2.202300209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/16/2024] [Indexed: 06/18/2024]
Abstract
The VISTA enhancer database is a valuable resource for evaluating predicted enhancers in humans and mice. In addition to thousands of validated positive regions (VPRs) in the human and mouse genomes, the database also contains similar numbers of validated negative regions (VNRs). It is previously shown that the VPRs are on average half as long as predicted overlapping enhancers that are highly conserved and hypothesize that the VPRs may be truncated forms of long bona fide enhancers. Here, it is shown that like the VPRs, the VNRs also are under strong evolutionary constraints and overlap predicted enhancers in the genomes. The VNRs are also on average half as long as predicted overlapping enhancers that are highly conserved. Moreover, the VNRs and the VPRs display similar cell/tissue-specific modification patterns of key epigenetic marks of active enhancers. Furthermore, the VNRs and the VPRs show similar impact score spectra of in silico mutagenesis. These highly similar properties between the VPRs and the VNRs suggest that like the VPRs, the VNRs may also be truncated forms of long bona fide enhancers.
Collapse
Affiliation(s)
- Pengyu Ni
- Department of Bioinformatics and Genomics the University of North Carolina at Charlotte Charlotte NC 28223 USA
- Present address: Department of Molecular Biophysics & Biochemistry Yale University New Haven CT 06520 USA
| | - Siwen Wu
- Department of Bioinformatics and Genomics the University of North Carolina at Charlotte Charlotte NC 28223 USA
| | - Zhengchang Su
- Department of Bioinformatics and Genomics the University of North Carolina at Charlotte Charlotte NC 28223 USA
| |
Collapse
|
6
|
Scott TJ, Hansen TJ, McArthur E, Hodges E. Cross-tissue patterns of DNA hypomethylation reveal genetically distinct histories of cell development. BMC Genomics 2023; 24:623. [PMID: 37858046 PMCID: PMC10588161 DOI: 10.1186/s12864-023-09622-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 08/24/2023] [Indexed: 10/21/2023] Open
Abstract
BACKGROUND Establishment of DNA methylation (DNAme) patterns is essential for balanced multi-lineage cellular differentiation, but exactly how these patterns drive cellular phenotypes is unclear. While > 80% of CpG sites are stably methylated, tens of thousands of discrete CpG loci form hypomethylated regions (HMRs). Because they lack DNAme, HMRs are considered transcriptionally permissive, but not all HMRs actively regulate genes. Unlike promoter HMRs, a subset of non-coding HMRs is cell type-specific and enriched for tissue-specific gene regulatory functions. Our data further argues not only that HMR establishment is an important step in enforcing cell identity, but also that cross-cell type and spatial HMR patterns are functionally informative of gene regulation. RESULTS To understand the significance of non-coding HMRs, we systematically dissected HMR patterns across diverse human cell types and developmental timepoints, including embryonic, fetal, and adult tissues. Unsupervised clustering of 126,104 distinct HMRs revealed that levels of HMR specificity reflects a developmental hierarchy supported by enrichment of stage-specific transcription factors and gene ontologies. Using a pseudo-time course of development from embryonic stem cells to adult stem and mature hematopoietic cells, we find that most HMRs observed in differentiated cells (~ 60%) are established at early developmental stages and accumulate as development progresses. HMRs that arise during differentiation frequently (~ 35%) establish near existing HMRs (≤ 6 kb away), leading to the formation of HMR clusters associated with stronger enhancer activity. Using SNP-based partitioned heritability from GWAS summary statistics across diverse traits and clinical lab values, we discovered that genetic contribution to trait heritability is enriched within HMRs. Moreover, the contribution of heritability to cell-relevant traits increases with both increasing HMR specificity and HMR clustering, supporting the role of distinct HMR subsets in regulating normal cell function. CONCLUSIONS Our results demonstrate that the entire HMR repertoire within a cell-type, rather than just the cell type-specific HMRs, stores information that is key to understanding and predicting cellular phenotypes. Ultimately, these data provide novel insights into how DNA hypo-methylation provides genetically distinct historical records of a cell's journey through development, highlighting HMRs as functionally distinct from other epigenomic annotations.
Collapse
Affiliation(s)
- Timothy J Scott
- Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Tyler J Hansen
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, 60637, USA
| | - Evonne McArthur
- Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
- Department of Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Emily Hodges
- Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA.
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA.
| |
Collapse
|
7
|
Gonçalves TM, Stewart CL, Baxley SD, Xu J, Li D, Gabel HW, Wang T, Avraham O, Zhao G. Towards a comprehensive regulatory map of Mammalian Genomes. RESEARCH SQUARE 2023:rs.3.rs-3294408. [PMID: 37841836 PMCID: PMC10571623 DOI: 10.21203/rs.3.rs-3294408/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Genome mapping studies have generated a nearly complete collection of genes for the human genome, but we still lack an equivalently vetted inventory of human regulatory sequences. Cis-regulatory modules (CRMs) play important roles in controlling when, where, and how much a gene is expressed. We developed a training data-free CRM-prediction algorithm, the Mammalian Regulatory MOdule Detector (MrMOD) for accurate CRM prediction in mammalian genomes. MrMOD provides genome position-fixed CRM models similar to the fixed gene models for the mouse and human genomes using only genomic sequences as the inputs with one adjustable parameter - the significance p-value. Importantly, MrMOD predicts a comprehensive set of high-resolution CRMs in the mouse and human genomes including all types of regulatory modules not limited to any tissue, cell type, developmental stage, or condition. We computationally validated MrMOD predictions used a compendium of 21 orthogonal experimental data sets including thousands of experimentally defined CRMs and millions of putative regulatory elements derived from hundreds of different tissues, cell types, and stimulus conditions obtained from multiple databases. In ovo transgenic reporter assay demonstrates the power of our prediction in guiding experimental design. We analyzed CRMs located in the chromosome 17 using unsupervised machine learning and identified groups of CRMs with multiple lines of evidence supporting their functionality, linking CRMs with upstream binding transcription factors and downstream target genes. Our work provides a comprehensive base pair resolution annotation of the functional regulatory elements and non-functional regions in the mammalian genomes.
Collapse
Affiliation(s)
| | | | | | - Jason Xu
- Missouri University of Science & Technology
| | - Daofeng Li
- Washington University School of Medicine
| | | | - Ting Wang
- Washington University School of Medicine
| | | | | |
Collapse
|
8
|
Bejjani F, Evanno E, Mahfoud S, Tolza C, Zibara K, Piechaczyk M, Jariel-Encontre I. Multiple Fra-1-bound enhancers showing different molecular and functional features can cooperate to repress gene transcription. Cell Biosci 2023; 13:129. [PMID: 37464380 DOI: 10.1186/s13578-023-01077-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 06/26/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND How transcription factors (TFs) down-regulate gene expression remains ill-understood, especially when they bind to multiple enhancers contacting the same gene promoter. In particular, it is not known whether they exert similar or significantly different molecular effects at these enhancers. RESULTS To address this issue, we used a particularly well-suited study model consisting of the down-regulation of the TGFB2 gene by the TF Fra-1 in Fra-1-overexpressing cancer cells, as Fra-1 binds to multiple enhancers interacting with the TGFB2 promoter. We show that Fra-1 does not repress TGFB2 transcription via reducing RNA Pol II recruitment at the gene promoter but by decreasing the formation of its transcription-initiating form. This is associated with complex long-range chromatin interactions implicating multiple molecularly and functionally heterogeneous Fra-1-bound transcriptional enhancers distal to the TGFB2 transcriptional start site. In particular, the latter display differential requirements upon the presence and the activity of the lysine acetyltransferase p300/CBP. Furthermore, the final transcriptional output of the TGFB2 gene seems to depend on a balance between the positive and negative effects of Fra-1 at these enhancers. CONCLUSION Our work unveils complex molecular mechanisms underlying the repressive actions of Fra-1 on TGFB2 gene expression. This has consequences for our general understanding of the functioning of the ubiquitous transcriptional complex AP-1, of which Fra-1 is the most documented component for prooncogenic activities. In addition, it raises the general question of the heterogeneity of the molecular functions of TFs binding to different enhancers regulating the same gene.
Collapse
Affiliation(s)
- Fabienne Bejjani
- IGMM, Univ Montpellier, CNRS, Montpellier, France
- DSST, ER045, PRASE, Lebanese University, Beirut, Lebanon
| | | | - Samantha Mahfoud
- IGMM, Univ Montpellier, CNRS, Montpellier, France
- DSST, ER045, PRASE, Lebanese University, Beirut, Lebanon
| | - Claire Tolza
- IGMM, Univ Montpellier, CNRS, Montpellier, France
| | - Kazem Zibara
- DSST, ER045, PRASE, Lebanese University, Beirut, Lebanon
- Biology Department, Faculty of Sciences-I, Lebanese University, Beirut, Lebanon
| | | | - Isabelle Jariel-Encontre
- IGMM, Univ Montpellier, CNRS, Montpellier, France.
- Institut de Recherche en Cancérologie de Montpellier, IRCM, INSERM U1194, ICM, Université de Montpellier, Montpellier, France.
| |
Collapse
|
9
|
Colbran LL, Ramos-Almodovar FC, Mathieson I. A gene-level test for directional selection on gene expression. Genetics 2023; 224:iyad060. [PMID: 37036411 PMCID: PMC10213495 DOI: 10.1093/genetics/iyad060] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/11/2023] [Accepted: 03/31/2023] [Indexed: 04/11/2023] Open
Abstract
Most variants identified in human genome-wide association studies and scans for selection are noncoding. Interpretation of their effects and the way in which they contribute to phenotypic variation and adaptation in human populations is therefore limited by our understanding of gene regulation and the difficulty of confidently linking noncoding variants to genes. To overcome this, we developed a gene-wise test for population-specific selection based on combinations of regulatory variants. Specifically, we use the QX statistic to test for polygenic selection on cis-regulatory variants based on whether the variance across populations in the predicted expression of a particular gene is higher than expected under neutrality. We then applied this approach to human data, testing for selection on 17,388 protein-coding genes in 26 populations from the Thousand Genomes Project. We identified 45 genes with significant evidence (FDR<0.1) for selection, including FADS1, KHK, SULT1A2, ITGAM, and several genes in the HLA region. We further confirm that these signals correspond to plausible population-level differences in predicted expression. While the small number of significant genes (0.2%) is consistent with most cis-regulatory variation evolving under genetic drift or stabilizing selection, it remains possible that there are effects not captured in this study. Our gene-level QX score is independent of standard genomic tests for selection, and may therefore be useful in combination with traditional selection scans to specifically identify selection on regulatory variation. Overall, our results demonstrate the utility of combining population-level genomic data with functional data to understand the evolution of gene expression.
Collapse
Affiliation(s)
- Laura L Colbran
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Iain Mathieson
- Corresponding author: Department of Genetics, Perelman School of Medicine, University of Pennsylvania, 405B Clinical Research Building, 415 Curie Blvd, Philadelphia, PA 19104, USA. ; *Corresponding author: Department of Genetics, Perelman School of Medicine, University of Pennsylvania, 405B Clinical Research Building, 415 Curie Blvd, Philadelphia, PA 19104, USA.
| |
Collapse
|
10
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
11
|
Kravchuk EV, Ashniev GA, Gladkova MG, Orlov AV, Vasileva AV, Boldyreva AV, Burenin AG, Skirda AM, Nikitin PI, Orlova NN. Experimental Validation and Prediction of Super-Enhancers: Advances and Challenges. Cells 2023; 12:cells12081191. [PMID: 37190100 DOI: 10.3390/cells12081191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 04/07/2023] [Accepted: 04/14/2023] [Indexed: 05/17/2023] Open
Abstract
Super-enhancers (SEs) are cis-regulatory elements of the human genome that have been widely discussed since the discovery and origin of the term. Super-enhancers have been shown to be strongly associated with the expression of genes crucial for cell differentiation, cell stability maintenance, and tumorigenesis. Our goal was to systematize research studies dedicated to the investigation of structure and functions of super-enhancers as well as to define further perspectives of the field in various applications, such as drug development and clinical use. We overviewed the fundamental studies which provided experimental data on various pathologies and their associations with particular super-enhancers. The analysis of mainstream approaches for SE search and prediction allowed us to accumulate existing data and propose directions for further algorithmic improvements of SEs' reliability levels and efficiency. Thus, here we provide the description of the most robust algorithms such as ROSE, imPROSE, and DEEPSEN and suggest their further use for various research and development tasks. The most promising research direction, which is based on topic and number of published studies, are cancer-associated super-enhancers and prospective SE-targeted therapy strategies, most of which are discussed in this review.
Collapse
Affiliation(s)
- Ekaterina V Kravchuk
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Leninskiye Gory, MSU, 1-12, 119991 Moscow, Russia
| | - German A Ashniev
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Leninskiye Gory, MSU, 1-12, 119991 Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, GSP-1, Leninskiye Gory, MSU, 1-73, 119234 Moscow, Russia
| | - Marina G Gladkova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, GSP-1, Leninskiye Gory, MSU, 1-73, 119234 Moscow, Russia
| | - Alexey V Orlov
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Anastasiia V Vasileva
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Anna V Boldyreva
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Alexandr G Burenin
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Artemiy M Skirda
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Petr I Nikitin
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Natalia N Orlova
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| |
Collapse
|
12
|
Lindhorst D, Halfon MS. Reporter gene assays and chromatin-level assays define substantially non-overlapping sets of enhancer sequences. BMC Genomics 2023; 24:17. [PMID: 36639739 PMCID: PMC9837977 DOI: 10.1186/s12864-023-09123-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 01/09/2023] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Transcriptional enhancers are essential for gene regulation, but how these regulatory elements are best defined remains a significant unresolved question. Traditional definitions rely on activity-based criteria such as reporter gene assays, while more recently, biochemical assays based on chromatin-level phenomena such as chromatin accessibility, histone modifications, and localized RNA transcription have gained prominence. RESULTS We examine here whether these two types of definitions, activity-based and chromatin-based, effectively identify the same sets of sequences. We find that, concerningly, the overlap between the two groups is strikingly limited. Few of the data sets we compared displayed statistically significant overlap, and even for those, the degree of overlap was typically small (below 40% of sequences). Moreover, a substantial batch effect was observed in which experiment set rather than experimental method was a primary driver of whether or not chromatin-defined enhancers showed a strong overlap with reporter gene-defined enhancers. CONCLUSIONS Our results raise important questions as to the appropriateness of both old and new enhancer definitions, and suggest that new approaches are required to reconcile the poor agreement among existing methods for defining enhancers.
Collapse
Affiliation(s)
- Daniel Lindhorst
- grid.273335.30000 0004 1936 9887Department of Biochemistry, University at Buffalo-State University of New York, 955 Main St. #5128, Buffalo, NY 14203 USA ,grid.21729.3f0000000419368729Present Address: Program in Biomedical Sciences, Columbia University, New York, NY 10032 USA
| | - Marc S. Halfon
- grid.273335.30000 0004 1936 9887Department of Biochemistry, University at Buffalo-State University of New York, 955 Main St. #5128, Buffalo, NY 14203 USA ,grid.273335.30000 0004 1936 9887Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203 USA ,grid.273335.30000 0004 1936 9887Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14260 USA ,NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203 USA ,grid.240614.50000 0001 2181 8635Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263 USA
| |
Collapse
|
13
|
Robinson KG, Marsh AG, Lee SK, Hicks J, Romero B, Batish M, Crowgey EL, Shrader MW, Akins RE. DNA Methylation Analysis Reveals Distinct Patterns in Satellite Cell-Derived Myogenic Progenitor Cells of Subjects with Spastic Cerebral Palsy. J Pers Med 2022; 12:jpm12121978. [PMID: 36556199 PMCID: PMC9780849 DOI: 10.3390/jpm12121978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 11/25/2022] [Indexed: 12/03/2022] Open
Abstract
Spastic type cerebral palsy (CP) is a complex neuromuscular disorder that involves altered skeletal muscle microanatomy and growth, but little is known about the mechanisms contributing to muscle pathophysiology and dysfunction. Traditional genomic approaches have provided limited insight regarding disease onset and severity, but recent epigenomic studies indicate that DNA methylation patterns can be altered in CP. Here, we examined whether a diagnosis of spastic CP is associated with intrinsic DNA methylation differences in myoblasts and myotubes derived from muscle resident stem cell populations (satellite cells; SCs). Twelve subjects were enrolled (6 CP; 6 control) with informed consent/assent. Skeletal muscle biopsies were obtained during orthopedic surgeries, and SCs were isolated and cultured to establish patient-specific myoblast cell lines capable of proliferation and differentiation in culture. DNA methylation analyses indicated significant differences at 525 individual CpG sites in proliferating SC-derived myoblasts (MB) and 1774 CpG sites in differentiating SC-derived myotubes (MT). Of these, 79 CpG sites were common in both culture types. The distribution of differentially methylated 1 Mbp chromosomal segments indicated distinct regional hypo- and hyper-methylation patterns, and significant enrichment of differentially methylated sites on chromosomes 12, 13, 14, 15, 18, and 20. Average methylation load across 2000 bp regions flanking transcriptional start sites was significantly different in 3 genes in MBs, and 10 genes in MTs. SC derived MBs isolated from study participants with spastic CP exhibited fundamental differences in DNA methylation compared to controls at multiple levels of organization that may reveal new targets for studies of mechanisms contributing to muscle dysregulation in spastic CP.
Collapse
Affiliation(s)
- Karyn G. Robinson
- Nemours Children’s Research, Nemours Children’s Health System, Wilmington, DE 19803, USA
| | - Adam G. Marsh
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19716, USA
| | - Stephanie K. Lee
- Nemours Children’s Research, Nemours Children’s Health System, Wilmington, DE 19803, USA
| | - Jonathan Hicks
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19716, USA
| | - Brigette Romero
- Medical and Molecular Sciences, University of Delaware, Newark, DE 19716, USA
| | - Mona Batish
- Medical and Molecular Sciences, University of Delaware, Newark, DE 19716, USA
| | - Erin L. Crowgey
- Nemours Children’s Research, Nemours Children’s Health System, Wilmington, DE 19803, USA
| | - M. Wade Shrader
- Department of Orthopedics, Nemours Children’s Hospital Delaware, Wilmington, DE 19803, USA
| | - Robert E. Akins
- Nemours Children’s Research, Nemours Children’s Health System, Wilmington, DE 19803, USA
- Correspondence: ; Tel.: +1-302-651-6779
| |
Collapse
|
14
|
Cooper YA, Guo Q, Geschwind DH. Multiplexed functional genomic assays to decipher the noncoding genome. Hum Mol Genet 2022; 31:R84-R96. [PMID: 36057282 PMCID: PMC9585676 DOI: 10.1093/hmg/ddac194] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/14/2022] Open
Abstract
Linkage disequilibrium and the incomplete regulatory annotation of the noncoding genome complicates the identification of functional noncoding genetic variants and their causal association with disease. Current computational methods for variant prioritization have limited predictive value, necessitating the application of highly parallelized experimental assays to efficiently identify functional noncoding variation. Here, we summarize two distinct approaches, massively parallel reporter assays and CRISPR-based pooled screens and describe their flexible implementation to characterize human noncoding genetic variation at unprecedented scale. Each approach provides unique advantages and limitations, highlighting the importance of multimodal methodological integration. These multiplexed assays of variant effects are undoubtedly poised to play a key role in the experimental characterization of noncoding genetic risk, informing our understanding of the underlying mechanisms of disease-associated loci and the development of more robust predictive classification algorithms.
Collapse
Affiliation(s)
- Yonatan A Cooper
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Medical Scientist Training Program, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Qiuyu Guo
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Program in Neurogenetics, Department of Neurology, University of California Los Angeles, Los Angeles, CA, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
15
|
Martinez-Delgado B, Barrero MJ. Epigenomic Approaches for the Diagnosis of Rare Diseases. EPIGENOMES 2022; 6:21. [PMID: 35997367 PMCID: PMC9397041 DOI: 10.3390/epigenomes6030021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 07/13/2022] [Accepted: 07/20/2022] [Indexed: 01/27/2023] Open
Abstract
Rare diseases affect more than 300 million people worldwide. Diagnosing rare diseases is a major challenge as they have different causes and etiologies. Careful assessment of clinical symptoms often leads to the testing of the most common genetic alterations that could explain the disease. Patients with negative results for these tests frequently undergo whole exome or genome sequencing, leading to the identification of the molecular cause of the disease in 50% of patients at best. Therefore, a significant proportion of patients remain undiagnosed after sequencing their genome. Recently, approaches based on functional aspects of the genome, including transcriptomics and epigenomics, are beginning to emerge. Here, we will review these approaches, including studies that have successfully provided diagnoses for complex undiagnosed cases.
Collapse
Affiliation(s)
- Beatriz Martinez-Delgado
- Molecular Genetics Unit, Institute of Rare Diseases Research (IIER), Spanish National Institute of Health Carlos III (ISCIII), 28220 Madrid, Spain;
- Centro de Investigación Biomédica en Red de Enfermedades Raras, CIBERER U758, 28029 Madrid, Spain
| | - Maria J. Barrero
- Models and Mechanisms Unit, Institute of Rare Diseases Research (IIER), Spanish National Institute of Health Carlos III (ISCIII), 28220 Madrid, Spain
| |
Collapse
|
16
|
Keränen SVE, Villahoz-Baleta A, Bruno AE, Halfon MS. REDfly: An Integrated Knowledgebase for Insect Regulatory Genomics. INSECTS 2022; 13:618. [PMID: 35886794 PMCID: PMC9323752 DOI: 10.3390/insects13070618] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/01/2022] [Accepted: 07/06/2022] [Indexed: 11/29/2022]
Abstract
We provide here an updated description of the REDfly (Regulatory Element Database for Fly) database of transcriptional regulatory elements, a unique resource that provides regulatory annotation for the genome of Drosophila and other insects. The genomic sequences regulating insect gene expression-transcriptional cis-regulatory modules (CRMs, e.g., "enhancers") and transcription factor binding sites (TFBSs)-are not currently curated by any other major database resources. However, knowledge of such sequences is important, as CRMs play critical roles with respect to disease as well as normal development, phenotypic variation, and evolution. Characterized CRMs also provide useful tools for both basic and applied research, including developing methods for insect control. REDfly, which is the most detailed existing platform for metazoan regulatory-element annotation, includes over 40,000 experimentally verified CRMs and TFBSs along with their DNA sequences, their associated genes, and the expression patterns they direct. Here, we briefly describe REDfly's contents and data model, with an emphasis on the new features implemented since 2020. We then provide an illustrated walk-through of several common REDfly search use cases.
Collapse
Affiliation(s)
| | - Angel Villahoz-Baleta
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA; (A.V.-B.); (A.E.B.)
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | - Andrew E. Bruno
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA; (A.V.-B.); (A.E.B.)
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | - Marc S. Halfon
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, State University of New York at Buffalo, Buffalo, NY 14203, USA
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
- Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| |
Collapse
|
17
|
Mulero Hernández J, Fernández-Breis JT. Analysis of the landscape of human enhancer sequences in biological databases. Comput Struct Biotechnol J 2022; 20:2728-2744. [PMID: 35685360 PMCID: PMC9168495 DOI: 10.1016/j.csbj.2022.05.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/20/2022] [Accepted: 05/21/2022] [Indexed: 12/01/2022] Open
Abstract
The process of gene regulation extends as a network in which both genetic sequences and proteins are involved. The levels of regulation and the mechanisms involved are multiple. Transcription is the main control mechanism for most genes, being the downstream steps responsible for refining the transcription patterns. In turn, gene transcription is mainly controlled by regulatory events that occur at promoters and enhancers. Several studies are focused on analyzing the contribution of enhancers in the development of diseases and their possible use as therapeutic targets. The study of regulatory elements has advanced rapidly in recent years with the development and use of next generation sequencing techniques. All this information has generated a large volume of information that has been transferred to a growing number of public repositories that store this information. In this article, we analyze the content of those public repositories that contain information about human enhancers with the aim of detecting whether the knowledge generated by scientific research is contained in those databases in a way that could be computationally exploited. The analysis will be based on three main aspects identified in the literature: types of enhancers, type of evidence about the enhancers, and methods for detecting enhancer-promoter interactions. Our results show that no single database facilitates the optimal exploitation of enhancer data, most types of enhancers are not represented in the databases and there is need for a standardized model for enhancers. We have identified major gaps and challenges for the computational exploitation of enhancer data.
Collapse
Affiliation(s)
- Juan Mulero Hernández
- Dept. Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, Spain
| | | |
Collapse
|
18
|
Velazquez-Arcelay K, Benton ML, Capra JA. Diverse functions associate with non-coding polymorphisms shared between humans and chimpanzees. BMC Ecol Evol 2022; 22:68. [PMID: 35606693 PMCID: PMC9125839 DOI: 10.1186/s12862-022-02020-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Accepted: 05/09/2022] [Indexed: 11/24/2022] Open
Abstract
Background Long-term balancing selection (LTBS) can maintain allelic variation at a locus over millions of years and through speciation events. Variants shared between species in the state of identity-by-descent, hereafter “trans-species polymorphisms”, can result from LTBS, often due to host–pathogen interactions. For instance, the major histocompatibility complex (MHC) locus contains TSPs present across primates. Several hundred candidate LTBS regions have been identified in humans and chimpanzees; however, because many are in non-protein-coding regions of the genome, the functions and potential adaptive roles for most remain unknown. Results We integrated diverse genomic annotations to explore the functions of 60 previously identified regions with multiple shared polymorphisms (SPs) between humans and chimpanzees, including 19 with strong evidence of LTBS. We analyzed genome-wide functional assays, expression quantitative trait loci (eQTL), genome-wide association studies (GWAS), and phenome-wide association studies (PheWAS) for all the regions. We identify functional annotations for 59 regions, including 58 with evidence of gene regulatory function from GTEx or functional genomics data and 19 with evidence of trait association from GWAS or PheWAS. As expected, the SPs associate in humans with many immune system phenotypes, including response to pathogens, but we also find associations with a range of other phenotypes, including body size, alcohol intake, cognitive performance, risk-taking behavior, and urate levels. Conclusions The diversity of traits associated with non-coding regions with multiple SPs support previous hypotheses that functions beyond the immune system are likely subject to LTBS. Furthermore, several of these trait associations provide support and candidate genetic loci for previous hypothesis about behavioral diversity in human and chimpanzee populations, such as the importance of variation in risk sensitivity. Supplementary Information The online version contains supplementary material available at 10.1186/s12862-022-02020-x.
Collapse
|
19
|
Huang D, Ovcharenko I. Enhancer-silencer transitions in the human genome. Genome Res 2022; 32:437-448. [PMID: 35105669 PMCID: PMC8896465 DOI: 10.1101/gr.275992.121] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 01/27/2022] [Indexed: 11/24/2022]
Abstract
Dual-function regulatory elements (REs), acting as enhancers in some cellular contexts and as silencers in others, have been reported to facilitate the precise gene regulatory response to developmental signals in Drosophila melanogaster. However, with few isolated examples detected, dual-function REs in mammals have yet to be systematically studied. We herein investigated this class of REs in the human genome and profiled their activity across multiple cell types. Focusing on enhancer–silencer transitions specific to the development of T cells, we built an accurate deep learning classifier of REs and identified about 12,000 silencers active in primary peripheral blood T cells that act as enhancers in embryonic stem cells. Compared with regular silencers, these dual-function REs are evolving under stronger purifying selection and are enriched for mutations associated with disease phenotypes and altered gene expression. In addition, they are enriched in the loci of transcriptional regulators, such as transcription factors (TFs) and chromatin remodeling genes. Dual-function REs consist of two intertwined but largely distinct sets of binding sites bound by either activating or repressing TFs, depending on the type of RE function in a given cell line. This indicates the recruitment of different TFs for different regulatory modes and a complex DNA sequence composition of these REs with dual activating and repressive encoding. With an estimated >6% of cell type–specific human silencers acting as dual-function REs, this overlooked class of REs requires a specific investigation on how their inherent functional plasticity might be a contributing factor to human diseases.
Collapse
|
20
|
Ding J, Frantzeskos A, Orozco G. Functional interrogation of autoimmune disease genetics using CRISPR/Cas9 technologies and massively parallel reporter assays. Semin Immunopathol 2022; 44:137-147. [PMID: 34508276 PMCID: PMC8837574 DOI: 10.1007/s00281-021-00887-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 08/13/2021] [Indexed: 02/07/2023]
Abstract
Genetic studies, including genome-wide association studies, have identified many common variants that are associated with autoimmune diseases. Strikingly, in addition to being frequently observed in healthy individuals, a number of these variants are shared across diseases with diverse clinical presentations. This highlights the potential for improved autoimmune disease understanding which could be achieved by characterising the mechanism by which variants lead to increased risk of disease. Of particular interest is the potential for identifying novel drug targets or of repositioning drugs currently used in other diseases. The majority of autoimmune disease variants do not alter coding regions and it is often difficult to generate a plausible hypothetical mechanism by which variants affect disease-relevant genes and pathways. Given the interest in this area, considerable effort has been invested in developing and applying appropriate methodologies. Two of the most important technologies in this space include both low- and high-throughput genomic perturbation using the CRISPR/Cas9 system and massively parallel reporter assays. In this review, we introduce the field of autoimmune disease functional genomics and use numerous examples to demonstrate the recent and potential future impact of these technologies.
Collapse
Affiliation(s)
- James Ding
- Centre for Genetics and Genomics Versus Arthritis, Division of Musculoskeletal and Dermatological Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, AV Hill Building, Oxford Road, Manchester, M13 9LJ, UK.
| | - Antonios Frantzeskos
- Centre for Genetics and Genomics Versus Arthritis, Division of Musculoskeletal and Dermatological Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, AV Hill Building, Oxford Road, Manchester, M13 9LJ, UK
| | - Gisela Orozco
- Centre for Genetics and Genomics Versus Arthritis, Division of Musculoskeletal and Dermatological Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, AV Hill Building, Oxford Road, Manchester, M13 9LJ, UK
- NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, M13 9WL, UK
| |
Collapse
|
21
|
Biswas A, Narlikar L. A universal framework for detecting cis-regulatory diversity in DNA regulatory regions. Genome Res 2021; 31:1646-1662. [PMID: 34285090 PMCID: PMC8415372 DOI: 10.1101/gr.274563.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 07/09/2021] [Indexed: 12/02/2022]
Abstract
High-throughput sequencing-based assays measure different biochemical activities pertaining to gene regulation, genome-wide. These activities include transcription factor (TF)–DNA binding, enhancer activity, open chromatin, and more. A major goal is to understand underlying sequence components, or motifs, that can explain the measured activity. It is usually not one motif but a combination of motifs bound by cooperatively acting proteins that confers activity to such regions. Furthermore, regions can be diverse, governed by different combinations of TFs/motifs. Current approaches do not take into account this issue of combinatorial diversity. We present a new statistical framework, cisDIVERSITY, which models regions as diverse modules characterized by combinations of motifs while simultaneously learning the motifs themselves. Because cisDIVERSITY does not rely on knowledge of motifs, modules, cell type, or organism, it is general enough to be applied to regions reported by most high-throughput assays. For example, in enhancer predictions resulting from different assays—GRO-cap, STARR-seq, and those measuring chromatin structure—cisDIVERSITY discovers distinct modules and combinations of TF binding sites, some specific to the assay. From protein–DNA binding data, cisDIVERSITY identifies potential cofactors of the profiled TF, whereas from ATAC-seq data, it identifies tissue-specific regulatory modules. Finally, analysis of single-cell ATAC-seq data suggests that regions open in one cell-state encode information about future states, with certain modules staying open and others closing down in the next time point.
Collapse
Affiliation(s)
- Anushua Biswas
- CSIR-National Chemical Laboratory, Academy of Scientific and Innovative Research
| | - Leelavati Narlikar
- CSIR-National Chemical Laboratory, Academy of Scientific and Innovative Research
| |
Collapse
|
22
|
Deregulation of Transcriptional Enhancers in Cancer. Cancers (Basel) 2021; 13:cancers13143532. [PMID: 34298745 PMCID: PMC8303223 DOI: 10.3390/cancers13143532] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 06/29/2021] [Accepted: 07/08/2021] [Indexed: 12/14/2022] Open
Abstract
Simple Summary One of the major challenges in cancer treatments is the dynamic adaptation of tumor cells to cancer therapies. In this regard, tumor cells can modify their response to environmental cues without altering their DNA sequence. This cell plasticity enables cells to undergo morphological and functional changes, for example, during the process of tumour metastasis or when acquiring resistance to cancer therapies. Central to cell plasticity, are the dynamic changes in gene expression that are controlled by a set of molecular switches called enhancers. Enhancers are DNA elements that determine when, where and to what extent genes should be switched on and off. Thus, defects in enhancer function can disrupt the gene expression program and can lead to tumour formation. Here, we review how enhancers control the activity of cancer-associated genes and how defects in these regulatory elements contribute to cell plasticity in cancer. Understanding enhancer (de)regulation can provide new strategies for modulating cell plasticity in tumour cells and can open new research avenues for cancer therapy. Abstract Epigenetic regulations can shape a cell’s identity by reversible modifications of the chromatin that ultimately control gene expression in response to internal and external cues. In this review, we first discuss the concept of cell plasticity in cancer, a process that is directly controlled by epigenetic mechanisms, with a particular focus on transcriptional enhancers as the cornerstone of epigenetic regulation. In the second part, we discuss mechanisms of enhancer deregulation in adult stem cells and epithelial-to-mesenchymal transition (EMT), as two paradigms of cell plasticity that are dependent on epigenetic regulation and serve as major sources of tumour heterogeneity. Finally, we review how genetic variations at enhancers and their epigenetic modifiers contribute to tumourigenesis, and we highlight examples of cancer drugs that target epigenetic modifications at enhancers.
Collapse
|
23
|
Fong SL, Capra JA. Modeling the evolutionary architectures of transcribed human enhancer sequences reveals distinct origins, functions, and associations with human-trait variation. Mol Biol Evol 2021; 38:3681-3696. [PMID: 33973014 PMCID: PMC8382917 DOI: 10.1093/molbev/msab138] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Despite the importance of gene regulatory enhancers in human biology and evolution, we lack a comprehensive model of enhancer evolution and function. This substantially limits our understanding of the genetic basis of species divergence and our ability to interpret the effects of noncoding variants on human traits. To explore enhancer sequence evolution and its relationship to regulatory function, we traced the evolutionary origins of transcribed human enhancer sequences with activity across diverse tissues and cellular contexts from the FANTOM5 consortium. The transcribed enhancers are enriched for sequences of a single evolutionary age (“simple” evolutionary architectures) compared with enhancers that are composites of sequences of multiple evolutionary ages (“complex” evolutionary architectures), likely indicating constraint against genomic rearrangements. Complex enhancers are older, more pleiotropic, and more active across species than simple enhancers. Genetic variants within complex enhancers are also less likely to associate with human traits and biochemical activity. Transposable-element-derived sequences (TEDS) have made diverse contributions to enhancers of both architectures; the majority of TEDS are found in enhancers with simple architectures, while a minority have remodeled older sequences to create complex architectures. Finally, we compare the evolutionary architectures of transcribed enhancers with histone-mark-defined enhancers. Our results reveal that most human transcribed enhancers are ancient sequences of a single age, and thus the evolution of most human enhancers was not driven by increases in evolutionary complexity over time. Our analyses further suggest that considering enhancer evolutionary histories provides context that can aid interpretation of the effects of variants on enhancer function. Based on these results, we propose a framework for analyzing enhancer evolutionary architecture.
Collapse
Affiliation(s)
- Sarah L Fong
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - John A Capra
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA.,Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA.,Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, USA
| |
Collapse
|
24
|
Lewis EMA, Kaushik K, Sandoval LA, Antony I, Dietmann S, Kroll KL. Epigenetic regulation during human cortical development: Seq-ing answers from the brain to the organoid. Neurochem Int 2021; 147:105039. [PMID: 33915225 PMCID: PMC8387070 DOI: 10.1016/j.neuint.2021.105039] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 03/23/2021] [Accepted: 03/27/2021] [Indexed: 01/22/2023]
Abstract
Epigenetic regulation plays an important role in controlling gene expression during complex processes, such as development of the human brain. Mutations in genes encoding chromatin modifying proteins and in the non-protein coding sequences of the genome can potentially alter transcription factor binding or chromatin accessibility. Such mutations can frequently cause neurodevelopmental disorders, therefore understanding how epigenetic regulation shapes brain development is of particular interest. While epigenetic regulation of neural development has been extensively studied in murine models, significant species-specific differences in both the genome sequence and in brain development necessitate human models. However, access to human fetal material is limited and these tissues cannot be grown or experimentally manipulated ex vivo. Therefore, models that recapitulate particular aspects of human fetal brain development, such as the in vitro differentiation of human pluripotent stem cells (hPSCs), are instrumental for studying the epigenetic regulation of human neural development. Here, we examine recent studies that have defined changes in the epigenomic landscape during fetal brain development. We compare these studies with analogous data derived by in vitro differentiation of hPSCs into specific neuronal cell types or as three-dimensional cerebral organoids. Such comparisons can be informative regarding which aspects of fetal brain development are faithfully recapitulated by in vitro differentiation models and provide a foundation for using experimentally tractable in vitro models of human brain development to study neural gene regulation and the basis of its disruption to cause neurodevelopmental disorders.
Collapse
Affiliation(s)
- Emily M A Lewis
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| | - Komal Kaushik
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| | - Luke A Sandoval
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| | - Irene Antony
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| | - Sabine Dietmann
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| | - Kristen L Kroll
- Department of Developmental Biology, Washington University School of Medicine, 660 S. Euclid Avenue St, Louis, MO, 63110, USA.
| |
Collapse
|
25
|
Kapourani CA, Argelaguet R, Sanguinetti G, Vallejos CA. scMET: Bayesian modeling of DNA methylation heterogeneity at single-cell resolution. Genome Biol 2021; 22:114. [PMID: 33879195 PMCID: PMC8056718 DOI: 10.1186/s13059-021-02329-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 03/25/2021] [Indexed: 02/06/2023] Open
Abstract
High-throughput single-cell measurements of DNA methylomes can quantify methylation heterogeneity and uncover its role in gene regulation. However, technical limitations and sparse coverage can preclude this task. scMET is a hierarchical Bayesian model which overcomes sparsity, sharing information across cells and genomic features to robustly quantify genuine biological heterogeneity. scMET can identify highly variable features that drive epigenetic heterogeneity, and perform differential methylation and variability analyses. We illustrate how scMET facilitates the characterization of epigenetically distinct cell populations and how it enables the formulation of novel hypotheses on the epigenetic regulation of gene expression. scMET is available at https://github.com/andreaskapou/scMET .
Collapse
Affiliation(s)
- Chantriolnt-Andreas Kapourani
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
- School of Informatics, University of Edinburgh, Edinburgh, UK
| | | | - Guido Sanguinetti
- School of Informatics, University of Edinburgh, Edinburgh, UK.
- SISSA, International School of Advanced Studies, Trieste, Italy.
| | - Catalina A Vallejos
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK.
- The Alan Turing Institute, London, UK.
| |
Collapse
|
26
|
Bejjani F, Tolza C, Boulanger M, Downes D, Romero R, Maqbool M, Zine El Aabidine A, Andrau JC, Lebre S, Brehelin L, Parrinello H, Rohmer M, Kaoma T, Vallar L, Hughes J, Zibara K, Lecellier CH, Piechaczyk M, Jariel-Encontre I. Fra-1 regulates its target genes via binding to remote enhancers without exerting major control on chromatin architecture in triple negative breast cancers. Nucleic Acids Res 2021; 49:2488-2508. [PMID: 33533919 PMCID: PMC7968996 DOI: 10.1093/nar/gkab053] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 12/21/2020] [Accepted: 01/25/2021] [Indexed: 12/12/2022] Open
Abstract
The ubiquitous family of dimeric transcription factors AP-1 is made up of Fos and Jun family proteins. It has long been thought to operate principally at gene promoters and how it controls transcription is still ill-understood. The Fos family protein Fra-1 is overexpressed in triple negative breast cancers (TNBCs) where it contributes to tumor aggressiveness. To address its transcriptional actions in TNBCs, we combined transcriptomics, ChIP-seqs, machine learning and NG Capture-C. Additionally, we studied its Fos family kin Fra-2 also expressed in TNBCs, albeit much less. Consistently with their pleiotropic effects, Fra-1 and Fra-2 up- and downregulate individually, together or redundantly many genes associated with a wide range of biological processes. Target gene regulation is principally due to binding of Fra-1 and Fra-2 at regulatory elements located distantly from cognate promoters where Fra-1 modulates the recruitment of the transcriptional co-regulator p300/CBP and where differences in AP-1 variant motif recognition can underlie preferential Fra-1- or Fra-2 bindings. Our work also shows no major role for Fra-1 in chromatin architecture control at target gene loci, but suggests collaboration between Fra-1-bound and -unbound enhancers within chromatin hubs sometimes including promoters for other Fra-1-regulated genes. Our work impacts our view of AP-1.
Collapse
Affiliation(s)
- Fabienne Bejjani
- IGMM, Univ Montpellier, CNRS, Montpellier, France
- PRASE, DSST, ER045, Lebanese University, Beirut, Lebanon
| | - Claire Tolza
- IGMM, Univ Montpellier, CNRS, Montpellier, France
| | | | - Damien Downes
- Medical Research Council, Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, Oxford University, Oxford, UK
| | - Raphaël Romero
- IMAG, Univ Montpellier, CNRS, Montpellier, France
- LIRMM, Univ Montpellier, CNRS, Montpellier, France
| | | | | | | | - Sophie Lebre
- IMAG, Univ Montpellier, CNRS, Montpellier, France
| | | | - Hughes Parrinello
- Montpellier GenomiX, MGX, BioCampus Montpellier, CNRS, INSERM, Univ. Montpellier, F-34094 Montpellier, France
| | - Marine Rohmer
- Montpellier GenomiX, MGX, BioCampus Montpellier, CNRS, INSERM, Univ. Montpellier, F-34094 Montpellier, France
| | - Tony Kaoma
- Computational Biomedecine, Quantitative Biology Unit, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Laurent Vallar
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | - Jim R Hughes
- Medical Research Council, Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, Oxford University, Oxford, UK
| | - Kazem Zibara
- PRASE, DSST, ER045, Lebanese University, Beirut, Lebanon
- Biology Department, Faculty of Sciences-I, Lebanese University, Beirut, Lebanon
| | - Charles-Henri Lecellier
- IGMM, Univ Montpellier, CNRS, Montpellier, France
- LIRMM, Univ Montpellier, CNRS, Montpellier, France
| | | | | |
Collapse
|
27
|
Boulanger M, Chakraborty M, Tempé D, Piechaczyk M, Bossis G. SUMO and Transcriptional Regulation: The Lessons of Large-Scale Proteomic, Modifomic and Genomic Studies. Molecules 2021; 26:molecules26040828. [PMID: 33562565 PMCID: PMC7915335 DOI: 10.3390/molecules26040828] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 01/29/2021] [Accepted: 02/01/2021] [Indexed: 12/12/2022] Open
Abstract
One major role of the eukaryotic peptidic post-translational modifier SUMO in the cell is transcriptional control. This occurs via modification of virtually all classes of transcriptional actors, which include transcription factors, transcriptional coregulators, diverse chromatin components, as well as Pol I-, Pol II- and Pol III transcriptional machineries and their regulators. For many years, the role of SUMOylation has essentially been studied on individual proteins, or small groups of proteins, principally dealing with Pol II-mediated transcription. This provided only a fragmentary view of how SUMOylation controls transcription. The recent advent of large-scale proteomic, modifomic and genomic studies has however considerably refined our perception of the part played by SUMO in gene expression control. We review here these developments and the new concepts they are at the origin of, together with the limitations of our knowledge. How they illuminate the SUMO-dependent transcriptional mechanisms that have been characterized thus far and how they impact our view of SUMO-dependent chromatin organization are also considered.
Collapse
Affiliation(s)
- Mathias Boulanger
- Institut de Génétique Moléculaire de Montpellier (IGMM), University of Montpellier, CNRS, Montpellier, France; (M.B.); (M.C.); (D.T.)
- Equipe Labellisée Ligue Contre le Cancer, Paris, France
| | - Mehuli Chakraborty
- Institut de Génétique Moléculaire de Montpellier (IGMM), University of Montpellier, CNRS, Montpellier, France; (M.B.); (M.C.); (D.T.)
- Equipe Labellisée Ligue Contre le Cancer, Paris, France
| | - Denis Tempé
- Institut de Génétique Moléculaire de Montpellier (IGMM), University of Montpellier, CNRS, Montpellier, France; (M.B.); (M.C.); (D.T.)
- Equipe Labellisée Ligue Contre le Cancer, Paris, France
| | - Marc Piechaczyk
- Institut de Génétique Moléculaire de Montpellier (IGMM), University of Montpellier, CNRS, Montpellier, France; (M.B.); (M.C.); (D.T.)
- Equipe Labellisée Ligue Contre le Cancer, Paris, France
- Correspondence: (M.P.); (G.B.)
| | - Guillaume Bossis
- Institut de Génétique Moléculaire de Montpellier (IGMM), University of Montpellier, CNRS, Montpellier, France; (M.B.); (M.C.); (D.T.)
- Equipe Labellisée Ligue Contre le Cancer, Paris, France
- Correspondence: (M.P.); (G.B.)
| |
Collapse
|
28
|
Lambert JT, Su-Feher L, Cichewicz K, Warren TL, Zdilar I, Wang Y, Lim KJ, Haigh JL, Morse SJ, Canales CP, Stradleigh TW, Castillo Palacios E, Haghani V, Moss SD, Parolini H, Quintero D, Shrestha D, Vogt D, Byrne LC, Nord AS. Parallel functional testing identifies enhancers active in early postnatal mouse brain. eLife 2021; 10:69479. [PMID: 34605404 PMCID: PMC8577842 DOI: 10.7554/elife.69479] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 10/02/2021] [Indexed: 01/07/2023] Open
Abstract
Enhancers are cis-regulatory elements that play critical regulatory roles in modulating developmental transcription programs and driving cell-type-specific and context-dependent gene expression in the brain. The development of massively parallel reporter assays (MPRAs) has enabled high-throughput functional screening of candidate DNA sequences for enhancer activity. Tissue-specific screening of in vivo enhancer function at scale has the potential to greatly expand our understanding of the role of non-coding sequences in development, evolution, and disease. Here, we adapted a self-transcribing regulatory element MPRA strategy for delivery to early postnatal mouse brain via recombinant adeno-associated virus (rAAV). We identified and validated putative enhancers capable of driving reporter gene expression in mouse forebrain, including regulatory elements within an intronic CACNA1C linkage disequilibrium block associated with risk in neuropsychiatric disorder genetic studies. Paired screening and single enhancer in vivo functional testing, as we show here, represents a powerful approach towards characterizing regulatory activity of enhancers and understanding how enhancer sequences organize gene expression in the brain.
Collapse
Affiliation(s)
- Jason T Lambert
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Linda Su-Feher
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Karol Cichewicz
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Tracy L Warren
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Iva Zdilar
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Yurong Wang
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Kenneth J Lim
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Jessica L Haigh
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Sarah J Morse
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Cesar P Canales
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Tyler W Stradleigh
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Erika Castillo Palacios
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Viktoria Haghani
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Spencer D Moss
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Hannah Parolini
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Diana Quintero
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Diwash Shrestha
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| | - Daniel Vogt
- Department of Pediatrics and Human Development, Grand Rapids Research Center, Michigan State UniversityGrand RapidsUnited States
| | - Leah C Byrne
- Helen Wills Neuroscience Institute, University of California, BerkeleyBerkeleyUnited States,Departments of Ophthalmology and Neurobiology, University of PittsburghPittsburghUnited States
| | - Alex S Nord
- Department of Psychiatry and Behavioral Sciences, University of California, DavisDavisUnited States,Department of Neurobiology, Physiology and Behavior, University of California, DavisDavisUnited States
| |
Collapse
|
29
|
Mulvey B, Lagunas T, Dougherty JD. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry 2021; 89:76-89. [PMID: 32843144 PMCID: PMC7938388 DOI: 10.1016/j.biopsych.2020.06.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Neuropsychiatric phenotypes have long been known to be influenced by heritable risk factors, directly confirmed by the past decade of genetic studies that have revealed specific genetic variants enriched in disease cohorts. However, the initial hope that a small set of genes would be responsible for a given disorder proved false. The more complex reality is that a given disorder may be influenced by myriad small-effect noncoding variants and/or by rare but severe coding variants, many de novo. Noncoding genomic sequences-for which molecular functions cannot usually be inferred-harbor a large portion of these variants, creating a substantial barrier to understanding higher-order molecular and biological systems of disease. Fortunately, novel genetic technologies-scalable oligonucleotide synthesis, RNA sequencing, and CRISPR (clustered regularly interspaced short palindromic repeats)-have opened novel avenues to experimentally identify biologically significant variants en masse. Massively parallel reporter assays (MPRAs) are an especially versatile technique resulting from such innovations. MPRAs are powerful molecular genetics tools that can be used to screen thousands of untranscribed or untranslated sequences and their variants for functional effects in a single experiment. This approach, though underutilized in psychiatric genetics, has several useful features for the field. We review methods for assaying putatively functional genetic variants and regions, emphasizing MPRAs and the opportunities they hold for dissection of psychiatric polygenicity. We discuss literature applying functional assays in neurogenetics, highlighting strengths, caveats, and design considerations-especially regarding disease-relevant variables (cell type, neurodevelopment, and sex), and we ultimately propose applications of MPRA to both computational and experimental neurogenetics of polygenic disease risk.
Collapse
Affiliation(s)
- Bernard Mulvey
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Tomás Lagunas
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri.
| |
Collapse
|
30
|
Cao Y, Kitanovski S, Hoffmann D. intePareto: an R package for integrative analyses of RNA-Seq and ChIP-Seq data. BMC Genomics 2020; 21:802. [PMID: 33372591 PMCID: PMC7771091 DOI: 10.1186/s12864-020-07205-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 10/29/2020] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND RNA-Seq, the high-throughput sequencing (HT-Seq) of mRNAs, has become an essential tool for characterizing gene expression differences between different cell types and conditions. Gene expression is regulated by several mechanisms, including epigenetically by post-translational histone modifications which can be assessed by ChIP-Seq (Chromatin Immuno-Precipitation Sequencing). As more and more biological samples are analyzed by the combination of ChIP-Seq and RNA-Seq, the integrated analysis of the corresponding data sets becomes, theoretically, a unique option to study gene regulation. However, technically such analyses are still in their infancy. RESULTS Here we introduce intePareto, a computational tool for the integrative analysis of RNA-Seq and ChIP-Seq data. With intePareto we match RNA-Seq and ChIP-Seq data at the level of genes, perform differential expression analysis between biological conditions, and prioritize genes with consistent changes in RNA-Seq and ChIP-Seq data using Pareto optimization. CONCLUSION intePareto facilitates comprehensive understanding of high dimensional transcriptomic and epigenomic data. Its superiority to a naive differential gene expression analysis with RNA-Seq and available integrative approach is demonstrated by analyzing a public dataset.
Collapse
Affiliation(s)
- Yingying Cao
- Bioinformatics and Computational Biophysics, Faculty of Biology and Center for Medical Biotechnology (ZMB), University of Duisburg-Essen, Universitätsstr.2, Essen, 45141, Germany.
| | - Simo Kitanovski
- Bioinformatics and Computational Biophysics, Faculty of Biology and Center for Medical Biotechnology (ZMB), University of Duisburg-Essen, Universitätsstr.2, Essen, 45141, Germany
| | - Daniel Hoffmann
- Bioinformatics and Computational Biophysics, Faculty of Biology and Center for Medical Biotechnology (ZMB), University of Duisburg-Essen, Universitätsstr.2, Essen, 45141, Germany
| |
Collapse
|
31
|
Tobias IC, Abatti LE, Moorthy SD, Mullany S, Taylor T, Khader N, Filice MA, Mitchell JA. Transcriptional enhancers: from prediction to functional assessment on a genome-wide scale. Genome 2020; 64:426-448. [PMID: 32961076 DOI: 10.1139/gen-2020-0104] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Enhancers are cis-regulatory sequences located distally to target genes. These sequences consolidate developmental and environmental cues to coordinate gene expression in a tissue-specific manner. Enhancer function and tissue specificity depend on the expressed set of transcription factors, which recognize binding sites and recruit cofactors that regulate local chromatin organization and gene transcription. Unlike other genomic elements, enhancers are challenging to identify because they function independently of orientation, are often distant from their promoters, have poorly defined boundaries, and display no reading frame. In addition, there are no defined genetic or epigenetic features that are unambiguously associated with enhancer activity. Over recent years there have been developments in both empirical assays and computational methods for enhancer prediction. We review genome-wide tools, CRISPR advancements, and high-throughput screening approaches that have improved our ability to both observe and manipulate enhancers in vitro at the level of primary genetic sequences, chromatin states, and spatial interactions. We also highlight contemporary animal models and their importance to enhancer validation. Together, these experimental systems and techniques complement one another and broaden our understanding of enhancer function in development, evolution, and disease.
Collapse
Affiliation(s)
- Ian C Tobias
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Luis E Abatti
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Sakthi D Moorthy
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Shanelle Mullany
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Tiegh Taylor
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Nawrah Khader
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Mario A Filice
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| | - Jennifer A Mitchell
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada.,Department of Cell and Systems Biology, University of Toronto, Toronto, ON, M5S 3G5, Canada
| |
Collapse
|
32
|
Dukler N, Huang YF, Siepel A. Phylogenetic Modeling of Regulatory Element Turnover Based on Epigenomic Data. Mol Biol Evol 2020; 37:2137-2152. [PMID: 32176292 PMCID: PMC7306682 DOI: 10.1093/molbev/msaa073] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Evolutionary changes in gene expression are often driven by gains and losses of cis-regulatory elements (CREs). The dynamics of CRE evolution can be examined using multispecies epigenomic data, but so far such analyses have generally been descriptive and model-free. Here, we introduce a probabilistic modeling framework for the evolution of CREs that operates directly on raw chromatin immunoprecipitation and sequencing (ChIP-seq) data and fully considers the phylogenetic relationships among species. Our framework includes a phylogenetic hidden Markov model, called epiPhyloHMM, for identifying the locations of multiply aligned CREs, and a combined phylogenetic and generalized linear model, called phyloGLM, for accounting for the influence of a rich set of genomic features in describing their evolutionary dynamics. We apply these methods to previously published ChIP-seq data for the H3K4me3 and H3K27ac histone modifications in liver tissue from nine mammals. We find that enhancers are gained and lost during mammalian evolution at about twice the rate of promoters, and that turnover rates are negatively correlated with DNA sequence conservation, expression level, and tissue breadth, and positively correlated with distance from the transcription start site, consistent with previous findings. In addition, we find that the predicted dosage sensitivity of target genes positively correlates with DNA sequence constraint in CREs but not with turnover rates, perhaps owing to differences in the effect sizes of the relevant mutations. Altogether, our probabilistic modeling framework enables a variety of powerful new analyses.
Collapse
Affiliation(s)
- Noah Dukler
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
- Physiology, Biophysics, and Systems Biology, Weill Cornell Medical College, New York, NY
| | - Yi-Fei Huang
- Department of Biology and Huck Institute of Life Sciences, Pennsylvania State University, University Park, PA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| |
Collapse
|
33
|
Broekema RV, Bakker OB, Jonkers IH. A practical view of fine-mapping and gene prioritization in the post-genome-wide association era. Open Biol 2020; 10:190221. [PMID: 31937202 PMCID: PMC7014684 DOI: 10.1098/rsob.190221] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 12/05/2019] [Indexed: 12/17/2022] Open
Abstract
Over the past 15 years, genome-wide association studies (GWASs) have enabled the systematic identification of genetic loci associated with traits and diseases. However, due to resolution issues and methodological limitations, the true causal variants and genes associated with traits remain difficult to identify. In this post-GWAS era, many biological and computational fine-mapping approaches now aim to solve these issues. Here, we review fine-mapping and gene prioritization approaches that, when combined, will improve the understanding of the underlying mechanisms of complex traits and diseases. Fine-mapping of genetic variants has become increasingly sophisticated: initially, variants were simply overlapped with functional elements, but now the impact of variants on regulatory activity and direct variant-gene 3D interactions can be identified. Moreover, gene manipulation by CRISPR/Cas9, the identification of expression quantitative trait loci and the use of co-expression networks have all increased our understanding of the genes and pathways affected by GWAS loci. However, despite this progress, limitations including the lack of cell-type- and disease-specific data and the ever-increasing complexity of polygenic models of traits pose serious challenges. Indeed, the combination of fine-mapping and gene prioritization by statistical, functional and population-based strategies will be necessary to truly understand how GWAS loci contribute to complex traits and diseases.
Collapse
Affiliation(s)
| | | | - I. H. Jonkers
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
34
|
Ponting CP. Big knowledge from big data in functional genomics. Emerg Top Life Sci 2017; 1:245-248. [PMID: 33525805 PMCID: PMC7288990 DOI: 10.1042/etls20170129] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 09/12/2017] [Accepted: 09/12/2017] [Indexed: 02/07/2023]
Abstract
With so much genomics data being produced, it might be wise to pause and consider what purpose this data can or should serve. Some improve annotations, others predict molecular interactions, but few add directly to existing knowledge. This is because sequence annotations do not always implicate function, and molecular interactions are often irrelevant to a cell's or organism's survival or propagation. Merely correlative relationships found in big data fail to provide answers to the Why questions of human biology. Instead, those answers are expected from methods that causally link DNA changes to downstream effects without being confounded by reverse causation. These approaches require the controlled measurement of the consequences of DNA variants, for example, either those introduced in single cells using CRISPR/Cas9 genome editing or that are already present across the human population. Inferred causal relationships between genetic variation and cellular phenotypes or disease show promise to rapidly grow and underpin our knowledge base.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, U.K
| |
Collapse
|