1
|
Tenekeci S, Tekir S. Identifying promoter and enhancer sequences by graph convolutional networks. Comput Biol Chem 2024; 110:108040. [PMID: 38430611 DOI: 10.1016/j.compbiolchem.2024.108040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 01/09/2024] [Accepted: 02/27/2024] [Indexed: 03/05/2024]
Abstract
Identification of promoters, enhancers, and their interactions helps understand genetic regulation. This study proposes a graph-based semi-supervised learning model (GCN4EPI) for the enhancer-promoter classification problem. We adopt a graph convolutional network (GCN) architecture to integrate interaction information with sequence features. Nodes of the constructed graph hold word embeddings of DNA sequences while edges hold the Enhancer-Promoter Interaction (EPI) information. By means of semi-supervised learning, much less data (16%) and time are needed in model training. Comparisons on a benchmark dataset of six human cell lines show that the proposed approach outperforms the state-of-the-art methods by a large margin (10% higher F1 score) and has the fastest training time (up to 3 times). Moreover, GCN4EPI's performance on cross-cell line data is also better than the baselines (3% higher F1 score). Our qualitative analyses with graph explainability models prove that GCN4EPI learns from both text and graph structure. The results suggest that integrating interaction information with sequence features improves predictive performance and compensates for the number of training instances.
Collapse
Affiliation(s)
- Samet Tenekeci
- Department of Computer Engineering, Izmir Institute of Technology, Izmir, 35430, Turkiye
| | - Selma Tekir
- Department of Computer Engineering, Izmir Institute of Technology, Izmir, 35430, Turkiye.
| |
Collapse
|
2
|
Abnizova I, Stapel C, Boekhorst RT, Lee JTH, Hemberg M. Integrative analysis of transcriptomic and epigenomic data reveals distinct patterns for developmental and housekeeping gene regulation. BMC Biol 2024; 22:78. [PMID: 38600550 PMCID: PMC11005181 DOI: 10.1186/s12915-024-01869-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 03/14/2024] [Indexed: 04/12/2024] Open
Abstract
BACKGROUND Regulation of transcription is central to the emergence of new cell types during development, and it often involves activation of genes via proximal and distal regulatory regions. The activity of regulatory elements is determined by transcription factors (TFs) and epigenetic marks, but despite extensive mapping of such patterns, the extraction of regulatory principles remains challenging. RESULTS Here we study differentially and similarly expressed genes along with their associated epigenomic profiles, chromatin accessibility and DNA methylation, during lineage specification at gastrulation in mice. Comparison of the three lineages allows us to identify genomic and epigenomic features that distinguish the two classes of genes. We show that differentially expressed genes are primarily regulated by distal elements, while similarly expressed genes are controlled by proximal housekeeping regulatory programs. Differentially expressed genes are relatively isolated within topologically associated domains, while similarly expressed genes tend to be located in gene clusters. Transcription of differentially expressed genes is associated with differentially open chromatin at distal elements including enhancers, while that of similarly expressed genes is associated with ubiquitously accessible chromatin at promoters. CONCLUSION Based on these associations of (linearly) distal genes' transcription start sites (TSSs) and putative enhancers for developmental genes, our findings allow us to link putative enhancers to their target promoters and to infer lineage-specific repertoires of putative driver transcription factors, within which we define subgroups of pioneers and co-operators.
Collapse
Affiliation(s)
- Irina Abnizova
- Epigenetics Programme, Babraham Institute, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
| | - Carine Stapel
- Epigenetics Programme, Babraham Institute, Cambridge, UK
| | | | | | - Martin Hemberg
- Wellcome Sanger Institute, Hinxton, UK.
- The Gene Lay Institute of Immunology and Inflammation Brigham & Women's Hospital and Harvard Medical School, Boston, USA.
| |
Collapse
|
3
|
Malfait J, Wan J, Spicuglia S. Epromoters are new players in the regulatory landscape with potential pleiotropic roles. Bioessays 2023; 45:e2300012. [PMID: 37246247 DOI: 10.1002/bies.202300012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 05/11/2023] [Accepted: 05/15/2023] [Indexed: 05/30/2023]
Abstract
Precise spatiotemporal control of gene expression during normal development and cell differentiation is achieved by the combined action of proximal (promoters) and distal (enhancers) cis-regulatory elements. Recent studies have reported that a subset of promoters, termed Epromoters, works also as enhancers to regulate distal genes. This new paradigm opened novel questions regarding the complexity of our genome and raises the possibility that genetic variation within Epromoters has pleiotropic effects on various physiological and pathological traits by differentially impacting multiple proximal and distal genes. Here, we discuss the different observations pointing to an important role of Epromoters in the regulatory landscape and summarize the evidence supporting a pleiotropic impact of these elements in disease. We further hypothesize that Epromoter might represent a major contributor to phenotypic variation and disease.
Collapse
Affiliation(s)
- Juliette Malfait
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, LIGUE, Marseille, France
| | - Jing Wan
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, LIGUE, Marseille, France
| | - Salvatore Spicuglia
- Aix-Marseille University, Inserm, TAGC, UMR1090, Marseille, France
- Equipe Labélisée Ligue Contre le Cancer, LIGUE, Marseille, France
| |
Collapse
|
4
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|
5
|
Abstract
The world of long non-coding RNAs (lncRNAs) has opened up massive new prospects in understanding the regulation of gene expression. Not only are there seemingly almost infinite numbers of lncRNAs in the mammalian cell, but they have highly diverse mechanisms of action. In the nucleus, some are chromatin-associated, transcribed from transcriptional enhancers (eRNAs) and/or direct changes in the epigenetic landscape with profound effects on gene expression. The pituitary gonadotrope is responsible for activation of reproduction through production and secretion of appropriate levels of the gonadotropic hormones. As such, it exemplifies a cell whose function is defined through changes in developmental and temporal patterns of gene expression, including those that are hormonally induced. Roles for diverse distal regulatory elements and eRNAs in gonadotrope biology have only just begun to emerge. Here, we will present an overview of the different kinds of lncRNAs that alter gene expression, and what is known about their roles in regulating some of the key gonadotrope genes. We will also review various screens that have detected differentially expressed pituitary lncRNAs associated with changes in reproductive state and those whose expression is found to play a role in gonadotrope-derived nonfunctioning pituitary adenomas. We hope to shed light on this exciting new field, emphasize the open questions, and encourage research to illuminate the roles of lncRNAs in various endocrine systems.
Collapse
Affiliation(s)
- Tal Refael
- Faculty of Biology, Technion Israel Institute of Technology, Haifa 32000, Israel
| | - Philippa Melamed
- Faculty of Biology, Technion Israel Institute of Technology, Haifa 32000, Israel
- Correspondence: Philippa Melamed, PhD, Faculty of Biology, Technion - Israel Institute of Technology, Haifa 32000, Israel.
| |
Collapse
|
6
|
Zeng X, Park SJ, Nakai K. Characterizing Promoter and Enhancer Sequences by a Deep Learning Method. Front Genet 2021; 12:681259. [PMID: 34211503 PMCID: PMC8239401 DOI: 10.3389/fgene.2021.681259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 05/20/2021] [Indexed: 11/21/2022] Open
Abstract
Promoters and enhancers are well-known regulatory elements modulating gene expression. As confirmed by high-throughput sequencing technologies, these regulatory elements are bidirectionally transcribed. That is, promoters produce stable mRNA in the sense direction and unstable RNA in the antisense direction, while enhancers transcribe unstable RNA in both directions. Although it is thought that enhancers and promoters share a similar architecture of transcription start sites (TSSs), how the transcriptional machinery distinctly uses these genomic regions as promoters or enhancers remains unclear. To address this issue, we developed a deep learning (DL) method by utilizing a convolutional neural network (CNN) and the saliency algorithm. In comparison with other classifiers, our CNN presented higher predictive performance, suggesting the overarching importance of the high-order sequence features, captured by the CNN. Moreover, our method revealed that there are substantial sequence differences between the enhancers and promoters. Remarkably, the 20–120 bp downstream regions from the center of bidirectional TSSs seemed to contribute to the RNA stability. These regions in promoters tend to have a larger number of guanines and cytosines compared to those in enhancers, and this feature contributed to the classification of the regulatory elements. Our CNN-based method can capture the complex TSS architectures. We found that the genomic regions around TSSs for promoters and enhancers contribute to RNA stability and show GC-biased characteristics as a critical determinant for promoter TSSs.
Collapse
Affiliation(s)
- Xin Zeng
- Department of Computational Biology and Medical Science, The University of Tokyo, Kashiwa, Japan
| | - Sung-Joon Park
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Science, The University of Tokyo, Kashiwa, Japan.,Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
7
|
Majic P, Payne JL. Enhancers Facilitate the Birth of De Novo Genes and Gene Integration into Regulatory Networks. Mol Biol Evol 2021; 37:1165-1178. [PMID: 31845961 PMCID: PMC7086177 DOI: 10.1093/molbev/msz300] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Regulatory networks control the spatiotemporal gene expression patterns that give rise to and define the individual cell types of multicellular organisms. In eumetazoa, distal regulatory elements called enhancers play a key role in determining the structure of such networks, particularly the wiring diagram of “who regulates whom.” Mutations that affect enhancer activity can therefore rewire regulatory networks, potentially causing adaptive changes in gene expression. Here, we use whole-tissue and single-cell transcriptomic and chromatin accessibility data from mouse to show that enhancers play an additional role in the evolution of regulatory networks: They facilitate network growth by creating transcriptionally active regions of open chromatin that are conducive to de novo gene evolution. Specifically, our comparative transcriptomic analysis with three other mammalian species shows that young, mouse-specific intergenic open reading frames are preferentially located near enhancers, whereas older open reading frames are not. Mouse-specific intergenic open reading frames that are proximal to enhancers are more highly and stably transcribed than those that are not proximal to enhancers or promoters, and they are transcribed in a limited diversity of cellular contexts. Furthermore, we report several instances of mouse-specific intergenic open reading frames proximal to promoters showing evidence of being repurposed enhancers. We also show that open reading frames gradually acquire interactions with enhancers over macroevolutionary timescales, helping integrate genes—those that have arisen de novo or by other means—into existing regulatory networks. Taken together, our results highlight a dual role of enhancers in expanding and rewiring gene regulatory networks.
Collapse
Affiliation(s)
- Paco Majic
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Corresponding author: E-mail:
| |
Collapse
|
8
|
Steinhaus R, Gonzalez T, Seelow D, Robinson PN. Pervasive and CpG-dependent promoter-like characteristics of transcribed enhancers. Nucleic Acids Res 2020; 48:5306-5317. [PMID: 32338759 PMCID: PMC7261191 DOI: 10.1093/nar/gkaa223] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 03/23/2020] [Accepted: 03/25/2020] [Indexed: 12/17/2022] Open
Abstract
The temporal and spatial expression of genes is controlled by promoters and enhancers. Findings obtained over the last decade that not only promoters but also enhancers are characterized by bidirectional, divergent transcription have challenged the traditional notion that promoters and enhancers represent distinct classes of regulatory elements. Over half of human promoters are associated with CpG islands (CGIs), relatively CpG-rich stretches of generally several hundred nucleotides that are often associated with housekeeping genes. Only about 6% of transcribed enhancers defined by CAGE-tag analysis are associated with CGIs. Here, we present an analysis of enhancer and promoter characteristics and relate them to the presence or absence of CGIs. We show that transcribed enhancers share a number of CGI-dependent characteristics with promoters, including statistically significant local overrepresentation of core promoter elements. CGI-associated enhancers are longer, display higher directionality of transcription, greater expression, a lesser degree of tissue specificity, and a higher frequency of transcription-factor binding events than non-CGI-associated enhancers. Genes putatively regulated by CGI-associated enhancers are enriched for transcription regulator activity. Our findings show that CGI-associated transcribed enhancers display a series of characteristics related to sequence, expression and function that distinguish them from enhancers not associated with CGIs.
Collapse
Affiliation(s)
- Robin Steinhaus
- Berlin Institute of Health, Charitéplatz 1, 10117 Berlin, Germany.,Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Tonatiuh Gonzalez
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.,Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Dominik Seelow
- Berlin Institute of Health, Charitéplatz 1, 10117 Berlin, Germany.,Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.,Institute for Systems Genomics, University of Connecticut, 263 Farmington Avenue, Farmington, CT 06030, USA
| |
Collapse
|
9
|
MMTR/Dmap1 Sets the Stage for Early Lineage Commitment of Embryonic Stem Cells by Crosstalk with PcG Proteins. Cells 2020; 9:cells9051190. [PMID: 32403252 PMCID: PMC7290897 DOI: 10.3390/cells9051190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 05/05/2020] [Accepted: 05/08/2020] [Indexed: 01/13/2023] Open
Abstract
Chromatin remodeling, including histone modification, chromatin (un)folding, and nucleosome remodeling, is a significant transcriptional regulation mechanism. By these epigenetic modifications, transcription factors and their regulators are recruited to the promoters of target genes, and thus gene expression is controlled through either transcriptional activation or repression. The Mat1-mediated transcriptional repressor (MMTR)/DNA methyltransferase 1 (DNMT1)-associated protein (Dmap1) is a transcription corepressor involved in chromatin remodeling, cell cycle regulation, DNA double-strand break repair, and tumor suppression. The Tip60-p400 complex proteins, including MMTR/Dmap1, interact with the oncogene Myc in embryonic stem cells (ESCs). These proteins interplay with the stem cell-related proteome networks and regulate gene expressions. However, the detailed mechanisms of their functions are unknown. Here, we show that MMTR/Dmap1, along with other Tip60-p400 complex proteins, bind the promoters of differentiation commitment genes in mouse ESCs. Hence, MMTR/Dmap1 controls gene expression alterations during differentiation. Furthermore, we propose a novel mechanism of MMTR/Dmap1 function in early stage lineage commitment of mouse ESCs by crosstalk with the polycomb group (PcG) proteins. The complex controls histone mark bivalency and transcriptional poising of commitment genes. Taken together, our comprehensive findings will help better understand the MMTR/Dmap1-mediated transcriptional regulation in ESCs and other cell types.
Collapse
|
10
|
Determinants of enhancer and promoter activities of regulatory elements. Nat Rev Genet 2019; 21:71-87. [DOI: 10.1038/s41576-019-0173-8] [Citation(s) in RCA: 284] [Impact Index Per Article: 56.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/04/2019] [Indexed: 12/13/2022]
|