1
|
Imamura K, Garland W, Schmid M, Jakobsen L, Sato K, Rouvière JO, Jakobsen KP, Burlacu E, Lopez ML, Lykke-Andersen S, Andersen JS, Jensen TH. A functional connection between the Microprocessor and a variant NEXT complex. Mol Cell 2024; 84:4158-4174.e6. [PMID: 39515294 DOI: 10.1016/j.molcel.2024.10.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 08/26/2024] [Accepted: 10/11/2024] [Indexed: 11/16/2024]
Abstract
In mammalian cells, primary miRNAs are cleaved at their hairpin structures by the Microprocessor complex, whose core is composed of DROSHA and DGCR8. Here, we show that 5' flanking regions, resulting from Microprocessor cleavage, are targeted by the RNA exosome in mouse embryonic stem cells (mESCs). This is facilitated by a physical link between DGCR8 and the nuclear exosome targeting (NEXT) component ZCCHC8. Surprisingly, however, both biochemical and mutagenesis studies demonstrate that a variant NEXT complex, containing the RNA helicase MTR4 but devoid of the RNA-binding protein RBM7, is the active entity. This Microprocessor-NEXT variant also targets stem-loop-containing RNAs expressed from other genomic regions, such as enhancers. By contrast, Microprocessor does not contribute to the turnover of less structured NEXT substrates. Our results therefore demonstrate that MTR4-ZCCHC8 can link to either RBM7 or DGCR8/DROSHA to target different RNA substrates depending on their structural context.
Collapse
Affiliation(s)
- Katsutoshi Imamura
- Department of Molecular Biology and Genetics, Universitetsbyen 81, Aarhus University, Aarhus, Denmark; Department of Systems Medicine, Chiba University Graduate School of Medicine, 1-8-1 Inohana, Chuo-ku, Chiba 260-8670, Japan
| | - William Garland
- Department of Molecular Biology and Genetics, Universitetsbyen 81, Aarhus University, Aarhus, Denmark
| | - Manfred Schmid
- Department of Molecular Biology and Genetics, Universitetsbyen 81, Aarhus University, Aarhus, Denmark
| | - Lis Jakobsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, Odense, Denmark
| | - Kengo Sato
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Jérôme O Rouvière
- Department of Molecular Biology and Genetics, Universitetsbyen 81, Aarhus University, Aarhus, Denmark
| | - Kristoffer Pors Jakobsen
- Department of Molecular Biology and Genetics, Universitetsbyen 81, Aarhus University, Aarhus, Denmark
| | - Elena Burlacu
- Department of Molecular Biology and Genetics, Universitetsbyen 81, Aarhus University, Aarhus, Denmark
| | - Marta Loureiro Lopez
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, Odense, Denmark
| | - Søren Lykke-Andersen
- Department of Molecular Biology and Genetics, Universitetsbyen 81, Aarhus University, Aarhus, Denmark
| | - Jens S Andersen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, Odense, Denmark
| | - Torben Heick Jensen
- Department of Molecular Biology and Genetics, Universitetsbyen 81, Aarhus University, Aarhus, Denmark.
| |
Collapse
|
2
|
Torre D, Fstkchyan YS, Ho JSY, Cheon Y, Patel RS, Degrace EJ, Mzoughi S, Schwarz M, Mohammed K, Seo JS, Romero-Bueno R, Demircioglu D, Hasson D, Tang W, Mahajani SU, Campisi L, Zheng S, Song WS, Wang YC, Shah H, Francoeur N, Soto J, Salfati Z, Weirauch MT, Warburton P, Beaumont K, Smith ML, Mulder L, Villalta SA, Kessenbrock K, Jang C, Lee D, De Rubeis S, Cobos I, Tam O, Hammell MG, Seldin M, Shi Y, Basu U, Sebastiano V, Byun M, Sebra R, Rosenberg BR, Benner C, Guccione E, Marazzi I. Nuclear RNA catabolism controls endogenous retroviruses, gene expression asymmetry, and dedifferentiation. Mol Cell 2023; 83:4255-4271.e9. [PMID: 37995687 PMCID: PMC10842741 DOI: 10.1016/j.molcel.2023.10.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 06/28/2023] [Accepted: 10/26/2023] [Indexed: 11/25/2023]
Abstract
Endogenous retroviruses (ERVs) are remnants of ancient parasitic infections and comprise sizable portions of most genomes. Although epigenetic mechanisms silence most ERVs by generating a repressive environment that prevents their expression (heterochromatin), little is known about mechanisms silencing ERVs residing in open regions of the genome (euchromatin). This is particularly important during embryonic development, where induction and repression of distinct classes of ERVs occur in short temporal windows. Here, we demonstrate that transcription-associated RNA degradation by the nuclear RNA exosome and Integrator is a regulatory mechanism that controls the productive transcription of most genes and many ERVs involved in preimplantation development. Disrupting nuclear RNA catabolism promotes dedifferentiation to a totipotent-like state characterized by defects in RNAPII elongation and decreased expression of long genes (gene-length asymmetry). Our results indicate that RNA catabolism is a core regulatory module of gene networks that safeguards RNAPII activity, ERV expression, cell identity, and developmental potency.
Collapse
Affiliation(s)
- Denis Torre
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Center for OncoGenomics and Innovative Therapeutics (COGIT), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Yesai S Fstkchyan
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jessica Sook Yuin Ho
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Programme in Emerging Infectious Diseases, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Youngseo Cheon
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea; Department of Biological Chemistry, University of California Irvine, Irvine, CA 92697, USA; Center for Epigenetics and Metabolism, University of California Irvine, Irvine, CA 92697, USA
| | - Roosheel S Patel
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Emma J Degrace
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Slim Mzoughi
- Center for OncoGenomics and Innovative Therapeutics (COGIT), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Megan Schwarz
- Center for OncoGenomics and Innovative Therapeutics (COGIT), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Kevin Mohammed
- Center for OncoGenomics and Innovative Therapeutics (COGIT), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ji-Seon Seo
- Department of Biological Chemistry, University of California Irvine, Irvine, CA 92697, USA; Center for Epigenetics and Metabolism, University of California Irvine, Irvine, CA 92697, USA
| | - Raquel Romero-Bueno
- Department of Biological Chemistry, University of California Irvine, Irvine, CA 92697, USA; Center for Epigenetics and Metabolism, University of California Irvine, Irvine, CA 92697, USA
| | - Deniz Demircioglu
- Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Bioinformatics for Next Generation Sequencing (BiNGS) Shared Resource Facility, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Dan Hasson
- Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Bioinformatics for Next Generation Sequencing (BiNGS) Shared Resource Facility, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Weijing Tang
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Sameehan U Mahajani
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Laura Campisi
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Simin Zheng
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Won-Suk Song
- Department of Biological Chemistry, University of California Irvine, Irvine, CA 92697, USA; Center for Epigenetics and Metabolism, University of California Irvine, Irvine, CA 92697, USA
| | - Ying-Chih Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Hardik Shah
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Nancy Francoeur
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Juan Soto
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zelda Salfati
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Matthew T Weirauch
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Peter Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Kristin Beaumont
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Melissa L Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY 40202, USA
| | - Lubbertus Mulder
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - S Armando Villalta
- Department of Physiology and Biophysics, University of California Irvine, Irvine, CA 92697, USA
| | - Kai Kessenbrock
- Department of Biological Chemistry, University of California Irvine, Irvine, CA 92697, USA
| | - Cholsoon Jang
- Department of Biological Chemistry, University of California Irvine, Irvine, CA 92697, USA; Center for Epigenetics and Metabolism, University of California Irvine, Irvine, CA 92697, USA
| | - Daeyoup Lee
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea
| | - Silvia De Rubeis
- Seaver Autism Center for Research and Treatment, Department of Psychiatry, The Mindich Child Health and Development Institute, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Inma Cobos
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Oliver Tam
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | - Marcus Seldin
- Department of Biological Chemistry, University of California Irvine, Irvine, CA 92697, USA; Center for Epigenetics and Metabolism, University of California Irvine, Irvine, CA 92697, USA
| | - Yongsheng Shi
- Center for Epigenetics and Metabolism, University of California Irvine, Irvine, CA 92697, USA; Department of Microbiology and Molecular Genetics, School of Medicine, University of California Irvine, Irvine, CA 92697, USA
| | - Uttiya Basu
- Department of Microbiology & Immunology, Columbia University Medical Center, New York, NY 10032, USA
| | - Vittorio Sebastiano
- Institute for Stem Cell Biology and Regenerative Medicine and the Department of Obstetrics and Gynecology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Minji Byun
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Brad R Rosenberg
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Chris Benner
- Department of Medicine, University of California, San Diego, San Diego, CA 92093, USA
| | - Ernesto Guccione
- Center for OncoGenomics and Innovative Therapeutics (COGIT), Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Pharmacological Sciences and Mount Sinai Center for Therapeutics Discovery, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| | - Ivan Marazzi
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Biological Chemistry, University of California Irvine, Irvine, CA 92697, USA; Center for Epigenetics and Metabolism, University of California Irvine, Irvine, CA 92697, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
3
|
Du J, Wang C, Wang L, Mao S, Zhu B, Li Z, Fan X. Automatic block-wise genotype-phenotype association detection based on hidden Markov model. BMC Bioinformatics 2023; 24:138. [PMID: 37029361 PMCID: PMC10082540 DOI: 10.1186/s12859-023-05265-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 03/31/2023] [Indexed: 04/09/2023] Open
Abstract
BACKGROUND For detecting genotype-phenotype association from case-control single nucleotide polymorphism (SNP) data, one class of methods relies on testing each genomic variant site individually. However, this approach ignores the tendency for associated variant sites to be spatially clustered instead of uniformly distributed along the genome. Therefore, a more recent class of methods looks for blocks of influential variant sites. Unfortunately, existing such methods either assume prior knowledge of the blocks, or rely on ad hoc moving windows. A principled method is needed to automatically detect genomic variant blocks which are associated with the phenotype. RESULTS In this paper, we introduce an automatic block-wise Genome-Wide Association Study (GWAS) method based on Hidden Markov model. Using case-control SNP data as input, our method detects the number of blocks associated with the phenotype and the locations of the blocks. Correspondingly, the minor allele of each variate site will be classified as having negative influence, no influence or positive influence on the phenotype. We evaluated our method using both datasets simulated from our model and datasets from a block model different from ours, and compared the performance with other methods. These included both simple methods based on the Fisher's exact test, applied site-by-site, as well as more complex methods built into the recent Zoom-Focus Algorithm. Across all simulations, our method consistently outperformed the comparisons. CONCLUSIONS With its demonstrated better performance, we expect our algorithm for detecting influential variant sites may help find more accurate signals across a wide range of case-control GWAS.
Collapse
Affiliation(s)
- Jin Du
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| | - Chaojie Wang
- School of Mathematical Science, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Lijun Wang
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Shanjun Mao
- College of Finance and Statistics, Hunan University, Changsha, Hunan Province, China
| | - Bencong Zhu
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Zheng Li
- Department of Surgery, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| |
Collapse
|
4
|
Integrator is a genome-wide attenuator of non-productive transcription. Mol Cell 2020; 81:514-529.e6. [PMID: 33385327 DOI: 10.1016/j.molcel.2020.12.014] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Revised: 10/11/2020] [Accepted: 11/20/2020] [Indexed: 12/28/2022]
Abstract
Termination of RNA polymerase II (RNAPII) transcription in metazoans relies largely on the cleavage and polyadenylation (CPA) and integrator (INT) complexes originally found to act at the ends of protein-coding and small nuclear RNA (snRNA) genes, respectively. Here, we monitor CPA- and INT-dependent termination activities genome-wide, including at thousands of previously unannotated transcription units (TUs), producing unstable RNA. We verify the global activity of CPA occurring at pA sites indiscriminately of their positioning relative to the TU promoter. We also identify a global activity of INT, which is largely sequence-independent and restricted to a ~3-kb promoter-proximal region. Our analyses suggest two functions of genome-wide INT activity: it dampens transcriptional output from weak promoters, and it provides quality control of RNAPII complexes that are unfavorably configured for transcriptional elongation. We suggest that the function of INT in stable snRNA production is an exception from its general cellular role, the attenuation of non-productive transcription.
Collapse
|
5
|
Basile A, Campanaro S, Kovalovszki A, Zampieri G, Rossi A, Angelidaki I, Valle G, Treu L. Revealing metabolic mechanisms of interaction in the anaerobic digestion microbiome by flux balance analysis. Metab Eng 2020; 62:138-149. [PMID: 32905861 DOI: 10.1016/j.ymben.2020.08.013] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 08/03/2020] [Accepted: 08/24/2020] [Indexed: 10/23/2022]
Abstract
Anaerobic digestion is a key biological process for renewable energy, yet the mechanistic knowledge on its hidden microbial dynamics is still limited. The present work charted the interaction network in the anaerobic digestion microbiome via the full characterization of pairwise interactions and the associated metabolite exchanges. To this goal, a novel collection of 836 genome-scale metabolic models was built to represent the functional capabilities of bacteria and archaea species derived from genome-centric metagenomics. Dominant microbes were shown to prefer mutualistic, parasitic and commensalistic interactions over neutralism, amensalism and competition, and are more likely to behave as metabolite importers and profiteers of the coexistence. Additionally, external hydrogen injection positively influences microbiome dynamics by promoting commensalism over amensalism. Finally, exchanges of glucogenic amino acids were shown to overcome auxotrophies caused by an incomplete tricarboxylic acid cycle. Our novel strategy predicted the most favourable growth conditions for the microbes, overall suggesting strategies to increasing the biogas production efficiency. In principle, this approach could also be applied to microbial populations of biomedical importance, such as the gut microbiome, to allow a broad inspection of the microbial interplays.
Collapse
Affiliation(s)
- Arianna Basile
- Department of Biology, University of Padova, Via U. Bassi 58/b, 35121, Padua, Italy
| | - Stefano Campanaro
- Department of Biology, University of Padova, Via U. Bassi 58/b, 35121, Padua, Italy; CRIBI Biotechnology Center, University of Padova, 35131, Padua, Italy.
| | - Adam Kovalovszki
- Department of Environmental Engineering, Technical University of Denmark, 2800, Kgs. Lyngby, Denmark
| | - Guido Zampieri
- Department of Biology, University of Padova, Via U. Bassi 58/b, 35121, Padua, Italy; Department of Computer Science and Information Systems, Teesside University, Middlesbrough, United Kingdom
| | - Alessandro Rossi
- Department of Biology, University of Padova, Via U. Bassi 58/b, 35121, Padua, Italy
| | - Irini Angelidaki
- Department of Environmental Engineering, Technical University of Denmark, 2800, Kgs. Lyngby, Denmark
| | - Giorgio Valle
- Department of Biology, University of Padova, Via U. Bassi 58/b, 35121, Padua, Italy
| | - Laura Treu
- Department of Biology, University of Padova, Via U. Bassi 58/b, 35121, Padua, Italy
| |
Collapse
|
6
|
Ge X, Zhang H, Xie L, Li WV, Kwon SB, Li JJ. EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences. Nucleic Acids Res 2019; 47:e77. [PMID: 31045217 PMCID: PMC6648345 DOI: 10.1093/nar/gkz287] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Revised: 03/31/2019] [Accepted: 04/10/2019] [Indexed: 11/15/2022] Open
Abstract
The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns.
Collapse
Affiliation(s)
- Xinzhou Ge
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Haowen Zhang
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Lingjue Xie
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Wei Vivian Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Soo Bin Kwon
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
- Department of Human Genetics, University of California, Los Angeles, CA 90095-7088, USA
- Department of Biomathematics, University of California, Los Angeles, CA 90095-1766, USA
| |
Collapse
|
7
|
Wang C, Zhang S. Reveal cell type-specific regulatory elements and their characterized histone code classes via a hidden Markov model. BMC Genomics 2018; 19:903. [PMID: 30598107 PMCID: PMC6311906 DOI: 10.1186/s12864-018-5274-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND With the maturity of next generation sequencing technology, a huge amount of epigenomic data have been generated by several large consortia in the last decade. These plenty resources leave us the opportunity about sufficiently utilizing those data to explore biological problems. RESULTS Here we developed an integrative and comparative method, CsreHMM, which is based on a hidden Markov model, to systematically reveal cell type-specific regulatory elements (CSREs) along the whole genome, and simultaneously recognize the histone codes (mark combinations) charactering them. This method also reveals the subclasses of CSREs and explicitly label those shared by a few cell types. We applied this method to a data set of 9 cell types and 9 chromatin marks to demonstrate its effectiveness and found that the revealed CSREs relates to different kinds of functional regulatory regions significantly. Their proximal genes have consistent expression and are likely to participate in cell type-specific biological functions. CONCLUSIONS These results suggest CsreHMM has the potential to help understand cell identity and the diverse mechanisms of gene regulation.
Collapse
Affiliation(s)
- Can Wang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Shihua Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
- Center for Excel-lence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
8
|
Abstract
Noncoding DNA regions have central roles in human biology, evolution, and disease. ChromHMM helps to annotate the noncoding genome using epigenomic information across one or multiple cell types. It combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type. ChromHMM learns chromatin-state signatures using a multivariate hidden Markov model (HMM) that explicitly models the combinatorial presence or absence of each mark. ChromHMM uses these signatures to generate a genome-wide annotation for each cell type by calculating the most probable state for each genomic segment. ChromHMM provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state. ChromHMM is distinguished by its modeling emphasis on combinations of marks, its tight integration with downstream functional enrichment analyses, its speed, and its ease of use. Chromatin states are learned, annotations are produced, and enrichments are computed within 1 d.
Collapse
|
9
|
Machné R, Murray DB, Stadler PF. Similarity-Based Segmentation of Multi-Dimensional Signals. Sci Rep 2017; 7:12355. [PMID: 28955039 PMCID: PMC5617875 DOI: 10.1038/s41598-017-12401-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 08/30/2017] [Indexed: 11/25/2022] Open
Abstract
The segmentation of time series and genomic data is a common problem in computational biology. With increasingly complex measurement procedures individual data points are often not just numbers or simple vectors in which all components are of the same kind. Analysis methods that capitalize on slopes in a single real-valued data track or that make explicit use of the vectorial nature of the data are not applicable in such scenaria. We develop here a framework for segmentation in arbitrary data domains that only requires a minimal notion of similarity. Using unsupervised clustering of (a sample of) the input yields an approximate segmentation algorithm that is efficient enough for genome-wide applications. As a showcase application we segment a time-series of transcriptome sequencing data from budding yeast, in high temporal resolution over ca. 2.5 cycles of the short-period respiratory oscillation. The algorithm is used with a similarity measure focussing on periodic expression profiles across the metabolic cycle rather than coverage per time point.
Collapse
Affiliation(s)
- Rainer Machné
- Institute for Synthetic Microbiology, Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Universitätsstraße 1, D-40225, Düsseldorf, Germany. .,Department of Theoretical Chemistry of the University of Vienna, Währingerstrasse 17, Vienna, A-1090, Austria.
| | - Douglas B Murray
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, 997-0017, Japan
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, University Leipzig, Härtelstrasse 16-18, D-04107, Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103, Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstrasse 1, D-04103, Leipzig, Germany. .,Department of Theoretical Chemistry of the University of Vienna, Währingerstrasse 17, Vienna, A-1090, Austria. .,Center for RNA in Technology and Health, Univ. Copenhagen, Grønneg ardsvej 3, Frederiksberg C, Denmark. .,Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, 87501, USA.
| |
Collapse
|
10
|
Molitor J, Mallm JP, Rippe K, Erdel F. Retrieving Chromatin Patterns from Deep Sequencing Data Using Correlation Functions. Biophys J 2017; 112:473-490. [PMID: 28131315 DOI: 10.1016/j.bpj.2017.01.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 11/30/2016] [Accepted: 01/04/2017] [Indexed: 01/31/2023] Open
Abstract
Epigenetic modifications and other chromatin features partition the genome on multiple length scales. They define chromatin domains with distinct biological functions that come in sizes ranging from single modified DNA bases to several megabases in the case of heterochromatic histone modifications. Due to chromatin folding, domains that are well separated along the linear nucleosome chain can form long-range interactions in three-dimensional space. It has now become a routine task to map epigenetic marks and chromatin structure by deep sequencing methods. However, assessing and comparing the properties of chromatin domains and their positional relationships across data sets without a priori assumptions remains challenging. Here, we introduce multiscale correlation evaluation (MCORE), which uses the fluctuation spectrum of mapped sequencing reads to quantify and compare chromatin patterns over a broad range of length scales in a model-independent manner. We applied MCORE to map the chromatin landscape in mouse embryonic stem cells and differentiated neural cells. We integrated sequencing data from chromatin immunoprecipitation, RNA expression, DNA methylation, and chromosome conformation capture experiments into network models that reflect the positional relationships among these features on different genomic scales. Furthermore, we used MCORE to compare our experimental data to models for heterochromatin reorganization during differentiation. The application of correlation functions to deep sequencing data complements current evaluation schemes and will support the development of quantitative descriptions of chromatin networks.
Collapse
Affiliation(s)
- Jana Molitor
- German Cancer Research Center (DKFZ) and Bioquant, Research Group Genome Organization & Function, Heidelberg, Germany
| | - Jan-Philipp Mallm
- German Cancer Research Center (DKFZ) and Bioquant, Research Group Genome Organization & Function, Heidelberg, Germany
| | - Karsten Rippe
- German Cancer Research Center (DKFZ) and Bioquant, Research Group Genome Organization & Function, Heidelberg, Germany.
| | - Fabian Erdel
- German Cancer Research Center (DKFZ) and Bioquant, Research Group Genome Organization & Function, Heidelberg, Germany.
| |
Collapse
|
11
|
Zacher B, Michel M, Schwalb B, Cramer P, Tresch A, Gagneur J. Accurate Promoter and Enhancer Identification in 127 ENCODE and Roadmap Epigenomics Cell Types and Tissues by GenoSTAN. PLoS One 2017; 12:e0169249. [PMID: 28056037 PMCID: PMC5215863 DOI: 10.1371/journal.pone.0169249] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 12/14/2016] [Indexed: 12/22/2022] Open
Abstract
Accurate maps of promoters and enhancers are required for understanding transcriptional regulation. Promoters and enhancers are usually mapped by integration of chromatin assays charting histone modifications, DNA accessibility, and transcription factor binding. However, current algorithms are limited by unrealistic data distribution assumptions. Here we propose GenoSTAN (Genomic STate ANnotation), a hidden Markov model overcoming these limitations. We map promoters and enhancers for 127 cell types and tissues from the ENCODE and Roadmap Epigenomics projects, today’s largest compendium of chromatin assays. Extensive benchmarks demonstrate that GenoSTAN generally identifies promoters and enhancers with significantly higher accuracy than previous methods. Moreover, GenoSTAN-derived promoters and enhancers showed significantly higher enrichment of complex trait-associated genetic variants than current annotations. Altogether, GenoSTAN provides an easy-to-use tool to define promoters and enhancers in any system, and our annotation of human transcriptional cis-regulatory elements constitutes a rich resource for future research in biology and medicine.
Collapse
Affiliation(s)
- Benedikt Zacher
- Gene Center and Department of Biochemistry, Center for Integrated Protein Science CIPSM, Ludwig-Maximilians-Universität Munich, Germany
- * E-mail: (BZ); (AT); (JG)
| | - Margaux Michel
- Department of Molecular Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Björn Schwalb
- Department of Molecular Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Patrick Cramer
- Department of Molecular Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Achim Tresch
- Department of Biology, University of Cologne, Cologne, Germany
- Max Planck Institute for Plant Breeding Research, Cologne, Germany
- * E-mail: (BZ); (AT); (JG)
| | - Julien Gagneur
- Gene Center and Department of Biochemistry, Center for Integrated Protein Science CIPSM, Ludwig-Maximilians-Universität Munich, Germany
- * E-mail: (BZ); (AT); (JG)
| |
Collapse
|
12
|
Milligan L, Huynh-Thu VA, Delan-Forino C, Tuck A, Petfalski E, Lombraña R, Sanguinetti G, Kudla G, Tollervey D. Strand-specific, high-resolution mapping of modified RNA polymerase II. Mol Syst Biol 2016; 12:874. [PMID: 27288397 PMCID: PMC4915518 DOI: 10.15252/msb.20166869] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Reversible modification of the RNAPII C‐terminal domain links transcription with RNA processing and surveillance activities. To better understand this, we mapped the location of RNAPII carrying the five types of CTD phosphorylation on the RNA transcript, providing strand‐specific, nucleotide‐resolution information, and we used a machine learning‐based approach to define RNAPII states. This revealed enrichment of Ser5P, and depletion of Tyr1P, Ser2P, Thr4P, and Ser7P in the transcription start site (TSS) proximal ~150 nt of most genes, with depletion of all modifications close to the poly(A) site. The TSS region also showed elevated RNAPII relative to regions further 3′, with high recruitment of RNA surveillance and termination factors, and correlated with the previously mapped 3′ ends of short, unstable ncRNA transcripts. A hidden Markov model identified distinct modification states associated with initiating, early elongating and later elongating RNAPII. The initiation state was enriched near the TSS of protein‐coding genes and persisted throughout exon 1 of intron‐containing genes. Notably, unstable ncRNAs apparently failed to transition into the elongation states seen on protein‐coding genes.
Collapse
Affiliation(s)
- Laura Milligan
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, UK
| | - Vân A Huynh-Thu
- School of Informatics, University of Edinburgh, Edinburgh, UK Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium
| | | | - Alex Tuck
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, UK Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus, Cambridge, UK
| | - Elisabeth Petfalski
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, UK
| | - Rodrigo Lombraña
- MRC Human Genetics Unit, IGMM, University of Edinburgh, Edinburgh, UK
| | | | - Grzegorz Kudla
- MRC Human Genetics Unit, IGMM, University of Edinburgh, Edinburgh, UK
| | - David Tollervey
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
13
|
Glas J, Dümcke S, Zacher B, Poron D, Gagneur J, Tresch A. Simultaneous characterization of sense and antisense genomic processes by the double-stranded hidden Markov model. Nucleic Acids Res 2016; 44:e44. [PMID: 26578558 PMCID: PMC4797261 DOI: 10.1093/nar/gkv1184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 10/24/2015] [Indexed: 11/14/2022] Open
Abstract
Hidden Markov models (HMMs) have been extensively used to dissect the genome into functionally distinct regions using data such as RNA expression or DNA binding measurements. It is a challenge to disentangle processes occurring on complementary strands of the same genomic region. We present the double-stranded HMM (dsHMM), a model for the strand-specific analysis of genomic processes. We applied dsHMM to yeast using strand specific transcription data, nucleosome data, and protein binding data for a set of 11 factors associated with the regulation of transcription.The resulting annotation recovers the mRNA transcription cycle (initiation, elongation, termination) while correctly predicting strand-specificity and directionality of the transcription process. We find that pre-initiation complex formation is an essentially undirected process, giving rise to a large number of bidirectional promoters and to pervasive antisense transcription. Notably, 12% of all transcriptionally active positions showed simultaneous activity on both strands. Furthermore, dsHMM reveals that antisense transcription is specifically suppressed by Nrd1, a yeast termination factor.
Collapse
Affiliation(s)
- Julia Glas
- Gene Center Munich and Department of Biochemistry, Ludwig-Maximilians-Universität München, Feodor-Lynen-Straße 25, 81377 Munich, Germany
| | - Sebastian Dümcke
- Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829 Cologne, Germany Institute for Genetics, University of Cologne, Zülpicher Str. 47b, 50674 Cologne, Germany
| | - Benedikt Zacher
- Gene Center Munich and Department of Biochemistry, Ludwig-Maximilians-Universität München, Feodor-Lynen-Straße 25, 81377 Munich, Germany Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829 Cologne, Germany
| | - Don Poron
- Institute for Genetics, University of Cologne, Zülpicher Str. 47b, 50674 Cologne, Germany
| | - Julien Gagneur
- Gene Center Munich and Department of Biochemistry, Ludwig-Maximilians-Universität München, Feodor-Lynen-Straße 25, 81377 Munich, Germany
| | - Achim Tresch
- Gene Center Munich and Department of Biochemistry, Ludwig-Maximilians-Universität München, Feodor-Lynen-Straße 25, 81377 Munich, Germany Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829 Cologne, Germany Institute for Genetics, University of Cologne, Zülpicher Str. 47b, 50674 Cologne, Germany
| |
Collapse
|