101
|
Bioinformatics tools for lncRNA research. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1859:23-30. [DOI: 10.1016/j.bbagrm.2015.07.014] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 07/07/2015] [Accepted: 07/14/2015] [Indexed: 12/28/2022]
|
102
|
Computational Detection of piRNA in Human Using Support Vector Machine. Avicenna J Med Biotechnol 2016; 8:36-41. [PMID: 26855734 PMCID: PMC4717465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Piwi-interacting RNAs (piRNAs) are small non-coding RNAs (ncRNAs), with a length of about 24-32 nucleotides, which have been discovered recently. These ncRNAs play an important role in germline development, transposon silencing, epigenetic regulation, protecting the genome from invasive transposable elements, and the pathophysiology of diseases such as cancer. piRNA identification is challenging due to the lack of conserved piRNA sequences and structural elements. METHODS To detect piRNAs, an appropriate feature set, including 8 diverse feature groups to encode each RNA was applied. In addition, a Support Vector Machine (SVM) classifier was used with optimized parameters for RNA classification. According to the obtained results, the classification performance using the optimized feature subsets was much higher than the one in previously published studies. RESULTS Our results revealed 98% accuracy, Mathew' correlation coefficient of 98% and 99% specificity in discriminating piRNAs from the other RNAs. Also, the obtained results show that the proposed method outperforms its competitors. CONCLUSION In this paper, a prediction method was proposed to identify piRNA in human. Also, 48 heterogeneous features (sequence and structural features) were used to encode RNAs. To assess the performance of the method, a benchmark dataset containing 515 piRNAs and 1206 types of other RNAs was constructed. Our method reached the accuracy of 99% on the benchmark dataset. Also, our analysis revealed that the structural features are the most contributing features in piRNA prediction.
Collapse
|
103
|
Ounzain S, Micheletti R, Arnan C, Plaisance I, Cecchi D, Schroen B, Reverter F, Alexanian M, Gonzales C, Ng SY, Bussotti G, Pezzuto I, Notredame C, Heymans S, Guigó R, Johnson R, Pedrazzini T. CARMEN, a human super enhancer-associated long noncoding RNA controlling cardiac specification, differentiation and homeostasis. J Mol Cell Cardiol 2015; 89:98-112. [PMID: 26423156 DOI: 10.1016/j.yjmcc.2015.09.016] [Citation(s) in RCA: 207] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2015] [Revised: 09/24/2015] [Accepted: 09/25/2015] [Indexed: 01/14/2023]
Abstract
Long noncoding RNAs (lncRNAs) are emerging as important regulators of developmental pathways. However, their roles in human cardiac precursor cell (CPC) remain unexplored. To characterize the long noncoding transcriptome during human CPC cardiac differentiation, we profiled the lncRNA transcriptome in CPCs isolated from the human fetal heart and identified 570 lncRNAs that were modulated during cardiac differentiation. Many of these were associated with active cardiac enhancer and super enhancers (SE) with their expression being correlated with proximal cardiac genes. One of the most upregulated lncRNAs was a SE-associated lncRNA that was named CARMEN, (CAR)diac (M)esoderm (E)nhancer-associated (N)oncoding RNA. CARMEN exhibits RNA-dependent enhancing activity and is upstream of the cardiac mesoderm-specifying gene regulatory network. Interestingly, CARMEN interacts with SUZ12 and EZH2, two components of the polycomb repressive complex 2 (PRC2). We demonstrate that CARMEN knockdown inhibits cardiac specification and differentiation in cardiac precursor cells independently of MIR-143 and -145 expression, two microRNAs located proximal to the enhancer sequences. Importantly, CARMEN expression was activated during pathological remodeling in the mouse and human hearts, and was necessary for maintaining cardiac identity in differentiated cardiomyocytes. This study demonstrates therefore that CARMEN is a crucial regulator of cardiac cell differentiation and homeostasis.
Collapse
Affiliation(s)
- Samir Ounzain
- Experimental Cardiology Unit, Department of Medicine, University of Lausanne Medical School, Lausanne, Switzerland.
| | - Rudi Micheletti
- Experimental Cardiology Unit, Department of Medicine, University of Lausanne Medical School, Lausanne, Switzerland
| | - Carme Arnan
- Bioinformatics and Genomics Group, Centre for Genomic Regulation, Barcelona, Spain
| | - Isabelle Plaisance
- Experimental Cardiology Unit, Department of Medicine, University of Lausanne Medical School, Lausanne, Switzerland
| | - Dario Cecchi
- Bioinformatics and Genomics Group, Centre for Genomic Regulation, Barcelona, Spain
| | - Blanche Schroen
- Centre for Heart Failure Research, Cardiovascular Research Institute, Maastricht University, The Netherlands
| | - Ferran Reverter
- Bioinformatics and Genomics Group, Centre for Genomic Regulation, Barcelona, Spain
| | - Michael Alexanian
- Experimental Cardiology Unit, Department of Medicine, University of Lausanne Medical School, Lausanne, Switzerland
| | - Christine Gonzales
- Experimental Cardiology Unit, Department of Medicine, University of Lausanne Medical School, Lausanne, Switzerland
| | - Shi Yan Ng
- Stem Cell and Developmental Biology Group, Genome Institute of Singapore, Singapore; NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore
| | - Giovanni Bussotti
- Comparative Bioinformatics Group, Centre for Genomic Regulation, Barcelona, Spain
| | - Iole Pezzuto
- Experimental Cardiology Unit, Department of Medicine, University of Lausanne Medical School, Lausanne, Switzerland
| | - Cedric Notredame
- Comparative Bioinformatics Group, Centre for Genomic Regulation, Barcelona, Spain
| | - Stephane Heymans
- Centre for Heart Failure Research, Cardiovascular Research Institute, Maastricht University, The Netherlands
| | - Roderic Guigó
- Bioinformatics and Genomics Group, Centre for Genomic Regulation, Barcelona, Spain
| | - Rory Johnson
- Bioinformatics and Genomics Group, Centre for Genomic Regulation, Barcelona, Spain.
| | - Thierry Pedrazzini
- Experimental Cardiology Unit, Department of Medicine, University of Lausanne Medical School, Lausanne, Switzerland.
| |
Collapse
|
104
|
Engelhardt J, Stadler PF. Evolution of the unspliced transcriptome. BMC Evol Biol 2015; 15:166. [PMID: 26289325 PMCID: PMC4546029 DOI: 10.1186/s12862-015-0437-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 07/29/2015] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Despite their abundance, unspliced EST data have received little attention as a source of information on non-coding RNAs. Very little is know, therefore, about the genomic distribution of unspliced non-coding transcripts and their relationship with the much better studied regularly spliced products. In particular, their evolution has remained virtually unstudied. RESULTS We systematically study the evidence on unspliced transcripts available in EST annotation tracks for human and mouse, comprising 104,980 and 66,109 unspliced EST clusters, respectively. Roughly one third of these are located totally inside introns of known genes (TINs) and another third overlaps exonic regions (PINs). Eleven percent are "intergenic", far away from any annotated gene. Direct evidence for the independent transcription of many PINs and TINs is obtained from CAGE tag and chromatin data. We predict more than 2000 3'UTR-associated RNA candidates for each human and mouse. Fifteen to twenty percent of the unspliced EST cluster are conserved between human and mouse. With the exception of TINs, the sequences of unspliced EST clusters evolve significantly slower than genomic background. Furthermore, like spliced lincRNAs, they show highly tissue-specific expression patterns. CONCLUSIONS Unspliced long non-coding RNAs are an important, rapidly evolving, component of mammalian transcriptomes. Their analysis is complicated by their preferential association with complex transcribed loci that usually also harbor a plethora of spliced transcripts. Unspliced EST data, although typically disregarded in transcriptome analysis, can be used to gain insights into this rarely investigated transcriptome component. The frequently postulated connection between lack of splicing and nuclear retention and the surprising overlap of chromatin-associated transcripts suggests that this class of transcripts might be involved in chromatin organization and possibly other mechanisms of epigenetic control.
Collapse
Affiliation(s)
- Jan Engelhardt
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Haertelstraße 16-18, Leipzig, D-04107, Germany.
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Haertelstraße 16-18, Leipzig, D-04107, Germany.
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany.
- Fraunhofer Institut for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany.
- Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, A-1090, Austria.
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg, 1870, Denmark.
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, 87501, NM, USA.
| |
Collapse
|
105
|
Housman G, Ulitsky I. Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2015; 1859:31-40. [PMID: 26265145 DOI: 10.1016/j.bbagrm.2015.07.017] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 06/18/2015] [Accepted: 07/19/2015] [Indexed: 12/12/2022]
Abstract
Long noncoding RNAs (lncRNAs) are a diverse class of RNAs with increasingly appreciated functions in vertebrates, yet much of their biology remains poorly understood. In particular, it is unclear to what extent the current catalog of over 10,000 annotated lncRNAs is indeed devoid of genes coding for proteins. Here we review the available computational and experimental schemes for distinguishing between coding and noncoding transcripts and assess the conclusions from their recent genome-wide applications. We conclude that the model most consistent with the available data is that a large number of mammalian lncRNAs undergo translation, but only a very small minority of such translation events results in stable and functional peptides. The outcomes of the majority of the translation events and their potential biological purposes remain an intriguing topic for future investigation. This article is part of a Special Issue entitled: Clues to long noncoding RNA taxonomy1, edited by Dr. Tetsuro Hirose and Dr. Shinichi Nakagawa.
Collapse
Affiliation(s)
- Gali Housman
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Igor Ulitsky
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot 76100, Israel.
| |
Collapse
|
106
|
Kirsten H, Al-Hasani H, Holdt L, Gross A, Beutner F, Krohn K, Horn K, Ahnert P, Burkhardt R, Reiche K, Hackermüller J, Löffler M, Teupser D, Thiery J, Scholz M. Dissecting the genetics of the human transcriptome identifies novel trait-related trans-eQTLs and corroborates the regulatory relevance of non-protein coding loci†. Hum Mol Genet 2015; 24:4746-63. [PMID: 26019233 PMCID: PMC4512630 DOI: 10.1093/hmg/ddv194] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 05/21/2015] [Indexed: 12/24/2022] Open
Abstract
Genetics of gene expression (eQTLs or expression QTLs) has proved an indispensable tool for understanding biological pathways and pathomechanisms of trait-associated SNPs. However, power of most genome-wide eQTL studies is still limited. We performed a large eQTL study in peripheral blood mononuclear cells of 2112 individuals increasing the power to detect trans-effects genome-wide. Going beyond univariate SNP-transcript associations, we analyse relations of eQTLs to biological pathways, polygenetic effects of expression regulation, trans-clusters and enrichment of co-localized functional elements. We found eQTLs for about 85% of analysed genes, and 18% of genes were trans-regulated. Local eSNPs were enriched up to a distance of 5 Mb to the transcript challenging typically implemented ranges of cis-regulations. Pathway enrichment within regulated genes of GWAS-related eSNPs supported functional relevance of identified eQTLs. We demonstrate that nearest genes of GWAS-SNPs might frequently be misleading functional candidates. We identified novel trans-clusters of potential functional relevance for GWAS-SNPs of several phenotypes including obesity-related traits, HDL-cholesterol levels and haematological phenotypes. We used chromatin immunoprecipitation data for demonstrating biological effects. Yet, we show for strongly heritable transcripts that still little trans-chromosomal heritability is explained by all identified trans-eSNPs; however, our data suggest that most cis-heritability of these transcripts seems explained. Dissection of co-localized functional elements indicated a prominent role of SNPs in loci of pseudogenes and non-coding RNAs for the regulation of coding genes. In summary, our study substantially increases the catalogue of human eQTLs and improves our understanding of the complex genetic regulation of gene expression, pathways and disease-related processes.
Collapse
Affiliation(s)
- Holger Kirsten
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases, Cognitive Genetics, Department of Cell Therapy
| | - Hoor Al-Hasani
- Department for Computer Science, Analysis Strategies Group, Department of Diagnostics, Young Investigators Group Bioinformatics and Transcriptomics, Department Proteomics, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany and
| | - Lesca Holdt
- Institute of Laboratory Medicine, Ludwig-Maximilians-University, Munich, Germany
| | - Arnd Gross
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases
| | - Frank Beutner
- LIFE - Leipzig Research Center for Civilization Diseases, Department of Internal Medicine/Cardiology, Heart Center
| | - Knut Krohn
- Interdisciplinary Center for Clinical Research, Faculty of Medicine and
| | - Katrin Horn
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases
| | - Peter Ahnert
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases
| | - Ralph Burkhardt
- LIFE - Leipzig Research Center for Civilization Diseases, Institute of Laboratory Medicine, University of Leipzig, Leipzig, Germany
| | - Kristin Reiche
- Department for Computer Science, RNomics Group, Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology- IZI, Leipzig, Germany, Young Investigators Group Bioinformatics and Transcriptomics, Department Proteomics, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany and
| | - Jörg Hackermüller
- Department for Computer Science, RNomics Group, Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology- IZI, Leipzig, Germany, Young Investigators Group Bioinformatics and Transcriptomics, Department Proteomics, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany and
| | - Markus Löffler
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases
| | - Daniel Teupser
- Institute of Laboratory Medicine, Ludwig-Maximilians-University, Munich, Germany
| | - Joachim Thiery
- LIFE - Leipzig Research Center for Civilization Diseases, Institute of Laboratory Medicine, University of Leipzig, Leipzig, Germany
| | - Markus Scholz
- Institute for Medical Informatics, Statistics and Epidemiology, LIFE - Leipzig Research Center for Civilization Diseases,
| |
Collapse
|
107
|
Kopf M, Hess WR. Regulatory RNAs in photosynthetic cyanobacteria. FEMS Microbiol Rev 2015; 39:301-15. [PMID: 25934122 PMCID: PMC6596454 DOI: 10.1093/femsre/fuv017] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 01/06/2015] [Accepted: 03/10/2015] [Indexed: 12/02/2022] Open
Abstract
Regulatory RNAs play versatile roles in bacteria in the coordination of gene expression during various physiological processes, especially during stress adaptation. Photosynthetic bacteria use sunlight as their major energy source. Therefore, they are particularly vulnerable to the damaging effects of excess light or UV irradiation. In addition, like all bacteria, photosynthetic bacteria must adapt to limiting nutrient concentrations and abiotic and biotic stress factors. Transcriptome analyses have identified hundreds of potential regulatory small RNAs (sRNAs) in model cyanobacteria such as Synechocystis sp. PCC 6803 or Anabaena sp. PCC 7120, and in environmentally relevant genera such as Trichodesmium, Synechococcus and Prochlorococcus. Some sRNAs have been shown to actually contain μORFs and encode short proteins. Examples include the 40-amino-acid product of the sml0013 gene, which encodes the NdhP subunit of the NDH1 complex. In contrast, the functional characterization of the non-coding sRNA PsrR1 revealed that the 131 nt long sRNA controls photosynthetic functions by targeting multiple mRNAs, providing a paradigm for sRNA functions in photosynthetic bacteria. We suggest that actuatons comprise a new class of genetic elements in which an sRNA gene is inserted upstream of a coding region to modify or enable transcription of that region.
Collapse
Affiliation(s)
- Matthias Kopf
- Faculty of Biology, Institute of Biology III, University of Freiburg, D-79104 Freiburg, Germany
| | - Wolfgang R Hess
- Faculty of Biology, Institute of Biology III, University of Freiburg, D-79104 Freiburg, Germany
| |
Collapse
|
108
|
Nitsche A, Rose D, Fasold M, Reiche K, Stadler PF. Comparison of splice sites reveals that long noncoding RNAs are evolutionarily well conserved. RNA (NEW YORK, N.Y.) 2015; 21:801-12. [PMID: 25802408 PMCID: PMC4408788 DOI: 10.1261/rna.046342.114] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Accepted: 12/24/2014] [Indexed: 05/03/2023]
Abstract
Large-scale RNA sequencing has revealed a large number of long mRNA-like transcripts (lncRNAs) that do not code for proteins. The evolutionary history of these lncRNAs has been notoriously hard to study systematically due to their low level of sequence conservation that precludes comprehensive homology-based surveys and makes them nearly impossible to align. An increasing number of special cases, however, has been shown to be at least as old as the vertebrate lineage. Here we use the conservation of splice sites to trace the evolution of lncRNAs. We show that >85% of the human GENCODE lncRNAs were already present at the divergence of placental mammals and many hundreds of these RNAs date back even further. Nevertheless, we observe a fast turnover of intron/exon structures. We conclude that lncRNA genes are evolutionary ancient components of vertebrate genomes that show an unexpected and unprecedented evolutionary plasticity. We offer a public web service (http://splicemap.bioinf.uni-leipzig.de) that allows to retrieve sets of orthologous splice sites and to produce overview maps of evolutionarily conserved splice sites for visualization and further analysis. An electronic supplement containing the ncRNA data sets used in this study is available at http://www.bioinf.uni-leipzig.de/publications/supplements/12-001.
Collapse
Affiliation(s)
- Anne Nitsche
- Bioinformatics Group, Department of Computer Science, University of Leipzig, D-04107 Leipzig, Germany Interdisciplinary Center for Bioinformatics, University of Leipzig, D-04107 Leipzig, Germany
| | - Dominic Rose
- Bioinformatics Group, Department of Computer Science, University of Freiburg, D-79110 Freiburg, Germany MML, Munich Leukemia Laboratory GmbH, D-81377 München, Germany
| | - Mario Fasold
- Interdisciplinary Center for Bioinformatics, University of Leipzig, D-04107 Leipzig, Germany ecSeq Bioinformatics, D-04275 Leipzig, Germany
| | - Kristin Reiche
- Young Investigators Group Bioinformatics and Transcriptomics, Department of Proteomics, Helmholtz Centre for Environmental Research-UFZ, D-04318 Leipzig, Germany Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology-IZI, D-04103 Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, D-04107 Leipzig, Germany Interdisciplinary Center for Bioinformatics, University of Leipzig, D-04107 Leipzig, Germany Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology-IZI, D-04103 Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany Department of Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria Center for non-coding RNA in Technology and Health, University of Copenhagen, DK-1870 Frederiksberg C, Denmark Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
109
|
Aprea J, Lesche M, Massalini S, Prenninger S, Alexopoulou D, Dahl A, Hiller M, Calegari F. Identification and expression patterns of novel long non-coding RNAs in neural progenitors of the developing mammalian cortex. NEUROGENESIS 2015; 2:e995524. [PMID: 27504473 PMCID: PMC4973583 DOI: 10.1080/23262133.2014.995524] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Revised: 11/20/2014] [Accepted: 12/02/2014] [Indexed: 11/21/2022]
Abstract
Long non-coding (lnc)RNAs play key roles in many biological processes. Elucidating the function of lncRNAs in cell type specification during organ development requires knowledge about their expression in individual progenitor types rather than in whole tissues. To achieve this during cortical development, we used a dual-reporter mouse line to isolate coexisting proliferating neural stem cells, differentiating neurogenic progenitors and newborn neurons and assessed the expression of lncRNAs by paired-end, high-throughput sequencing. We identified 379 genomic loci encoding novel lncRNAs and performed a comprehensive assessment of cell-specific expression patterns for all, annotated and novel, lncRNAs described to date. Our study provides a powerful new resource for studying these elusive transcripts during stem cell commitment and neurogenesis.
Collapse
Affiliation(s)
- Julieta Aprea
- DFG-Research Center and Cluster of Excellence for Regenerative Therapies; Dresden, Germany; Authors are equal contributing joint-first authors
| | - Mathias Lesche
- Deep Sequencing Group, Biotechnology Center; Dresden, Germany; Authors are equal contributing joint-first authors
| | - Simone Massalini
- DFG-Research Center and Cluster of Excellence for Regenerative Therapies ; Dresden, Germany
| | - Silvia Prenninger
- DFG-Research Center and Cluster of Excellence for Regenerative Therapies ; Dresden, Germany
| | | | - Andreas Dahl
- Deep Sequencing Group, Biotechnology Center ; Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics; Dresden, Germany; Max Planck Institute for the Physics of Complex Systems; Dresden, Germany
| | - Federico Calegari
- DFG-Research Center and Cluster of Excellence for Regenerative Therapies ; Dresden, Germany
| |
Collapse
|
110
|
Mallory AC, Shkumatava A. LncRNAs in vertebrates: advances and challenges. Biochimie 2015; 117:3-14. [PMID: 25812751 DOI: 10.1016/j.biochi.2015.03.014] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 03/17/2015] [Indexed: 01/06/2023]
Abstract
Beyond the handful of classic and well-characterized long noncoding RNAs (lncRNAs), more recently, hundreds of thousands of lncRNAs have been identified in multiple species including bacteria, plants and vertebrates, and the number of newly annotated lncRNAs continues to increase as more transcriptomes are analyzed. In vertebrates, the expression of many lncRNAs is highly regulated, displaying discrete temporal and spatial expression patterns, suggesting roles in a wide range of developmental processes and setting them apart from classic housekeeping ncRNAs. In addition, the deregulation of a subset of these lncRNAs has been linked to the development of several diseases, including cancers, as well as developmental anomalies. However, the majority of vertebrate lncRNA functions remain enigmatic. As such, a major task at hand is to decipher the biological roles of lncRNAs and uncover the regulatory networks upon which they impinge. This review focuses on our emerging understanding of lncRNAs in vertebrate animals, highlighting some recent advances in their functional analyses across several species and emphasizing the current challenges researchers face to characterize lncRNAs and identify their in vivo functions.
Collapse
Affiliation(s)
- Allison C Mallory
- Institut Curie, 26 Rue d'Ulm, 75248 Paris Cedex 05, France; CNRS UMR3215, 75248 Paris Cedex 05, France; INSERM U934, 75248 Paris Cedex 05, France.
| | - Alena Shkumatava
- Institut Curie, 26 Rue d'Ulm, 75248 Paris Cedex 05, France; CNRS UMR3215, 75248 Paris Cedex 05, France; INSERM U934, 75248 Paris Cedex 05, France.
| |
Collapse
|
111
|
Jalali S, Kapoor S, Sivadas A, Bhartiya D, Scaria V. Computational approaches towards understanding human long non-coding RNA biology. Bioinformatics 2015; 31:2241-51. [DOI: 10.1093/bioinformatics/btv148] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 03/10/2015] [Indexed: 12/18/2022] Open
|
112
|
Yang X, Xie X, Xiao YF, Xie R, Hu CJ, Tang B, Li BS, Yang SM. The emergence of long non-coding RNAs in the tumorigenesis of hepatocellular carcinoma. Cancer Lett 2015; 360:119-24. [PMID: 25721084 DOI: 10.1016/j.canlet.2015.02.035] [Citation(s) in RCA: 131] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2014] [Revised: 02/16/2015] [Accepted: 02/16/2015] [Indexed: 12/13/2022]
Abstract
Hepatocellular carcinoma (HCC) is the third cause of cancer-related death worldwide. However, the treatments for HCC are limited, and most of them are only available to the early stage. In the later stages, traditional chemotherapy has only marginal effects and may include toxicity. Thus, the identification of new predictive markers is urgently needed. New targets for non-conventional treatments will help to accelerate research on the molecular pathogenesis of HCC. A new class of transcripts, long non-coding RNAs (lncRNAs), has recently been found to be pervasively transcribed in the human genome. Aberrant expression of several lncRNAs was found to be involved in the tumorigenesis of HCC. In this review, we describe the possible molecular mechanisms that underlie lncRNA expression changes in HCC, as well as potential future applications of lncRNA research in the diagnosis and treatment of HCC.
Collapse
Affiliation(s)
- Xin Yang
- Department of Gastroenterology, Xinqiao Hospital, Third Military Medical University, Chongqing, 400037, China
| | - Xia Xie
- Department of Gastroenterology, Xinqiao Hospital, Third Military Medical University, Chongqing, 400037, China
| | - Yu-Feng Xiao
- Department of Gastroenterology, Xinqiao Hospital, Third Military Medical University, Chongqing, 400037, China
| | - Rei Xie
- Department of Gastroenterology, Xinqiao Hospital, Third Military Medical University, Chongqing, 400037, China
| | - Chang-Jiang Hu
- Department of Gastroenterology, Xinqiao Hospital, Third Military Medical University, Chongqing, 400037, China
| | - Bo Tang
- Department of Gastroenterology, Xinqiao Hospital, Third Military Medical University, Chongqing, 400037, China
| | - Bo-Sheng Li
- Department of Gastroenterology, Xinqiao Hospital, Third Military Medical University, Chongqing, 400037, China
| | - Shi-Ming Yang
- Department of Gastroenterology, Xinqiao Hospital, Third Military Medical University, Chongqing, 400037, China.
| |
Collapse
|
113
|
Santulli G. A Fleeting Glimpse Inside microRNA, Epigenetics, and Micropeptidomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 887:1-14. [PMID: 26662983 PMCID: PMC4871246 DOI: 10.1007/978-3-319-22380-3_1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
MicroRNAs (miRs) are important regulators of gene expression in numerous biological processes. Their maturation process is herein described, including the most updated insights from the current literature. Circa 2000 miR sequences have been identified in the human genome, with over 50,000 miR-target interactions, including enzymes involved in epigenetic modulation of gene expression. Moreover, some "pieces of RNA" previously annotated as noncoding have been recently found to encode micropeptides that carry out critical mechanistic functions in the cell. Advanced techniques now available will certainly allow a precise scanning of the genome looking for micropeptides hidden within the "noncoding" RNA.
Collapse
|
114
|
Zhang G, Li C, Li Q, Li B, Larkin DM, Lee C, Storz JF, Antunes A, Greenwold MJ, Meredith RW, Ödeen A, Cui J, Zhou Q, Xu L, Pan H, Wang Z, Jin L, Zhang P, Hu H, Yang W, Hu J, Xiao J, Yang Z, Liu Y, Xie Q, Yu H, Lian J, Wen P, Zhang F, Li H, Zeng Y, Xiong Z, Liu S, Zhou L, Huang Z, An N, Wang J, Zheng Q, Xiong Y, Wang G, Wang B, Wang J, Fan Y, da Fonseca RR, Alfaro-Núñez A, Schubert M, Orlando L, Mourier T, Howard JT, Ganapathy G, Pfenning A, Whitney O, Rivas MV, Hara E, Smith J, Farré M, Narayan J, Slavov G, Romanov MN, Borges R, Machado JP, Khan I, Springer MS, Gatesy J, Hoffmann FG, Opazo JC, Håstad O, Sawyer RH, Kim H, Kim KW, Kim HJ, Cho S, Li N, Huang Y, Bruford MW, Zhan X, Dixon A, Bertelsen MF, Derryberry E, Warren W, Wilson RK, Li S, Ray DA, Green RE, O'Brien SJ, Griffin D, Johnson WE, Haussler D, Ryder OA, Willerslev E, Graves GR, Alström P, Fjeldså J, Mindell DP, Edwards SV, Braun EL, Rahbek C, Burt DW, Houde P, Zhang Y, et alZhang G, Li C, Li Q, Li B, Larkin DM, Lee C, Storz JF, Antunes A, Greenwold MJ, Meredith RW, Ödeen A, Cui J, Zhou Q, Xu L, Pan H, Wang Z, Jin L, Zhang P, Hu H, Yang W, Hu J, Xiao J, Yang Z, Liu Y, Xie Q, Yu H, Lian J, Wen P, Zhang F, Li H, Zeng Y, Xiong Z, Liu S, Zhou L, Huang Z, An N, Wang J, Zheng Q, Xiong Y, Wang G, Wang B, Wang J, Fan Y, da Fonseca RR, Alfaro-Núñez A, Schubert M, Orlando L, Mourier T, Howard JT, Ganapathy G, Pfenning A, Whitney O, Rivas MV, Hara E, Smith J, Farré M, Narayan J, Slavov G, Romanov MN, Borges R, Machado JP, Khan I, Springer MS, Gatesy J, Hoffmann FG, Opazo JC, Håstad O, Sawyer RH, Kim H, Kim KW, Kim HJ, Cho S, Li N, Huang Y, Bruford MW, Zhan X, Dixon A, Bertelsen MF, Derryberry E, Warren W, Wilson RK, Li S, Ray DA, Green RE, O'Brien SJ, Griffin D, Johnson WE, Haussler D, Ryder OA, Willerslev E, Graves GR, Alström P, Fjeldså J, Mindell DP, Edwards SV, Braun EL, Rahbek C, Burt DW, Houde P, Zhang Y, Yang H, Wang J, Jarvis ED, Gilbert MTP, Wang J. Comparative genomics reveals insights into avian genome evolution and adaptation. Science 2014; 346:1311-20. [PMID: 25504712 PMCID: PMC4390078 DOI: 10.1126/science.1251385] [Show More Authors] [Citation(s) in RCA: 717] [Impact Index Per Article: 65.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.
Collapse
Affiliation(s)
- Guojie Zhang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China. Centre for Social Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, DK-2100 Copenhagen, Denmark.
| | - Cai Li
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China. Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Qiye Li
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China. Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Bo Li
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Denis M Larkin
- Royal Veterinary College, University of London, London, UK
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-742, Republic of Korea. Cho and Kim Genomics, Seoul National University Research Park, Seoul 151-919, Republic of Korea
| | - Jay F Storz
- School of Biological Sciences, University of Nebraska, Lincoln, NE 68588, USA
| | - Agostinho Antunes
- Centro de Investigación en Ciencias del Mar y Limnología (CIMAR)/Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR), Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal. Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Matthew J Greenwold
- Department of Biological Sciences, University of South Carolina, Columbia, SC, USA
| | - Robert W Meredith
- Department of Biology and Molecular Biology, Montclair State University, Montclair, NJ 07043, USA
| | - Anders Ödeen
- Department of Animal Ecology, Uppsala University, Norbyvägen 18D, S-752 36 Uppsala, Sweden
| | - Jie Cui
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Biological Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia. Program in Emerging Infectious Diseases, Duke-NUS Graduate Medical School, Singapore 169857, Singapore
| | - Qi Zhou
- Department of Integrative Biology University of California, Berkeley, CA 94720, USA
| | - Luohao Xu
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China. College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Hailin Pan
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Zongji Wang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China. School of Bioscience and Bioengineering, South China University of Technology, Guangzhou 510006, China
| | - Lijun Jin
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Pei Zhang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Haofu Hu
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Wei Yang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Jiang Hu
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Jin Xiao
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Zhikai Yang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Yang Liu
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Qiaolin Xie
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Hao Yu
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Jinmin Lian
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Ping Wen
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Fang Zhang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Hui Li
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Yongli Zeng
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Zijun Xiong
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Shiping Liu
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China. School of Bioscience and Bioengineering, South China University of Technology, Guangzhou 510006, China
| | - Long Zhou
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Zhiyong Huang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Na An
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Jie Wang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China. BGI Education Center,University of Chinese Academy of Sciences,Shenzhen, 518083, China
| | - Qiumei Zheng
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Yingqi Xiong
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Guangbiao Wang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Bo Wang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Jingjing Wang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Yu Fan
- Key Laboratory of Animal Models and Human Disease Mechanisms of Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Kunming, Yunnan 650223, China
| | - Rute R da Fonseca
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Alonzo Alfaro-Núñez
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Mikkel Schubert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Ludovic Orlando
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Tobias Mourier
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Jason T Howard
- Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC 27710, USA
| | - Ganeshkumar Ganapathy
- Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC 27710, USA
| | - Andreas Pfenning
- Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC 27710, USA
| | - Osceola Whitney
- Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC 27710, USA
| | - Miriam V Rivas
- Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC 27710, USA
| | - Erina Hara
- Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC 27710, USA
| | - Julia Smith
- Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC 27710, USA
| | - Marta Farré
- Royal Veterinary College, University of London, London, UK
| | - Jitendra Narayan
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, UK
| | - Gancho Slavov
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, UK
| | | | - Rui Borges
- Centro de Investigación en Ciencias del Mar y Limnología (CIMAR)/Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR), Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal. Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - João Paulo Machado
- Centro de Investigación en Ciencias del Mar y Limnología (CIMAR)/Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR), Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal. Instituto de Ciências Biomédicas Abel Salazar (ICBAS), Universidade do Porto, Portugal
| | - Imran Khan
- Centro de Investigación en Ciencias del Mar y Limnología (CIMAR)/Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR), Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal. Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Mark S Springer
- Department of Biology, University of California Riverside, Riverside, CA 92521, USA
| | - John Gatesy
- Department of Biology, University of California Riverside, Riverside, CA 92521, USA
| | - Federico G Hoffmann
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA. Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA
| | - Juan C Opazo
- Instituto de Ciencias Ambientales y Evolutivas, Facultad de Ciencias, Universidad Austral de Chile, Valdivia, Chile
| | - Olle Håstad
- Department of Anatomy, Physiology and Biochemistry, Swedish University of Agricultural Sciences, Post Office Box 7011, S-750 07, Uppsala, Sweden
| | - Roger H Sawyer
- Department of Biological Sciences, University of South Carolina, Columbia, SC, USA
| | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-742, Republic of Korea. Cho and Kim Genomics, Seoul National University Research Park, Seoul 151-919, Republic of Korea. Department of Agricultural Biotechnology and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-742, Republic of Korea
| | - Kyu-Won Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-742, Republic of Korea
| | - Hyeon Jeong Kim
- Cho and Kim Genomics, Seoul National University Research Park, Seoul 151-919, Republic of Korea
| | - Seoae Cho
- Cho and Kim Genomics, Seoul National University Research Park, Seoul 151-919, Republic of Korea
| | - Ning Li
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing 100094, China
| | - Yinhua Huang
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing 100094, China. College of Animal Science and Technology, China Agricultural University, Beijing 100094, China
| | - Michael W Bruford
- Organisms and Environment Division, Cardiff School of Biosciences, Cardiff University, Cardiff CF10 3AX, Wales, UK
| | - Xiangjiang Zhan
- Organisms and Environment Division, Cardiff School of Biosciences, Cardiff University, Cardiff CF10 3AX, Wales, UK. Key Lab of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101 China
| | - Andrew Dixon
- International Wildlife Consultants, Carmarthen SA33 5YL, Wales, UK
| | - Mads F Bertelsen
- Centre for Zoo and Wild Animal Health, Copenhagen Zoo, Roskildevej 38, DK-2000 Frederiksberg, Denmark
| | - Elizabeth Derryberry
- Department of Ecology and Evolutionary Biology, Tulane University, New Orleans, LA, USA. Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Wesley Warren
- The Genome Institute at Washington University, St. Louis, MO 63108, USA
| | - Richard K Wilson
- The Genome Institute at Washington University, St. Louis, MO 63108, USA
| | - Shengbin Li
- College of Medicine and Forensics, Xi'an Jiaotong University, Xi'an, 710061, China
| | - David A Ray
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Stephen J O'Brien
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia. Nova Southeastern University Oceanographic Center 8000 N Ocean Drive, Dania, FL 33004, USA
| | - Darren Griffin
- School of Biosciences, University of Kent, Canterbury CT2 7NJ, UK
| | - Warren E Johnson
- Smithsonian Conservation Biology Institute, National Zoological Park, 1500 Remount Road, Front Royal, VA 22630, USA
| | - David Haussler
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Oliver A Ryder
- Genetics Division, San Diego Zoo Institute for Conservation Research, 15600 San Pasqual Valley Road, Escondido, CA 92027, USA
| | - Eske Willerslev
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Gary R Graves
- Department of Vertebrate Zoology, MRC-116, National Museum of Natural History, Smithsonian Institution, Post Office Box 37012, Washington, DC 20013-7012, USA. Center for Macroecology, Evolution and Climate, the Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen O, Denmark
| | - Per Alström
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China. Swedish Species Information Centre, Swedish University of Agricultural Sciences, Box 7007, SE-750 07 Uppsala, Sweden
| | - Jon Fjeldså
- Center for Macroecology, Evolution and Climate, the Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen O, Denmark
| | - David P Mindell
- Department of Biochemistry & Biophysics, University of California, San Francisco, CA 94158, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA
| | - Edward L Braun
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - Carsten Rahbek
- Center for Macroecology, Evolution and Climate, the Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen O, Denmark. Imperial College London, Grand Challenges in Ecosystems and the Environment Initiative, Silwood Park Campus, Ascot, Berkshire SL5 7PY, UK
| | - David W Burt
- Division of Genetics and Genomics, The Roslin Institute and Royal (Dick) School of Veterinary Studies, The Roslin Institute Building, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - Peter Houde
- Department of Biology, New Mexico State University, Box 30001 MSC 3AF, Las Cruces, NM 88003, USA
| | - Yong Zhang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Huanming Yang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China. Macau University of Science and Technology, Avenida Wai long, Taipa, Macau 999078, China
| | - Jian Wang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China
| | - Erich D Jarvis
- Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC 27710, USA.
| | - M Thomas P Gilbert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark. Trace and Environmental DNA Laboratory, Department of Environment and Agriculture, Curtin University, Perth, Western Australia, 6102, Australia.
| | - Jun Wang
- China National GeneBank, Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, 518083, China. Macau University of Science and Technology, Avenida Wai long, Taipa, Macau 999078, China. Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark. Princess Al Jawhara Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah 21589, Saudi Arabia. Department of Medicine, University of Hong Kong, Hong Kong.
| |
Collapse
|
115
|
Hu L, Di C, Kai M, Yang YCT, Li Y, Qiu Y, Hu X, Yip KY, Zhang MQ, Lu ZJ. A common set of distinct features that characterize noncoding RNAs across multiple species. Nucleic Acids Res 2014; 43:104-14. [PMID: 25505163 PMCID: PMC4288202 DOI: 10.1093/nar/gku1316] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
To find signature features shared by various ncRNA sub-types and characterize novel ncRNAs, we have developed a method, RNAfeature, to investigate >600 sets of genomic and epigenomic data with various evolutionary and biophysical scores. RNAfeature utilizes a fine-tuned intra-species wrapper algorithm that is followed by a novel feature selection strategy across species. It considers long distance effect of certain features (e.g. histone modification at the promoter region). We finally narrow down on 10 informative features (including sequences, structures, expression profiles and epigenetic signals). These features are complementary to each other and as a whole can accurately distinguish canonical ncRNAs from CDSs and UTRs (accuracies: >92% in human, mouse, worm and fly). Moreover, the feature pattern is conserved across multiple species. For instance, the supervised 10-feature model derived from animal species can predict ncRNAs in Arabidopsis (accuracy: 82%). Subsequently, we integrate the 10 features to define a set of noncoding potential scores, which can identify, evaluate and characterize novel noncoding RNAs. The score covers all transcribed regions (including unconserved ncRNAs), without requiring assembly of the full-length transcripts. Importantly, the noncoding potential allows us to identify and characterize potential functional domains with feature patterns similar to canonical ncRNAs (e.g. tRNA, snRNA, miRNA, etc) on ∼70% of human long ncRNAs (lncRNAs).
Collapse
Affiliation(s)
- Long Hu
- PKU-Tsinghua-NIBS Graduate Program, School of Life Sciences, Peking University, Beijing 100871, China MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Chao Di
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Mingxuan Kai
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yu-Cheng T Yang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yang Li
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yunjiang Qiu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xihao Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Kevin Y Yip
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Michael Q Zhang
- Department of Molecular and Cell Biology, Center for Systems Biology, The University of Texas, Dallas 800 West Campbell Road, RL11 Richardson, TX 75080-3021, USA MOE Key Laboratory of Bioinformatics and Bioinformatics Division, Center for Synthetic and Systems Biology, TNLIST and School of Medicine, Tsinghua University, Beijing 100084, China
| | - Zhi John Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
116
|
Lindgreen S, Umu SU, Lai ASW, Eldai H, Liu W, McGimpsey S, Wheeler NE, Biggs PJ, Thomson NR, Barquist L, Poole AM, Gardner PP. Robust identification of noncoding RNA from transcriptomes requires phylogenetically-informed sampling. PLoS Comput Biol 2014; 10:e1003907. [PMID: 25357249 PMCID: PMC4214555 DOI: 10.1371/journal.pcbi.1003907] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Accepted: 09/11/2014] [Indexed: 02/03/2023] Open
Abstract
Noncoding RNAs are integral to a wide range of biological processes, including translation, gene regulation, host-pathogen interactions and environmental sensing. While genomics is now a mature field, our capacity to identify noncoding RNA elements in bacterial and archaeal genomes is hampered by the difficulty of de novo identification. The emergence of new technologies for characterizing transcriptome outputs, notably RNA-seq, are improving noncoding RNA identification and expression quantification. However, a major challenge is to robustly distinguish functional outputs from transcriptional noise. To establish whether annotation of existing transcriptome data has effectively captured all functional outputs, we analysed over 400 publicly available RNA-seq datasets spanning 37 different Archaea and Bacteria. Using comparative tools, we identify close to a thousand highly-expressed candidate noncoding RNAs. However, our analyses reveal that capacity to identify noncoding RNA outputs is strongly dependent on phylogenetic sampling. Surprisingly, and in stark contrast to protein-coding genes, the phylogenetic window for effective use of comparative methods is perversely narrow: aggregating public datasets only produced one phylogenetic cluster where these tools could be used to robustly separate unannotated noncoding RNAs from a null hypothesis of transcriptional noise. Our results show that for the full potential of transcriptomics data to be realized, a change in experimental design is paramount: effective transcriptomics requires phylogeny-aware sampling. We have analysed more than 400 public transcriptomes, generated using RNA-seq, from almost 40 strains of Bacteria and Archaea. We discovered that the capacity to identify noncoding RNA outputs from this data is strongly dependent on phylogenetic sampling. Our results show that, for the full potential of transcriptomics data as a discovery tool to be realized, a change in experimental design is critical: effective comparative transcriptomics requires phylogeny-aware sampling. We also examined how comparative transcriptomics experiments can be used to effectively identify RNA elements. We find that, for RNA element discovery, a phylogeny-informed sampling approach is more effective than analyses of individual species. Phylogeny-informed sampling reveals a narrow ‘Goldilocks Zone’ (where species are not too similar and not too divergent) for RNA identification using clusters of related species. In stark contrast to protein-coding genes, not only is the phylogenetic window for the effective use of comparative methods for noncoding RNA identification perversely narrow, but few existing datasets sit within this Goldilocks Zone: by aggregating public datasets, we were only able to create one phylogenetic cluster where comparative tools could be used to confidently separate unannotated noncoding RNAs from transcriptional noise.
Collapse
MESH Headings
- Archaea/genetics
- Bacteria/genetics
- Cluster Analysis
- Computational Biology
- Databases, Genetic
- Gene Expression Profiling/methods
- Phylogeny
- RNA, Archaeal/chemistry
- RNA, Archaeal/classification
- RNA, Archaeal/genetics
- RNA, Bacterial/chemistry
- RNA, Bacterial/classification
- RNA, Bacterial/genetics
- RNA, Untranslated/chemistry
- RNA, Untranslated/classification
- RNA, Untranslated/genetics
- Transcriptome/genetics
Collapse
Affiliation(s)
- Stinus Lindgreen
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Sinan Uğur Umu
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
| | - Alicia Sook-Wei Lai
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Hisham Eldai
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Wenting Liu
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Stephanie McGimpsey
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Nicole E. Wheeler
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Patrick J. Biggs
- Institute of Veterinary, Animal & Biomedical Sciences, Massey University, Palmerston North, New Zealand
- Allan Wilson Centre for Molecular Ecology & Evolution, Massey University, Palmerston North, New Zealand
| | - Nick R. Thomson
- Pathogen Genetics, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Lars Barquist
- Pathogen Genetics, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
- Institute for Molecular Infection Biology, University of Wuerzburg, Wuerzburg, Germany
| | - Anthony M. Poole
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
- Allan Wilson Centre for Molecular Ecology & Evolution, Massey University, Palmerston North, New Zealand
- * E-mail: (AMP); (PPG)
| | - Paul P. Gardner
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
- * E-mail: (AMP); (PPG)
| |
Collapse
|
117
|
Long non-coding RNAs differentially expressed between normal versus primary breast tumor tissues disclose converse changes to breast cancer-related protein-coding genes. PLoS One 2014; 9:e106076. [PMID: 25264628 PMCID: PMC4180073 DOI: 10.1371/journal.pone.0106076] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Accepted: 07/29/2014] [Indexed: 12/04/2022] Open
Abstract
Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes.
Collapse
|
118
|
The primary transcriptome of the marine diazotroph Trichodesmium erythraeum IMS101. Sci Rep 2014; 4:6187. [PMID: 25155278 PMCID: PMC4143802 DOI: 10.1038/srep06187] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 08/04/2014] [Indexed: 01/03/2023] Open
Abstract
Blooms of the dinitrogen-fixing marine cyanobacterium Trichodesmium considerably contribute to new nitrogen inputs into tropical oceans. Intriguingly, only 60% of the Trichodesmium erythraeum IMS101 genome sequence codes for protein, compared with ~85% in other sequenced cyanobacterial genomes. The extensive non-coding genome fraction suggests space for an unusually high number of unidentified, potentially regulatory non-protein-coding RNAs (ncRNAs). To identify the transcribed fraction of the genome, here we present a genome-wide map of transcriptional start sites (TSS) at single nucleotide resolution, revealing the activity of 6,080 promoters. We demonstrate that T. erythraeum has the highest number of actively splicing group II introns and the highest percentage of TSS yielding ncRNAs of any bacterium examined to date. We identified a highly transcribed retroelement that serves as template repeat for the targeted mutation of at least 12 different genes by mutagenic homing. Our findings explain the non-coding portion of the T. erythraeum genome by the transcription of an unusually high number of non-coding transcripts in addition to the known high incidence of transposable elements. We conclude that riboregulation and RNA maturation-dependent processes constitute a major part of the Trichodesmium regulatory apparatus.
Collapse
|
119
|
Brain-specific noncoding RNAs are likely to originate in repeats and may play a role in up-regulating genes in cis. Int J Biochem Cell Biol 2014; 54:331-7. [PMID: 24993078 DOI: 10.1016/j.biocel.2014.06.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Revised: 06/01/2014] [Accepted: 06/20/2014] [Indexed: 12/21/2022]
Abstract
The mouse and human brain express a large number of noncoding RNAs (ncRNAs). Some of these are known to participate in neural progenitor cell fate determination, cell differentiation, neuronal and synaptic plasticity and transposable elements derived ncRNAs contribute to somatic variation. Dysregulation of specific long ncRNAs (lncRNAs) has been shown in neuro-developmental and neuro-degenerative diseases thus highlighting the importance of lncRNAs in brain function. Even though it is known that lncRNAs are expressed in cells at low levels in a tissue-specific manner, bioinformatics analyses of brain-specific ncRNAs has not been performed. We analyzed previously published custom microarray ncRNA expression data generated from twelve human tissues to identify tissue-specific ncRNAs. We find that among the 12 tissues studied, brain has the largest number of ncRNAs. Our analyses show that genes in the vicinity of brain-specific ncRNAs are significantly up regulated in the brain. Investigations of repeat representation show that brain-specific ncRNAs are significantly more likely to originate in repeat regions especially DNA/TcMar-Tigger compared with non-tissue-specific ncRNAs. We find SINE/Alus depleted from brain-specific dataset when compared with non-tissue-specific ncRNAs. Our data provide a bioinformatics comparison between brain-specific and non tissue-specific ncRNAs. This article is part of a Directed Issue entitled: The Non-coding RNA Revolution.
Collapse
|
120
|
Bai Y, Dai X, Harrison AP, Chen M. RNA regulatory networks in animals and plants: a long noncoding RNA perspective. Brief Funct Genomics 2014; 14:91-101. [PMID: 24914100 DOI: 10.1093/bfgp/elu017] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
A recent highlight of genomics research has been the discovery of many families of transcripts which have function but do not code for proteins. An important group is long noncoding RNAs (lncRNAs), which are typically longer than 200 nt, and whose members originate from thousands of loci across genomes. We review progress in understanding the biogenesis and regulatory mechanisms of lncRNAs. We describe diverse computational and high throughput technologies for identifying and studying lncRNAs. We discuss the current knowledge of functional elements embedded in lncRNAs as well as insights into the lncRNA-based regulatory network in animals. We also describe genome-wide studies of large amount of lncRNAs in plants, as well as knowledge of selected plant lncRNAs with a focus on biotic/abiotic stress-responsive lncRNAs.
Collapse
|
121
|
Wu Z, Wu C, Shao J, Zhu Z, Wang W, Zhang W, Tang M, Pei N, Fan H, Li J, Yao H, Gu H, Xu X, Lu C. The Streptococcus suis transcriptional landscape reveals adaptation mechanisms in pig blood and cerebrospinal fluid. RNA (NEW YORK, N.Y.) 2014; 20:882-898. [PMID: 24759092 PMCID: PMC4024642 DOI: 10.1261/rna.041822.113] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2013] [Accepted: 03/11/2014] [Indexed: 06/03/2023]
Abstract
Streptococcus suis (SS) is an important pathogen of pigs, and it is also recognized as a zoonotic agent for humans. SS infection may result in septicemia or meningitis in the host. However, little is known about genes that contribute to the virulence process and survival within host blood or cerebrospinal fluid (CSF). Small RNAs (sRNA) have emerged as key regulators of virulence in several bacteria, but they have not been investigated in SS. Here, using a differential RNA-sequencing approach and RNAs from SS strain P1/7 grown in rich medium, pig blood, or CSF, we present the SS genome-wide map of 793 transcriptional start sites and 370 operons. In addition to identifying 29 sRNAs, we show that five sRNA deletion mutants attenuate SS virulence in a zebrafish infection model. Homology searches revealed that 10 sRNAs were predicted to be present in other pathogenic Streptococcus species. Compared with wild-type strain P1/7, sRNAs rss03, rss05, and rss06 deletion mutants were significantly more sensitive to killing by pig blood. It is possible that rss06 contributes to SS virulence by indirectly activating expression of SSU0308, a virulence gene encoding a zinc-binding lipoprotein. In blood, genes involved in the synthesis of capsular polysaccharide (CPS) and subversion of host defenses were up-regulated. In contrast, in CSF, genes for CPS synthesis were down-regulated. Our study is the first analysis of SS sRNAs involved in virulence and has both improved our understanding of SS pathogenesis and increased the number of sRNAs known to play definitive roles in bacterial virulence.
Collapse
Affiliation(s)
- Zongfu Wu
- College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
- Key Lab of Animal Bacteriology, Ministry of Agriculture, Nanjing 210095, China
- OIE Reference Laboratory for Swine Streptococcosis, Nanjing 210095, China
| | | | - Jing Shao
- College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
- Key Lab of Animal Bacteriology, Ministry of Agriculture, Nanjing 210095, China
- OIE Reference Laboratory for Swine Streptococcosis, Nanjing 210095, China
| | | | - Weixue Wang
- College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
- Key Lab of Animal Bacteriology, Ministry of Agriculture, Nanjing 210095, China
- OIE Reference Laboratory for Swine Streptococcosis, Nanjing 210095, China
| | | | - Min Tang
- College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
- Key Lab of Animal Bacteriology, Ministry of Agriculture, Nanjing 210095, China
- OIE Reference Laboratory for Swine Streptococcosis, Nanjing 210095, China
| | - Na Pei
- BGI-Shenzhen, Shenzhen 518083, China
| | - Hongjie Fan
- College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
- Key Lab of Animal Bacteriology, Ministry of Agriculture, Nanjing 210095, China
- OIE Reference Laboratory for Swine Streptococcosis, Nanjing 210095, China
- Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou 225009, China
| | | | - Huochun Yao
- College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
- Key Lab of Animal Bacteriology, Ministry of Agriculture, Nanjing 210095, China
- OIE Reference Laboratory for Swine Streptococcosis, Nanjing 210095, China
| | - Hongwei Gu
- Jiangsu Engineering Research Center for microRNA Biology and Biotechnology, State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210093, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen 518083, China
| | - Chengping Lu
- College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China
- Key Lab of Animal Bacteriology, Ministry of Agriculture, Nanjing 210095, China
- OIE Reference Laboratory for Swine Streptococcosis, Nanjing 210095, China
| |
Collapse
|
122
|
Lertampaiporn S, Thammarongtham C, Nukoolkit C, Kaewkamnerdpong B, Ruengjitchatchawalya M. Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm. Nucleic Acids Res 2014; 42:e93. [PMID: 24771344 PMCID: PMC4066759 DOI: 10.1093/nar/gku325] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/02/2014] [Accepted: 04/07/2014] [Indexed: 12/13/2022] Open
Abstract
To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features-structure, sequence, modularity, structural robustness and coding potential-to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm.
Collapse
Affiliation(s)
- Supatcha Lertampaiporn
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Chinae Thammarongtham
- Biochemical Engineering and Pilot Plant Research and Development Unit, National Center for Genetic Engineering and Biotechnology at King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand
| | - Chakarida Nukoolkit
- School of Information Technology, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Boonserm Kaewkamnerdpong
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Marasri Ruengjitchatchawalya
- Biotechnology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand Bioinformatics and Systems Biology Program, King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand
| |
Collapse
|
123
|
Backofen R, Vogel T. Biological and bioinformatical approaches to study crosstalk of long-non-coding RNAs and chromatin-modifying proteins. Cell Tissue Res 2014; 356:507-26. [PMID: 24820400 DOI: 10.1007/s00441-014-1885-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2014] [Accepted: 03/27/2014] [Indexed: 02/04/2023]
Abstract
Long-non-coding RNA (lncRNA) regulates gene expression through transcriptional and epigenetic regulation as well as alternative splicing in the nucleus. In addition, regulation is achieved at the levels of mRNA translation, storage and degradation in the cytoplasm. During recent years, several studies have described the interaction of lncRNAs with enzymes that confer so-called epigenetic modifications, such as DNA methylation, histone modifications and chromatin structure or remodelling. LncRNA interaction with chromatin-modifying enzymes (CME) is an emerging field that confers another layer of complexity in transcriptional regulation. Given that CME-lncRNA interactions have been identified in many biological processes, ranging from development to disease, comprehensive understanding of underlying mechanisms is important to inspire basic and translational research in the future. In this review, we highlight recent findings to extend our understanding about the functional interdependencies between lncRNAs and CMEs that activate or repress gene expression. We focus on recent highlights of molecular and functional roles for CME-lncRNAs and provide an interdisciplinary overview of recent technical and methodological developments that have improved biological and bioinformatical approaches for detection and functional studies of CME-lncRNA interaction.
Collapse
Affiliation(s)
- Rolf Backofen
- Institute of Computer Science, Albert-Ludwigs-University, Freiburg, Germany
| | | |
Collapse
|
124
|
Backofen R, Amman F, Costa F, Findeiß S, Richter AS, Stadler PF. Bioinformatics of prokaryotic RNAs. RNA Biol 2014; 11:470-83. [PMID: 24755880 PMCID: PMC4152356 DOI: 10.4161/rna.28647] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 03/17/2014] [Accepted: 03/25/2014] [Indexed: 02/02/2023] Open
Abstract
The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
- Center for non-coding RNA in Technology and Health; University of Copenhagen; Grønnegårdsvej 3; DK-1870 Frederiksberg C, Denmark
| | - Fabian Amman
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics Group; Department of Computer Science, and Interdisciplinary Center for Bioinformatics; University of Leipzig; Härtelstraße 16-18; D-04107 Leipzig, Germany
| | - Fabrizio Costa
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
| | - Sven Findeiß
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics and Computational Biology Research Group; University of Vienna; Währingerstraße 29; A-1090 Wien, Austria
| | - Andreas S Richter
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
- Max Planck Institute of Immunobiology and Epigenetics; Stübeweg 51; D-79108 Freiburg, Germany
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health; University of Copenhagen; Grønnegårdsvej 3; DK-1870 Frederiksberg C, Denmark
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics Group; Department of Computer Science, and Interdisciplinary Center for Bioinformatics; University of Leipzig; Härtelstraße 16-18; D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences; Inselstraße 22; D-04103 Leipzig, Germany
- Fraunhofer Institute for Cell Therapy and Immunology – IZI; Perlickstraße 1; D-04103 Leipzig, Germany
- Santa Fe Institute; Santa Fe, NM USA
| |
Collapse
|
125
|
Hackermüller J, Reiche K, Otto C, Hösler N, Blumert C, Brocke-Heidrich K, Böhlig L, Nitsche A, Kasack K, Ahnert P, Krupp W, Engeland K, Stadler PF, Horn F. Cell cycle, oncogenic and tumor suppressor pathways regulate numerous long and macro non-protein-coding RNAs. Genome Biol 2014; 15:R48. [PMID: 24594072 PMCID: PMC4054595 DOI: 10.1186/gb-2014-15-3-r48] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Accepted: 03/04/2014] [Indexed: 12/16/2022] Open
Abstract
Background The genome is pervasively transcribed but most transcripts do not code for proteins, constituting non-protein-coding RNAs. Despite increasing numbers of functional reports of individual long non-coding RNAs (lncRNAs), assessing the extent of functionality among the non-coding transcriptional output of mammalian cells remains intricate. In the protein-coding world, transcripts differentially expressed in the context of processes essential for the survival of multicellular organisms have been instrumental in the discovery of functionally relevant proteins and their deregulation is frequently associated with diseases. We therefore systematically identified lncRNAs expressed differentially in response to oncologically relevant processes and cell-cycle, p53 and STAT3 pathways, using tiling arrays. Results We found that up to 80% of the pathway-triggered transcriptional responses are non-coding. Among these we identified very large macroRNAs with pathway-specific expression patterns and demonstrated that these are likely continuous transcripts. MacroRNAs contain elements conserved in mammals and sauropsids, which in part exhibit conserved RNA secondary structure. Comparing evolutionary rates of a macroRNA to adjacent protein-coding genes suggests a local action of the transcript. Finally, in different grades of astrocytoma, a tumor disease unrelated to the initially used cell lines, macroRNAs are differentially expressed. Conclusions It has been shown previously that the majority of expressed non-ribosomal transcripts are non-coding. We now conclude that differential expression triggered by signaling pathways gives rise to a similar abundance of non-coding content. It is thus unlikely that the prevalence of non-coding transcripts in the cell is a trivial consequence of leaky or random transcription events.
Collapse
|
126
|
Coding sequence density estimation via topological pressure. J Math Biol 2014; 70:45-69. [PMID: 24448658 DOI: 10.1007/s00285-014-0754-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Revised: 12/31/2013] [Indexed: 10/25/2022]
Abstract
We give a new approach to coding sequence (CDS) density estimation in genomic analysis based on the topological pressure, which we develop from a well known concept in ergodic theory. Topological pressure measures the 'weighted information content' of a finite word, and incorporates 64 parameters which can be interpreted as a choice of weight for each nucleotide triplet. We train the parameters so that the topological pressure fits the observed coding sequence density on the human genome, and use this to give ab initio predictions of CDS density over windows of size around 66,000 bp on the genomes of Mus Musculus, Rhesus Macaque and Drososphilia Melanogaster. While the differences between these genomes are too great to expect that training on the human genome could predict, for example, the exact locations of genes, we demonstrate that our method gives reasonable estimates for the 'coarse scale' problem of predicting CDS density. Inspired again by ergodic theory, the weightings of the nucleotide triplets obtained from our training procedure are used to define a probability distribution on finite sequences, which can be used to distinguish between intron and exon sequences from the human genome of lengths between 750 and 5,000 bp. At the end of the paper, we explain the theoretical underpinning for our approach, which is the theory of Thermodynamic Formalism from the dynamical systems literature. Mathematica and MATLAB implementations of our method are available at http://sourceforge.net/projects/topologicalpres/ .
Collapse
|
127
|
Washietl S, Kellis M, Garber M. Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals. Genome Res 2014; 24:616-28. [PMID: 24429298 PMCID: PMC3975061 DOI: 10.1101/gr.165035.113] [Citation(s) in RCA: 298] [Impact Index Per Article: 27.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Long intergenic noncoding RNAs (lincRNAs) play diverse regulatory roles in human development and disease, but little is known about their evolutionary history and constraint. Here, we characterize human lincRNA expression patterns in nine tissues across six mammalian species and multiple individuals. Of the 1898 human lincRNAs expressed in these tissues, we find orthologous transcripts for 80% in chimpanzee, 63% in rhesus, 39% in cow, 38% in mouse, and 35% in rat. Mammalian-expressed lincRNAs show remarkably strong conservation of tissue specificity, suggesting that it is selectively maintained. In contrast, abundant splice-site turnover suggests that exact splice sites are not critical. Relative to evolutionarily young lincRNAs, mammalian-expressed lincRNAs show higher primary sequence conservation in their promoters and exons, increased proximity to protein-coding genes enriched for tissue-specific functions, fewer repeat elements, and more frequent single-exon transcripts. Remarkably, we find that ∼20% of human lincRNAs are not expressed beyond chimpanzee and are undetectable even in rhesus. These hominid-specific lincRNAs are more tissue specific, enriched for testis, and faster evolving within the human lineage.
Collapse
Affiliation(s)
- Stefan Washietl
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02140, USA
| | | | | |
Collapse
|
128
|
Abstract
Transcriptomics experiments and computational predictions both enable systematic discovery of new functional RNAs. However, many putative noncoding transcripts arise instead from artifacts and biological noise, and current computational prediction methods have high false positive rates. I discuss prospects for improving computational methods for analyzing and identifying functional RNAs, with a focus on detecting signatures of conserved RNA secondary structure. An interesting new front is the application of chemical and enzymatic experiments that probe RNA structure on a transcriptome-wide scale. I review several proposed approaches for incorporating structure probing data into the computational prediction of RNA secondary structure. Using probabilistic inference formalisms, I show how all these approaches can be unified in a well-principled framework, which in turn allows RNA probing data to be easily integrated into a wide range of analyses that depend on RNA secondary structure inference. Such analyses include homology search and genome-wide detection of new structural RNAs.
Collapse
Affiliation(s)
- Sean R Eddy
- Howard Hughes Medical Institute Janelia Farm Research Campus, Ashburn, Virginia 20147;
| |
Collapse
|
129
|
Nitsche A, Doose G, Tafer H, Robinson M, Saha NR, Gerdol M, Canapa A, Hoffmann S, Amemiya CT, Stadler PF. Atypical RNAs in the coelacanth transcriptome. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2013; 322:342-51. [PMID: 24174405 DOI: 10.1002/jez.b.22542] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 07/22/2013] [Accepted: 08/16/2013] [Indexed: 01/15/2023]
Abstract
Circular and apparently trans-spliced RNAs have recently been reported as abundant types of transcripts in mammalian transcriptome data. Both types of non-colinear RNAs are also abundant in RNA-seq of different tissue from both the African and the Indonesian coelacanth. We observe more than 8,000 lincRNAs with normal gene structure and several thousands of circularized and trans-spliced products, showing that such atypical RNAs form a substantial contribution to the transcriptome. Surprisingly, the majority of the circularizing and trans-connecting splice junctions are unique to atypical forms, that is, are not used in normal isoforms.
Collapse
Affiliation(s)
- Anne Nitsche
- Department of Computer Science, Bioinformatics Group, University of Leipzig, Leipzig, Germany; Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
130
|
Abstract
Long intervening noncoding RNAs (lincRNAs) are transcribed from thousands of loci in mammalian genomes and might play widespread roles in gene regulation and other cellular processes. This Review outlines the emerging understanding of lincRNAs in vertebrate animals, with emphases on how they are being identified and current conclusions and questions regarding their genomics, evolution and mechanisms of action.
Collapse
Affiliation(s)
- Igor Ulitsky
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
| | | |
Collapse
|
131
|
The developmental transcriptome of the mosquito Aedes aegypti, an invasive species and major arbovirus vector. G3-GENES GENOMES GENETICS 2013; 3:1493-509. [PMID: 23833213 PMCID: PMC3755910 DOI: 10.1534/g3.113.006742] [Citation(s) in RCA: 150] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Mosquitoes are vectors of a number of important human and animal diseases. The development of novel vector control strategies requires a thorough understanding of mosquito biology. To facilitate this, we used RNA-seq to identify novel genes and provide the first high-resolution view of the transcriptome throughout development and in response to blood feeding in a mosquito vector of human disease, Aedes aegypti, the primary vector for Dengue and yellow fever. We characterized mRNA expression at 34 distinct time points throughout Aedes development, including adult somatic and germline tissues, by using polyA+ RNA-seq. We identify a total of 14,238 novel new transcribed regions corresponding to 12,597 new loci, as well as many novel transcript isoforms of previously annotated genes. Altogether these results increase the annotated fraction of the transcribed genome into long polyA+ RNAs by more than twofold. We also identified a number of patterns of shared gene expression, as well as genes and/or exons expressed sex-specifically or sex-differentially. Expression profiles of small RNAs in ovaries, early embryos, testes, and adult male and female somatic tissues also were determined, resulting in the identification of 38 new Aedes-specific miRNAs, and ~291,000 small RNA new transcribed regions, many of which are likely to be endogenous small-interfering RNAs and Piwi-interacting RNAs. Genes of potential interest for transgene-based vector control strategies also are highlighted. Our data have been incorporated into a user-friendly genome browser located at www.Aedes.caltech.edu, with relevant links to Vectorbase (www.vectorbase.org)
Collapse
|
132
|
Haudry A, Platts AE, Vello E, Hoen DR, Leclercq M, Williamson RJ, Forczek E, Joly-Lopez Z, Steffen JG, Hazzouri KM, Dewar K, Stinchcombe JR, Schoen DJ, Wang X, Schmutz J, Town CD, Edger PP, Pires JC, Schumaker KS, Jarvis DE, Mandáková T, Lysak MA, van den Bergh E, Schranz ME, Harrison PM, Moses AM, Bureau TE, Wright SI, Blanchette M. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat Genet 2013; 45:891-8. [PMID: 23817568 DOI: 10.1038/ng.2684] [Citation(s) in RCA: 229] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Accepted: 06/04/2013] [Indexed: 12/17/2022]
Abstract
Despite the central importance of noncoding DNA to gene regulation and evolution, understanding of the extent of selection on plant noncoding DNA remains limited compared to that of other organisms. Here we report sequencing of genomes from three Brassicaceae species (Leavenworthia alabamica, Sisymbrium irio and Aethionema arabicum) and their joint analysis with six previously sequenced crucifer genomes. Conservation across orthologous bases suggests that at least 17% of the Arabidopsis thaliana genome is under selection, with nearly one-quarter of the sequence under selection lying outside of coding regions. Much of this sequence can be localized to approximately 90,000 conserved noncoding sequences (CNSs) that show evidence of transcriptional and post-transcriptional regulation. Population genomics analyses of two crucifer species, A. thaliana and Capsella grandiflora, confirm that most of the identified CNSs are evolving under medium to strong purifying selection. Overall, these CNSs highlight both similarities and several key differences between the regulatory DNA of plants and other species.
Collapse
Affiliation(s)
- Annabelle Haudry
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
133
|
Hiller M, Agarwal S, Notwell JH, Parikh R, Guturu H, Wenger AM, Bejerano G. Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish. Nucleic Acids Res 2013; 41:e151. [PMID: 23814184 PMCID: PMC3753653 DOI: 10.1093/nar/gkt557] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.
Collapse
Affiliation(s)
- Michael Hiller
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA, Department of Computer Science, Stanford University, Stanford, CA 94305, USA and Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
| | | | | | | | | | | | | |
Collapse
|
134
|
Wu P, Zuo X, Deng H, Liu X, Liu L, Ji A. Roles of long noncoding RNAs in brain development, functional diversification and neurodegenerative diseases. Brain Res Bull 2013; 97:69-80. [PMID: 23756188 DOI: 10.1016/j.brainresbull.2013.06.001] [Citation(s) in RCA: 289] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Revised: 05/31/2013] [Accepted: 06/01/2013] [Indexed: 12/11/2022]
Abstract
Long noncoding RNAs (lncRNAs) have been attracting immense research interest, while only a handful of lncRNAs have been characterized thoroughly. Their involvement in the fundamental cellular processes including regulate gene expression at epigenetics, transcription, and post-transcription highlighted a central role in cell homeostasis. However, lncRNAs studies are still at a relatively early stage, their definition, conservation, functions, and action mechanisms remain fairly complicated. Here, we give a systematic and comprehensive summary of the existing knowledge of lncRNAs in order to provide a better understanding of this new studying field. lncRNAs play important roles in brain development, neuron function and maintenance, and neurodegenerative diseases are becoming increasingly evident. In this review, we also highlighted recent studies related lncRNAs in central nervous system (CNS) development and neurodegenerative diseases, including Alzheimer's disease (AD), Parkinson's disease (PD), Huntington's disease (HD) and amyotrophic lateral sclerosis (ALS), and elucidated some specific lncRNAs which may be important for understanding the pathophysiology of neurodegenerative diseases, also have the potential as therapeutic targets.
Collapse
Affiliation(s)
- Ping Wu
- Center for Drug Research and Development, Zhujiang Hospital, Southern Medical University, Guangzhou 510282, PR China
| | | | | | | | | | | |
Collapse
|
135
|
The Escherichia coli CydX protein is a member of the CydAB cytochrome bd oxidase complex and is required for cytochrome bd oxidase activity. J Bacteriol 2013; 195:3640-50. [PMID: 23749980 DOI: 10.1128/jb.00324-13] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Cytochrome bd oxidase operons from more than 50 species of bacteria contain a short gene encoding a small protein that ranges from ∼30 to 50 amino acids and is predicted to localize to the cell membrane. Although cytochrome bd oxidases have been studied for more than 70 years, little is known about the role of this small protein, denoted CydX, in oxidase activity. Here we report that Escherichia coli mutants lacking CydX exhibit phenotypes associated with reduced oxidase activity. In addition, cell membrane extracts from ΔcydX mutant strains have reduced oxidase activity in vitro. Consistent with data showing that CydX is required for cytochrome bd oxidase activity, copurification experiments indicate that CydX interacts with the CydAB cytochrome bd oxidase complex. Together, these data support the hypothesis that CydX is a subunit of the CydAB cytochrome bd oxidase complex that is required for complex activity. The results of mutation analysis of CydX suggest that few individual amino acids in the small protein are essential for function, at least in the context of protein overexpression. In addition, the results of analysis of the paralogous small transmembrane protein AppX show that the two proteins could have some overlapping functionality in the cell and that both have the potential to interact with the CydAB complex.
Collapse
|
136
|
Müller SA, Findeiß S, Pernitzsch SR, Wissenbach DK, Stadler PF, Hofacker IL, von Bergen M, Kalkhof S. Identification of new protein coding sequences and signal peptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics. J Proteomics 2013; 86:27-42. [PMID: 23665149 DOI: 10.1016/j.jprot.2013.04.036] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Revised: 03/29/2013] [Accepted: 04/26/2013] [Indexed: 12/16/2022]
Abstract
UNLABELLED Correct annotation of protein coding genes is the basis of conventional data analysis in proteomic studies. Nevertheless, most protein sequence databases almost exclusively rely on gene finding software and inevitably also miss protein annotations or possess errors. Proteogenomics tries to overcome these issues by matching MS data directly against a genome sequence database. Here we report an in-depth proteogenomics study of Helicobacter pylori strain 26695. MS data was searched against a combined database of the NCBI annotations and a six-frame translation of the genome. Database searches with Mascot and X! Tandem revealed 1115 proteins identified by at least two peptides with a peptide false discovery rate below 1%. This represents 71% of the predicted proteome. So far this is the most extensive proteome study of Helicobacter pylori. Our proteogenomic approach unambiguously identified four previously missed annotations and furthermore allowed us to correct sequences of six annotated proteins. Since secreted proteins are often involved in pathogenic processes we further investigated signal peptidase cleavage sites. By applying a database search that accommodates the identification of semi-specific cleaved peptides, 63 previously unknown signal peptides were detected. The motif LXA showed to be the predominant recognition sequence for signal peptidases. BIOLOGICAL SIGNIFICANCE The results of MS-based proteomic studies highly rely on correct annotation of protein coding genes which is the basis of conventional data analysis. However, the annotation of protein coding sequences in genomic data is usually based on gene finding software. These tools are limited in their prediction accuracy such as the problematic determination of exact gene boundaries. Thus, protein databases own partly erroneous or incomplete sequences. Additionally, some protein sequences might also be missing in the databases. Proteogenomics, a combination of proteomic and genomic data analyses, is well suited to detect previously not annotated proteins and to correct erroneous sequences. For this purpose, the existing database of the investigated species is typically supplemented with a six-frame translation of the genome. Here, we studied the proteome of the major human pathogen Helicobacter pylori that is responsible for many gastric diseases such as duodenal ulcers and gastric cancer. Our in-depth proteomic study highly reliably identified 1115 proteins (FDR<0.01%) by at least two peptides (FDR<1%) which represent 71% of the predicted proteome deposited at NCBI. The proteogenomic data analysis of our data set resulted in the unambiguous identification of four previously missed annotations, the correction of six annotated proteins as well as the detection of 63 previously unknown signal peptides. We have annotated proteins of particular biological interest like the ferrous iron transport protein A, the coiled-coil-rich protein HP0058 and the lipopolysaccharide biosynthesis protein HP0619. For instance, the protein HP0619 could be a drug target for the inhibition of the LPS synthesis pathway. Furthermore it has been proven that the motif "LXA" is the predominant recognition sequence for the signal peptidase I of H. pylori. Signal peptidases are essential enzymes for the viability of bacterial cells and are involved in pathogenesis. Therefore signal peptidases could be novel targets for antibiotics. The inclusion of the corrected and new annotated proteins as well as the information of signal peptide cleavage sites will help in the study of biological pathways involved in pathogenesis or drug response of H. pylori.
Collapse
Affiliation(s)
- Stephan A Müller
- Department of Proteomics, UFZ, Helmholtz-Centre for Environmental Research Leipzig, 04318 Leipzig, Germany
| | | | | | | | | | | | | | | |
Collapse
|
137
|
Wang L, Park HJ, Dasari S, Wang S, Kocher JP, Li W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res 2013; 41:e74. [PMID: 23335781 PMCID: PMC3616698 DOI: 10.1093/nar/gkt006] [Citation(s) in RCA: 1353] [Impact Index Per Article: 112.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Thousands of novel transcripts have been identified using deep transcriptome sequencing. This discovery of large and ‘hidden’ transcriptome rejuvenates the demand for methods that can rapidly distinguish between coding and noncoding RNA. Here, we present a novel alignment-free method, Coding Potential Assessment Tool (CPAT), which rapidly recognizes coding and noncoding transcripts from a large pool of candidates. To this end, CPAT uses a logistic regression model built with four sequence features: open reading frame size, open reading frame coverage, Fickett TESTCODE statistic and hexamer usage bias. CPAT software outperformed (sensitivity: 0.96, specificity: 0.97) other state-of-the-art alignment-based software such as Coding-Potential Calculator (sensitivity: 0.99, specificity: 0.74) and Phylo Codon Substitution Frequencies (sensitivity: 0.90, specificity: 0.63). In addition to high accuracy, CPAT is approximately four orders of magnitude faster than Coding-Potential Calculator and Phylo Codon Substitution Frequencies, enabling its users to process thousands of transcripts within seconds. The software accepts input sequences in either FASTA- or BED-formatted data files. We also developed a web interface for CPAT that allows users to submit sequences and receive the prediction results almost instantly.
Collapse
Affiliation(s)
- Liguo Wang
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN 55905, USA
| | | | | | | | | | | |
Collapse
|
138
|
Ma H, Hao Y, Dong X, Gong Q, Chen J, Zhang J, Tian W. Molecular mechanisms and function prediction of long noncoding RNA. ScientificWorldJournal 2012; 2012:541786. [PMID: 23319885 PMCID: PMC3540756 DOI: 10.1100/2012/541786] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Accepted: 11/21/2012] [Indexed: 12/25/2022] Open
Abstract
The central dogma of gene expression considers RNA as the carrier of genetic information from DNA to protein. However, it has become more and more clear that RNA plays more important roles than simply being the information carrier. Recently, whole genome transcriptomic analyses have identified large numbers of dynamically expressed long noncoding RNAs (lncRNAs), many of which are involved in a variety of biological functions. Even so, the functions and molecular mechanisms of most lncRNAs still remain elusive. Therefore, it is necessary to develop computational methods to predict the function of lncRNAs in order to accelerate the study of lncRNAs. Here, we review the recent progress in the identification of lncRNAs, the molecular functions and mechanisms of lncRNAs, and the computational methods for predicting the function of lncRNAs.
Collapse
Affiliation(s)
- Handong Ma
- Institute of Biostatistics, School of Life Science, Fudan University, 220 Handan Road, Shanghai 2004333, China
| | | | | | | | | | | | | |
Collapse
|
139
|
Hotto AM, Germain A, Stern DB. Plastid non-coding RNAs: emerging candidates for gene regulation. TRENDS IN PLANT SCIENCE 2012; 17:737-44. [PMID: 22981395 DOI: 10.1016/j.tplants.2012.08.002] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2012] [Revised: 07/27/2012] [Accepted: 08/05/2012] [Indexed: 05/08/2023]
Abstract
Recent advances in transcriptomics and bioinformatics, specifically strand-specific RNA sequencing, have allowed high-throughput, comprehensive detection of low-abundance transcripts typical of the non-coding RNAs studied in bacteria and eukaryotes. Before this, few plastid non-coding RNAs (pncRNAs) had been identified, and even fewer had been investigated for any functional role in gene regulation. Relaxed plastid transcription initiation and termination result in full transcription of both chloroplast DNA strands. Following this, post-transcriptional processing produces a pool of metastable RNA species, including distinct pncRNAs. Here we review pncRNA biogenesis and possible functionality, and speculate that this RNA class may have an underappreciated role in plastid gene regulation.
Collapse
Affiliation(s)
- Amber M Hotto
- Boyce Thompson Institute for Plant Research, Tower Road, Ithaca, NY 14853, USA
| | | | | |
Collapse
|
140
|
Pohl M, Theissen G, Schuster S. GC content dependency of open reading frame prediction via stop codon frequencies. Gene 2012; 511:441-6. [PMID: 23000023 DOI: 10.1016/j.gene.2012.09.031] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Revised: 04/27/2012] [Accepted: 09/05/2012] [Indexed: 11/18/2022]
Abstract
A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.
Collapse
Affiliation(s)
- Martin Pohl
- Department of Bioinformatics, Friedrich Schiller University Jena, Ernst-Abbe-Platz 2, 07745 Jena, Germany.
| | | | | |
Collapse
|
141
|
Washietl S, Will S, Hendrix DA, Goff LA, Rinn JL, Berger B, Kellis M. Computational analysis of noncoding RNAs. WILEY INTERDISCIPLINARY REVIEWS-RNA 2012; 3:759-78. [PMID: 22991327 DOI: 10.1002/wrna.1134] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Noncoding RNAs have emerged as important key players in the cell. Understanding their surprisingly diverse range of functions is challenging for experimental and computational biology. Here, we review computational methods to analyze noncoding RNAs. The topics covered include basic and advanced techniques to predict RNA structures, annotation of noncoding RNAs in genomic data, mining RNA-seq data for novel transcripts and prediction of transcript structures, computational aspects of microRNAs, and database resources.
Collapse
Affiliation(s)
- Stefan Washietl
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | | | | | | | | | | | | |
Collapse
|
142
|
Abstract
Thousands of long noncoding RNAs (lncRNAs) have been found in vertebrate animals, a few of which have known biological roles. To better understand the genomics and features of lncRNAs in invertebrates, we used available RNA-seq, poly(A)-site, and ribosome-mapping data to identify lncRNAs of Caenorhabditis elegans. We found 170 long intervening ncRNAs (lincRNAs), which had single- or multiexonic structures that did not overlap protein-coding transcripts, and about sixty antisense lncRNAs (ancRNAs), which were complementary to protein-coding transcripts. Compared to protein-coding genes, the lncRNA genes tended to be expressed in a stage-dependent manner. Approximately 25% of the newly identified lincRNAs showed little signal for sequence conservation and mapped antisense to clusters of endogenous siRNAs, as would be expected if they serve as templates and targets for these siRNAs. The other 75% tended to be more conserved and included lincRNAs with intriguing expression and sequence features associating them with processes such as dauer formation, male identity, sperm formation, and interaction with sperm-specific mRNAs. Our study provides a glimpse into the lncRNA content of a nonvertebrate animal and a resource for future studies of lncRNA function.
Collapse
Affiliation(s)
- Jin-Wu Nam
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA
| | | |
Collapse
|
143
|
Seemann SE, Sunkin SM, Hawrylycz MJ, Ruzzo WL, Gorodkin J. Transcripts with in silico predicted RNA structure are enriched everywhere in the mouse brain. BMC Genomics 2012; 13:214. [PMID: 22651826 PMCID: PMC3464589 DOI: 10.1186/1471-2164-13-214] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Accepted: 05/31/2012] [Indexed: 01/24/2023] Open
Abstract
Background Post-transcriptional control of gene expression is mostly conducted by specific elements in untranslated regions (UTRs) of mRNAs, in collaboration with specific binding proteins and RNAs. In several well characterized cases, these RNA elements are known to form stable secondary structures. RNA secondary structures also may have major functional implications for long noncoding RNAs (lncRNAs). Recent transcriptional data has indicated the importance of lncRNAs in brain development and function. However, no methodical efforts to investigate this have been undertaken. Here, we aim to systematically analyze the potential for RNA structure in brain-expressed transcripts. Results By comprehensive spatial expression analysis of the adult mouse in situ hybridization data of the Allen Mouse Brain Atlas, we show that transcripts (coding as well as non-coding) associated with in silico predicted structured probes are highly and significantly enriched in almost all analyzed brain regions. Functional implications of these RNA structures and their role in the brain are discussed in detail along with specific examples. We observe that mRNAs with a structure prediction in their UTRs are enriched for binding, transport and localization gene ontology categories. In addition, after manual examination we observe agreement between RNA binding protein interaction sites near the 3’ UTR structures and correlated expression patterns. Conclusions Our results show a potential use for RNA structures in expressed coding as well as noncoding transcripts in the adult mouse brain, and describe the role of structured RNAs in the context of intracellular signaling pathways and regulatory networks. Based on this data we hypothesize that RNA structure is widely involved in transcriptional and translational regulatory mechanisms in the brain and ultimately plays a role in brain function.
Collapse
Affiliation(s)
- Stefan E Seemann
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Denmark
| | | | | | | | | |
Collapse
|
144
|
Abstract
Several classes of exclusively--or at least predominantly--unspliced non-coding RNAs have been described in the last years, including totally and partially intronic transcripts and long intergenic RNAs. Functionally, they appear to be involved in regulating gene expression, at least in part by associating with the chromatin. Intron-less transcripts have received little attention, even though recent findings indicate that intron-less protein-coding genes have several features that set them apart from the more abundant and much better understood spliced mRNAs. Even less is known about unspliced non-coding transcripts. Thus we systematically analyze the distribution of unspliced ESTs in the human genome. These form a large source of transcriptomic data that is almost always excluded from detailed studies. Most unspliced ESTs appear in clusters overlapping, or located in the close vicinity of, annotated RefSeq genes. Partially intronic unspliced ESTs show complex patterns of overlap with the intron/exon structure of the RefSeq gene. Distinctive patterns of CAGE tags indicate that a large class of unspliced EST clusters is forming long extensions of 3'UTRs, at least several hundreds of which probably appear also as independent 3'UTR-associated RNAs.
Collapse
|
145
|
Molecular Functions of Long Non-Coding RNAs in Plants. Genes (Basel) 2012; 3:176-90. [PMID: 24704849 PMCID: PMC3899965 DOI: 10.3390/genes3010176] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2012] [Revised: 02/28/2012] [Accepted: 02/29/2012] [Indexed: 11/16/2022] Open
Abstract
The past decade has seen dramatic changes in our understanding of the scale and complexity of eukaryotic transcriptome owing to the discovery of diverse types of short and long non-protein-coding RNAs (ncRNAs). While short ncRNA-mediated gene regulation has been extensively studied and the mechanisms well understood, the function of long ncRNAs remains largely unexplored, especially in plants. Nevertheless, functional insights generated in recent studies with mammalian systems have indicated that long ncRNAs are key regulators of a variety of biological processes. They have been shown to act as transcriptional regulators and competing endogenous RNAs (ceRNAs), to serve as molecular cargos for protein re-localization and as modular scaffolds to recruit the assembly of multiple protein complexes for chromatin modifications. Some of these functions have been found to be conserved in plants. Here, we review our current understanding of long ncRNA functions in plants and discuss the challenges in functional characterization of plant long ncRNAs.
Collapse
|
146
|
Abstract
Tiling array and novel sequencing technologies have made available the transcription profile of the entire human genome. However, the extent of transcription and the function of genetic elements that occur outside of protein-coding genes, particularly those involved in disease, are still a matter of debate. In this review, we focus on long non-coding RNAs (lncRNAs) that are involved in cancer. We define lncRNAs and present a cancer-oriented list of lncRNAs, list some tools (for example, public databases) that classify lncRNAs or that scan genome spans of interest to find whether known lncRNAs reside there, and describe some of the functions of lncRNAs and the possible genetic mechanisms that underlie lncRNA expression changes in cancer, as well as current and potential future applications of lncRNA research in the treatment of cancer.
Collapse
|
147
|
Non-coding RNAs in marine Synechococcus and their regulation under environmentally relevant stress conditions. ISME JOURNAL 2012; 6:1544-57. [PMID: 22258101 DOI: 10.1038/ismej.2011.215] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Regulatory small RNAs (sRNAs) have crucial roles in the adaptive responses of bacteria to changes in the environment. Thus far, potential regulatory RNAs have been studied mainly in marine picocyanobacteria in genetically intractable Prochlorococcus, rendering their molecular analysis difficult. Synechococcus sp. WH7803 is a model cyanobacterium, representative of the picocyanobacteria from the mesotrophic areas of the ocean. Similar to the closely related Prochlorococcus it possesses a relatively streamlined genome and a small number of genes, but is genetically tractable. Here, a comparative genome analysis was performed for this and four additional marine Synechococcus to identify the suite of possible sRNAs and other RNA elements. Based on the prediction and on complementary microarray profiling, we have identified several known as well as 32 novel sRNAs. Some sRNAs overlap adjacent coding regions, for instance for the central photosynthetic gene psbA. Several of these novel sRNAs responded specifically to environmentally relevant stress conditions. Among them are six sRNAs changing their accumulation level under cold stress, six responding to high light and two to iron limitation. Target predictions suggested genes encoding components of the light-harvesting apparatus as targets of sRNAs originating from genomic islands and that one of the iron-regulated sRNAs might be a functional homolog of RyhB. These data suggest that marine Synechococcus mount adaptive responses to these different stresses involving regulatory sRNAs.
Collapse
|
148
|
Madhugiri R, Pessi G, Voss B, Hahn J, Sharma CM, Reinhardt R, Vogel J, Hess WR, Fischer HM, Evguenieva-Hackenberg E. Small RNAs of the Bradyrhizobium/Rhodopseudomonas lineage and their analysis. RNA Biol 2012; 9:47-58. [PMID: 22258152 DOI: 10.4161/rna.9.1.18008] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Small RNAs (sRNAs) play a pivotal role in bacterial gene regulation. However, the sRNAs of the vast majority of bacteria with sequenced genomes still remain unknown since sRNA genes are usually difficult to recognize and thus not annotated. Here, expression of seven sRNAs (BjrC2a, BjrC2b, BjrC2c, BjrC68, BjrC80, BjrC174 and BjrC1505) predicted by genome comparison of Bradyrhizobium and Rhodopseudomonas members, was verified by RNA gel blot hybridization, microarray and deep sequencing analyses of RNA from the soybean symbiont Bradyrhizobium japonicum USDA 110. BjrC2a, BjrC2b and BjrC2c belong to the RNA family RF00519, while the other sRNAs are novel. For some of the sRNAs we observed expression differences between free-living bacteria and bacteroids in root nodules. The amount of BjrC1505 was decreased in nodules. By contrast, the amount of BjrC2a, BjrC68, BjrC80, BjrC174 and the previously described 6S RNA was increased in nodules, and accumulation of truncated forms of these sRNAs was observed. Comparative genomics and deep sequencing suggest that BjrC2a is an antisense RNA regulating the expression of inositol-monophosphatase. The analyzed sRNAs show a different degree of conservation in Rhizobiales, and expression of homologs of BjrC2, BjrC68, BjrC1505, and 6S RNA was confirmed in the free-living purple bacterium Rhodopseudomonas palustris 5D.
Collapse
MESH Headings
- Bradyrhizobium/enzymology
- Bradyrhizobium/genetics
- Bradyrhizobium/metabolism
- Computational Biology
- Culture Media/metabolism
- Databases, Genetic
- Gene Expression Regulation, Bacterial
- Gene Expression Regulation, Enzymologic
- Genome, Bacterial
- High-Throughput Nucleotide Sequencing/methods
- Oligonucleotide Array Sequence Analysis
- Phosphoric Monoester Hydrolases/genetics
- Phosphoric Monoester Hydrolases/metabolism
- RNA, Antisense/genetics
- RNA, Antisense/metabolism
- RNA, Bacterial/genetics
- RNA, Bacterial/metabolism
- RNA, Untranslated
- Rhodopseudomonas/enzymology
- Rhodopseudomonas/genetics
- Rhodopseudomonas/metabolism
- Root Nodules, Plant/genetics
- Root Nodules, Plant/metabolism
- Root Nodules, Plant/microbiology
- Glycine max/microbiology
- Symbiosis
Collapse
|
149
|
Schmidtke C, Findeiss S, Sharma CM, Kuhfuss J, Hoffmann S, Vogel J, Stadler PF, Bonas U. Genome-wide transcriptome analysis of the plant pathogen Xanthomonas identifies sRNAs with putative virulence functions. Nucleic Acids Res 2011; 40:2020-31. [PMID: 22080557 PMCID: PMC3300014 DOI: 10.1093/nar/gkr904] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The Gram-negative plant-pathogenic bacterium Xanthomonas campestris pv. vesicatoria (Xcv) is an important model to elucidate the mechanisms involved in the interaction with the host. To gain insight into the transcriptome of the Xcv strain 85–10, we took a differential RNA sequencing (dRNA-seq) approach. Using a novel method to automatically generate comprehensive transcription start site (TSS) maps we report 1421 putative TSSs in the Xcv genome. Genes in Xcv exhibit a poorly conserved −10 promoter element and no consensus Shine-Dalgarno sequence. Moreover, 14% of all mRNAs are leaderless and 13% of them have unusually long 5′-UTRs. Northern blot analyses confirmed 16 intergenic small RNAs and seven cis-encoded antisense RNAs in Xcv. Expression of eight intergenic transcripts was controlled by HrpG and HrpX, key regulators of the Xcv type III secretion system. More detailed characterization identified sX12 as a small RNA that controls virulence of Xcv by affecting the interaction of the pathogen and its host plants. The transcriptional landscape of Xcv is unexpectedly complex, featuring abundant antisense transcripts, alternative TSSs and clade-specific small RNAs.
Collapse
Affiliation(s)
- Cornelius Schmidtke
- Department of Genetics, Martin-Luther-Universität Halle-Wittenberg, Institute for Biology, D-06099 Halle, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
150
|
Tinoco AD, Saghatelian A. Investigating endogenous peptides and peptidases using peptidomics. Biochemistry 2011; 50:7447-61. [PMID: 21786763 DOI: 10.1021/bi200417k] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Rather than simply being protein degradation products, peptides have proven to be important bioactive molecules. Bioactive peptides act as hormones, neurotransmitters, and antimicrobial agents in vivo. The dysregulation of bioactive peptide signaling is also known to be involved in disease, and targeting peptide hormone pathways has been a successful strategy in the development of novel therapeutics. The importance of bioactive peptides in biology has spurred research to elucidate the function and regulation of these molecules. Classical methods for peptide analysis have relied on targeted immunoassays, but certain scientific questions necessitated a broader and more detailed view of the peptidome--all the peptides in a cell, tissue, or organism. In this review we discuss how peptidomics has emerged to fill this need through the application of advanced liquid chromatography--tandem mass spectrometry (LC-MS/MS) methods that provide unique insights into peptide activity and regulation.
Collapse
Affiliation(s)
- Arthur D Tinoco
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States
| | | |
Collapse
|