1
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
2
|
Li P, Zhang D, Su T, Wang W, Yu Y, Zhao X, Li Z, Yu S, Zhang F. Genome-wide analysis of mRNA and lncRNA expression and mitochondrial genome sequencing provide insights into the mechanisms underlying a novel cytoplasmic male sterility system, BVRC-CMS96, in Brassicarapa. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2020; 133:2157-2170. [PMID: 32399654 DOI: 10.1007/s00122-020-03587-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 03/31/2020] [Indexed: 05/26/2023]
Abstract
Characterization of a novel and valuable CMS system in Brassicarapa. Cytoplasmic male sterility (CMS) is extensively used to produce F1 hybrid seeds in a variety of crops. However, it has not been successfully used in Chinese cabbage (Brassicarapa L. ssp. pekinensis) because of degeneration or temperature sensitivity. Here, we characterize a novel CMS system, BVRC-CMS96, which originated in B.napus cybrid obtained from INRAE, France and transferred by us to B.rapa. Floral morphology and agronomic characteristics indicate that BVRC-CMS96 plants are 100% male sterile and show no degeneration in the BC7 generation, confirming its suitability for commercial use. We also sequenced the BVRC-CMS96 and maintainer line 18BCM mitochondrial genomes. Genomic analyses showed the presence of syntenic blocks and distinct structures between BVRC-CMS96 and 18BCM and the other known CMS systems. We found that BVRC-CMS96 has one orf222 from 'Nap'-type CMS and two copies of orf138 from 'Ogu'-type CMS. We analyzed expression of orf222, orf138, orf261b, and the mitochondrial energy genes (atp6, atp9, and cox1) in flower bud developmental stages S1-S5 and in four floral organs. orf138 and orf222 were both highly expressed in S4, S5-stage buds, calyx, and the stamen. RNA-seq identified differentially expressed mRNAs and lncRNAs (long non-coding RNAs) that were significantly enriched in pollen wall assembly, pollen development, and pollen coat. Our findings suggest that an energy supply disorder caused by orf222/orf138/orf261b may inhibit a series of nuclear pollen development-related genes. Our study shows that BVRC-CMS96 is a valuable CMS system, and our detailed molecular analysis will facilitate its application in Chinese cabbage breeding.
Collapse
Affiliation(s)
- Peirong Li
- Beijing Vegetable Research Center (BVRC), Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, China
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture, Beijing, 100097, China
- Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing, 100097, China
| | - Deshuang Zhang
- Beijing Vegetable Research Center (BVRC), Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, China
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture, Beijing, 100097, China
- Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing, 100097, China
| | - Tongbing Su
- Beijing Vegetable Research Center (BVRC), Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, China
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture, Beijing, 100097, China
- Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing, 100097, China
| | - Weihong Wang
- Beijing Vegetable Research Center (BVRC), Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, China
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture, Beijing, 100097, China
- Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing, 100097, China
| | - Yangjun Yu
- Beijing Vegetable Research Center (BVRC), Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, China
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture, Beijing, 100097, China
- Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing, 100097, China
| | - Xiuyun Zhao
- Beijing Vegetable Research Center (BVRC), Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, China
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture, Beijing, 100097, China
- Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing, 100097, China
| | - Zhenxing Li
- Beijing Vegetable Research Center (BVRC), Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, China
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture, Beijing, 100097, China
- Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing, 100097, China
| | - Shuancang Yu
- Beijing Vegetable Research Center (BVRC), Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, China.
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture, Beijing, 100097, China.
- Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing, 100097, China.
| | - Fenglan Zhang
- Beijing Vegetable Research Center (BVRC), Beijing Academy of Agriculture and Forestry Sciences (BAAFS), Beijing, 100097, China.
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops (North China), Ministry of Agriculture, Beijing, 100097, China.
- Beijing Key Laboratory of Vegetable Germplasm Improvement, Beijing, 100097, China.
| |
Collapse
|
3
|
Bai Y, Dai X, Ye T, Zhang P, Yan X, Gong X, Liang S, Chen M. PlncRNADB: A Repository of Plant lncRNAs and lncRNA-RBP Protein Interactions. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190131161002] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Long noncoding RNAs (lncRNAs) are endogenous noncoding RNAs, arbitrarily
longer than 200 nucleotides, that play critical roles in diverse biological processes.
LncRNAs exist in different genomes ranging from animals to plants.
Objective:
PlncRNADB is a searchable database of lncRNA sequences and annotation in plants.
Methods:
We built a pipeline for lncRNA prediction in plants, providing a convenient utility for
users to quickly distinguish potential noncoding RNAs from protein-coding transcripts.
Results:
More than five thousand lncRNAs are collected from four plant species (Arabidopsis thaliana,
Arabidopsis lyrata, Populus trichocarpa and Zea mays) in PlncRNADB. Moreover, our database
provides the relationship between lncRNAs and various RNA-binding proteins (RBPs),
which can be displayed through a user-friendly web interface.
Conclusion:
PlncRNADB can serve as a reference database to investigate the lncRNAs and their
interaction with RNA-binding proteins in plants. The PlncRNADB is freely available at
http://bis.zju.edu.cn/PlncRNADB/.
Collapse
Affiliation(s)
- Youhuang Bai
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiaozhuan Dai
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Tiantian Ye
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Peijing Zhang
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xu Yan
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiaonan Gong
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Siliang Liang
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming Chen
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
4
|
Noviello TMR, Di Liddo A, Ventola GM, Spagnuolo A, D’Aniello S, Ceccarelli M, Cerulo L. Detection of long non-coding RNA homology, a comparative study on alignment and alignment-free metrics. BMC Bioinformatics 2018; 19:407. [PMID: 30400819 PMCID: PMC6220562 DOI: 10.1186/s12859-018-2441-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 10/19/2018] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) represent a novel class of non-coding RNAs having a crucial role in many biological processes. The identification of long non-coding homologs among different species is essential to investigate such roles in model organisms as homologous genes tend to retain similar molecular and biological functions. Alignment-based metrics are able to effectively capture the conservation of transcribed coding sequences and then the homology of protein coding genes. However, unlike protein coding genes the poor sequence conservation of long non-coding genes makes the identification of their homologs a challenging task. RESULTS In this study we compare alignment-based and alignment-free string similarity metrics and look at promoter regions as a possible source of conserved information. We show that promoter regions encode relevant information for the conservation of long non-coding genes across species and that such information is better captured by alignment-free metrics. We perform a genome wide test of this hypothesis in human, mouse, and zebrafish. CONCLUSIONS The obtained results persuaded us to postulate the new hypothesis that, unlike protein coding genes, long non-coding genes tend to preserve their regulatory machinery rather than their transcribed sequence. All datasets, scripts, and the prediction tools adopted in this study are available at https://github.com/bioinformatics-sannio/lncrna-homologs .
Collapse
Affiliation(s)
- Teresa M. R. Noviello
- Dep. of Science and Technology, University of Sannio, via Port’Arsa, 11, Benevento, 82100 Italy
- BioGeM, Institute of Genetic Research “Gaetano Salvatore”, Camporeale, Ariano Irpino (AV), 83031 Italy
| | - Antonella Di Liddo
- Buchmann Institute for Molecular Life Sciences, Goethe University, Max-von-Laue-Straße 13, Frankfurt am Main, 60438 Germany
| | | | - Antonietta Spagnuolo
- Dep. of Biology and Evolution of Marine Organisms, Stazione Zoologica “A. Dohrn”, Villa Comunale, Napoli, 80121 Italy
| | - Salvatore D’Aniello
- Dep. of Biology and Evolution of Marine Organisms, Stazione Zoologica “A. Dohrn”, Villa Comunale, Napoli, 80121 Italy
| | - Michele Ceccarelli
- Dep. of Science and Technology, University of Sannio, via Port’Arsa, 11, Benevento, 82100 Italy
- BioGeM, Institute of Genetic Research “Gaetano Salvatore”, Camporeale, Ariano Irpino (AV), 83031 Italy
| | - Luigi Cerulo
- Dep. of Science and Technology, University of Sannio, via Port’Arsa, 11, Benevento, 82100 Italy
- BioGeM, Institute of Genetic Research “Gaetano Salvatore”, Camporeale, Ariano Irpino (AV), 83031 Italy
| |
Collapse
|
5
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany.,Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria. .,Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. .,Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA.
| |
Collapse
|
6
|
Lagarde J, Uszczynska-Ratajczak B, Carbonell S, Pérez-Lluch S, Abad A, Davis C, Gingeras TR, Frankish A, Harrow J, Guigo R, Johnson R. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat Genet 2017; 49:1731-1740. [PMID: 29106417 PMCID: PMC5709232 DOI: 10.1038/ng.3988] [Citation(s) in RCA: 171] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 10/11/2017] [Indexed: 12/20/2022]
Abstract
Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.
Collapse
Affiliation(s)
- Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Barbara Uszczynska-Ratajczak
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Silvia Carbonell
- R&D Department, Quantitative Genomic Medicine Laboratories (qGenomics), Barcelona, Spain
| | - Sílvia Pérez-Lluch
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Amaya Abad
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Carrie Davis
- Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - Thomas R. Gingeras
- Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - Adam Frankish
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK CB10 1HH
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK CB10 1HH
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Rory Johnson
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
7
|
Abstract
Piwi-interacting RNAs (piRNAs) are the non-coding RNAs with 24-32 nucleotides (nt). They exhibit stark differences in length, expression pattern, abundance, and genomic organization when compared to micro-RNAs (miRNAs). There are hundreds of thousands unique piRNA sequences in each species. Numerous piRNAs have been identified and deposited in public databases. Since the piRNAs were originally discovered and well-studied in the germline, a few other studies have reported the presence of piRNAs in somatic cells including neurons. This paper reviewed the common features, biogenesis, functions, and distributions of piRNAs and summarized their specific functions in the brain. This review may provide new insights and research direction for brain disorders.
Collapse
Affiliation(s)
- Lingjun Zuo
- Division of Human Genetics, Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Zhiren Wang
- Biological Psychiatry Research Center, Beijing Huilongguan Hospital, Beijing, China
| | - Yunlong Tan
- Biological Psychiatry Research Center, Beijing Huilongguan Hospital, Beijing, China
| | - Xiangning Chen
- Nevada Institute of Personalized Medicine and Department of Psychology, University of Nevada, Las Vegas, NV, USA
| | - Xingguang Luo
- Division of Human Genetics, Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA; Biological Psychiatry Research Center, Beijing Huilongguan Hospital, Beijing, China
| |
Collapse
|
8
|
Wucher V, Legeai F, Hédan B, Rizk G, Lagoutte L, Leeb T, Jagannathan V, Cadieu E, David A, Lohi H, Cirera S, Fredholm M, Botherel N, Leegwater PA, Le Béguec C, Fieten H, Johnson J, Alföldi J, André C, Lindblad-Toh K, Hitte C, Derrien T. FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Res 2017; 45:e57. [PMID: 28053114 PMCID: PMC5416892 DOI: 10.1093/nar/gkw1306] [Citation(s) in RCA: 189] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 12/13/2016] [Accepted: 12/14/2016] [Indexed: 12/13/2022] Open
Abstract
Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/FEELnc.
Collapse
Affiliation(s)
- Valentin Wucher
- Institut Génétique et Développement de Rennes, CNRS, UMR6290, University Rennes1, Rennes, Cedex 35043, France
| | - Fabrice Legeai
- IGEPP, BIPAA, INRA, Campus Beaulieu, Le Rheu 35653, France
- Institut National de Recherche en Informatique et en Automatique, Institut de Recherche en Informatique et Systèmes Aléatoires, Genscale, Campus Beaulieu, Rennes 35042, France
| | - Benoît Hédan
- Institut Génétique et Développement de Rennes, CNRS, UMR6290, University Rennes1, Rennes, Cedex 35043, France
| | - Guillaume Rizk
- Institut National de Recherche en Informatique et en Automatique, Institut de Recherche en Informatique et Systèmes Aléatoires, Genscale, Campus Beaulieu, Rennes 35042, France
| | - Lætitia Lagoutte
- Institut Génétique et Développement de Rennes, CNRS, UMR6290, University Rennes1, Rennes, Cedex 35043, France
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern 3001, Switzerland
| | - Vidhya Jagannathan
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern 3001, Switzerland
| | - Edouard Cadieu
- Institut Génétique et Développement de Rennes, CNRS, UMR6290, University Rennes1, Rennes, Cedex 35043, France
| | - Audrey David
- IGEPP, BIPAA, INRA, Campus Beaulieu, Le Rheu 35653, France
| | - Hannes Lohi
- Department of Veterinary Biosciences and Research Programs Unit, Molecular Neurology, University of Helsinki, PO Box 63, Helsinki 00014, Finland
- The Folkhälsan Institute of Genetics, Helsinki 00014, Finland
| | - Susanna Cirera
- Department of Veterinary Clinical and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 1870, Denmark
| | - Merete Fredholm
- Department of Veterinary Clinical and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 1870, Denmark
| | - Nadine Botherel
- Institut Génétique et Développement de Rennes, CNRS, UMR6290, University Rennes1, Rennes, Cedex 35043, France
| | - Peter A.J. Leegwater
- Department of Clinical Sciences of Companion Animals, Faculty of Veterinary Medicine, Utrecht University, Utrecht 3584CM, the Netherlands
| | - Céline Le Béguec
- Institut Génétique et Développement de Rennes, CNRS, UMR6290, University Rennes1, Rennes, Cedex 35043, France
| | - Hille Fieten
- Department of Clinical Sciences of Companion Animals, Faculty of Veterinary Medicine, Utrecht University, Utrecht 3584CM, the Netherlands
| | - Jeremy Johnson
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jessica Alföldi
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Catherine André
- Institut Génétique et Développement de Rennes, CNRS, UMR6290, University Rennes1, Rennes, Cedex 35043, France
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala 751 23, Sweden
| | - Christophe Hitte
- Institut Génétique et Développement de Rennes, CNRS, UMR6290, University Rennes1, Rennes, Cedex 35043, France
| | - Thomas Derrien
- Institut Génétique et Développement de Rennes, CNRS, UMR6290, University Rennes1, Rennes, Cedex 35043, France
| |
Collapse
|
9
|
Chatzou M, Magis C, Chang JM, Kemena C, Bussotti G, Erb I, Notredame C. Multiple sequence alignment modeling: methods and applications. Brief Bioinform 2015; 17:1009-1023. [PMID: 26615024 DOI: 10.1093/bib/bbv099] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 10/16/2015] [Indexed: 12/20/2022] Open
Abstract
This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.
Collapse
|
10
|
Moss WN, Steitz JA. In silico discovery and modeling of non-coding RNA structure in viruses. Methods 2015; 91:48-56. [PMID: 26116541 DOI: 10.1016/j.ymeth.2015.06.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Revised: 06/17/2015] [Accepted: 06/22/2015] [Indexed: 11/30/2022] Open
Abstract
This review covers several computational methods for discovering structured non-coding RNAs in viruses and modeling their putative secondary structures. Here we will use examples from two target viruses to highlight these approaches: influenza A virus-a relatively small, segmented RNA virus; and Epstein-Barr virus-a relatively large DNA virus with a complex transcriptome. Each system has unique challenges to overcome and unique characteristics to exploit. From these particular cases, generically useful approaches can be derived for the study of additional viral targets.
Collapse
Affiliation(s)
- Walter N Moss
- Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT 06536, USA
| | - Joan A Steitz
- Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT 06536, USA.
| |
Collapse
|
11
|
Wang D, Yu J. Plastid-LCGbase: a collection of evolutionarily conserved plastid-associated gene pairs. Nucleic Acids Res 2014; 43:D990-5. [PMID: 25378306 PMCID: PMC4383908 DOI: 10.1093/nar/gku1070] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Plastids carry their own genetic material that encodes a variable set of genes that are limited in number but functionally important. Aside from orthology, the lineage-specific order and orientation of these genes are also relevant. Here, we develop a database, Plastid-LCGbase (http://lcgbase.big.ac.cn/plastid-LCGbase/), which focuses on organizational variability of plastid genes and genomes from diverse taxonomic groups. The current Plastid-LCGbase contains information from 470 plastid genomes and exhibits several unique features. First, through a genome-overview page generated from OrganellarGenomeDRAW, it displays general arrangement of all plastid genes (circular or linear). Second, it shows patterns and modes of all paired plastid genes and their physical distances across user-defined lineages, which are facilitated by a step-wise stratification of taxonomic groups. Third, it divides the paired genes into three categories (co-directionally-paired genes or CDPGs, convergently-paired genes or CPGs and divergently-paired genes or DPGs) and three patterns (separation, overlap and inclusion) and provides basic statistics for each species. Fourth, the gene pairing scheme is expandable, where neighboring genes can also be included in species-/lineage-specific comparisons. We hope that Plastid-LCGbase facilitates gene variation (insertion-deletion, translocation and rearrangement) and transcription-level studies of plastid genomes.
Collapse
Affiliation(s)
- Dapeng Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, P. R. China Stem Cell Laboratory, UCL Cancer Institute, University College London, London WC1E 6BT, UK
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, P. R. China
| |
Collapse
|
12
|
StemSearch: RNA search tool based on stem identification and indexing. Methods 2014; 69:326-34. [PMID: 25009129 DOI: 10.1016/j.ymeth.2014.06.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2014] [Revised: 06/11/2014] [Accepted: 06/15/2014] [Indexed: 11/23/2022] Open
Abstract
The discovery and functional analysis of noncoding RNA (ncRNA) systems in different organisms motivates the development of tools for aiding ncRNA research. Several tools exist that search for occurrences of a given RNA structural profile in genomic sequences. Yet, there is a need for an "RNA BLAST" tool, i.e., a tool that takes a putative functional RNA sequence as input, and efficiently searches for similar sequences in genomic databases, taking into consideration potential secondary structure features of the input query sequence. This work aims at providing such a tool. Our tool, denoted StemSearch, is based on a structural representation of an RNA sequence by its potential stems. Potential stems in genomic sequences are identified in a preprocessing stage, and indexed. A user-provided query sequence is likewise processed, and stems from the target genomes that are similar to the query stems are retrieved from the index. Then, relevant genomic regions are identified and ranked according to their similarity to the query stem-set while enforcing conservation of cross-stem topology. Experiments using RFAM families show significantly improved recall for StemSearch over BLAST, with small loss of precision. We further demonstrate our system's capability to handle eukaryotic genomes by successfully searching for members of the 7SK family in chromosome 2 of the human genome. StemSearch is freely available on the web at: http://www.cs.bgu.ac.il/∼negevcb/StemSearch.
Collapse
|
13
|
Tárraga J, Arnau V, Martínez H, Moreno R, Cazorla D, Salavert-Torres J, Blanquer-Espert I, Dopazo J, Medina I. Acceleration of short and long DNA read mapping without loss of accuracy using suffix array. ACTA ACUST UNITED AC 2014; 30:3396-8. [PMID: 25143289 PMCID: PMC4816028 DOI: 10.1093/bioinformatics/btu553] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
UNLABELLED HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20× for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies. AVAILABILITY AND IMPLEMENTATION https://github.com/opencb/hpg-aligner.
Collapse
Affiliation(s)
- Joaquín Tárraga
- Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Vicente Arnau
- Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Héctor Martínez
- Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Raul Moreno
- Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Diego Cazorla
- Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - José Salavert-Torres
- Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Ignacio Blanquer-Espert
- Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Joaquín Dopazo
- Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de Valènci
| | - Ignacio Medina
- Department of Computational Genomics, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB) at CIPF 46012, Departamento de Informática, Universidad de Valencia, 46100 Valencia, Departamento de Ingeniería y Ciencia de Computadores, Universitat Jaume I, 12071 Castellón de la Plana, Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, Campus Universitario, 02071 Albacete, Universitat Politècnica de València, Instituto de Instrumentación para Imagen Molecular, 46022 Valencia, Grupo de Investigación Biomédica de Imagen (GIBI 2^30), La Fe Polytechnic University Hospital, 46022 Valencia and Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| |
Collapse
|
14
|
Wolf M, Koetschan C, Müller T. ITS2, 18S, 16S or any other RNA - simply aligning sequences and their individual secondary structures simultaneously by an automatic approach. Gene 2014; 546:145-9. [PMID: 24881812 DOI: 10.1016/j.gene.2014.05.065] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 05/28/2014] [Accepted: 05/29/2014] [Indexed: 11/29/2022]
Abstract
Secondary structures of RNA sequences are increasingly being used as additional information in reconstructing phylogenies and/or in distinguishing species by compensatory base change (CBC) analyses. However, in most cases just one secondary structure is used in manually correcting an automatically generated multiple sequence alignment and/or just one secondary structure is used in guiding a sequence alignment still completely generated by hand. With the advent of databases and tools offering individual RNA secondary structures, here we re-introduce a twelve letter code already implemented in 4SALE - a tool for synchronous sequence and secondary structure alignment and editing - that enables one to align RNA sequences and their individual secondary structures synchronously and fully automatic, while dramatically increasing the phylogenetic information content. We further introduce a scaled down non-GUI version of 4SALE particularly designed for big data analysis, and available at: http://4sale.bioapps.biozentrum.uni-wuerzburg.de.
Collapse
Affiliation(s)
- Matthias Wolf
- Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany.
| | - Christian Koetschan
- Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| | - Tobias Müller
- Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| |
Collapse
|
15
|
Backofen R, Amman F, Costa F, Findeiß S, Richter AS, Stadler PF. Bioinformatics of prokaryotic RNAs. RNA Biol 2014; 11:470-83. [PMID: 24755880 PMCID: PMC4152356 DOI: 10.4161/rna.28647] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 03/17/2014] [Accepted: 03/25/2014] [Indexed: 02/02/2023] Open
Abstract
The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
- Center for non-coding RNA in Technology and Health; University of Copenhagen; Grønnegårdsvej 3; DK-1870 Frederiksberg C, Denmark
| | - Fabian Amman
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics Group; Department of Computer Science, and Interdisciplinary Center for Bioinformatics; University of Leipzig; Härtelstraße 16-18; D-04107 Leipzig, Germany
| | - Fabrizio Costa
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
| | - Sven Findeiß
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics and Computational Biology Research Group; University of Vienna; Währingerstraße 29; A-1090 Wien, Austria
| | - Andreas S Richter
- Bioinformatics Group; Department of Computer Science; University of Freiburg; Georges-Köhler-Allee 106; D-79110 Freiburg, Germany
- Max Planck Institute of Immunobiology and Epigenetics; Stübeweg 51; D-79108 Freiburg, Germany
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health; University of Copenhagen; Grønnegårdsvej 3; DK-1870 Frederiksberg C, Denmark
- Institute for Theoretical Chemistry; University of Vienna; Währingerstraße 17; A-1090 Wien, Austria
- Bioinformatics Group; Department of Computer Science, and Interdisciplinary Center for Bioinformatics; University of Leipzig; Härtelstraße 16-18; D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences; Inselstraße 22; D-04103 Leipzig, Germany
- Fraunhofer Institute for Cell Therapy and Immunology – IZI; Perlickstraße 1; D-04103 Leipzig, Germany
- Santa Fe Institute; Santa Fe, NM USA
| |
Collapse
|
16
|
Abstract
Transcriptomics experiments and computational predictions both enable systematic discovery of new functional RNAs. However, many putative noncoding transcripts arise instead from artifacts and biological noise, and current computational prediction methods have high false positive rates. I discuss prospects for improving computational methods for analyzing and identifying functional RNAs, with a focus on detecting signatures of conserved RNA secondary structure. An interesting new front is the application of chemical and enzymatic experiments that probe RNA structure on a transcriptome-wide scale. I review several proposed approaches for incorporating structure probing data into the computational prediction of RNA secondary structure. Using probabilistic inference formalisms, I show how all these approaches can be unified in a well-principled framework, which in turn allows RNA probing data to be easily integrated into a wide range of analyses that depend on RNA secondary structure inference. Such analyses include homology search and genome-wide detection of new structural RNAs.
Collapse
Affiliation(s)
- Sean R Eddy
- Howard Hughes Medical Institute Janelia Farm Research Campus, Ashburn, Virginia 20147;
| |
Collapse
|
17
|
Abstract
Long intervening noncoding RNAs (lincRNAs) are transcribed from thousands of loci in mammalian genomes and might play widespread roles in gene regulation and other cellular processes. This Review outlines the emerging understanding of lincRNAs in vertebrate animals, with emphases on how they are being identified and current conclusions and questions regarding their genomics, evolution and mechanisms of action.
Collapse
Affiliation(s)
- Igor Ulitsky
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
| | | |
Collapse
|
18
|
Sun L, Luo H, Bu D, Zhao G, Yu K, Zhang C, Liu Y, Chen R, Zhao Y. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res 2013; 41:e166. [PMID: 23892401 PMCID: PMC3783192 DOI: 10.1093/nar/gkt646] [Citation(s) in RCA: 1213] [Impact Index Per Article: 110.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Revised: 06/28/2013] [Accepted: 07/01/2013] [Indexed: 02/01/2023] Open
Abstract
It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci.
Collapse
Affiliation(s)
- Liang Sun
- Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Haitao Luo
- Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Dechao Bu
- Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Guoguang Zhao
- Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Kuntao Yu
- Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Changhai Zhang
- Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yuanning Liu
- Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Runsheng Chen
- Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yi Zhao
- Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
19
|
Bussotti G, Notredame C, Enright AJ. Detecting and comparing non-coding RNAs in the high-throughput era. Int J Mol Sci 2013; 14:15423-58. [PMID: 23887659 PMCID: PMC3759867 DOI: 10.3390/ijms140815423] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Revised: 07/16/2013] [Accepted: 07/17/2013] [Indexed: 02/07/2023] Open
Abstract
In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data.
Collapse
Affiliation(s)
- Giovanni Bussotti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; E-Mail:
| | - Cedric Notredame
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), Aiguader, 88, 08003 Barcelona, Spain; E-Mail:
| | - Anton J. Enright
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; E-Mail:
| |
Collapse
|
20
|
Will S, Siebauer MF, Heyne S, Engelhardt J, Stadler PF, Reiche K, Backofen R. LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology search. Algorithms Mol Biol 2013; 8:14. [PMID: 23601347 PMCID: PMC3716875 DOI: 10.1186/1748-7188-8-14] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2013] [Accepted: 03/28/2013] [Indexed: 12/15/2022] Open
Abstract
Background The search for distant homologs has become an import issue in genome annotation. A particular difficulty is posed by divergent homologs that have lost recognizable sequence similarity. This same problem also arises in the recognition of novel members of large classes of RNAs such as snoRNAs or microRNAs that consist of families unrelated by common descent. Current homology search tools for structured RNAs are either based entirely on sequence similarity (such as blast or hmmer) or combine sequence and secondary structure. The most prominent example of the latter class of tools is Infernal. Alternatives are descriptor-based methods. In most practical applications published to-date, however, the information contained in covariance models or manually prescribed search patterns is dominated by sequence information. Here we ask two related questions: (1) Is secondary structure alone informative for homology search and the detection of novel members of RNA classes? (2) To what extent is the thermodynamic propensity of the target sequence to fold into the correct secondary structure helpful for this task? Results Sequence-structure alignment can be used as an alternative search strategy. In this scenario, the query consists of a base pairing probability matrix, which can be derived either from a single sequence or from a multiple alignment representing a set of known representatives. Sequence information can be optionally added to the query. The target sequence is pre-processed to obtain local base pairing probabilities. As a search engine we devised a semi-global scanning variant of LocARNA’s algorithm for sequence-structure alignment. The LocARNAscan tool is optimized for speed and low memory consumption. In benchmarking experiments on artificial data we observe that the inclusion of thermodynamic stability is helpful, albeit only in a regime of extremely low sequence information in the query. We observe, furthermore, that the sensitivity is bounded in particular by the limited accuracy of the predicted local structures of the target sequence. Conclusions Although we demonstrate that a purely structure-based homology search is feasible in principle, it is unlikely to outperform tools such as Infernal in most application scenarios, where a substantial amount of sequence information is typically available. The LocARNAscan approach will profit, however, from high throughput methods to determine RNA secondary structure. In transcriptome-wide applications, such methods will provide accurate structure annotations on the target side. Availability Source code of the free software LocARNAscan 1.0 and supplementary data are available at
http://www.bioinf.uni-leipzig.de/Software/LocARNAscan.
Collapse
|
21
|
Kemena C, Bussotti G, Capriotti E, Marti-Renom MA, Notredame C. Using tertiary structure for the computation of highly accurate multiple RNA alignments with the SARA-Coffee package. ACTA ACUST UNITED AC 2013; 29:1112-9. [PMID: 23449094 DOI: 10.1093/bioinformatics/btt096] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
MOTIVATION Aligning RNAs is useful to search for homologous genes, study evolutionary relationships, detect conserved regions and identify any patterns that may be of biological relevance. Poor levels of conservation among homologs, however, make it difficult to compare RNA sequences, even when considering closely evolutionary related sequences. RESULTS We describe SARA-Coffee, a tertiary structure-based multiple RNA aligner, which has been validated using BRAliDARTS, a new benchmark framework designed for evaluating tertiary structure-based multiple RNA aligners. We provide two methods to measure the capacity of alignments to match corresponding secondary and tertiary structure features. On this benchmark, SARA-Coffee outperforms both regular aligners and those using secondary structure information. Furthermore, we show that on sequences in which <60% of the nucleotides form base pairs, primary sequence methods usually perform better than secondary-structure aware aligners. AVAILABILITY AND IMPLEMENTATION The package and the datasets are available from http://www.tcoffee.org/Projects/saracoffee and http://structure.biofold.org/sara/.
Collapse
Affiliation(s)
- Carsten Kemena
- Bioinformatics and Genomics Program, Centre for Genomic Regulation, 08003 Barcelona, Spain
| | | | | | | | | |
Collapse
|
22
|
Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii). PLoS One 2012; 7:e47174. [PMID: 23056606 PMCID: PMC3466250 DOI: 10.1371/journal.pone.0047174] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 09/10/2012] [Indexed: 01/05/2023] Open
Abstract
Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the 'oligo-capping' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5'-ESTs and 41,317 3'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for whole genome sequencing.
Collapse
|
23
|
Krzyzanowski PM, Muro EM, Andrade-Navarro MA. Computational approaches to discovering noncoding RNA. WILEY INTERDISCIPLINARY REVIEWS-RNA 2012; 3:567-79. [PMID: 22555938 DOI: 10.1002/wrna.1121] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
New developments are being brought to the field of molecular biology with the mounting evidence that RNA transcripts not translated into protein (noncoding RNAs, ncRNAs) hold a variety of biological functions. Computational discovery of ncRNAs is one of these developments, fueled not only by the urge to characterize these sequences but also by necessity to prioritize ones with the most relevant functions for experimental verification. The heterogeneity in size and mode of activity of ncRNAs is reflected in the corresponding diversity of computational methods for their study. Sequence and structural analysis, conservation across species, and relative position to other genomic elements are being used for ncRNA detection. In addition, the recent development of techniques that allow deep sequencing of cell transcripts either globally or from isolated ncRNA-related material is leading the field toward increased use of such high-throughput data. We expect that imminent breakthroughs will include the classification of newer types of ncRNA and new insights into miRNA and piRNA biology, eventually leading toward the completion of a catalog of all human ncRNAs.
Collapse
|
24
|
Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C. Use of ChIP-Seq data for the design of a multiple promoter-alignment method. Nucleic Acids Res 2012; 40:e52. [PMID: 22230796 PMCID: PMC3326335 DOI: 10.1093/nar/gkr1292] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments.
Collapse
Affiliation(s)
- Ionas Erb
- Bioinformatics and Genomics program, Centre for Genomic Regulation and UPF, 08003 Barcelona, Spain
| | | | | | | | | | | |
Collapse
|