1
|
Velandia-Huerto CA, Fallmann J, Stadler PF. miRNAture-Computational Detection of microRNA Candidates. Genes (Basel) 2021; 12:348. [PMID: 33673400 PMCID: PMC7996739 DOI: 10.3390/genes12030348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 02/19/2021] [Accepted: 02/20/2021] [Indexed: 12/16/2022] Open
Abstract
Homology-based annotation of short RNAs, including microRNAs, is a difficult problem because their inherently small size limits the available information. Highly sensitive methods, including parameter optimized blast, nhmmer, or cmsearch runs designed to increase sensitivity inevitable lead to large numbers of false positives, which can be detected only by detailed analysis of specific features typical for a RNA family and/or the analysis of conservation patterns in structure-annotated multiple sequence alignments. The miRNAture pipeline implements a workflow specific to animal microRNAs that automatizes homology search and validation steps. The miRNAture pipeline yields very good results for a large number of "typical" miRBase families. However, it also highlights difficulties with atypical cases, in particular microRNAs deriving from repetitive elements and microRNAs with unusual, branched precursor structures and atypical locations of the mature product, which require specific curation by domain experts.
Collapse
Affiliation(s)
- Cristian A. Velandia-Huerto
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, CO-111321 Bogotá, Colombia
- Santa Fe Insitute, Santa Fe, NM 87501, USA
| |
Collapse
|
2
|
Tomáška Ľ, Nosek J. Co-evolution in the Jungle: From Leafcutter Ant Colonies to Chromosomal Ends. J Mol Evol 2020; 88:293-318. [PMID: 32157325 DOI: 10.1007/s00239-020-09935-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 02/25/2020] [Indexed: 02/06/2023]
Abstract
Biological entities are multicomponent systems where each part is directly or indirectly dependent on the others. In effect, a change in a single component might have a consequence on the functioning of its partners, thus affecting the fitness of the entire system. In this article, we provide a few examples of such complex biological systems, ranging from ant colonies to a population of amino acids within a single-polypeptide chain. Based on these examples, we discuss one of the central and still challenging questions in biology: how do such multicomponent consortia co-evolve? More specifically, we ask how telomeres, nucleo-protein complexes protecting the integrity of linear DNA chromosomes, originated from the ancestral organisms having circular genomes and thus not dealing with end-replication and end-protection problems. Using the examples of rapidly evolving topologies of mitochondrial genomes in eukaryotic microorganisms, we show what means of co-evolution were employed to accommodate various types of telomere-maintenance mechanisms in mitochondria. We also describe an unprecedented runaway evolution of telomeric repeats in nuclei of ascomycetous yeasts accompanied by co-evolution of telomere-associated proteins. We propose several scenarios derived from research on telomeres and supported by other studies from various fields of biology, while emphasizing that the relevant answers are still not in sight. It is this uncertainty and a lack of a detailed roadmap that makes the journey through the jungle of biological systems still exciting and worth undertaking.
Collapse
Affiliation(s)
- Ľubomír Tomáška
- Department of Genetics, Faculty of Natural Sciences, Comenius University in Bratislava, Ilkovičova 6, 842 15, Bratislava, Slovakia.
| | - Jozef Nosek
- Department of Biochemistry, Faculty of Natural Sciences, Comenius University in Bratislava, Ilkovičova 6, 842 15, Bratislava, Slovakia
| |
Collapse
|
3
|
Li X, Chen F, Xiao J, Chou SH, Li X, He J. Genome-wide Analysis of the Distribution of Riboswitches and Function Analyses of the Corresponding Downstream Genes in Prokaryotes. Curr Bioinform 2018. [DOI: 10.2174/1574893613666180423145812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Riboswitches are structured elements that usually reside in the noncoding
regions of mRNAs, with which various ligands bind to control a wide variety of downstream gene
expressions. To date, more than twenty different classes of riboswitches have been characterized to
sense various metabolites, including purines and their derivatives, coenzymes, amino acids, and metal
ions, etc.
</P><P>
Objective: This study aims to study the genome-wide analysis of the distribution of riboswitches and
function analyses of the corresponding downstream genes in prokaryotes.
Results:
In this study, we have completed a genome context analysis of 27 riboswitches to elucidate
their metabolic capacities of riboswitch-mediated gene regulation from the completely-sequenced 3,079
prokaryotic genomes. Furthermore, Cluster of Orthologous Groups of proteins (COG) annotation was
applied to predict and classify the possible functions of corresponding downstream genes of these
riboswitches. We found that they could all be successfully annotated and grouped into 20 different COG
functional categories, in which the two main clusters "coenzyme metabolism [H]" and "amino acid
transport and metabolism [E]" were the most significantly enriched.
Conclusion:
Riboswitches are found to be widespread in bacteria, among which three main classes of
TPP-, cobalamin- and SAM-riboswitch were the most widely distributed. We found a wide variety of
functions were associated with the corresponding downstream genes, suggesting that a wide extend of
regulatory roles were mediated by these riboswitches in prokaryotes.
Collapse
Affiliation(s)
- Xinfeng Li
- State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Fang Chen
- State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Jinfeng Xiao
- State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Shan-Ho Chou
- Institute of Biochemistry and NCHU Agricultural Biotechnology Center, National Chung Hsing University, Taichung 40227, Taiwan
| | - Xuming Li
- State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Jin He
- State Key Laboratory of Agricultural Microbiology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| |
Collapse
|
4
|
蒋 洋, 朱 浩, 张 海. [Analysis of orthologous lncRNAs in humans and mice and their species-specific epigenetic target genes]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2018; 38:731-735. [PMID: 29997097 PMCID: PMC6765704 DOI: 10.3969/j.issn.1673-4254.2018.06.14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 12/09/2017] [Indexed: 06/08/2023]
Abstract
OBJECTIVE To identify orthologous lncRNAs in human and mice and the species specificity of their epigenetic regulatory functions. METHODS The human/mouse whole-genome pairwise alignment (hg19/mm10, genome.UCSC.edu) was used to identify the orthologues in 13 562 and 10 481 GENCODE-annotated human and mouse lncRNAs. The Infernal program was used to search the orthologous sequences of all the exons of the 13562 human lncRNAs in mouse genome (mm10) to identify the highly conserved orthologues in mice. LongTarget program was used to predict the DNA binding sites of the orthologous lncRNAs in their local genomic regions. Gene Ontology analysis was carried out to examine the functions of genes. RESULTS Only 158 orthologous lncRNAs were identified in humans and mice, and many of these orthologues had species-specific DNA binding sites and epigenetic target genes. Some of the epigenetic target genes executed important functions in determining human and mouse phenotypes. CONCLUSION s Only a few human and mouse lncRNAs are orthologues, and most of lncRNAs are species-specific. The orthologous lncRNAs have species-specific epigenetic target genes, and species-specific epigenetic regulation greatly contributes to the differences between humans and mice.
Collapse
Affiliation(s)
- 洋洋 蒋
- 南方医科大学 网络中心, 广东 广州 510515Network Center, Southern Medical University, Guangzhou 510515, China
| | - 浩 朱
- 南方医科大学 基础医学院生物信息学教研室, 广东 广州 510515Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - 海 张
- 南方医科大学 网络中心, 广东 广州 510515Network Center, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
5
|
Kutchko KM, Madden EA, Morrison C, Plante KS, Sanders W, Vincent HA, Cruz Cisneros MC, Long KM, Moorman NJ, Heise MT, Laederach A. Structural divergence creates new functional features in alphavirus genomes. Nucleic Acids Res 2018; 46:3657-3670. [PMID: 29361131 PMCID: PMC6283419 DOI: 10.1093/nar/gky012] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Revised: 12/10/2017] [Accepted: 01/05/2018] [Indexed: 12/03/2022] Open
Abstract
Alphaviruses are mosquito-borne pathogens that cause human diseases ranging from debilitating arthritis to lethal encephalitis. Studies with Sindbis virus (SINV), which causes fever, rash, and arthralgia in humans, and Venezuelan equine encephalitis virus (VEEV), which causes encephalitis, have identified RNA structural elements that play key roles in replication and pathogenesis. However, a complete genomic structural profile has not been established for these viruses. We used the structural probing technique SHAPE-MaP to identify structured elements within the SINV and VEEV genomes. Our SHAPE-directed structural models recapitulate known RNA structures, while also identifying novel structural elements, including a new functional element in the nsP1 region of SINV whose disruption causes a defect in infectivity. Although RNA structural elements are important for multiple aspects of alphavirus biology, we found the majority of RNA structures were not conserved between SINV and VEEV. Our data suggest that alphavirus RNA genomes are highly divergent structurally despite similar genomic architecture and sequence conservation; still, RNA structural elements are critical to the viral life cycle. These findings reframe traditional assumptions about RNA structure and evolution: rather than structures being conserved, alphaviruses frequently evolve new structures that may shape interactions with host immune systems or co-evolve with viral proteins.
Collapse
Affiliation(s)
- Katrina M Kutchko
- Department of Biology, UNC-Chapel Hill, USA
- Curriculum in Bioinformatics and Computational Biology, UNC-Chapel Hill, USA
| | - Emily A Madden
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
| | | | | | - Wes Sanders
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, UNC-Chapel Hill, USA
| | | | | | | | - Nathaniel J Moorman
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, UNC-Chapel Hill, USA
| | - Mark T Heise
- Department of Microbiology and Immunology, UNC-Chapel Hill, USA
- Department of Genetics, UNC-Chapel Hill, USA
| | - Alain Laederach
- Department of Biology, UNC-Chapel Hill, USA
- Curriculum in Bioinformatics and Computational Biology, UNC-Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, UNC-Chapel Hill, USA
| |
Collapse
|
6
|
Aken BL, Ayling S, Barrell D, Clarke L, Curwen V, Fairley S, Fernandez Banet J, Billis K, García Girón C, Hourlier T, Howe K, Kähäri A, Kokocinski F, Martin FJ, Murphy DN, Nag R, Ruffier M, Schuster M, Tang YA, Vogel JH, White S, Zadissa A, Flicek P, Searle SMJ. The Ensembl gene annotation system. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw093. [PMID: 27337980 PMCID: PMC4919035 DOI: 10.1093/database/baw093] [Citation(s) in RCA: 714] [Impact Index Per Article: 89.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 05/09/2016] [Indexed: 12/12/2022]
Abstract
The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html.
Collapse
Affiliation(s)
- Bronwen L Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Sarah Ayling
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Present addresses: The Genome Analysis Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Daniel Barrell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Eagle Genomics Ltd, Babraham Research Campus, Cambridge CB22 3AT, UK
| | - Laura Clarke
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Valery Curwen
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Susan Fairley
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julio Fernandez Banet
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Pfizer Inc, 10646 Science Center Dr, San Diego, CA 92121, USA
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Kevin Howe
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andreas Kähäri
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Institutionen för cell-och molekylärbiologi, Uppsala University, Husargatan 3, Uppsala 752 37, Sweden
| | - Felix Kokocinski
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Daniel N Murphy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Rishi Nag
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Magali Ruffier
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michael Schuster
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna a-1090, Austria
| | - Y Amy Tang
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jan-Hinnerk Vogel
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Genentech Inc, 1 DNA Way, South San Francisco, CA 94080, USA
| | - Simon White
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK The Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Amonida Zadissa
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Stephen M J Searle
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
7
|
Barquist L, Burge SW, Gardner PP. Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:12.13.1-12.13.25. [PMID: 27322404 PMCID: PMC5010141 DOI: 10.1002/cpbi.4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remain difficult. This unit introduces methods developed by the Rfam database for identifying "families" of homologous ncRNAs starting from single "seed" sequences, using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step-by-step iterative protocol for identifying ncRNA homologs and then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Lars Barquist
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, D-97080 Germany
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA United Kingdom; Fax: +44 (0)1223 494919
| | - Sarah W. Burge
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA United Kingdom; Fax: +44 (0)1223 494919
| | - Paul P. Gardner
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
| |
Collapse
|
8
|
Pignatelli M, Vilella AJ, Muffato M, Gordon L, White S, Flicek P, Herrero J. ncRNA orthologies in the vertebrate lineage. Database (Oxford) 2016; 2016:bav127. [PMID: 26980512 PMCID: PMC4792531 DOI: 10.1093/database/bav127] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 12/23/2015] [Accepted: 12/23/2015] [Indexed: 01/07/2023]
Abstract
Annotation of orthologous and paralogous genes is necessary for many aspects of evolutionary analysis. Methods to infer these homology relationships have traditionally focused on protein-coding genes and evolutionary models used by these methods normally assume the positions in the protein evolve independently. However, as our appreciation for the roles of non-coding RNA genes has increased, consistently annotated sets of orthologous and paralogous ncRNA genes are increasingly needed. At the same time, methods such as PHASE or RAxML have implemented substitution models that consider pairs of sites to enable proper modelling of the loops and other features of RNA secondary structure. Here, we present a comprehensive analysis pipeline for the automatic detection of orthologues and paralogues for ncRNA genes. We focus on gene families represented in Rfam and for which a specific covariance model is provided. For each family ncRNA genes found in all Ensembl species are aligned using Infernal, and several trees are built using different substitution models. In parallel, a genomic alignment that includes the ncRNA genes and their flanking sequence regions is built with PRANK. This alignment is used to create two additional phylogenetic trees using the neighbour-joining (NJ) and maximum-likelihood (ML) methods. The trees arising from both the ncRNA and genomic alignments are merged using TreeBeST, which reconciles them with the species tree in order to identify speciation and duplication events. The final tree is used to infer the orthologues and paralogues following Fitch's definition. We also determine gene gain and loss events for each family using CAFE. All data are accessible through the Ensembl Comparative Genomics ('Compara') API, on our FTP site and are fully integrated in the Ensembl genome browser, where they can be accessed in a user-friendly manner. Database URL: http://www.ensembl.org.
Collapse
Affiliation(s)
- Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Albert J Vilella
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Simon White
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute UCL Cancer Institute, University College London, London WC1E 6BT, UK
| |
Collapse
|
9
|
Yotsukura S, duVerle D, Hancock T, Natsume-Kitatani Y, Mamitsuka H. Computational recognition for long non-coding RNA (lncRNA): Software and databases. Brief Bioinform 2016; 18:9-27. [DOI: 10.1093/bib/bbv114] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 12/10/2015] [Indexed: 01/22/2023] Open
|
10
|
Abstract
The primary task of the Rfam database is to collate experimentally validated noncoding RNA (ncRNA) sequences from the published literature and facilitate the prediction and annotation of new homologues in novel nucleotide sequences. We group homologous ncRNA sequences into "families" and related families are further grouped into "clans." We collate and manually curate data cross-references for these families from other databases and external resources. Our Web site offers researchers a simple interface to Rfam and provides tools with which to annotate their own sequences using our covariance models (CMs), through our tools for searching, browsing, and downloading information on Rfam families. In this chapter, we will work through examples of annotating a query sequence, collating family information, and searching for data.
Collapse
Affiliation(s)
- Jennifer Daub
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK,
| | | | | | | |
Collapse
|
11
|
Lertampaiporn S, Thammarongtham C, Nukoolkit C, Kaewkamnerdpong B, Ruengjitchatchawalya M. Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm. Nucleic Acids Res 2014; 42:e93. [PMID: 24771344 PMCID: PMC4066759 DOI: 10.1093/nar/gku325] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/02/2014] [Accepted: 04/07/2014] [Indexed: 12/13/2022] Open
Abstract
To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features-structure, sequence, modularity, structural robustness and coding potential-to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm.
Collapse
Affiliation(s)
- Supatcha Lertampaiporn
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Chinae Thammarongtham
- Biochemical Engineering and Pilot Plant Research and Development Unit, National Center for Genetic Engineering and Biotechnology at King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand
| | - Chakarida Nukoolkit
- School of Information Technology, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Boonserm Kaewkamnerdpong
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Marasri Ruengjitchatchawalya
- Biotechnology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand Bioinformatics and Systems Biology Program, King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand
| |
Collapse
|
12
|
He S, Gu W, Li Y, Zhu H. ANRIL/CDKN2B-AS shows two-stage clade-specific evolution and becomes conserved after transposon insertions in simians. BMC Evol Biol 2013; 13:247. [PMID: 24225082 PMCID: PMC3831594 DOI: 10.1186/1471-2148-13-247] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Accepted: 11/08/2013] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Many long non-coding RNA (lncRNA) genes identified in mammals have multiple exons and functional domains, allowing them to bind to polycomb proteins, DNA methyltransferases, and specific DNA sequences to regulate genome methylation. Little is known about the origin and evolution of lncRNAs. ANRIL/CDKN2B-AS consists of 19 exons on human chromosome 9p21 and regulates the expression of three cyclin-dependent kinase inhibitors (CDKN2A/ARF/CDKN2B). RESULTS ANRIL/CDKN2B-AS originated in placental mammals, obtained additional exons during mammalian evolution but gradually lost them during rodent evolution, and reached 19 exons only in simians. ANRIL lacks splicing signals in mammals. In simians, multiple transposons were inserted and transformed into exons of the ANRIL gene, after which ANRIL became highly conserved. A further survey reveals that multiple transposons exist in many lncRNAs. CONCLUSIONS ANRIL shows a two-stage, clade-specific evolutionary process and is fully developed only in simians. The domestication of multiple transposons indicates an impressive pattern of "evolutionary tinkering" and is likely to be important for ANRIL's structure and function. The evolution of lncRNAs and that of transposons may be highly co-opted in primates. Many lncRNAs may be functional only in simians.
Collapse
Affiliation(s)
| | | | | | - Hao Zhu
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Shatai Road, Guangzhou 510515, China.
| |
Collapse
|
13
|
Moss WN, Steitz JA. Genome-wide analyses of Epstein-Barr virus reveal conserved RNA structures and a novel stable intronic sequence RNA. BMC Genomics 2013; 14:543. [PMID: 23937650 PMCID: PMC3751371 DOI: 10.1186/1471-2164-14-543] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 08/07/2013] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Epstein-Barr virus (EBV) is a human herpesvirus implicated in cancer and autoimmune disorders. Little is known concerning the roles of RNA structure in this important human pathogen. This study provides the first comprehensive genome-wide survey of RNA and RNA structure in EBV. RESULTS Novel EBV RNAs and RNA structures were identified by computational modeling and RNA-Seq analyses of EBV. Scans of the genomic sequences of four EBV strains (EBV-1, EBV-2, GD1, and GD2) and of the closely related Macacine herpesvirus 4 using the RNAz program discovered 265 regions with high probability of forming conserved RNA structures. Secondary structure models are proposed for these regions based on a combination of free energy minimization and comparative sequence analysis. The analysis of RNA-Seq data uncovered the first observation of a stable intronic sequence RNA (sisRNA) in EBV. The abundance of this sisRNA rivals that of the well-known and highly expressed EBV-encoded non-coding RNAs (EBERs). CONCLUSION This work identifies regions of the EBV genome likely to generate functional RNAs and RNA structures, provides structural models for these regions, and discusses potential functions suggested by the modeled structures. Enhanced understanding of the EBV transcriptome will guide future experimental analyses of the discovered RNAs and RNA structures.
Collapse
Affiliation(s)
- Walter N Moss
- Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, Connecticut 06536, USA
| | - Joan A Steitz
- Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, Connecticut 06536, USA
| |
Collapse
|
14
|
Chen XS, Brown CM. Computational identification of new structured cis-regulatory elements in the 3'-untranslated region of human protein coding genes. Nucleic Acids Res 2012; 40:8862-73. [PMID: 22821558 PMCID: PMC3467077 DOI: 10.1093/nar/gks684] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Revised: 06/15/2012] [Accepted: 06/20/2012] [Indexed: 01/14/2023] Open
Abstract
Messenger ribonucleic acids (RNAs) contain a large number of cis-regulatory RNA elements that function in many types of post-transcriptional regulation. These cis-regulatory elements are often characterized by conserved structures and/or sequences. Although some classes are well known, given the wide range of RNA-interacting proteins in eukaryotes, it is likely that many new classes of cis-regulatory elements are yet to be discovered. An approach to this is to use computational methods that have the advantage of analysing genomic data, particularly comparative data on a large scale. In this study, a set of structural discovery algorithms was applied followed by support vector machine (SVM) classification. We trained a new classification model (CisRNA-SVM) on a set of known structured cis-regulatory elements from 3'-untranslated regions (UTRs) and successfully distinguished these and groups of cis-regulatory elements not been strained on from control genomic and shuffled sequences. The new method outperformed previous methods in classification of cis-regulatory RNA elements. This model was then used to predict new elements from cross-species conserved regions of human 3'-UTRs. Clustering of these elements identified new classes of potential cis-regulatory elements. The model, training and testing sets and novel human predictions are available at: http://mRNA.otago.ac.nz/CisRNA-SVM.
Collapse
Affiliation(s)
- Xiaowei Sylvia Chen
- Department of Biochemistry and Genetics Otago, University of Otago, Dunedin 9054, New Zealand.
| | | |
Collapse
|
15
|
Bussotti G, Raineri E, Erb I, Zytnicki M, Wilm A, Beaudoing E, Bucher P, Notredame C. BlastR--fast and accurate database searches for non-coding RNAs. Nucleic Acids Res 2011; 39:6886-95. [PMID: 21624887 PMCID: PMC3167602 DOI: 10.1093/nar/gkr335] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.
Collapse
Affiliation(s)
- Giovanni Bussotti
- Bioinformatics and Genomics program, Center for Genomic Regulation (CRG) and UPF, Barcelona, C/ D. Aiguader, 88, 08003 Barcelona, Spain
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Abstract
Small noncoding RNAs regulate a variety of cellular processes, including genomic imprinting, chromatin remodeling, replication, transcription, and translation. Here, we report small replication-regulating RNAs (srRNAs) that specifically inhibit DNA replication of the human BK polyomavirus (BKV) in vitro and in vivo. srRNAs from FM3A murine mammary tumor cells were enriched by DNA replication assay-guided fractionation and hybridization to the BKV noncoding control region (NCCR) and synthesized as cDNAs. Selective mutagenesis of the cDNA sequences and their putative targets suggests that the inhibition of BKV DNA replication is mediated by srRNAs binding to the viral NCCR, hindering early steps in the initiation of DNA replication. Ectopic expression of srRNAs in human cells inhibited BKV DNA replication in vivo. Additional srRNAs were designed and synthesized that specifically inhibit simian virus 40 (SV40) DNA replication in vitro. These observations point to novel mechanisms for regulating DNA replication and suggest the design of synthetic agents for inhibiting replication of polyomaviruses and possibly other viruses.
Collapse
|
17
|
Otto C, Hoffmann S, Gorodkin J, Stadler PF. Fast local fragment chaining using sum-of-pair gap costs. Algorithms Mol Biol 2011; 6:4. [PMID: 21418573 PMCID: PMC3072320 DOI: 10.1186/1748-7188-6-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2010] [Accepted: 03/18/2011] [Indexed: 01/11/2023] Open
Abstract
Background Fast seed-based alignment heuristics such as BLAST and BLAT have become indispensable tools in comparative genomics for all studies aiming at the evolutionary relations of proteins, genes, and non-coding RNAs. This is true in particular for the large mammalian genomes. The sensitivity and specificity of these tools, however, crucially depend on parameters such as seed sizes or maximum expectation values. In settings that require high sensitivity the amount of short local match fragments easily becomes intractable. Then, fragment chaining is a powerful leverage to quickly connect, score, and rank the fragments to improve the specificity. Results Here we present a fast and flexible fragment chainer that for the first time also supports a sum-of-pair gap cost model. This model has proven to achieve a higher accuracy and sensitivity in its own field of application. Due to a highly time-efficient index structure our method outperforms the only existing tool for fragment chaining under the linear gap cost model. It can easily be applied to the output generated by alignment tools such as segemehl or BLAST. As an example we consider homology-based searches for human and mouse snoRNAs demonstrating that a highly sensitive BLAST search with subsequent chaining is an attractive option. The sum-of-pair gap costs provide a substantial advantage is this context. Conclusions Chaining of short match fragments helps to quickly and accurately identify regions of homology that may not be found using local alignment heuristics alone. By providing both the linear and the sum-of-pair gap cost model, a wider range of application can be covered. The software clasp is available at http://www.bioinf.uni-leipzig.de/Software/clasp/.
Collapse
|