1
|
Rahaman MM, Zhang S. RNAMotifProfile: a graph-based approach to build RNA structural motif profiles. NAR Genom Bioinform 2024; 6:lqae128. [PMID: 39328267 PMCID: PMC11426329 DOI: 10.1093/nargab/lqae128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/24/2024] [Accepted: 09/09/2024] [Indexed: 09/28/2024] Open
Abstract
RNA structural motifs are the recurrent segments in RNA three-dimensional structures that play a crucial role in the functional diversity of RNAs. Understanding the similarities and variations within these recurrent motif groups is essential for gaining insights into RNA structure and function. While recurrent structural motifs are generally assumed to be composed of the same isosteric base interactions, this consistent pattern is not observed across all examples of these motifs. Existing methods for analyzing and comparing RNA structural motifs may overlook variations in base interactions and associated nucleotides. RNAMotifProfile is a novel profile-to-profile alignment algorithm that generates a comprehensive profile from a group of structural motifs, incorporating all base interactions and associated nucleotides at each position. By structurally aligning input motif instances using a guide-tree-based approach, RNAMotifProfile captures the similarities and variations within recurrent motif groups. Additionally, RNAMotifProfile can function as a motif search tool, enabling the identification of instances of a specific motif family by searching with the corresponding profile. The ability to generate accurate and comprehensive profiles for RNA structural motif families, and to search for these motifs, facilitates a deeper understanding of RNA structure-function relationships and potential applications in RNA engineering and therapeutic design.
Collapse
Affiliation(s)
- Md Mahfuzur Rahaman
- Department of Computer Science, University of Central Florida, 4328 Scorpius Street, Orlando, FL 32816-2362, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, 4328 Scorpius Street, Orlando, FL 32816-2362, USA
| |
Collapse
|
2
|
Quadrini M, Tesei L, Merelli E. Automatic generation of pseudoknotted RNAs taxonomy. BMC Bioinformatics 2023; 23:575. [PMID: 37322429 DOI: 10.1186/s12859-023-05362-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 05/25/2023] [Indexed: 06/17/2023] Open
Abstract
BACKGROUND The ability to compare RNA secondary structures is important in understanding their biological function and for grouping similar organisms into families by looking at evolutionarily conserved sequences such as 16S rRNA. Most comparison methods and benchmarks in the literature focus on pseudoknot-free structures due to the difficulty of mapping pseudoknots in classical tree representations. Some approaches exist that permit to cluster pseudoknotted RNAs but there is not a general framework for evaluating their performance. RESULTS We introduce an evaluation framework based on a similarity/dissimilarity measure obtained by a comparison method and agglomerative clustering. Their combination automatically partition a set of molecules into groups. To illustrate the framework we define and make available a benchmark of pseudoknotted (16S and 23S) and pseudoknot-free (5S) rRNA secondary structures belonging to Archaea, Bacteria and Eukaryota. We also consider five different comparison methods from the literature that are able to manage pseudoknots. For each method we clusterize the molecules in the benchmark to obtain the taxa at the rank phylum according to the European Nucleotide Archive curated taxonomy. We compute appropriate metrics for each method and we compare their suitability to reconstruct the taxa.
Collapse
Affiliation(s)
- Michela Quadrini
- School of Sciences and Technology, University of Camerino, Via Madonna delle Carceri 7, 62032, Camerino, MC, Italy
| | - Luca Tesei
- School of Sciences and Technology, University of Camerino, Via Madonna delle Carceri 7, 62032, Camerino, MC, Italy.
| | - Emanuela Merelli
- School of Sciences and Technology, University of Camerino, Via Madonna delle Carceri 7, 62032, Camerino, MC, Italy
| |
Collapse
|
3
|
Metrics for RNA Secondary Structure Comparison. Methods Mol Biol 2023; 2586:79-88. [PMID: 36705899 DOI: 10.1007/978-1-0716-2768-6_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
RNA secondary structure comparison is one of the important analyses for elucidating individual functions of RNAs since it is widely accepted that their functions and structures are strongly correlated. However, although the RNA secondary structures with pseudoknot play important roles in vivo, it is difficult to deal with such structures in silico due to their structural complexity, which is a major obstacle to the analysis of RNA functions.Here, we introduce an algorithm and a metric for comparing pseudoknotted RNA secondary structures based on topological centroid identification and tree edit distance and describe the usage protocol of a software enabling us to run the comparison. This software is publicly available and works on both Microsoft Windows and Apple macOS.
Collapse
|
4
|
Abstract
Alignments of discrete objects can be constructed in a very general setting as super-objects from which the constituent objects are recovered by means of projections. Here, we focus on contact maps, i.e. undirected graphs with an ordered set of vertices. These serve as natural discretizations of RNA and protein structures. In the general case, the alignment problem for vertex-ordered graphs is NP-complete. In the special case of RNA secondary structures, i.e. crossing-free matchings, however, the alignments have a recursive structure. The alignment problem then can be solved by a variant of the Sankoff algorithm in polynomial time. Moreover, the tree or forest alignments of RNA secondary structure can be understood as the alignments of ordered edge sets.
Collapse
Affiliation(s)
- Peter F Stadler
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Centre for Scalable Data Services and Solutions Dresden-Leipzig, Leipzig Research Centre for Civilization Diseases, and Centre for Biotechnology and Biomedicine at Leipzig University, Universität Leipzig, Leipzig, Germany.,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany.,Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, 1090 Wien, Austria.,Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia.,Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
| |
Collapse
|
5
|
Quadrini M. Structural relation matching: an algorithm to identify structural patterns into RNAs and their interactions. J Integr Bioinform 2021; 18:111-126. [PMID: 34051708 PMCID: PMC9382659 DOI: 10.1515/jib-2020-0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 04/19/2021] [Indexed: 11/15/2022] Open
Abstract
RNA molecules play crucial roles in various biological processes. Their three-dimensional configurations determine the functions and, in turn, influences the interaction with other molecules. RNAs and their interaction structures, the so-called RNA-RNA interactions, can be abstracted in terms of secondary structures, i.e., a list of the nucleotide bases paired by hydrogen bonding within its nucleotide sequence. Each secondary structure, in turn, can be abstracted into cores and shadows. Both are determined by collapsing nucleotides and arcs properly. We formalize all of these abstractions as arc diagrams, whose arcs determine loops. A secondary structure, represented by an arc diagram, is pseudoknot-free if its arc diagram does not present any crossing among arcs otherwise, it is said pseudoknotted. In this study, we face the problem of identifying a given structural pattern into secondary structures or the associated cores or shadow of both RNAs and RNA-RNA interactions, characterized by arbitrary pseudoknots. These abstractions are mapped into a matrix, whose elements represent the relations among loops. Therefore, we face the problem of taking advantage of matrices and submatrices. The algorithms, implemented in Python, work in polynomial time. We test our approach on a set of 16S ribosomal RNAs with inhibitors of Thermus thermophilus, and we quantify the structural effect of the inhibitors.
Collapse
Affiliation(s)
- Michela Quadrini
- University of Camerino, School of Science and Technology, via Madonna delle Carceri, Camerino, Italy
| |
Collapse
|
6
|
Quadrini M, Tesei L, Merelli E. ASPRAlign: a tool for the alignment of RNA secondary structures with arbitrary pseudoknots. Bioinformatics 2020; 36:3578-3579. [PMID: 32125359 DOI: 10.1093/bioinformatics/btaa147] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 02/23/2020] [Accepted: 02/26/2020] [Indexed: 01/18/2023] Open
Abstract
SUMMARY Current methods for comparing RNA secondary structures are based on tree representations and exploit edit distance or alignment algorithms. Most of them can only process structures without pseudoknots. To overcome this limitation, we introduce ASPRAlign, a Java tool that aligns particular algebraic tree representations of RNA. These trees neglect the primary sequence and can handle structures with arbitrary pseudoknots. A measure of comparison, called ASPRA distance, is computed with a worst-case time complexity of O(n2) where n is the number of nucleotides of the longer structure. AVAILABILITY AND IMPLEMENTATION ASPRAlign is implemented in Java and source code is released under the GNU GPLv3 license. Code and documentation are freely available at https://github.com/bdslab/aspralign. CONTACT luca.tesei@unicam.it. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michela Quadrini
- Department of Information Engineering, University of Padua, Padova 35131, Italy
| | - Luca Tesei
- School of Sciences and Technology, University of Camerino, Camerino 62032, Italy
| | - Emanuela Merelli
- School of Sciences and Technology, University of Camerino, Camerino 62032, Italy
| |
Collapse
|
7
|
Zika Virus Subgenomic Flavivirus RNA Generation Requires Cooperativity between Duplicated RNA Structures That Are Essential for Productive Infection in Human Cells. J Virol 2020; 94:JVI.00343-20. [PMID: 32581095 DOI: 10.1128/jvi.00343-20] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 06/16/2020] [Indexed: 12/15/2022] Open
Abstract
Zika virus (ZIKV) is an emerging flavivirus, mainly transmitted by mosquitoes, which represents a global health threat. A common feature of flavivirus-infected cells is the accumulation of viral noncoding subgenomic RNAs by partial degradation of the viral genome, known as sfRNAs, involved in immune evasion and pathogenesis. Although great effort is being made to understand the mechanism by which these sfRNAs function during infection, the picture of how they work is still incomplete. In this study, we developed new genetic tools to dissect the functions of ZIKV RNA structures for viral replication and sfRNA production in mosquito and human hosts. ZIKV infections mostly accumulate two kinds of sfRNAs, sfRNA1 and sfRNA2, by stalling genome degradation upstream of duplicated stem loops (SLI and SLII) of the viral 3' untranslated region (UTR). Although the two SLs share conserved sequences and structures, different functions have been found for ZIKV replication in human and mosquito cells. While both SLs are enhancers for viral infection in human cells, they play opposite roles in the mosquito host. The dissection of determinants for sfRNA formation indicated a strong cooperativity between SLI and SLII, supporting a high-order organization of this region of the 3' UTR. Using recombinant ZIKV with different SLI and SLII arrangements, which produce different types of sfRNAs or lack the ability to generate these molecules, revealed that at least one sfRNA was necessary for efficient infection and transmission in Aedes aegypti mosquitoes. Importantly, we demonstrate an absolute requirement of sfRNAs for ZIKV propagation in human cells. In this regard, viruses lacking sfRNAs, constructed by deletion of the region containing SLI and SLII, were able to infect human cells but the infection was rapidly cleared by antiviral responses. Our findings are unique for ZIKV, since in previous studies, other flaviviruses with deletions of analogous regions of the genome, including dengue and West Nile viruses, accumulated distinct species of sfRNAs and were infectious in human cells. We conclude that flaviviruses share common strategies for sfRNA generation, but they have evolved mechanisms to produce different kinds of these RNAs to accomplish virus-specific functions.IMPORTANCE Flaviviruses are important emerging and reemerging human pathogens. Understanding the molecular mechanisms for viral replication and evasion of host antiviral responses is relevant to development of control strategies. Flavivirus infections produce viral noncoding RNAs, known as sfRNAs, involved in viral replication and pathogenesis. In this study, we dissected molecular determinants for Zika virus sfRNA generation in the two natural hosts, human cells and mosquitoes. We found that two RNA structures of the viral 3' UTR operate in a cooperative manner to produce two species of sfRNAs and that the deletion of these elements has a profoundly different impact on viral replication in the two hosts. Generation of at least one sfRNA was necessary for efficient Zika virus infection of Aedes aegypti mosquitoes. Moreover, recombinant viruses with different 3' UTR arrangements revealed an essential role of sfRNAs for productive infection in human cells. In summary, we define molecular requirements for Zika virus sfRNA accumulation and provide new ideas of how flavivirus RNA structures have evolved to succeed in different hosts.
Collapse
|
8
|
|
9
|
Control of the neuroprotective Lipocalin Apolipoprotein D expression by alternative promoter regions and differentially expressed mRNA 5' UTR variants. PLoS One 2020; 15:e0234857. [PMID: 32559215 PMCID: PMC7304576 DOI: 10.1371/journal.pone.0234857] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 06/03/2020] [Indexed: 02/02/2023] Open
Abstract
The Lipocalin Apolipoprotein D (ApoD) is one of the few genes consistently overexpressed in the aging brain, and in most neurodegenerative and psychiatric diseases. Its functions include metabolism regulation, myelin management, neuroprotection, and longevity regulation. Knowledge of endogenous regulatory mechanisms controlling brain disease-triggered ApoD expression is relevant if we want to boost pharmacologically its neuroprotecting potential. In addition to classical transcriptional control, Lipocalins have a remarkable variability in mRNA 5’UTR-dependent translation efficiency. Using bioinformatic analyses, we uncover strong selective pressures preserving ApoD 5’UTR properties, indicating unexpected functional conservation. PCR amplifications demonstrate the production of five 5’UTR variants (A-E) in mouse ApoD, with diverse expression levels across tissues and developmental stages. Importantly, Variant E is specifically expressed in the oxidative stress-challenged brain. Predictive analyses of 5’UTR secondary structures and enrichment in elements restraining translation, point to Variant E as a tight regulator of ApoD expression. We find two genomic regions conserved in human and mouse ApoD: a canonical (α) promoter region and a previously unknown region upstream of Variant E that could function as an alternative mouse promoter (β). Luciferase assays demonstrate that both α and β promoter regions can drive expression in cultured mouse astrocytes, and that Promoter β activity responds proportionally to incremental doses of the oxidative stress generator Paraquat. We postulate that Promoter β works in association with Variant E 5’UTR as a regulatory tandem that organizes ApoD gene expression in the nervous system in response to oxidative stress, the most common factor in aging and neurodegeneration.
Collapse
|
10
|
Wang F, Akutsu T, Mori T. Comparison of Pseudoknotted RNA Secondary Structures by Topological Centroid Identification and Tree Edit Distance. J Comput Biol 2020; 27:1443-1451. [PMID: 32058802 DOI: 10.1089/cmb.2019.0512] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Comparison of RNA structures is one of the most crucial analysis for elucidating their individual functions and promoting medical applications. Because it is widely accepted that their functions and structures are strongly correlated, various methods for RNA secondary structure analysis have been proposed owing to the difficulty in predicting RNA three-dimensional structure directly from its sequence. However, there are few methods dealing with RNA secondary structures with a specific and complex partial structure called pseudoknot despite its significance to biological process, which is a big obstacle for analyzing their functions. In this study, we propose a novel tree representation of pseudoknotted RNA secondary structures by topological centroid identification and their comparison methods based on the tree edit distance. In the proposed method, a given graph representing an RNA secondary structure is transformed to a tree rooted at one of the vertices constituting the topological centroid that is identified by removing cycles with peeling processing for the graph. When comparing tree-represented RNA secondary structures collected from a public database using the tree edit distance and functional gene groups defined by Gene Ontology (GO), the proposed method showed better clustering results according to their GOs than canonical RNA sequence-based comparison. In addition, we also report a case that the combination of the tree edit distance and the sequence edit distance shows a better classification of the pseudoknotted RNA secondary structures.
Collapse
Affiliation(s)
- Feiqi Wang
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Tomoya Mori
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| |
Collapse
|
11
|
Diel B, Dequivre M, Wisniewski‐Dyé F, Vial L, Hommais F. A novel plasmid‐transcribed regulatory sRNA, QfsR, controls chromosomal polycistronic gene expression in
Agrobacterium fabrum. Environ Microbiol 2019; 21:3063-3075. [DOI: 10.1111/1462-2920.14704] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 06/04/2019] [Indexed: 01/06/2023]
Affiliation(s)
- Benjamin Diel
- Université de Lyon F‐69622 Lyon France
- Université Lyon 1 F‐69622 Villeurbanne France
- CNRSUMR 5240 Microbiologie Adaptation et Pathogénie F‐69622 Villeurbanne France
- CNRSUMR 5557 Ecologie Microbienne F‐69622 Villeurbanne France
- INRAUMR1418 Ecologie Microbienne F‐69622 Villeurbanne France
| | - Magali Dequivre
- Université de Lyon F‐69622 Lyon France
- Université Lyon 1 F‐69622 Villeurbanne France
- CNRSUMR 5240 Microbiologie Adaptation et Pathogénie F‐69622 Villeurbanne France
| | - Florence Wisniewski‐Dyé
- Université de Lyon F‐69622 Lyon France
- Université Lyon 1 F‐69622 Villeurbanne France
- CNRSUMR 5557 Ecologie Microbienne F‐69622 Villeurbanne France
- INRAUMR1418 Ecologie Microbienne F‐69622 Villeurbanne France
| | - Ludovic Vial
- Université de Lyon F‐69622 Lyon France
- Université Lyon 1 F‐69622 Villeurbanne France
- CNRSUMR 5557 Ecologie Microbienne F‐69622 Villeurbanne France
- INRAUMR1418 Ecologie Microbienne F‐69622 Villeurbanne France
| | - Florence Hommais
- Université de Lyon F‐69622 Lyon France
- Université Lyon 1 F‐69622 Villeurbanne France
- CNRSUMR 5240 Microbiologie Adaptation et Pathogénie F‐69622 Villeurbanne France
| |
Collapse
|
12
|
Abstract
BACKGROUND RNA secondary structure comparison is a fundamental task for several studies, among which are RNA structure prediction and evolution. The comparison can currently be done efficiently only for pseudoknot-free structures due to their inherent tree representation. RESULTS In this work, we introduce an algebraic language to represent RNA secondary structures with arbitrary pseudoknots. Each structure is associated with a unique algebraic RNA tree that is derived from a tree grammar having concatenation, nesting and crossing as operators. From an algebraic RNA tree, an abstraction is defined in which the primary structure is neglected. The resulting structural RNA tree allows us to define a new measure of similarity calculated exploiting classical tree alignment. CONCLUSIONS The tree grammar with its operators permit to uniquely represent any RNA secondary structure as a tree. Structural RNA trees allow us to perform comparison of RNA secondary structures with arbitrary pseudoknots without taking into account the primary structure.
Collapse
Affiliation(s)
- Michela Quadrini
- School of Science and Technology, University of Camerino, Via Madonna della Carceri 9, Camerino, 62032 Italy
| | - Luca Tesei
- School of Science and Technology, University of Camerino, Via Madonna della Carceri 9, Camerino, 62032 Italy
| | - Emanuela Merelli
- School of Science and Technology, University of Camerino, Via Madonna della Carceri 9, Camerino, 62032 Italy
| |
Collapse
|
13
|
Tano A, Kadota Y, Morimune T, Jam FA, Yukiue H, Bellier JP, Sokoda T, Maruo Y, Tooyama I, Mori M. The juvenility-associated long noncoding RNA Gm14230 maintains cellular juvenescence. J Cell Sci 2019; 132:jcs.227801. [PMID: 30872457 DOI: 10.1242/jcs.227801] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Accepted: 03/05/2019] [Indexed: 12/12/2022] Open
Abstract
Juvenile animals possess distinct properties that are missing in adults. These properties include capabilities for higher growth, faster wound healing, plasticity and regeneration. However, the molecular mechanisms underlying these juvenile physiological properties are not fully understood. To obtain insight into the distinctiveness of juveniles from adults at the molecular level, we assessed long noncoding RNAs (lncRNAs) that are highly expressed selectively in juvenile cells. The noncoding elements of the transcriptome were investigated in hepatocytes and cardiomyocytes isolated from juvenile and adult mice. Here, we identified 62 juvenility-associated lncRNAs (JAlncs), which are selectively expressed in both hepatocytes and cardiomyocytes from juvenile mice. Among these common (shared) JAlncs, Gm14230 is evolutionarily conserved and is essential for cellular juvenescence. Loss of Gm14230 impairs cell growth and causes cellular senescence. Gm14230 safeguards cellular juvenescence through recruiting the histone methyltransferase Ezh2 to Tgif2, thereby repressing the functional role of Tgif2 in cellular senescence. Thus, we identify Gm14230 as a juvenility-selective lncRNA required to maintain cellular juvenescence.
Collapse
Affiliation(s)
- Ayami Tano
- Molecular Neuroscience Research Center (MNRC), Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan
| | - Yosuke Kadota
- Molecular Neuroscience Research Center (MNRC), Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan
| | - Takao Morimune
- Molecular Neuroscience Research Center (MNRC), Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan.,Department of Pediatrics, Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan
| | - Faidruz Azura Jam
- Molecular Neuroscience Research Center (MNRC), Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan
| | - Haruka Yukiue
- Molecular Neuroscience Research Center (MNRC), Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan
| | - Jean-Pierre Bellier
- Molecular Neuroscience Research Center (MNRC), Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan
| | - Tatsuyuki Sokoda
- Department of Pediatrics, Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan
| | - Yoshihiro Maruo
- Department of Pediatrics, Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan
| | - Ikuo Tooyama
- Molecular Neuroscience Research Center (MNRC), Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan
| | - Masaki Mori
- Molecular Neuroscience Research Center (MNRC), Shiga University of Medical Science, Seta Tsukinowa-cho, Otsu, Shiga 520-2192, Japan
| |
Collapse
|
14
|
Walter Costa MB, Höner zu Siederdissen C, Dunjić M, Stadler PF, Nowick K. SSS-test: a novel test for detecting positive selection on RNA secondary structure. BMC Bioinformatics 2019; 20:151. [PMID: 30898084 PMCID: PMC6429701 DOI: 10.1186/s12859-019-2711-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 03/03/2019] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) play an important role in regulating gene expression and are thus important for determining phenotypes. Most attempts to measure selection in lncRNAs have focused on the primary sequence. The majority of small RNAs and at least some parts of lncRNAs must fold into specific structures to perform their biological function. Comprehensive assessments of selection acting on RNAs therefore must also encompass structure. Selection pressures acting on the structure of non-coding genes can be detected within multiple sequence alignments. Approaches of this type, however, have so far focused on negative selection. Thus, a computational method for identifying ncRNAs under positive selection is needed. RESULTS We introduce the SSS-test (test for Selection on Secondary Structure) to identify positive selection and thus adaptive evolution. Benchmarks with biological as well as synthetic controls yield coherent signals for both negative and positive selection, demonstrating the functionality of the test. A survey of a lncRNA collection comprising 15,443 families resulted in 110 candidates that appear to be under positive selection in human. In 26 lncRNAs that have been associated with psychiatric disorders we identified local structures that have signs of positive selection in the human lineage. CONCLUSIONS It is feasible to assay positive selection acting on RNA secondary structures on a genome-wide scale. The detection of human-specific positive selection in lncRNAs associated with cognitive disorder provides a set of candidate genes for further experimental testing and may provide insights into the evolution of cognitive abilities in humans. AVAILABILITY The SSS-test and related software is available at: https://github.com/waltercostamb/SSS-test . The databases used in this work are available at: http://www.bioinf.uni-leipzig.de/Software/SSS-test/ .
Collapse
Affiliation(s)
- Maria Beatriz Walter Costa
- Embrapa Agroenergia, Parque Estação Biológica (PqEB), Asa Norte, Brasília, DF, 70770-901 Brazil
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, 04107 Germany
| | - Christian Höner zu Siederdissen
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, 04107 Germany
| | - Marko Dunjić
- Human Biology Group, Institute for Biology, Department of Biology, Chemistry, Pharmacy, Freie Universitaet Berlin, Königin-Luise-Straße 1-3, Berlin, 14195 Germany
- Center for Human Molecular Genetics, Faculty of Biology, University of Belgrade, Studentski trg 16, PO box 43, Belgrade, 11000 Serbia
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, 04107 Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig & Competence Center for Scalable Data Services and Solutions Dresden-Leipzig & Leipzig Research Center for Civilization Diseases, University Leipzig, Leipzig, 04107 Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, 04103 Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090 Austria
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
- Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Ciudad Universitaria, Bogotá, D.C., COL-111321 Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501 USA
| | - Katja Nowick
- Human Biology Group, Institute for Biology, Department of Biology, Chemistry, Pharmacy, Freie Universitaet Berlin, Königin-Luise-Straße 1-3, Berlin, 14195 Germany
- TFome Research Group, Bioinformatics Group, Interdisciplinary Center of Bioinformatics, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, 04107 Germany
- Paul-Flechsig-Institute for Brain Research, University of Leipzig, Liebigstraße 19. Haus C, Leipzig, 04103 Germany
- Bioinformatics, Faculty of Agricultural Sciences, Institute of Animal Science, University of Hohenheim, Garbenstraße 13, Stuttgart, 70593 Germany
| |
Collapse
|
15
|
Mejias A, Diez-Hermano S, Ganfornina MD, Gutierrez G, Sanchez D. Characterization of mammalian Lipocalin UTRs in silico: Predictions for their role in post-transcriptional regulation. PLoS One 2019; 14:e0213206. [PMID: 30840684 PMCID: PMC6402760 DOI: 10.1371/journal.pone.0213206] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 02/15/2019] [Indexed: 01/20/2023] Open
Abstract
The Lipocalin family is a group of homologous proteins characterized by its big array of functional capabilities. As extracellular proteins, they can bind small hydrophobic ligands through a well-conserved β-barrel folding. Lipocalins evolutionary history sprawls across many different taxa and shows great divergence even within chordates. This variability is also found in their heterogeneous tissue expression pattern. Although a handful of promoter regions have been previously described, studies on UTR regulatory roles in Lipocalin gene expression are scarce. Here we report a comprehensive bioinformatic analysis showing that complex post-transcriptional regulation exists in Lipocalin genes, as suggested by the presence of alternative UTRs with substantial sequence conservation in mammals, alongside a high diversity of transcription start sites and alternative promoters. Strong selective pressure could have operated upon Lipocalins UTRs, leading to an enrichment in particular sequence motifs that limit the choice of secondary structures. Mapping these regulatory features to the expression pattern of early and late diverging Lipocalins suggests that UTRs represent an additional phylogenetic signal, which may help to uncover how functional pleiotropy originated within the Lipocalin family.
Collapse
Affiliation(s)
- Andres Mejias
- Departamento de Genetica, Universidad de Sevilla, Sevilla, Spain
| | - Sergio Diez-Hermano
- Instituto de Biologia y Genetica Molecular-Departamento de Bioquimica y Biologia Molecular y Fisiologia, Universidad de Valladolid-CSIC, Valladolid, Spain
- Departamento de Matemática Aplicada, Universidad Complutense, Madrid, Spain
| | - Maria D. Ganfornina
- Instituto de Biologia y Genetica Molecular-Departamento de Bioquimica y Biologia Molecular y Fisiologia, Universidad de Valladolid-CSIC, Valladolid, Spain
| | | | - Diego Sanchez
- Instituto de Biologia y Genetica Molecular-Departamento de Bioquimica y Biologia Molecular y Fisiologia, Universidad de Valladolid-CSIC, Valladolid, Spain
- * E-mail:
| |
Collapse
|
16
|
Abstract
Flaviviruses include a diverse group of medically important viruses that cycle between mosquitoes and humans. During this natural process of switching hosts, each species imposes different selective forces on the viral population. Using dengue virus (DENV) as model, we found that paralogous RNA structures originating from duplications in the viral 3' untranslated region (UTR) are under different selective pressures in the two hosts. These RNA structures, known as dumbbells (DB1 and DB2), were originally proposed to be enhancers of viral replication. Analysis of viruses obtained from infected mosquitoes showed selection of mutations that mapped in DB2. Recombinant viruses carrying the identified variations confirmed that these mutations greatly increase viral replication in mosquito cells, with low or no impact in human cells. Use of viruses lacking each of the DB structures revealed opposite viral phenotypes. While deletion of DB1 reduced viral replication about 10-fold, viruses lacking DB2 displayed a great increase of fitness in mosquitoes, confirming a functional diversification of these similar RNA elements. Mechanistic analysis indicated that DB1 and DB2 differentially modulate viral genome cyclization and RNA replication. We found that a pseudoknot formed within DB2 competes with long-range RNA-RNA interactions that are necessary for minus-strand RNA synthesis. Our results support a model in which a functional diversification of duplicated RNA elements in the viral 3' UTR is driven by host-specific requirements. This study provides new ideas for understanding molecular aspects of the evolution of RNA viruses that naturally jump between different species.IMPORTANCE Flaviviruses constitute the most relevant group of arthropod-transmitted viruses, including important human pathogens such as the dengue, Zika, yellow fever, and West Nile viruses. The natural alternation of these viruses between vertebrate and invertebrate hosts shapes the viral genome population, which leads to selection of different viral variants with potential implications for epidemiological fitness and pathogenesis. However, the selective forces and mechanisms acting on the viral RNA during host adaptation are still largely unknown. Here, we found that two almost identical tandem RNA structures present at the viral 3' untranslated region are under different selective pressures in the two hosts. Mechanistic studies indicated that the two RNA elements, known as dumbbells, contain sequences that overlap essential RNA cyclization elements involved in viral RNA synthesis. The data support a model in which the duplicated RNA structures differentially evolved to accommodate distinct functions for viral replication in the two hosts.
Collapse
|
17
|
Chiu JKH, Dillon TS, Chen YPP. Large-scale frequent stem pattern mining in RNA families. J Theor Biol 2018; 455:131-139. [PMID: 30036526 DOI: 10.1016/j.jtbi.2018.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 07/09/2018] [Accepted: 07/11/2018] [Indexed: 11/19/2022]
Abstract
Functionally similar non-coding RNAs are expected to be similar in certain regions of their secondary structures. These similar regions are called common structure motifs, and are structurally conserved throughout evolution to maintain their functional roles. Common structure motif identification is one of the critical tasks in RNA secondary structure analysis. Nevertheless, current approaches suffer several limitations, and/or do not scale with both structure size and the number of input secondary structures. In this work, we present a method to transform the conserved base pair stems into transaction items and apply frequent itemset mining to identify common structure motifs existing in a majority of input structures. Our experimental results on telomerase and ribosomal RNA secondary structures report frequent stem patterns that are of biological significance. Moreover, the algorithms utilized in our method are scalable and frequent stem patterns can be identified efficiently among many large structures.
Collapse
Affiliation(s)
- Jimmy Ka Ho Chiu
- Department of Computer Science and Information, Technology, La Trobe University, Melbourne VIC 3086, Australia.
| | - Tharam S Dillon
- Department of Computer Science and Information, Technology, La Trobe University, Melbourne VIC 3086, Australia.
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information, Technology, La Trobe University, Melbourne VIC 3086, Australia.
| |
Collapse
|
18
|
Deng H, Cheema J, Zhang H, Woolfenden H, Norris M, Liu Z, Liu Q, Yang X, Yang M, Deng X, Cao X, Ding Y. Rice In Vivo RNA Structurome Reveals RNA Secondary Structure Conservation and Divergence in Plants. MOLECULAR PLANT 2018; 11:607-622. [PMID: 29409859 PMCID: PMC5886760 DOI: 10.1016/j.molp.2018.01.008] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Revised: 01/11/2018] [Accepted: 01/25/2018] [Indexed: 05/07/2023]
Abstract
RNA secondary structure plays a critical role in gene regulation. Rice (Oryza sativa) is one of the most important food crops in the world. However, RNA structure in rice has scarcely been studied. Here, we have successfully generated in vivo Structure-seq libraries in rice. We found that the structural flexibility of mRNAs might associate with the dynamics of biological function. Higher N6-methyladenosine (m6A) modification tends to have less RNA structure in 3' UTR, whereas GC content does not significantly affect in vivo mRNA structure to maintain efficient biological processes such as translation. Comparative analysis of RNA structurome between rice and Arabidopsis revealed that higher GC content does not lead to stronger structure and less RNA structural flexibility. Moreover, we found a weak correlation between sequence and structure conservation of the orthologs between rice and Arabidopsis. The conservation and divergence of both sequence and in vivo RNA structure corresponds to diverse and specific biological processes. Our results indicate that RNA secondary structure might offer a separate layer of selection to the sequence between monocot and dicot. Therefore, our study implies that RNA structure evolves differently in various biological processes to maintain robustness in development and adaptational flexibility during angiosperm evolution.
Collapse
Affiliation(s)
- Hongjing Deng
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK; College of Life Sciences, University of Chinese Academy of Sciences, 100049, Beijing, China; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Jitender Cheema
- Department of Computational and Systems Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Hang Zhang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Hugh Woolfenden
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Matthew Norris
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Zhenshan Liu
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Qi Liu
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Xiaofei Yang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Minglei Yang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Xian Deng
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaofeng Cao
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research, CAS Center for Excellence in Molecular Plant Sciences, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK.
| | - Yiliang Ding
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK; CAS-JIC Centre of Excellence for Plant and Microbial Science (CEPAMS), John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK.
| |
Collapse
|
19
|
A method to improve prediction of secondary structure for large single RNA sequences. Biochem Biophys Res Commun 2018; 496:523-528. [PMID: 29339162 DOI: 10.1016/j.bbrc.2018.01.086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 01/11/2018] [Indexed: 11/20/2022]
Abstract
The function of a particular RNA molecule within an organic system is principally determined by its structure. The current physical methods available for structure determination are time consuming and expensive. Hence, computational methods for structure prediction are often used. The prediction of the structure of a large single sequence of RNA needs a lot of research work. In the present work, a method is introduced to improve the prediction of large single sequence RNA secondary structure obtained by Mfold program using the concept of minimum free energy. The Mfold program contains a constraint option that allows forcing some helices in the predicted structure. In our method, some of the firstly formed hairpins that are expected, by a statistical study, to be present in the real structure are forced in the Mfold predicted structure. The results show improvement, toward the real structure, in the Mfold predicted structure and this gives evidence to the RNA kinetic folding.
Collapse
|
20
|
Löwes B, Chauve C, Ponty Y, Giegerich R. The BRaliBase dent-a tale of benchmark design and interpretation. Brief Bioinform 2017; 18:306-311. [PMID: 26984616 PMCID: PMC5444242 DOI: 10.1093/bib/bbw022] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Indexed: 11/25/2022] Open
Abstract
BRaliBase is a widely used benchmark for assessing the accuracy of RNA secondary structure alignment methods. In most case studies based on the BRaliBase benchmark, one can observe a puzzling drop in accuracy in the 40–60% sequence identity range, the so-called ‘BRaliBase Dent’. In this article, we show this dent is owing to a bias in the composition of the BRaliBase benchmark, namely the inclusion of a disproportionate number of transfer RNAs, which exhibit a conserved secondary structure. Our analysis, aside of its interest regarding the specific case of the BRaliBase benchmark, also raises important questions regarding the design and use of benchmarks in computational biology.
Collapse
Affiliation(s)
- Benedikt Löwes
- Division of Cardiology, University of Nebraska Medical Center, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Yann Ponty
- LIX, CNRS/Inria AMIB, Ecole Polytechnique, Palaiseau, France
| | - Robert Giegerich
- Institute for Bioinformatics, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
21
|
Chiu JKH, Chen YPP. A comprehensive study of RNA secondary structure alignment algorithms. Brief Bioinform 2017; 18:291-305. [PMID: 26984617 DOI: 10.1093/bib/bbw009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Indexed: 01/04/2023] Open
Abstract
RNA secondary structure alignment has received more attention since the discovery of the structure-function relationships in some non-protein-encoding RNAs. However, unlike the pure sequence alignment problem, which has been solved in polynomial time, secondary structure alignment incorporates the base pairings as another information dimension in addition to the base sequence. This problem therefore becomes more challenging. In this study, we classify the selected approaches, and algorithmically illustrate how these methods address the alignment problems with different structure types. Other features such as the types of base pair edit operations supported and the time complexity are also compared.
Collapse
Affiliation(s)
- Jimmy Ka Ho Chiu
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria, Australia
| |
Collapse
|
22
|
Abstract
The secondary structure of an RNA molecule represents the base-pairing interactions within the molecule and fundamentally determines its overall structure. In this chapter, we overview the main approaches and existing tools for predicting RNA secondary structures, as well as methods for identifying noncoding RNAs from genomic sequences or RNA sequencing data. We then focus on the identification of a well-known class of small noncoding RNAs, namely microRNAs, which play very important roles in many biological processes through regulating post-transcriptionally the expression of genes and which dysregulation has been shown to be involved in several human diseases.
Collapse
Affiliation(s)
- Fariza Tahi
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France.
- IPS2, University of Paris-Saclay, 91190, Gif-sur-Yvette, France.
| | - Van Du T Tran
- Vital-IT group, SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Anouar Boucheham
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France
- College of NTIC, Constantine University 2, Constantine, Algeria
| |
Collapse
|
23
|
RNA Structure Duplications and Flavivirus Host Adaptation. Trends Microbiol 2016; 24:270-283. [PMID: 26850219 DOI: 10.1016/j.tim.2016.01.002] [Citation(s) in RCA: 121] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Revised: 01/04/2016] [Accepted: 01/08/2016] [Indexed: 01/11/2023]
Abstract
Flaviviruses include a highly diverse group of arboviruses with a global distribution and a high human disease burden. Most flaviviruses cycle between insects and vertebrate hosts; thus, they are obligated to use different cellular machinery for their replication and mount different mechanisms to evade specific antiviral responses. In addition to coding for viral proteins, the viral genome contains signals in RNA structures that govern the amplification of viral components and participate in triggering or evading antiviral responses. In this review, we focused on new information about host-specific functions of RNA structures present in the 3' untranslated region (3' UTR) of flavivirus genomes. Models and conservation patterns of RNA elements of distinct flavivirus ecological groups are revised. An intriguing feature of the 3' UTR of insect-borne flavivirus genomes is the conservation of complex RNA structure duplications. Here, we discuss new hypotheses of how these RNA elements specialize for replication in vertebrate and invertebrate hosts, and present new ideas associating the significance of RNA structure duplication, small subgenomic flavivirus RNA formation, and host adaptation.
Collapse
|
24
|
Hua L, Song Y, Kim N, Laing C, Wang JTL, Schlick T. CHSalign: A Web Server That Builds upon Junction-Explorer and RNAJAG for Pairwise Alignment of RNA Secondary Structures with Coaxial Helical Stacking. PLoS One 2016; 11:e0147097. [PMID: 26789998 PMCID: PMC4720362 DOI: 10.1371/journal.pone.0147097] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 12/29/2015] [Indexed: 01/01/2023] Open
Abstract
RNA junctions are important structural elements of RNA molecules. They are formed when three or more helices come together in three-dimensional space. Recent studies have focused on the annotation and prediction of coaxial helical stacking (CHS) motifs within junctions. Here we exploit such predictions to develop an efficient alignment tool to handle RNA secondary structures with CHS motifs. Specifically, we build upon our Junction-Explorer software for predicting coaxial stacking and RNAJAG for modelling junction topologies as tree graphs to incorporate constrained tree matching and dynamic programming algorithms into a new method, called CHSalign, for aligning the secondary structures of RNA molecules containing CHS motifs. Thus, CHSalign is intended to be an efficient alignment tool for RNAs containing similar junctions. Experimental results based on thousands of alignments demonstrate that CHSalign can align two RNA secondary structures containing CHS motifs more accurately than other RNA secondary structure alignment tools. CHSalign yields a high score when aligning two RNA secondary structures with similar CHS motifs or helical arrangement patterns, and a low score otherwise. This new method has been implemented in a web server, and the program is also made freely available, at http://bioinformatics.njit.edu/CHSalign/.
Collapse
Affiliation(s)
- Lei Hua
- Bioinformatics Laboratory, Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, United States of America
| | - Yang Song
- Bioinformatics Laboratory, Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, United States of America
| | - Namhee Kim
- Department of Chemistry, New York University, New York, New York, United States of America
| | - Christian Laing
- Bioinformatics Laboratory, Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, United States of America
| | - Jason T. L. Wang
- Bioinformatics Laboratory, Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey, United States of America
- * E-mail: (JW); (TS)
| | - Tamar Schlick
- Department of Chemistry, New York University, New York, New York, United States of America
- Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
- * E-mail: (JW); (TS)
| |
Collapse
|
25
|
Abstract
Genomic studies have greatly expanded our knowledge of structural non-coding RNAs (ncRNAs). These RNAs fold into characteristic secondary structures and perform specific-structure dependent biological functions. Hence RNA secondary structure prediction is one of the most well studied problems in computational RNA biology. Comparative sequence analysis is one of the more reliable RNA structure prediction approaches as it exploits information of multiple related sequences to infer the consensus secondary structure. This class of methods essentially learns a global secondary structure from the input sequences. In this paper, we consider the more general problem of unearthing common local secondary structure based patterns from a set of related sequences. The input sequences for example could correspond to 3(') or 5(') untranslated regions of a set of orthologous genes and the unearthed local patterns could correspond to regulatory motifs found in these regions. These sequences could also correspond to in vitro selected RNA, genomic segments housing ncRNA genes from the same family and so on. Here, we give a detailed review of the various computational techniques proposed in literature attempting to solve this general motif discovery problem. We also give empirical comparisons of some of the current state of the art methods and point out future directions of research.
Collapse
Affiliation(s)
- Avinash Achar
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Pål Sætrom
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway.
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway.
| |
Collapse
|
26
|
Chiu JKH, Chen YPP. Pairwise RNA secondary structure alignment with conserved stem pattern. Bioinformatics 2015; 31:3914-21. [PMID: 26275897 DOI: 10.1093/bioinformatics/btv471] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 08/07/2015] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION The regulatory functions performed by non-coding RNAs are related to their 3D structures, which are, in turn, determined by their secondary structures. Pairwise secondary structure alignment gives insight into the functional similarity between a pair of RNA sequences. Numerous exact or heuristic approaches have been proposed for computational alignment. However, the alignment becomes intractable when arbitrary pseudoknots are allowed. Also, since non-coding RNAs are, in general, more conserved in structures than sequences, it is more effective to perform alignment based on the common structural motifs discovered. RESULTS We devised a method to approximate the true conserved stem pattern for a secondary structure pair, and constructed the alignment from it. Experimental results suggest that our method identified similar RNA secondary structures better than the existing tools, especially for large structures. It also successfully indicated the conservation of some pseudoknot features with biological significance. More importantly, even for large structures with arbitrary pseudoknots, the alignment can usually be obtained efficiently. AVAILABILITY AND IMPLEMENTATION Our algorithm has been implemented in a tool called PSMAlign. The source code of PSMAlign is freely available at http://homepage.cs.latrobe.edu.au/ypchen/psmalign/.
Collapse
Affiliation(s)
- Jimmy Ka Ho Chiu
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria 3086, Australia
| |
Collapse
|
27
|
Janssen S, Giegerich R. Ambivalent covariance models. BMC Bioinformatics 2015; 16:178. [PMID: 26017195 PMCID: PMC4504443 DOI: 10.1186/s12859-015-0569-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 04/10/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Evolutionary variations let us define a set of similar nucleic acid sequences as a family if these different molecules execute a common function. Capturing their sequence variation by using e. g. position specific scoring matrices significantly improves sensitivity of detection tools. Members of a functional (non-coding) RNA family are affected by these variations not only on the sequence, but also on the structural level. For example, some transfer-RNAs exhibit a fifth helix in addition to the typical cloverleaf structure. Current covariance models - the unrivaled homology search approach for structured RNA - do not benefit from structural variation within a family, but rather penalize it. This leads to artificial subdivision of families and loss of information in the RFAM database. RESULTS We propose an extension to the fundamental architecture of covariance models to allow for several, compatible consensus structures. The resulting models are called ambivalent covariance models. Evaluation on several RFAM families shows that coalescence of structural variation within a family by using ambivalent consensus models is superior to subdividing the family into multiple classical covariance models. CONCLUSION A prototype and source code is available at http://bibiserv.cebitec.uni-bielefeld.de/acms.
Collapse
Affiliation(s)
- Stefan Janssen
- Practical Computer Science, Faculty of Technology, Bielefeld University, Universitätsstraße 25, Bielefeld, 33615, Germany.
| | - Robert Giegerich
- Practical Computer Science, Faculty of Technology, Bielefeld University, Universitätsstraße 25, Bielefeld, 33615, Germany.
| |
Collapse
|
28
|
Zhao Y, Hayashida M, Cao Y, Hwang J, Akutsu T. Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs. BMC Bioinformatics 2015; 16:128. [PMID: 25907438 PMCID: PMC4419412 DOI: 10.1186/s12859-015-0558-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2014] [Accepted: 03/30/2015] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Many tree structures are found in nature and organisms. Such trees are believed to be constructed on the basis of certain rules. We have previously developed grammar-based compression methods for ordered and unordered single trees, based on bisection-type tree grammars. Here, these methods find construction rules for one single tree. On the other hand, specified construction rules can be utilized to generate multiple similar trees. RESULTS Therefore, in this paper, we develop novel methods to discover common rules for the construction of multiple distinct trees, by improving and extending the previous methods using integer programming. We apply our proposed methods to several sets of glycans and RNA secondary structures, which play important roles in cellular systems, and can be regarded as tree structures. The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs. CONCLUSIONS We propose integer programming-based methods MinSEOTGMul and MinSEUTGMul for the determination of the minimum grammars constructing multiple ordered and unordered trees, respectively. The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.
Collapse
Affiliation(s)
- Yang Zhao
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
| | - Morihiro Hayashida
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
| | - Yue Cao
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
| | - Jaewook Hwang
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
| |
Collapse
|
29
|
Jalali S, Kapoor S, Sivadas A, Bhartiya D, Scaria V. Computational approaches towards understanding human long non-coding RNA biology. Bioinformatics 2015; 31:2241-51. [DOI: 10.1093/bioinformatics/btv148] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 03/10/2015] [Indexed: 12/18/2022] Open
|
30
|
Song Y, Hua L, Shapiro BA, Wang JTL. Effective alignment of RNA pseudoknot structures using partition function posterior log-odds scores. BMC Bioinformatics 2015; 16:39. [PMID: 25727492 PMCID: PMC4339682 DOI: 10.1186/s12859-015-0464-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 01/13/2015] [Indexed: 11/18/2022] Open
Abstract
Background RNA pseudoknots play important roles in many biological processes. Previous methods for comparative pseudoknot analysis mainly focus on simultaneous folding and alignment of RNA sequences. Little work has been done to align two known RNA secondary structures with pseudoknots taking into account both sequence and structure information of the two RNAs. Results In this article we present a novel method for aligning two known RNA secondary structures with pseudoknots. We adopt the partition function methodology to calculate the posterior log-odds scores of the alignments between bases or base pairs of the two RNAs with a dynamic programming algorithm. The posterior log-odds scores are then used to calculate the expected accuracy of an alignment between the RNAs. The goal is to find an optimal alignment with the maximum expected accuracy. We present a heuristic to achieve this goal. The performance of our method is investigated and compared with existing tools for RNA structure alignment. An extension of the method to multiple alignment of pseudoknot structures is also discussed. Conclusions The method described here has been implemented in a tool named RKalign, which is freely accessible on the Internet. As more and more pseudoknots are revealed, collected and stored in public databases, we anticipate a tool like RKalign will play a significant role in data comparison, annotation, analysis, and retrieval in these databases. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0464-9) contains supplementary material, which is available to authorized users.
Collapse
|
31
|
Belter A, Gudanis D, Rolle K, Piwecka M, Gdaniec Z, Naskręt-Barciszewska MZ, Barciszewski J. Mature miRNAs form secondary structure, which suggests their function beyond RISC. PLoS One 2014; 9:e113848. [PMID: 25423301 PMCID: PMC4244182 DOI: 10.1371/journal.pone.0113848] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2014] [Accepted: 10/30/2014] [Indexed: 12/11/2022] Open
Abstract
The generally accepted model of the miRNA-guided RNA down-regulation suggests that mature miRNA targets mRNA in a nucleotide sequence-specific manner. However, we have shown that the nucleotide sequence of miRNA is not the only determinant of miRNA specificity. Using specific nucleases, T1, V1 and S1 as well as NMR, UV/Vis and CD spectroscopies, we found that miR-21, miR-93 and miR-296 can adopt hairpin and/or homoduplex structures. The secondary structure of those miRNAs in solution is a function of RNA concentration and ionic conditions. Additionally, we have shown that a formation of miRNA hairpin is facilitated by cellular environment.Looking for functional consequences of this observation, we have perceived that structure of these miRNAs resemble RNA aptamers, short oligonucleotides forming a stable 3D structures with a high affinity and specificity for their targets. We compared structures of anti-tenascin C (anti-Tn-C) aptamers, which inhibit brain tumor glioblastoma multiforme (GBM, WHO IV) and selected miRNA. A strong overexpression of miR-21, miR-93 as well Tn-C in GBM may imply some connections between them. The structural similarity of these miRNA hairpins and anti-Tn-C aptamers indicates that miRNAs may function also beyond RISC and are even more sophisticated regulators, that it was previously expected. We think that the knowledge of the miRNA structure may give a new insight into miRNA-dependent gene regulation mechanism and be a step forward in the understanding their function and involvement in cancerogenesis. This may improve design process of anti-miRNA therapeutics.
Collapse
Affiliation(s)
- Agnieszka Belter
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, ul. Noskowskiego 12/14, 61-704, Poznan, Poland
| | - Dorota Gudanis
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, ul. Noskowskiego 12/14, 61-704, Poznan, Poland
| | - Katarzyna Rolle
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, ul. Noskowskiego 12/14, 61-704, Poznan, Poland
| | - Monika Piwecka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, ul. Noskowskiego 12/14, 61-704, Poznan, Poland
| | - Zofia Gdaniec
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, ul. Noskowskiego 12/14, 61-704, Poznan, Poland
| | | | - Jan Barciszewski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, ul. Noskowskiego 12/14, 61-704, Poznan, Poland
| |
Collapse
|
32
|
Middleton SA, Kim J. NoFold: RNA structure clustering without folding or alignment. RNA (NEW YORK, N.Y.) 2014; 20:1671-1683. [PMID: 25234928 PMCID: PMC4201820 DOI: 10.1261/rna.041913.113] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 07/28/2014] [Indexed: 06/03/2023]
Abstract
Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function-for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures.
Collapse
Affiliation(s)
- Sarah A Middleton
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Junhyong Kim
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
33
|
Faoro C, Ataide SF. Ribonomic approaches to study the RNA-binding proteome. FEBS Lett 2014; 588:3649-64. [PMID: 25150170 DOI: 10.1016/j.febslet.2014.07.039] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 07/04/2014] [Accepted: 07/04/2014] [Indexed: 01/23/2023]
Abstract
Gene expression is controlled through a complex interplay among mRNAs, non-coding RNAs and RNA-binding proteins (RBPs), which all assemble along with other RNA-associated factors in dynamic and functional ribonucleoprotein complexes (RNPs). To date, our understanding of RBPs is largely limited to proteins with known or predicted RNA-binding domains. However, various methods have been recently developed to capture an RNA of interest and comprehensively identify its associated RBPs. In this review, we discuss the RNA-affinity purification methods followed by mass spectrometry analysis (AP-MS); RBP screening within protein libraries and computational methods that can be used to study the RNA-binding proteome (RBPome).
Collapse
Affiliation(s)
- Camilla Faoro
- School of Molecular Biosciences, University of Sydney, NSW, Australia
| | - Sandro F Ataide
- School of Molecular Biosciences, University of Sydney, NSW, Australia.
| |
Collapse
|
34
|
Wolf M, Koetschan C, Müller T. ITS2, 18S, 16S or any other RNA - simply aligning sequences and their individual secondary structures simultaneously by an automatic approach. Gene 2014; 546:145-9. [PMID: 24881812 DOI: 10.1016/j.gene.2014.05.065] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 05/28/2014] [Accepted: 05/29/2014] [Indexed: 11/29/2022]
Abstract
Secondary structures of RNA sequences are increasingly being used as additional information in reconstructing phylogenies and/or in distinguishing species by compensatory base change (CBC) analyses. However, in most cases just one secondary structure is used in manually correcting an automatically generated multiple sequence alignment and/or just one secondary structure is used in guiding a sequence alignment still completely generated by hand. With the advent of databases and tools offering individual RNA secondary structures, here we re-introduce a twelve letter code already implemented in 4SALE - a tool for synchronous sequence and secondary structure alignment and editing - that enables one to align RNA sequences and their individual secondary structures synchronously and fully automatic, while dramatically increasing the phylogenetic information content. We further introduce a scaled down non-GUI version of 4SALE particularly designed for big data analysis, and available at: http://4sale.bioapps.biozentrum.uni-wuerzburg.de.
Collapse
Affiliation(s)
- Matthias Wolf
- Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany.
| | - Christian Koetschan
- Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| | - Tobias Müller
- Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| |
Collapse
|
35
|
Energy-based RNA consensus secondary structure prediction in multiple sequence alignments. Methods Mol Biol 2014; 1097:125-41. [PMID: 24639158 DOI: 10.1007/978-1-62703-709-9_7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Many biologically important RNA structures are conserved in evolution leading to characteristic mutational patterns. RNAalifold is a widely used program to predict consensus secondary structures in multiple alignments by combining evolutionary information with traditional energy-based RNA folding algorithms. Here we describe the theory and applications of the RNAalifold algorithm. Consensus secondary structure prediction not only leads to significantly more accurate structure models, but it also allows to study structural conservation of functional RNAs.
Collapse
|
36
|
Abstract
Many methods have been proposed for RNA secondary structure comparison, and new ones are still being developed. In this chapter, we first consider structure representations and discuss their suitability for structure comparison. Then, we take a look at the more commonly used methods, restricting ourselves to structures without pseudo-knots. For comparing structures of the same sequence, we study base pair distances. For structures of different sequences (and of different length), we study variants of the tree edit model. We name some of the available tools and give pointers to the literature. We end with a short review on comparing structures with pseudo-knots as an unsolved problem and topic of active research.
Collapse
|
37
|
Mohammed J, Flynt AS, Siepel A, Lai EC. The impact of age, biogenesis, and genomic clustering on Drosophila microRNA evolution. RNA (NEW YORK, N.Y.) 2013; 19:1295-308. [PMID: 23882112 PMCID: PMC3753935 DOI: 10.1261/rna.039248.113] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 06/07/2013] [Indexed: 05/16/2023]
Abstract
The molecular evolutionary signatures of miRNAs inform our understanding of their emergence, biogenesis, and function. The known signatures of miRNA evolution have derived mostly from the analysis of deeply conserved, canonical loci. In this study, we examine the impact of age, biogenesis pathway, and genomic arrangement on the evolutionary properties of Drosophila miRNAs. Crucial to the accuracy of our results was our curation of high-quality miRNA alignments, which included nearly 150 corrections to ortholog calls and nucleotide sequences of the global 12-way Drosophilid alignments currently available. Using these data, we studied primary sequence conservation, normalized free-energy values, and types of structure-preserving substitutions. We expand upon common miRNA evolutionary patterns that reflect fundamental features of miRNAs that are under functional selection. We observe that melanogaster-subgroup-specific miRNAs, although recently emerged and rapidly evolving, nonetheless exhibit evolutionary signatures that are similar to well-conserved miRNAs and distinct from other structured noncoding RNAs and bulk conserved non-miRNA hairpins. This provides evidence that even young miRNAs may be selected for regulatory activities. More strikingly, we observe that mirtrons and clustered miRNAs both exhibit distinct evolutionary properties relative to solo, well-conserved miRNAs, even after controlling for sequence depth. These studies highlight the previously unappreciated impact of biogenesis strategy and genomic location on the evolutionary dynamics of miRNAs, and affirm that miRNAs do not evolve as a unitary class.
Collapse
Affiliation(s)
- Jaaved Mohammed
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York 10065, USA
| | - Alex S. Flynt
- Sloan-Kettering Institute, Department of Developmental Biology, New York, New York 10065, USA
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
| | - Eric C. Lai
- Sloan-Kettering Institute, Department of Developmental Biology, New York, New York 10065, USA
| |
Collapse
|
38
|
Meyer F, Kurtz S, Beckstette M. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns. BMC Bioinformatics 2013; 14:226. [PMID: 23865810 PMCID: PMC3765529 DOI: 10.1186/1471-2105-14-226] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 07/11/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. RESULTS We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. CONCLUSIONS The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator.
Collapse
Affiliation(s)
- Fernando Meyer
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, Hamburg 20146, Germany.
| | | | | |
Collapse
|
39
|
Milo N, Zakov S, Katzenelson E, Bachmat E, Dinitz Y, Ziv-Ukelson M. Unrooted unordered homeomorphic subtree alignment of RNA trees. Algorithms Mol Biol 2013; 8:13. [PMID: 23590940 PMCID: PMC3765143 DOI: 10.1186/1748-7188-8-13] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 02/05/2013] [Indexed: 11/17/2022] Open
Abstract
We generalize some current approaches for RNA tree alignment, which are traditionally confined to ordered rooted mappings, to also consider unordered unrooted mappings. We define the Homeomorphic Subtree Alignment problem (HSA), and present a new algorithm which applies to several modes, combining global or local, ordered or unordered, and rooted or unrooted tree alignments. Our algorithm generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that for input trees T and S, our algorithm has an O(nTnS + min(dT,dS)LTLS) time complexity, where nT,LT and dT are the number of nodes, the number of leaves, and the maximum node degree in T, respectively (satisfying dT ≤ LT ≤ nT), and similarly for nS,LS and dS with respect to the tree S. This improves the time complexity of previous algorithms for less general variants of the problem. In order to obtain this time bound for HSA, we developed new algorithms for a generalized variant of the Min-Cost Bipartite Matching problem (MCM), as well as to two derivatives of this problem, entitled All-Cavity-MCM and All-Pairs-Cavity-MCM. For two input sets of size n and m, where n ≤ m, MCM and both its cavity derivatives are solved in O(n3 + nm) time, without the usage of priority queues (e.g. Fibonacci heaps) or other complex data structures. This gives the first cubic time algorithm for All-Pairs-Cavity-MCM, and improves the running times of MCM and All-Cavity-MCM problems in the unbalanced case where n ≪ m. We implemented the algorithm (in all modes mentioned above) as a graphical software tool which computes and displays similarities between secondary structures of RNA given as input, and employed it to a preliminary experiment in which we ran all-against-all inter-family pairwise alignments of RNAse P and Hammerhead RNA family members, exposing new similarities which could not be detected by the traditional rooted ordered alignment approaches. The results demonstrate that our approach can be used to expose structural similarity between some RNAs with higher sensitivity than the traditional rooted ordered alignment approaches. Source code and web-interface for our tool can be found in http://www.cs.bgu.ac.il/\~negevcb/FRUUT.
Collapse
|
40
|
Abstract
MOTIVATION To recognize remote relationships between RNA molecules, one must be able to align structures without regard to sequence similarity. We have implemented a method, which is swift [O(n(2))], sensitive and tolerant of large gaps and insertions. Molecules are broken into overlapping fragments, which are characterized by their memberships in a probabilistic classification based on local geometry and H-bonding descriptors. This leads to a probabilistic similarity measure that is used in a conventional dynamic programming method. RESULTS Examples are given of database searching, the detection of structural similarities, which would not be found using sequence based methods, and comparisons with a previously published approach. AVAILABILITY AND IMPLEMENTATION Source code (C and perl) and binaries for linux are freely available at www.zbh.uni-hamburg.de/fries.
Collapse
Affiliation(s)
- Tim Wiegels
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, D-20146 Hamburg, Germany.
| | | | | |
Collapse
|
41
|
Johnson E, Srivastava R. Volatility in mRNA secondary structure as a design principle for antisense. Nucleic Acids Res 2013; 41:e43. [PMID: 23161691 PMCID: PMC3562002 DOI: 10.1093/nar/gks902] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2012] [Revised: 09/05/2012] [Accepted: 09/07/2012] [Indexed: 11/28/2022] Open
Abstract
Designing effective antisense sequences is a formidable problem. A method for predicting efficacious antisense holds the potential to provide fundamental insight into this biophysical process. More practically, such an understanding increases the chance of successful antisense design as well as saving considerable time, money and labor. The secondary structure of an mRNA molecule is believed to be in a constant state of flux, sampling several different suboptimal states. We hypothesized that particularly volatile regions might provide better accessibility for antisense targeting. A computational framework, GenAVERT was developed to evaluate this hypothesis. GenAVERT used UNAFold and RNAforester to generate and compare the predicted suboptimal structures of mRNA sequences. Subsequent analysis revealed regions that were particularly volatile in terms of intramolecular hydrogen bonding, and thus potentially superior antisense targets due to their high accessibility. Several mRNA sequences with known natural antisense target sites as well as artificial antisense target sites were evaluated. Upon comparison, antisense sequences predicted based upon the volatility hypothesis closely matched those of the naturally occurring antisense, as well as those artificial target sites that provided efficient down-regulation. These results suggest that this strategy may provide a powerful new approach to antisense design.
Collapse
Affiliation(s)
- Erik Johnson
- Department of Chemical, Materials and Biomolecular Engineering, University of
Connecticut, Storrs, CT 06269 and Program in Head and Neck Cancer and Oral
Oncology, Neag Comprehensive Cancer Center, University of Connecticut Health Center,
Farmington, CT 06030, USA
| | - Ranjan Srivastava
- Department of Chemical, Materials and Biomolecular Engineering, University of
Connecticut, Storrs, CT 06269 and Program in Head and Neck Cancer and Oral
Oncology, Neag Comprehensive Cancer Center, University of Connecticut Health Center,
Farmington, CT 06030, USA
| |
Collapse
|
42
|
Movila A, Morozov A, Sitnicova N. Genetic Polymorphism of 12S rRNA Gene amongDermacentor reticulatusFabricius Ticks in the Chernobyl Nuclear Power Plant Exclusion Zone. J Parasitol 2013; 99:40-3. [DOI: 10.1645/ge-3225.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
43
|
Heyne S, Costa F, Rose D, Backofen R. GraphClust: alignment-free structural clustering of local RNA secondary structures. ACTA ACUST UNITED AC 2013; 28:i224-32. [PMID: 22689765 PMCID: PMC3371856 DOI: 10.1093/bioinformatics/bts224] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Motivation: Clustering according to sequence–structure similarity has now become a generally accepted scheme for ncRNA annotation. Its application to complete genomic sequences as well as whole transcriptomes is therefore desirable but hindered by extremely high computational costs. Results: We present a novel linear-time, alignment-free method for comparing and clustering RNAs according to sequence and structure. The approach scales to datasets of hundreds of thousands of sequences. The quality of the retrieved clusters has been benchmarked against known ncRNA datasets and is comparable to state-of-the-art sequence–structure methods although achieving speedups of several orders of magnitude. A selection of applications aiming at the detection of novel structural ncRNAs are presented. Exemplarily, we predicted local structural elements specific to lincRNAs likely functionally associating involved transcripts to vital processes of the human nervous system. In total, we predicted 349 local structural RNA elements. Availability: The GraphClust pipeline is available on request. Contact:backofen@informatik.uni-freiburg.de Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Steffen Heyne
- Bioinformatics Group, Department of Computer Science, University of Freiburg,Georges-Köhler-Allee 106, D-79110 Freiburg, Germany
| | | | | | | |
Collapse
|
44
|
El-Kalioby M, Abouelhoda M, Krüger J, Giegerich R, Sczyrba A, Wall DP, Tonellato P. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package. BMC Bioinformatics 2012; 13 Suppl 17:S22. [PMID: 23281941 PMCID: PMC3521398 DOI: 10.1186/1471-2105-13-s17-s22] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.
Collapse
|
45
|
Jensen SMR, Schmitz A, Pedersen FS, Kjems J, Bramsen JB. Functional selection of shRNA loops from randomized retroviral libraries. PLoS One 2012; 7:e43095. [PMID: 22912797 PMCID: PMC3422301 DOI: 10.1371/journal.pone.0043095] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2011] [Accepted: 07/18/2012] [Indexed: 12/12/2022] Open
Abstract
Gene silencing by RNA interference (RNAi) can be achieved by the ectopic expression of tailored short hairpin RNAs (shRNAs) which after export to the cytoplasm are processed by Dicer and incorporated into the RNA induced silencing complex (RISC). Design rules for shRNAs have been the focus of several studies, but only a few reports have turned the attention to the sequence of the loop-region. In this work we selected high-functional and low-functional shRNA loops from retroviral hairpin-loop-libraries in an RNAi reporter assay. The procedure revealed a very significant and stem sequence-dependent effect of the loop on shRNA function and although neither strong consensus loop sequence nor structural motifs could be identified, a preferred loop sequence (5'-UGUGCUU-3') was found to support robust knock down with little stem sequence dependency. These findings will serve as a guide for designing shRNAs with improved knock down capacity.
Collapse
Affiliation(s)
| | - Alexander Schmitz
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Finn Skou Pedersen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Jørgen Kjems
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Aarhus, Denmark
- * E-mail:
| | - Jesper Bertram Bramsen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Aarhus, Denmark
| |
Collapse
|
46
|
Hua L, Wang JTL, Ji X, Malhotra A, Khaladkar M, Shapiro BA, Zhang K. A method for discovering common patterns from two RNA secondary structures and its application to structural repeat detection. J Bioinform Comput Biol 2012; 10:1250001. [PMID: 22809414 DOI: 10.1142/s0219720012500011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We propose an ab initio method, named DiscoverR, for finding common patterns from two RNA secondary structures. The method works by representing RNA secondary structures as ordered labeled trees and performs tree pattern discovery using an efficient dynamic programming algorithm. DiscoverR is able to identify and extract the largest common substructures from two RNA molecules having different sizes without prior knowledge of the locations and topologies of these substructures. We also extend DiscoverR to find repeated regions in an RNA secondary structure, and apply this extended method to detect structural repeats in the 3'-untranslated region of a protein kinase gene. We describe the biological significance of a repeated hairpin found by our method, demonstrating the usefulness of the method. DiscoverR is implemented in Java; a jar file including the source code of the program is available for download at http://bioinformatics.njit.edu/DiscoverR.
Collapse
Affiliation(s)
- Lei Hua
- Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey 07102, USA.
| | | | | | | | | | | | | |
Collapse
|
47
|
DeBlasio D, Bruand J, Zhang S. A memory efficient method for structure-based RNA multiple alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1-11. [PMID: 21576754 DOI: 10.1109/tcbb.2011.86] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Structure-based RNA multiple alignment is particularly challenging because covarying mutations make sequence information alone insufficient. Existing tools for RNA multiple alignment first generate pairwise RNA structure alignments and then build the multiple alignment using only sequence information. Here we present PMFastR, an algorithm which iteratively uses a sequence-structure alignment procedure to build a structure-based RNA multiple alignment from one sequence with known structure and a database of sequences from the same family. PMFastR also has low memory consumption allowing for the alignment of large sequences such as 16S and 23S rRNA. The algorithm also provides a method to utilize a multicore environment. We present results on benchmark data sets from BRAliBase, which shows PMFastR performs comparably to other state-of-the-art programs. Finally, we regenerate 607 Rfam seed alignments and show that our automated process creates multiple alignments similar to the manually curated Rfam seed alignments. Thus, the techniques presented in this paper allow for the generation of multiple alignments using sequence-structure guidance, while limiting memory consumption. As a result, multiple alignments of long RNA sequences, such as 16S and 23S rRNAs, can easily be generated locally on a personal computer. The software and supplementary data are available at http://genome.ucf.edu/PMFastR.
Collapse
|
48
|
Abstract
RNA is now appreciated to serve numerous cellular roles, and understanding RNA structure is important for understanding a mechanism of action. This contribution discusses the methods available for predicting RNA structure. Secondary structure is the set of the canonical base pairs, and secondary structure can be accurately determined by comparative sequence analysis. Secondary structure can also be predicted. The most commonly used method is free energy minimization. The accuracy of structure prediction is improved either by using experimental mapping data or by predicting a structure conserved in a set of homologous sequences. Additionally, tertiary structure, the three-dimensional arrangement of atoms, can be modeled with guidance from comparative analysis and experimental techniques. New approaches are also available for predicting tertiary structure.
Collapse
Affiliation(s)
- Matthew G Seetin
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY, USA
| | | |
Collapse
|
49
|
Achawanantakun R, Sun Y, Takyar SS. ncRNA consensus secondary structure derivation using grammar strings. J Bioinform Comput Biol 2011; 9:317-37. [PMID: 21523935 DOI: 10.1142/s0219720011005501] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 02/28/2011] [Accepted: 03/01/2011] [Indexed: 11/18/2022]
Abstract
Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce a consensus structure derivation approach based on grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG) and a full RNA grammar including pseudoknots. Being a string defined on a special alphabet constructed from a grammar, grammar string converts ncRNA alignment into sequence alignment. We derive consensus secondary structures from hundreds of ncRNA families from BraliBase 2.1 and 25 families containing pseudoknots using grammar string alignment. Our experiments have shown that grammar string-based structure derivation competes favorably in consensus structure quality with Murlet and RNASampler. Source code and experimental data are available at http://www.cse.msu.edu/~yannisun/grammar-string.
Collapse
Affiliation(s)
- Rujira Achawanantakun
- Computer Science and Engineering Department, Michigan State University, East Lansing, Michigan 48824, USA
| | | | | |
Collapse
|
50
|
Meyer F, Kurtz S, Backofen R, Will S, Beckstette M. Structator: fast index-based search for RNA sequence-structure patterns. BMC Bioinformatics 2011; 12:214. [PMID: 21619640 PMCID: PMC3154205 DOI: 10.1186/1471-2105-12-214] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Accepted: 05/27/2011] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. RESULTS We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. CONCLUSIONS The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator.
Collapse
Affiliation(s)
- Fernando Meyer
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | | | | | | | | |
Collapse
|