1
|
Du Z, Peng Z, Yang J. RNA threading with secondary structure and sequence profile. Bioinformatics 2024; 40:btae080. [PMID: 38341662 PMCID: PMC10893584 DOI: 10.1093/bioinformatics/btae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 01/05/2024] [Accepted: 02/09/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION RNA threading aims to identify remote homologies for template-based modeling of RNA 3D structure. Existing RNA alignment methods primarily rely on secondary structure alignment. They are often time- and memory-consuming, limiting large-scale applications. In addition, the accuracy is far from satisfactory. RESULTS Using RNA secondary structure and sequence profile, we developed a novel RNA threading algorithm, named RNAthreader. To enhance the alignment process and minimize memory usage, a novel approach has been introduced to simplify RNA secondary structures into compact diagrams. RNAthreader employs a two-step methodology. Initially, integer programming and dynamic programming are combined to create an initial alignment for the simplified diagram. Subsequently, the final alignment is obtained using dynamic programming, taking into account the initial alignment derived from the previous step. The benchmark test on 80 RNAs illustrates that RNAthreader generates more accurate alignments than other methods, especially for RNAs with pseudoknots. Another benchmark, involving 30 RNAs from the RNA-Puzzles experiments, exhibits that the models constructed using RNAthreader templates have a lower average RMSD than those created by alternative methods. Remarkably, RNAthreader takes less than two hours to complete alignments with ∼5000 RNAs, which is 3-40 times faster than other methods. These compelling results suggest that RNAthreader is a promising algorithm for RNA template detection. AVAILABILITY AND IMPLEMENTATION https://yanglab.qd.sdu.edu.cn/RNAthreader.
Collapse
Affiliation(s)
- Zongyang Du
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Zhenling Peng
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Jianyi Yang
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
2
|
Steger G. Predicting the Structure of a Viroid : Structure, Structure Distribution, Consensus Structure, and Structure Drawing. Methods Mol Biol 2022; 2316:331-371. [PMID: 34845705 DOI: 10.1007/978-1-0716-1464-8_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Viroids are small non-coding RNAs that require a special sequence and structure to be replicated and transported by the host machinery. Many of these features can be predicted and later experimentally verified. Here, we will present workflows to predict viroid structures and draw the predicted structures in a pleasing and descriptive way using recently developed software.
Collapse
Affiliation(s)
- Gerhard Steger
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
3
|
Bayegan AH, Clote P. RNAmountAlign: Efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment. PLoS One 2020; 15:e0227177. [PMID: 31978147 PMCID: PMC6980424 DOI: 10.1371/journal.pone.0227177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 12/13/2019] [Indexed: 11/19/2022] Open
Abstract
Alignment of structural RNAs is an important problem with a wide range of applications. Since function is often determined by molecular structure, RNA alignment programs should take into account both sequence and base-pairing information for structural homology identification. This paper describes C++ software, RNAmountAlign, for RNA sequence/structure alignment that runs in O(n3) time and O(n2) space for two sequences of length n; moreover, our software returns a p-value (transformable to expect value E) based on Karlin-Altschul statistics for local alignment, as well as parameter fitting for local and global alignment. Using incremental mountain height, a representation of structural information computable in cubic time, RNAmountAlign implements quadratic time pairwise local, global and global/semiglobal (query search) alignment using a weighted combination of sequence and structural similarity. RNAmountAlign is capable of performing progressive multiple alignment as well. Benchmarking of RNAmountAlign against LocARNA, LARA, FOLDALIGN, DYNALIGN, STRAL, MXSCARNA, and MUSCLE shows that RNAmountAlign has reasonably good accuracy and faster run time supporting all alignment types. Additionally, our extension of RNAmountAlign, called RNAmountAlignScan, which scans a target genome sequence to find hits having high sequence and structural similarity to a given query sequence, outperforms RSEARCH and sequence-only query scans and runs faster than FOLDALIGN query scan.
Collapse
Affiliation(s)
- Amir H. Bayegan
- Biology Department, Boston College, Chestnut Hill, MA, United States of America
| | - Peter Clote
- Biology Department, Boston College, Chestnut Hill, MA, United States of America
- * E-mail:
| |
Collapse
|
4
|
Abstract
The structure of RNA has been a natural subject for mathematical modeling, inviting many innovative computational frameworks. This single-stranded polynucleotide chain can fold upon itself in numerous ways to form hydrogen-bonded segments, imperfect with single-stranded loops. Illustrating these paired and non-paired interaction networks, known as RNA's secondary (2D) structure, using mathematical graph objects has been illuminating for RNA structure analysis. Building upon such seminal work from the 1970s and 1980s, graph models are now used to study not only RNA structure but also describe RNA's recurring modular units, sample the conformational space accessible to RNAs, predict RNA's three-dimensional folds, and apply the combined aspects to novel RNA design. In this article, we outline the development of the RNA-As-Graphs (or RAG) approach and highlight current applications to RNA structure prediction and design.
Collapse
Affiliation(s)
- Tamar Schlick
- Department of Chemistry, 100 Washington Square East, Silver Building, New York University, New York, NY 10003, USA; Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, NY 10012, USA; New York University ECNU - Center for Computational Chemistry at NYU Shanghai, 3663 North Zhongshan Road, Shanghai, 200062, China.
| |
Collapse
|
5
|
Churkin A, Barash D. RNA dot plots: an image representation for RNA secondary structure analysis and manipulations. WILEY INTERDISCIPLINARY REVIEWS-RNA 2013; 4:205-16. [PMID: 23386427 DOI: 10.1002/wrna.1154] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Dot plots were originally introduced in bioinformatics as dot-containing images used to compare biological sequences and identify regions of close similarity between them. In addition to similarity, dot plots were extended to possibly represent interactions between building blocks of biological sequences, where the dots can vary in size or color according to desired features. In this survey, we first review their use in representing an RNA secondary structure, which has mostly been applied for displaying the output secondary structures as a result of running RNA folding prediction algorithms. Such a result may often contain suboptimal solutions in addition to the optimal one, which can be easily incorporated in the dot plot. We then proceed from their passive use of providing RNA secondary structure snapshots to their active use of illustrating RNA secondary structure manipulations in beneficial ways. While comparison between RNA secondary structures can mostly be done efficiently using a string representation, there are notable advantages in using dot plots for analyzing the suboptimal solutions that convey important information about the structure of the RNA molecule. In addition, structure-based alignment of dot plots has been advanced considerably and the filtering of dot plots that considers chemical and enzymatic data from structure determination experiments has been suggested. We discuss these procedures and how they can be enhanced in the future by using an image representation to analyze RNA secondary structures and examine their manipulations.
Collapse
Affiliation(s)
- Alexander Churkin
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
| | | |
Collapse
|
6
|
Hogeweg P. Toward a theory of multilevel evolution: long-term information integration shapes the mutational landscape and enhances evolvability. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 751:195-224. [PMID: 22821460 DOI: 10.1007/978-1-4614-3567-9_10] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Most of evolutionary theory has abstracted away from how information is coded in the genome and how this information is transformed into traits on which selection takes place. While in the earliest stages of biological evolution, in the RNA world, the mapping from the genotype into function was largely predefined by the physical-chemical properties of the evolving entities (RNA replicators, e.g. from sequence to folded structure and catalytic sites), in present-day organisms, the mapping itself is the result of evolution. I will review results of several in silico evolutionary studies which examine the consequences of evolving the genetic coding, and the ways this information is transformed, while adapting to prevailing environments. Such multilevel evolution leads to long-term information integration. Through genome, network, and dynamical structuring, the occurrence and/or effect of random mutations becomes nonrandom, and facilitates rapid adaptation. This is what does happen in the in silico experiments. Is it also what did happen in biological evolution? I will discuss some data that suggest that it did. In any case, these results provide us with novel search images to tackle the wealth of biological data.
Collapse
Affiliation(s)
- Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
7
|
Elväng A, Melik W, Bertrand Y, Lönn M, Johansson M. Sequencing of a tick-borne encephalitis virus from Ixodes ricinus reveals a thermosensitive RNA switch significant for virus propagation in ectothermic arthropods. Vector Borne Zoonotic Dis 2011; 11:649-58. [PMID: 21254926 DOI: 10.1089/vbz.2010.0105] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Tick-borne encephalitis virus (TBEV) is a flavivirus with major impact on global health. The geographical TBEV distribution is expanding, thus making it pivotal to further characterize the natural virus populations. In this study, we completed the earlier partial sequencing of a TBEV pulled out of a pool of RNA extracted from 115 ticks collected on Torö in the Stockholm archipelago. The total RNA was sufficient for all sequencing of a TBEV genome (Torö-2003), without conventional enrichment procedures such as cell culturing or suckling mice amplification. To our knowledge, this is the first time that the genome of TBEV has been sequenced directly from an arthropod reservoir. The Torö-2003 sequence has been characterized and compared with other TBE viruses. In silico analyses of secondary RNA structures formed by the two untranslated regions revealed a temperature-sensitive structural shift between a closed replicative form and an open AUG accessible form, analogous to a recently described bacterial thermoswitch. Additionally, novel phylogenetic conserved structures were identified in the variable part of the 3'-untranslated region, and their sequence and structure similarity when compared with earlier identified structures suggests an enhancing function on virus replication and translation. We propose that the thermo-switch mechanism may explain the low TBEV prevalence often observed in environmentally sampled ticks. Finally, we were able to detect variations that help in the understanding of virus adaptations to varied environmental temperatures and mammalian hosts through a comparative approach that compares RNA folding dynamics between strains with different mammalian cell passage histories.
Collapse
Affiliation(s)
- Annelie Elväng
- School of Life Sciences, Södertörn University, Huddinge, Sweden
| | | | | | | | | |
Collapse
|
8
|
Ivry T, Michal S, Avihoo A, Sapiro G, Barash D. An image processing approach to computing distances between RNA secondary structures dot plots. Algorithms Mol Biol 2009; 4:4. [PMID: 19203377 PMCID: PMC2677394 DOI: 10.1186/1748-7188-4-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2007] [Accepted: 02/09/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Computing the distance between two RNA secondary structures can contribute in understanding the functional relationship between them. When used repeatedly, such a procedure may lead to finding a query RNA structure of interest in a database of structures. Several methods are available for computing distances between RNAs represented as strings or graphs, but none utilize the RNA representation with dot plots. Since dot plots are essentially digital images, there is a clear motivation to devise an algorithm for computing the distance between dot plots based on image processing methods. RESULTS We have developed a new metric dubbed 'DoPloCompare', which compares two RNA structures. The method is based on comparing dot plot diagrams that represent the secondary structures. When analyzing two diagrams and motivated by image processing, the distance is based on a combination of histogram correlations and a geometrical distance measure. We introduce, describe, and illustrate the procedure by two applications that utilize this metric on RNA sequences. The first application is the RNA design problem, where the goal is to find the nucleotide sequence for a given secondary structure. Examples where our proposed distance measure outperforms others are given. The second application locates peculiar point mutations that induce significant structural alternations relative to the wild type predicted secondary structure. The approach reported in the past to solve this problem was tested on several RNA sequences with known secondary structures to affirm their prediction, as well as on a data set of ribosomal pieces. These pieces were computationally cut from a ribosome for which an experimentally derived secondary structure is available, and on each piece the prediction conveys similarity to the experimental result. Our newly proposed distance measure shows benefit in this problem as well when compared to standard methods used for assessing the distance similarity between two RNA secondary structures. CONCLUSION Inspired by image processing and the dot plot representation for RNA secondary structure, we have managed to provide a conceptually new and potentially beneficial metric for comparing two RNA secondary structures. We illustrated our approach on the RNA design problem, as well as on an application that utilizes the distance measure to detect conformational rearranging point mutations in an RNA sequence.
Collapse
|
9
|
Simossis V, Kleinjung J, Heringa J. An overview of multiple sequence alignment. CURRENT PROTOCOLS IN BIOINFORMATICS 2008; Chapter 3:3.7.1-3.7.26. [PMID: 18428699 DOI: 10.1002/0471250953.bi0307s03] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Multiple sequence alignment is perhaps the most commonly applied bioinformatics technique. It often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. In this unit, an overview of multiple sequence alignment techniques is presented, covering a history of nearly 30 years from the early pioneering methods to the current state-of-the-art techniques. Methodological and biological issues and end-user considerations, as well as alignment evaluation issues, are discussed.
Collapse
Affiliation(s)
- Victor Simossis
- Integrative Bioinformatics Institute (IBIVU), Free University, Amsterdam, The Netherlands
| | | | | |
Collapse
|
10
|
Shu W, Bo X, Zheng Z, Wang S. A novel representation of RNA secondary structure based on element-contact graphs. BMC Bioinformatics 2008; 9:188. [PMID: 18402706 PMCID: PMC2373570 DOI: 10.1186/1471-2105-9-188] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2007] [Accepted: 04/11/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Depending on their specific structures, noncoding RNAs (ncRNAs) play important roles in many biological processes. Interest in developing new topological indices based on RNA graphs has been revived in recent years, as such indices can be used to compare, identify and classify RNAs. Although the topological indices presented before characterize the main topological features of RNA secondary structures, information on RNA structural details is ignored to some degree. Therefore, it is necessity to identify topological features with low degeneracy based on complete and fine-grained RNA graphical representations. RESULTS In this study, we present a complete and fine scheme for RNA graph representation as a new basis for constructing RNA topological indices. We propose a combination of three vertex-weighted element-contact graphs (ECGs) to describe the RNA element details and their adjacent patterns in RNA secondary structure. Both the stem and loop topologies are encoded completely in the ECGs. The relationship among the three typical topological index families defined by their ECGs and RNA secondary structures was investigated from a dataset of 6,305 ncRNAs. The applicability of topological indices is illustrated by three application case studies. Based on the applied small dataset, we find that the topological indices can distinguish true pre-miRNAs from pseudo pre-miRNAs with about 96% accuracy, and can cluster known types of ncRNAs with about 98% accuracy, respectively. CONCLUSION The results indicate that the topological indices can characterize the details of RNA structures and may have a potential role in identifying and classifying ncRNAs. Moreover, these indices may lead to a new approach for discovering novel ncRNAs. However, further research is needed to fully resolve the challenging problem of predicting and classifying noncoding RNAs.
Collapse
Affiliation(s)
- Wenjie Shu
- Beijing Institute of Radiation Medicine, Beijing 100850, China.
| | | | | | | |
Collapse
|
11
|
Gruber AR, Bernhart SH, Hofacker IL, Washietl S. Strategies for measuring evolutionary conservation of RNA secondary structures. BMC Bioinformatics 2008; 9:122. [PMID: 18302738 PMCID: PMC2335298 DOI: 10.1186/1471-2105-9-122] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2007] [Accepted: 02/26/2008] [Indexed: 02/01/2023] Open
Abstract
Background Evolutionary conservation of RNA secondary structure is a typical feature of many functional non-coding RNAs. Since almost all of the available methods used for prediction and annotation of non-coding RNA genes rely on this evolutionary signature, accurate measures for structural conservation are essential. Results We systematically assessed the ability of various measures to detect conserved RNA structures in multiple sequence alignments. We tested three existing and eight novel strategies that are based on metrics of folding energies, metrics of single optimal structure predictions, and metrics of structure ensembles. We find that the folding energy based SCI score used in the RNAz program and a simple base-pair distance metric are by far the most accurate. The use of more complex metrics like for example tree editing does not improve performance. A variant of the SCI performed particularly well on highly conserved alignments and is thus a viable alternative when only little evolutionary information is available. Surprisingly, ensemble based methods that, in principle, could benefit from the additional information contained in sub-optimal structures, perform particularly poorly. As a general trend, we observed that methods that include a consensus structure prediction outperformed equivalent methods that only consider pairwise comparisons. Conclusion Structural conservation can be measured accurately with relatively simple and intuitive metrics. They have the potential to form the basis of future RNA gene finders, that face new challenges like finding lineage specific structures or detecting mis-aligned sequences.
Collapse
Affiliation(s)
- Andreas R Gruber
- Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, 1090 Wien, Austria.
| | | | | | | |
Collapse
|
12
|
Shu W, Bo X, Ni M, Zheng Z, Wang S. In silico genetic robustness analysis of microRNA secondary structures: potential evidence of congruent evolution in microRNA. BMC Evol Biol 2007; 7:223. [PMID: 17997861 PMCID: PMC2222248 DOI: 10.1186/1471-2148-7-223] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2007] [Accepted: 11/13/2007] [Indexed: 12/17/2022] Open
Abstract
Background Robustness is a fundamental property of biological systems and is defined as the ability to maintain stable functioning in the face of various perturbations. Understanding how robustness has evolved has become one of the most attractive areas of research for evolutionary biologists, as it is still unclear whether genetic robustness evolved as a direct consequence of natural selection, as an intrinsic property of adaptations, or as congruent correlate of environment robustness. Recent studies have demonstrated that the stem-loop structures of microRNA (miRNA) are tolerant to some structural changes and show thermodynamic stability. We therefore hypothesize that genetic robustness may evolve as a correlated side effect of the evolution for environmental robustness. Results We examine the robustness of 1,082 miRNA genes covering six species. Our data suggest the stem-loop structures of miRNA precursors exhibit a significantly higher level of genetic robustness, which goes beyond the intrinsic robustness of the stem-loop structure and is not a byproduct of the base composition bias. Furthermore, we demonstrate that the phenotype of miRNA buffers against genetic perturbations, and at the same time is also insensitive to environmental perturbations. Conclusion The results suggest that the increased robustness of miRNA stem-loops may result from congruent evolution for environment robustness. Potential applications of our findings are also discussed.
Collapse
Affiliation(s)
- Wenjie Shu
- Beijing Institute of Radiation Medicine, Beijing 100850, China.
| | | | | | | | | |
Collapse
|
13
|
Haslinger C, Stadler PF. RNA structures with pseudo-knots: graph-theoretical, combinatorial, and statistical properties. Bull Math Biol 2007; 61:437-67. [PMID: 17883226 PMCID: PMC7197269 DOI: 10.1006/bulm.1998.0085] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The secondary structures of nucleic acids form a particularly important class of contact structures. Many important RNA molecules, however, contain pseudo-knots, a structural feature that is excluded explicitly from the conventional definition of secondary structures. We propose here a generalization of secondary structures incorporating ‘non-nested’ pseudo-knots, which we call bi-secondary structures, and discuss measures for the complexity of more general contact structures based on their graph-theoretical properties. Bi-secondary structures are planar trivalent graphs that are characterized by special embedding properties. We derive exact upper bounds on their number (as a function of the chain length n) implying that there are fewer different structures than sequences. Computational results show that the number of bi-secondary structures grows approximately like 2.35n. Numerical studies based on kinetic folding and a simple extension of the standard energy model show that the global features of the sequence-structure map of RNA do not change when pseudo-knots are introduced into the secondary structure picture. We find a large fraction of neutral mutations and, in particular, networks of sequences that fold into the same shape. These neutral networks percolate through the entire sequence space.
Collapse
Affiliation(s)
- Christian Haslinger
- Institut für Theoretische Chemie, Universität Wien, Währingerstra×e 17, A-1090 Wien, Austria
| | - Peter F. Stadler
- Institut für Theoretische Chemie, Universität Wien, Währingerstra×e 17, A-1090 Wien, Austria
- The Sante Fe Institute, 1399 Hyde Park Road, Sante Fe, NM 87501 USA
| |
Collapse
|
14
|
Abstract
Biological robustness, defined as the ability to maintain stable functioning in the face of various perturbations, is an important and fundamental topic in current biology, and has become a focus of numerous studies in recent years. Although structural robustness has been explored in several types of RNA molecules, the origins of robustness are still controversial. Computational analysis results are needed to make up for the lack of evidence of robustness in natural biological systems. The RNA structural robustness evaluator (RSRE) web server presented here provides a freely available online tool to quantitatively evaluate the structural robustness of RNA based on the widely accepted definition of neutrality. Several classical structure comparison methods are employed; five randomization methods are implemented to generate control sequences; sub-optimal predicted structures can be optionally utilized to mitigate the uncertainty of secondary structure prediction. With a user-friendly interface, the web application is easy to use. Intuitive illustrations are provided along with the original computational results to facilitate analysis. The RSRE will be helpful in the wide exploration of RNA structural robustness and will catalyze our understanding of RNA evolution. The RSRE web server is freely available at http://biosrv1.bmi.ac.cn/RSRE/ or http://biotech.bmi.ac.cn/RSRE/.
Collapse
Affiliation(s)
- Wenjie Shu
- Beijing Institute of Radiation Medicine, Beijing 100850, China and College of Electro-Mechanic and Automation, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Xiaochen Bo
- Beijing Institute of Radiation Medicine, Beijing 100850, China and College of Electro-Mechanic and Automation, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Zhiqiang Zheng
- Beijing Institute of Radiation Medicine, Beijing 100850, China and College of Electro-Mechanic and Automation, National University of Defense Technology, Changsha, Hunan 410073, China
| | - Shengqi Wang
- Beijing Institute of Radiation Medicine, Beijing 100850, China and College of Electro-Mechanic and Automation, National University of Defense Technology, Changsha, Hunan 410073, China
- *To whom correspondence should be addressed. +86-10-66932211+86-10-66932211 Correspondence may also be addressed to Xiaochen Bo. +86-10-66932211+86-10-66932211
| |
Collapse
|
15
|
|
16
|
Abstract
BACKGROUND With the ever-increasing number of sequenced RNAs and the establishment of new RNA databases, such as the Comparative RNA Web Site and Rfam, there is a growing need for accurately and automatically predicting RNA structures from multiple alignments. Since RNA secondary structure is often conserved in evolution, the well known, but underused, mutual information measure for identifying covarying sites in an alignment can be useful for identifying structural elements. This article presents MIfold, a MATLAB toolbox that employs mutual information, or a related covariation measure, to display and predict conserved RNA secondary structure (including pseudoknots) from an alignment. RESULTS We show that MIfold can be used to predict simple pseudoknots, and that the performance can be adjusted to make it either more sensitive or more selective. We also demonstrate that the overall performance of MIfold improves with the number of aligned sequences for certain types of RNA sequences. In addition, we show that, for these sequences, MIfold is more sensitive but less selective than the related RNAalifold structure prediction program and is comparable with the COVE structure prediction package. CONCLUSION MIfold provides a useful supplementary tool to programs such as RNA Structure Logo, RNAalifold and COVE, and should be useful for automatically generating structural predictions for databases such as Rfam.
Collapse
Affiliation(s)
- Eva Freyhult
- The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden.
| | | | | |
Collapse
|
17
|
Kafeero-Kiwanuka CK. Determination of genetic stability during cryopreservation and maintenance of yeasts. Afr J Ecol 2004. [DOI: 10.1111/j.1365-2028.2004.00470.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
18
|
Tanzer A, Stadler PF. Molecular evolution of a microRNA cluster. J Mol Biol 2004; 339:327-35. [PMID: 15136036 DOI: 10.1016/j.jmb.2004.03.065] [Citation(s) in RCA: 455] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2004] [Revised: 03/23/2004] [Accepted: 03/26/2004] [Indexed: 01/13/2023]
Abstract
Many of the known microRNAs are encoded in polycistronic transcripts. Here, we reconstruct the evolutionary history of the mir17 microRNA clusters which consist of miR-17, miR-18, miR-19a, miR-19b, miR-20, miR-25, miR-92, miR-93, miR-106a, and miR-106b. The history of this cluster is governed by an initial phase of local (tandem) duplications, a series of duplications of entire clusters and subsequent loss of individual microRNAs from the resulting paralogous clusters. The complex history of the mir17 microRNA family appears to be closely linked to the early evolution of the vertebrate lineage.
Collapse
Affiliation(s)
- Andrea Tanzer
- Lehrstuhl für Bioinformatik am Institut für Informatik und Interdisziplinäres, Zentrum für Bioinformatik, Universität Leipzig, Germany
| | | |
Collapse
|
19
|
Thurner C, Witwer C, Hofacker IL, Stadler PF. Conserved RNA secondary structures in Flaviviridae genomes. J Gen Virol 2004; 85:1113-1124. [PMID: 15105528 DOI: 10.1099/vir.0.19462-0] [Citation(s) in RCA: 147] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Presented here is a comprehensive computational survey of evolutionarily conserved secondary structure motifs in the genomic RNAs of the family Flaviviridae: This virus family consists of the three genera Flavivirus, Pestivirus and Hepacivirus and the group of GB virus C/hepatitis G virus with a currently uncertain taxonomic classification. Based on the control of replication and translation, two subgroups were considered separately: the genus Flavivirus, with its type I cap structure at the 5' untranslated region (UTR) and a highly structured 3' UTR, and the remaining three groups, which exhibit translation control by means of an internal ribosomal entry site (IRES) in the 5' UTR and a much shorter less-structured 3' UTR. The main findings of this survey are strong hints for the possibility of genome cyclization in hepatitis C virus and GB virus C/hepatitis G virus in addition to the flaviviruses; a surprisingly large number of conserved RNA motifs in the coding regions; and a lower level of detailed structural conservation in the IRES and 3' UTR motifs than reported in the literature. An electronic atlas organizes the information on the more than 150 conserved, and therefore putatively functional, RNA secondary structure elements.
Collapse
Affiliation(s)
- Caroline Thurner
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
| | - Christina Witwer
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
| | - Ivo L Hofacker
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
| | - Peter F Stadler
- The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
- Bioinformatik, Institut für Informatik, Universität Leipzig, Kreuzstraße 7b, D-04103 Leipzig, Germany
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
| |
Collapse
|
20
|
Kitagawa J, Futamura Y, Yamamoto K. Analysis of the conformational energy landscape of human snRNA with a metric based on tree representation of RNA structures. Nucleic Acids Res 2003; 31:2006-13. [PMID: 12655018 PMCID: PMC152804 DOI: 10.1093/nar/gkg288] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
It is an outstanding problem to clarify how the RNA sequence is related to its structure and biological functions. We developed a simplified definition of a metric for tree representation of RNA secondary structures and analyzed the conformational energy landscapes of human spliceosomal snRNAs. We discuss the structural properties of the biological sequence by calculating the conformational energy landscapes based on the structural distance between each of the pairs in the set of suboptimal structures. The new index value is introduced for estimating the shapes of distribution patterns in conformational energy landscapes. We apply our method to the five human snRNAs and show that U1 snRNA has a multi-valley profile of the landscape, whereas the landscapes of the other four snRNAs have one steep valley. This result reflects different biological functions of these snRNAs in the pre-mRNA splicing process. The results of analyzing tRNAs and rRNAs show that the conformational energy landscapes of these sequences have multi-valley profiles.
Collapse
Affiliation(s)
- Junji Kitagawa
- Information Science Seminar, Graduate School of Engineering, University of Tokyo, Japan.
| | | | | |
Collapse
|
21
|
Abstract
Most functional RNA molecules have characteristic secondary structures that are highly conserved in evolution. Here we present a method for computing the consensus structure of a set aligned RNA sequences taking into account both thermodynamic stability and sequence covariation. Comparison with phylogenetic structures of rRNAs shows that a reliability of prediction of more than 80% is achieved for only five related sequences. As an application we show that the Early Noduline mRNA contains significant secondary structure that is supported by sequence covariation.
Collapse
MESH Headings
- Algorithms
- Archaea/genetics
- Base Sequence
- Consensus Sequence/genetics
- Databases, Nucleic Acid
- Escherichia coli/genetics
- Evolution, Molecular
- Molecular Sequence Data
- Nucleic Acid Conformation
- Phylogeny
- Prokaryotic Cells
- RNA/chemistry
- RNA/genetics
- RNA Stability
- RNA, Ribosomal, 16S/chemistry
- RNA, Ribosomal, 16S/genetics
- RNA, Ribosomal, 23S/chemistry
- RNA, Ribosomal, 23S/genetics
- Sequence Alignment
- Sequence Homology, Nucleic Acid
- Thermodynamics
Collapse
Affiliation(s)
- Ivo L Hofacker
- Institut für Theoretische Chemie, Universität Wien, Währingerstrasse 17, Austria
| | | | | |
Collapse
|
22
|
Abstract
In this paper, we consider the evolutionary dynamics of catalytically active species with a distinct genotype-phenotype relationship. Folding landscapes of RNA molecules serve as a paradigm for this relationship with essential neutral properties. The landscape itself is partitioned by phenotypes (realized as RNA secondary structures). To each genotype (represented as a sequence) a structure is assigned in a unique way. The set of all sequences which map into a particular structure is modeled as a random graph in sequence space (the so-called neutral network). A catalytic network is realized as a random digraph with maximal out-degree two and secondary structures as vertex sets. A population of catalytic RNA molecules shows significantly different behavior compared to a deterministic description: hypercycles are able to co-exist and out-compete a parasite with superior catalytic support. A "switching" between different dynamic organizations of the network can be observed, dynamical stability of hypercyclic organizations against errors and the existence of an error-threshold of catalysis can be reported.
Collapse
|
23
|
Abstract
New results for calculating nucleic acid secondary structure by free energy minimization and phylogenetic comparisons have recently been reported. A complete set of DNA energy parameters is now available and the RNA parameters have been improved. Although databases of RNA secondary structures are still derived and expanded using computer-assisted, ad hoc comparative analysis, a number of new computer algorithms combine covariation analysis with energy methods.
Collapse
Affiliation(s)
- M Zuker
- Department of Biochemistry and Molecular Biophysics, Washington University, St Louis, 63110, USA.
| |
Collapse
|
24
|
Abstract
Many different programs have been developed for the prediction of the secondary structure of an RNA sequence. Some of these programs generate an ensemble of structures, all of which have free energy close to that of the optimal structure, making it important to be able to quantify how similar these different structures are. To deal with this problem, we define a new class of metrics, the mountain metrics, on the set of RNA secondary structures of a fixed length. We compare properties of these metrics with other well known metrics on RNA secondary structures. We also study some global and local properties of these metrics.
Collapse
Affiliation(s)
- V Moulton
- FMI (Physics and Mathematics Department), Mid-Sweden University, Sundsvall
| | | | | | | | | |
Collapse
|
25
|
|
26
|
Hofacker IL, Stadler PF. Automatic detection of conserved base pairing patterns in RNA virus genomes. COMPUTERS & CHEMISTRY 1999; 23:401-14. [PMID: 10404627 DOI: 10.1016/s0097-8485(99)00013-3] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Almost all RNA molecules--and consequently also almost all subsequences of a large RNA molecule-form secondary structures. The presence of secondary structure in itself therefore does not indicate any functional significance. In fact, we cannot expect a conserved secondary structure for all parts of a viral genome or a mRNA, even if there is a significant level of sequence conservation. We present a novel method for detecting conserved RNA secondary structures in a family of related RNA sequences. The method is based on combining the prediction of base pair probability matrices and comparative sequence analysis. It can be applied to small sets of long sequences and does not require a prior knowledge of conserved sequence or structure motifs. As such it can be used to scan large amounts of sequence data for regions that warrant further experimental investigation. Applications to complete genomic RNAs of some viruses show that in all cases the known secondary structure features are identified. In addition, we predict a substantial number of conserved structural elements which have not been described so far.
Collapse
Affiliation(s)
- I L Hofacker
- Institut für Theoretische Chemie, Universität Wien, Austria
| | | |
Collapse
|
27
|
Hofacker IL, Fekete M, Flamm C, Huynen MA, Rauscher S, Stolorz PE, Stadler PF. Automatic detection of conserved RNA structure elements in complete RNA virus genomes. Nucleic Acids Res 1998; 26:3825-36. [PMID: 9685502 PMCID: PMC147758 DOI: 10.1093/nar/26.16.3825] [Citation(s) in RCA: 101] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
We propose a new method for detecting conserved RNA secondary structures in a family of related RNA sequences. Our method is based on a combination of thermodynamic structure prediction and phylogenetic comparison. In contrast to purely phylogenetic methods, our algorithm can be used for small data sets of approximately 10 sequences, efficiently exploiting the information contained in the sequence variability. The procedure constructs a prediction only for those parts of sequences that are consistent with a single conserved structure. Our implementation produces reasonable consensus structures without user interference. As an example we have analysed the complete HIV-1 and hepatitis C virus (HCV) genomes as well as the small segment of hantavirus. Our method confirms the known structures in HIV-1 and predicts previously unknown conserved RNA secondary structures in HCV.
Collapse
Affiliation(s)
- I L Hofacker
- Institut für Theoretische Chemie, Universität Wien, Wien, Austria, EMBL, Heidelberg, Germany, Max Delbrück Center, Berlin, Germany
| | | | | | | | | | | | | |
Collapse
|
28
|
Schuster P. Genotypes with phenotypes: Adventures in an RNA toy world. Biophys Chem 1997; 66:75-110. [PMID: 17029873 DOI: 10.1016/s0301-4622(97)00058-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/1997] [Accepted: 04/17/1997] [Indexed: 11/28/2022]
Abstract
Evolution has created the complexity of the animate world and deciphering the language of evolution is the key towards understanding nature. The dynamics of evolution is simplified by considering it as a superposition of three less sophisticated processes: population dynamics, population support dynamics, and genotype-phenotype mapping. Evolution of molecules in laboratory assays provides a sufficiently simple system for the quantitative analysis of the three phenomena. Coarse-grained notions of structures like RNA secondary structures are used as model phenotypes. They provide an excellent tool for a comprehensive analysis of the entire complex of molecular evolution. The mapping from RNA genotypes into secondary structures is highly redundant. In order to find at least one sequence for every common structures one need only search a (relatively) small part of sequence space. The existence of selectively neutral phenotypes plays an important role for the the success and the efficiency of evolutionary optimization. Molecular evolution found a highly promising technological application in the design of biomolecules with predefined properties.
Collapse
Affiliation(s)
- P Schuster
- Institut für Theoretische Chemie und Strahlenchemie, Universität Wien, A-1090 Wien, Austria
| |
Collapse
|
29
|
Reidys C, Stadler PF, Schuster P. Generic properties of combinatory maps: neutral networks of RNA secondary structures. Bull Math Biol 1997; 59:339-97. [PMID: 9116604 DOI: 10.1007/bf02462007] [Citation(s) in RCA: 181] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Random graph theory is used to model and analyse the relationships between sequences and secondary structures of RNA molecules, which are understood as mappings from sequence space into shape space. These maps are non-invertible since there are always many orders of magnitude more sequences than structures. Sequences folding into identical structures form neutral networks. A neutral network is embedded in the set of sequences that are compatible with the given structure. Networks are modeled as graphs and constructed by random choice of vertices from the space of compatible sequences. The theory characterizes neutral networks by the mean fraction of neutral neighbors (lambda). The networks are connected and percolate sequence space if the fraction of neutral nearest neighbors exceeds a threshold value (lambda > lambda *). Below threshold (lambda < lambda *), the networks are partitioned into a largest "giant" component and several smaller components. Structures are classified as "common" or "rare" according to the sizes of their pre-images, i.e. according to the fractions of sequences folding into them. The neutral networks of any pair of two different common structures almost touch each other, and, as expressed by the conjecture of shape space covering sequences folding into almost all common structures, can be found in a small ball of an arbitrary location in sequence space. The results from random graph theory are compared to data obtained by folding large samples of RNA sequences. Differences are explained in terms of specific features of RNA molecular structures.
Collapse
Affiliation(s)
- C Reidys
- Santa Fe Institute, NM 87501, USA
| | | | | |
Collapse
|
30
|
Analysis of RNA sequence structure maps by exhaustive enumeration I. Neutral networks. MONATSHEFTE FUR CHEMIE 1996. [DOI: 10.1007/bf00810881] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
31
|
Benedetti G, Morosetti S. A graph-topological approach to recognition of pattern and similarity in RNA secondary structures. Biophys Chem 1996; 59:179-84. [PMID: 8867337 DOI: 10.1016/0301-4622(95)00119-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Secondary and tertiary RNA structures play an important role in many biological processes. Therefore the necessity arises to find similar higher-order structures for different but functionally homologous RNA sequences. We propose here a graph-topological approach to the problem, which shows two main features: simplified graph representation which allows the recognition of similarity of RNA secondary structures with the same branching look despite minor differences. This allows comparison among foldings from different sequences, and "pruning" of the secondary structures not shared by all the sequences since the early stages of the search. (b) The graph representation is encoded by the Randić topological index, and the search for the folding similarity is reduced to checking the identity of single numbers. These characteristics make this approach significantly different, less depending on empirical criteria, and less computationally heavy then previous methods, where the folding consensus has been measured by an alignment procedure or correlation of strings representing the secondary structures. Some U2 snRNA and viroid sequences are studied by this approach, which is imbedded in our previous search method based on genetic algorithms.
Collapse
Affiliation(s)
- G Benedetti
- Dipartimento di Chimica, Università di Roma La Sapienza, Italy
| | | |
Collapse
|
32
|
Abstract
Shapes of biological macromolecules--RNA, DNA, and proteins--can be represented by abstract algebraic structures provided that a suitably coarse resolution is chosen. These abstract structures, for instance partially ordered sets and permutation groups, can be used for deriving new metric distances between bimolecular shapes and for proving surprising theorems on sequence-structure relations.
Collapse
Affiliation(s)
- C Reidys
- Institut für Molekulare Biotechnologie, Beutenbergstrasse 11, PF 100813, D-07708 Jena, Germany
| | | |
Collapse
|
33
|
Huynen MA, Perelson A, Vieira WA, Stadler PF. Base pairing probabilities in a complete HIV-1 RNA. J Comput Biol 1996; 3:253-74. [PMID: 8811486 DOI: 10.1089/cmb.1996.3.253] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
We have calculated the base pair probability distribution for the secondary structure of a full length HIV-1 genome using the partition function approach introduced by McCaskill (1990). By analyzing the full distribution of base pair probabilities instead of a restricted number of secondary structures, we gain more complete and reliable information about the secondary structure of HIV-1. We introduce methods that condense the information in the probability distribution to one value per nucleotide in the sequence. Using these methods we represent the secondary structure as a weighted average of the base pair probabilities, and we can identify interesting secondary structures that have relatively well-defined base pairing. The results show high probabilities for the known secondary structures at the 5'-end of the molecule that have been predicted on the basis of biochemical data. The Rev response element (RRE) appears as a distinct element in the secondary structure. It has a meta-stable domain at the high affinity site for the binding of Rev. The overall structure decomposes into fairly small independent structures in the first 4,000 bases of the molecule. The remaining 5,000 bases (excluding the terminal repeat) form a single, large structure, on top of which the RRE is located.
Collapse
Affiliation(s)
- M A Huynen
- Theoretical Division and Center for Non-Linear Studies, Los Alamos Natl. Lab. NM 87545, USA
| | | | | | | |
Collapse
|
34
|
Benedetti G, Morosetti S. A genetic algorithm to search for optimal and suboptimal RNA secondary structures. Biophys Chem 1995; 55:253-9. [PMID: 7542936 DOI: 10.1016/0301-4622(94)00130-c] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Genetic algorithms are a search method used in solving problems by selection, recombination and mutation of tentative solutions, until the better ones are achieved. They are very efficient when the 'building block' hypothesis is effective for the solutions, which means that a better solution can be obtained by assembling short 'motifs' or 'schemata' that can be retrieved in some other worse solutions. The additive nature of the secondary structure free energy rules suggests the validity of this hypothesis, and therefore the likely power of a genetic algorithm approach to search for RNA secondary structures. We describe in detail an original genetic algorithm specific for this problem. The sharing function used to obtain differentiated solutions is also described. It results in a greater effectiveness of the algorithm in retrieving a large number of suboptimal RNA foldings besides the optimal one. RNA sequences of different length are used to test the method. The PSTV viroid sequence has been studied.
Collapse
Affiliation(s)
- G Benedetti
- Dipartimento di Chimica, Università di Roma La Sapienza, Italy
| | | |
Collapse
|
35
|
Nielsen DA, Novoradovsky A, Goldman D. SSCP primer design based on single-strand DNA structure predicted by a DNA folding program. Nucleic Acids Res 1995; 23:2287-91. [PMID: 7610057 PMCID: PMC307019 DOI: 10.1093/nar/23.12.2287] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
To predict alterations in single-strand DNA mobility in non-denaturing electrophoretic gels, Zuker's RNA folding program was modified. Energy files utilized by the LRNA RNA folding algorithm were modified to emulate folding of single-strand DNA. Energy files were modified to disallow G-T base pairing. Stacking energies were corrected for DNA thermodynamics. Constraints on loop nucleotide sequences were removed. The LRNA RNA folding algorithm using the DNA fold energy files was applied to predict folding of PCR generated single-strand DNA molecules from polymorphic human ALDH2 and TPH alleles. The DNA-Fold version 1.0 program was used to design primers to create and abolish SSCP mobility shifts. Primers were made that add a 5' tag sequence or alter complementarity to an internal sequence. Differences in DNA secondary structure were assessed by SSCP analysis and compared to single-strand DNA secondary structure predictions. Results demonstrate that alterations in single-strand DNA conformation may be predicted using DNA-Fold 1.0.
Collapse
Affiliation(s)
- D A Nielsen
- Section of Molecular Genetics, NIAAA, NIH, Bethesda, MD 20892-0001, USA
| | | | | |
Collapse
|
36
|
Schuster P, Stadler PF. Landscapes: complex optimization problems and biopolymer structures. COMPUTERS & CHEMISTRY 1994; 18:295-324. [PMID: 7524995 DOI: 10.1016/0097-8485(94)85025-9] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The evolution of RNA molecules in replication assays, viroids and RNA viruses can be viewed as an adaptation process on a 'fitness' landscape. The dynamics of evolution is hence tightly linked to the structure of the underlying landscape. Global features of landscapes can be described by statistical measures like number of optima, lengths of walks and correlation functions. The evolution of a quasispecies on such landscapes exhibits three dynamical regimes depending on the replication fidelity: Above the "localization threshold" the population is centered around a (local) optimum. Between localization and "dispersion threshold" the population is still centered around a consensus sequence, which, however, changes in time. For very large mutation rates the population spreads in sequence space like a gas. The critical mutation rates separating the three domains depend strongly on characteristics properties of the fitness landscapes. Statistical characteristics of RNA landscapes are accessible by mathematical analysis and computer calculations on the level of secondary structures: these RNA landscapes belong to the same class as well known optimization problems and simple spin glass models. The notion of a landscape is extended to combinatory maps, thereby allowing for a direct statistical investigation of the sequence structure relationships of RNA at the level of secondary structures. Frequencies of structures are highly non-uniform: we find relatively few common and many rare ones, as expressed by a generalized form of Zipf's law. Using an algorithm for inverse folding we show that sequences sharing the same structure are distributed randomly over sequence space. Together with calculations of structure correlations and a survey of neutral mutations this provides convincing evidence that RNA landscapes are as simple as they could possibly be for evolutionary adaptation: Any desired secondary structure can be found close to an arbitrary initial sequence and at the same time almost all bases can be substituted sequentially without ever changing the shape of the molecule. Consequences of these results for evolutionary optimization, the early stages of life, and molecular biotechnology are discussed.
Collapse
Affiliation(s)
- P Schuster
- Institut für Theoretische Chemie, Universität Wien, Austria
| | | |
Collapse
|
37
|
Huynen MA, Hogeweg P. Pattern generation in molecular evolution: exploitation of the variation in RNA landscapes. J Mol Evol 1994; 39:71-9. [PMID: 7520506 DOI: 10.1007/bf00178251] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Evolution of RNA secondary structure is studied using simulation techniques and statistical analysis of fitness landscapes. The transition from RNA sequence to RNA secondary structure leads to fitness landscapes that have local variations in their "ruggedness." Evolution exploits these variations. In stable environments it moves the quasispecies toward relatively "flat" peaks, where not only the master sequence but also its mutants have a high fitness. In a rapidly changing environment, the situation is reversed; evolution moves the quasispecies to a region where the correlation between secondary structures of "neighboring" RNA sequences is relatively low. In selection for simple secondary structures the movement toward flat peaks leads to pattern generation in the RNA sequences. Patterns are generated at the level of polynucleotide frequencies and the distribution of purines and pyrimidines. The patterns increase the modularity of the sequence. They thereby prevent the formation of alternative secondary structures after mutations. The movement of the quasispecies toward relatively rugged parts of the landscape results in pattern generation at the level of the RNA secondary structure. The base-pairing frequency of the sequences increases. The patterns that are generated in the RNA sequences and the RNA secondary structures are not directly selected for and can be regarded as a side effect of the evolutionary dynamics of the system.
Collapse
Affiliation(s)
- M A Huynen
- Bioinformatics Group, Utrecht University, Netherlands
| | | |
Collapse
|
38
|
Schuster P, Fontana W, Stadler PF, Hofacker IL. From sequences to shapes and back: a case study in RNA secondary structures. Proc Biol Sci 1994; 255:279-84. [PMID: 7517565 DOI: 10.1098/rspb.1994.0040] [Citation(s) in RCA: 476] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
RNA folding is viewed here as a map assigning secondary structures to sequences. At fixed chain length the number of sequences far exceeds the number of structures. Frequencies of structures are highly non-uniform and follow a generalized form of Zipf's law: we find relatively few common and many rare ones. By using an algorithm for inverse folding, we show that sequences sharing the same structure are distributed randomly over sequence space. All common structures can be accessed from an arbitrary sequence by a number of mutations much smaller than the chain length. The sequence space is percolated by extensive neutral networks connecting nearest neighbours folding into identical structures. Implications for evolutionary adaptation and for applied molecular evolution are evident: finding a particular structure by mutation and selection is much simpler than expected and, even if catalytic activity should turn out to be sparse of RNA structures, it can hardly be missed by evolutionary processes.
Collapse
Affiliation(s)
- P Schuster
- Institut für Molekulare Biotechnologie, Jena, Germany
| | | | | | | |
Collapse
|
39
|
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P. Fast folding and comparison of RNA secondary structures. MONATSHEFTE FUR CHEMIE 1994. [DOI: 10.1007/bf00818163] [Citation(s) in RCA: 543] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
40
|
Schinazi RF, Lloyd RM, Ramanathan CS, Taylor EW. Antiviral drug resistance mutations in human immunodeficiency virus type 1 reverse transcriptase occur in specific RNA structural regions. Antimicrob Agents Chemother 1994; 38:268-74. [PMID: 7514854 PMCID: PMC284439 DOI: 10.1128/aac.38.2.268] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
A statistically significant correlation exists between the locations of drug resistance mutations (DRMs) observed for various reverse transcriptase inhibitors and features of the secondary structure predicted for the RNA coding for human immunodeficiency virus type 1 reverse transcriptase. The known DRMs map onto "unstable" bases, which are predominantly nonhelical regions (i.e., loops, bulges, and bends) of the predicted RNA secondary structure, whereas codons for the key conserved residues of polymerase sequence motifs map onto "stable" paired bases involved in helical regions. On the basis of these results, we hypothesize that the secondary structure of the RNA template (in this case, the reverse transcriptase gene itself) may be a previously unrecognized factor contributing to base misincorporation errors during reverse transcription and that, rather than being randomly distributed, mutations are more likely to occur in specific regions of the genome. The results suggest that these "mutation-prone" regions can be predicted by using a standard algorithm for RNA secondary structure.
Collapse
Affiliation(s)
- R F Schinazi
- Veterans Affairs Medical Center, Atlanta, Decatur, Georgia 30033
| | | | | | | |
Collapse
|
41
|
Abstract
A statistical reference for RNA secondary structures with minimum free energies is computed by folding large ensembles of random RNA sequences. Four nucleotide alphabets are used: two binary alphabets, AU and GC, the biophysical AUGC and the synthetic GCXK alphabet. RNA secondary structures are made of structural elements, such as stacks, loops, joints, and free ends. Statistical properties of these elements are computed for small RNA molecules of chain lengths up to 100. The results of RNA structure statistics depend strongly on the particular alphabet chosen. The statistical reference is compared with the data derived from natural RNA molecules with similar base frequencies. Secondary structures are represented as trees. Tree editing provides a quantitative measure for the distance dt, between two structures. We compute a structure density surface as the conditional probability of two structures having distance t given that their sequences have distance h. This surface indicates that the vast majority of possible minimum free energy secondary structures occur within a fairly small neighborhood of any typical (random) sequence. Correlation lengths for secondary structures in their tree representations are computed from probability densities. They are appropriate measures for the complexity of the sequence-structure relation. The correlation length also provides a quantitative estimate for the mean sensitivity of structures to point mutations.
Collapse
Affiliation(s)
- W Fontana
- Theoretical Division, Los Alamos National Laboratory, New Mexico 87545
| | | | | | | |
Collapse
|
42
|
Bonhoeffer S, McCaskill JS, Stadler PF, Schuster P. RNA multi-structure landscapes. A study based on temperature dependent partition functions. EUROPEAN BIOPHYSICS JOURNAL : EBJ 1993; 22:13-24. [PMID: 7685689 DOI: 10.1007/bf00205808] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Statistical properties of RNA folding landscapes obtained by the partition function algorithm (McCaskill 1990) are investigated in detail. The pair correlation of free energies as a function of the Hamming distance is used as a measure for the ruggedness of the landscape. The calculation of the partition function contains information about the entire ensemble of secondary structures as a function of temperature and opens the door to all quantities of thermodynamic interest, in contrast with the conventional minimal free energy approach. A metric distance of structure ensembles is introduced and pair correlations at the level of the structures themselves are computed. Just as with landscapes based on most stable secondary structure prediction, the landscapes defined on the full biophysical GCAU alphabet are much smoother than the landscapes restricted to pure GC sequences and the correlation lengths are almost constant fractions of the chain lengths. Correlation functions for multi-structure landscape exhibit an increased correlation length, especially near the melting temperature. However, the main effect on evolution is rather an effective increase in sampling for finite populations where each sequence explores multiple structures.
Collapse
Affiliation(s)
- S Bonhoeffer
- Institut für Theoretische Chemie, Universität Wien, Austria
| | | | | | | |
Collapse
|
43
|
Fontana W, Stadler PF, Bornberg-Bauer EG, Griesmacher T, Hofacker IL, Tacker M, Tarazona P, Weinberger ED, Schuster P. RNA folding and combinatory landscapes. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1993; 47:2083-2099. [PMID: 9960229 DOI: 10.1103/physreve.47.2083] [Citation(s) in RCA: 174] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
44
|
|
45
|
Konings DA, Nash MA, Maizel JV, Arlinghaus RB. Novel GACG-hairpin pair motif in the 5' untranslated region of type C retroviruses related to murine leukemia virus. J Virol 1992; 66:632-40. [PMID: 1309906 PMCID: PMC240761 DOI: 10.1128/jvi.66.2.632-640.1992] [Citation(s) in RCA: 62] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
We searched for the presence of common RNA structural motifs in mammalian type C retroviruses related to murine leukemia viruses and the closely related avian spleen necrosis virus. A novel motif consisting of a pair of hairpins, called hairpin pair motif, was detected in the 5' untranslated regions of the genomes of these retroviruses. A combination of computational analyses that included the assessment of phylogenetic sequence conservation by multiple alignment, the search for regions with unusual RNA folding properties, and the analysis of RNA secondary structure by suboptimal free-energy calculations highlighted the significance of this hairpin pair motif. The hairpin pair motif encompasses 70 to 80 nucleotides between the splice donor site and the gag translational initiation codon of these viruses. The motif is composed of two adjacent hairpins both with a perfectly conserved GACG tetraloop. We propose that the novel GACG-hairpin pair motif described here constitutes an essential component of the regulatory machinery in these type C retroviruses.
Collapse
Affiliation(s)
- D A Konings
- Laboratory of Mathematical Biology, National Cancer Institute, Frederick, Maryland 21702
| | | | | | | |
Collapse
|
46
|
Benedetti G, Morosetti S. Recognition of the folding consensus in RNA secondary structures by the topological-filtering method. EUROPEAN JOURNAL OF BIOCHEMISTRY 1991; 202:241-8. [PMID: 1722147 DOI: 10.1111/j.1432-1033.1991.tb16368.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Functionally homologous RNA sequences can substantially diverge in their primary sequences but it can be reasonably assumed that they are related in their higher-degree structures. The problem to find such structures and simultaneously satisfy as far as possible the free-energy-minimization criterion, is considered here in two aspects. Firstly a quantitative measure of the folding consensus among secondary structures is defined, translating each structure into a linear representation and using the correlation theorem to compare them. Secondly an algorithm for the parallel search for secondary structures according to the free-energy-minimization criterion, but with a filtering action on the basis of the folding consensus measure is presented. The method is tested on groups of RNA sequences different in origin and in functions, for which proposals of homologous secondary structures based on experimental data exist. A comparison of the results with a blank consisting of a search on the basis of the free energy minimization alone is always performed. In these tests the method shows its ability in obtaining, from different sequences, secondary structures characterized by a high-folding consensus measure also when lower free energy but not homologous structures are possible. Two applications are also shown. The first demonstrates the transfer of experimental data available for one sequence, to a functionally related and therefore homologous one. The second application is the possibility of using a topological probe in the search for precise structural motifs.
Collapse
Affiliation(s)
- G Benedetti
- Dipartimento di Chimica, Università di Roma La Sapienza, Italy
| | | |
Collapse
|
47
|
Statistics of landscapes based on free energies, replication and degradation rate constants of RNA secondary structures. MONATSHEFTE FUR CHEMIE 1991. [DOI: 10.1007/bf00815919] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
48
|
Saltarelli M, Querat G, Konings DA, Vigne R, Clements JE. Nucleotide sequence and transcriptional analysis of molecular clones of CAEV which generate infectious virus. Virology 1990; 179:347-64. [PMID: 2171210 DOI: 10.1016/0042-6822(90)90303-9] [Citation(s) in RCA: 219] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The lentivirus caprine arthritis-encephalitis virus (CAEV) is closely related by nucleotide sequence homology to visna virus and other sheep lentiviruses and shows less similarity to the other animal and human lentiviruses. The genomic organization of CAEV is very similar to that of visna virus and the South African ovine maedi visna virus (SA-OMVV) as well as to those of other primate lentiviruses. The CAEV genome includes the small open reading frames (ORF) between pol and env which are the hallmarks of the lentivirus genomes. The most striking difference in the organization of CAEV is in the env gene. The Env polyproteins of visna virus and the related SA-OMVV contain 20 amino acids between the translational start and the signal peptide not present in CAEV. In addition to nucleotide sequence analysis, the transcriptional products of CAEV were determined by Northern analysis. The viral mRNA present in cells transfected with the infectious clone reveal a pattern characteristic of the mRNAs observed in other lentivirus infections. The putative tat ORF of CAEV could be identified by genomic location and amino acid homology to the visna virus tat gene. However, the CAEV rev gene could not be identified in a similar fashion. Thus, to determine the location of the rev ORF cDNA clones were obtained by PCR amplification of the mRNA from infected cells. To determine if a Rev response element was contained in the CAEV genome, secondary structural analysis of the viral RNA was performed. A stable stem loop structure which is similar in location, stability, and configuration to that determined for the Rev response element of HIV was found.
Collapse
MESH Headings
- Amino Acid Sequence
- Animals
- Arthritis-Encephalitis Virus, Caprine/genetics
- Arthritis-Encephalitis Virus, Caprine/pathogenicity
- Base Sequence
- Blotting, Northern
- Cells, Cultured
- Cloning, Molecular
- Gene Products, gag/genetics
- Gene Products, pol/genetics
- Gene Products, tat
- Genes, Regulator
- Genes, Viral
- Goats
- Models, Molecular
- Molecular Sequence Data
- Nucleic Acid Conformation
- Polymerase Chain Reaction
- RNA, Messenger/genetics
- RNA, Messenger/isolation & purification
- RNA, Viral/genetics
- RNA, Viral/isolation & purification
- Sequence Homology, Nucleic Acid
- Species Specificity
- Synovial Membrane/cytology
- Transcription, Genetic
- Transfection
- Viral Envelope Proteins/genetics
Collapse
Affiliation(s)
- M Saltarelli
- Division of Comparative Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205
| | | | | | | | | |
Collapse
|
49
|
Beaumont C, Porcher C, Picat C, Nordmann Y, Grandchamp B. The Mouse Porphobilinogen Deaminase Gene. J Biol Chem 1989. [DOI: 10.1016/s0021-9258(18)63775-5] [Citation(s) in RCA: 48] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
50
|
Konings DA, Hogeweg P. Pattern analysis of RNA secondary structure similarity and consensus of minimal-energy folding. J Mol Biol 1989; 207:597-614. [PMID: 2474658 DOI: 10.1016/0022-2836(89)90468-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
We describe an automated procedure to search for consensus structures or substructures in a set of homologous or related RNA molecules. The procedure is based on the calculation of optimal and sub-optimal secondary structures using thermodynamic rules for base-pairing by energy-minimization. A linear representation of the secondary structures of the related RNAs is used so that they can be compared and classified using standard alignment and clusterings programs. We illustrate the method by means of two sets of homologous small RNAs, U2 and U3, and a set of alpha-globin mRNAs and show that biologically interesting consensus structures are obtained.
Collapse
Affiliation(s)
- D A Konings
- European Molecular Biology Laboratory, Heidelberg, F.R.G
| | | |
Collapse
|