1
|
Steger G. Predicting the Structure of a Viroid : Structure, Structure Distribution, Consensus Structure, and Structure Drawing. Methods Mol Biol 2022; 2316:331-371. [PMID: 34845705 DOI: 10.1007/978-1-0716-1464-8_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Viroids are small non-coding RNAs that require a special sequence and structure to be replicated and transported by the host machinery. Many of these features can be predicted and later experimentally verified. Here, we will present workflows to predict viroid structures and draw the predicted structures in a pleasing and descriptive way using recently developed software.
Collapse
Affiliation(s)
- Gerhard Steger
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
2
|
Choudhary K, Deng F, Aviran S. Comparative and integrative analysis of RNA structural profiling data: current practices and emerging questions. QUANTITATIVE BIOLOGY 2017; 5:3-24. [PMID: 28717530 PMCID: PMC5510538 DOI: 10.1007/s40484-017-0093-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Revised: 12/08/2016] [Accepted: 12/15/2016] [Indexed: 12/30/2022]
Abstract
BACKGROUND Structure profiling experiments provide single-nucleotide information on RNA structure. Recent advances in chemistry combined with application of high-throughput sequencing have enabled structure profiling at transcriptome scale and in living cells, creating unprecedented opportunities for RNA biology. Propelled by these experimental advances, massive data with ever-increasing diversity and complexity have been generated, which give rise to new challenges in interpreting and analyzing these data. RESULTS We review current practices in analysis of structure profiling data with emphasis on comparative and integrative analysis as well as highlight emerging questions. Comparative analysis has revealed structural patterns across transcriptomes and has become an integral component of recent profiling studies. Additionally, profiling data can be integrated into traditional structure prediction algorithms to improve prediction accuracy. CONCLUSIONS To keep pace with experimental developments, methods to facilitate, enhance and refine such analyses are needed. Parallel advances in analysis methodology will complement profiling technologies and help them reach their full potential.
Collapse
Affiliation(s)
| | | | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, University of California at Davis, Davis, CA 95616, USA
| |
Collapse
|
3
|
Chatzou M, Magis C, Chang JM, Kemena C, Bussotti G, Erb I, Notredame C. Multiple sequence alignment modeling: methods and applications. Brief Bioinform 2015; 17:1009-1023. [PMID: 26615024 DOI: 10.1093/bib/bbv099] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 10/16/2015] [Indexed: 12/20/2022] Open
Abstract
This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.
Collapse
|
4
|
Abstract
It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.
Collapse
|
5
|
Andersen ES. The art of editing RNA structural alignments. Methods Mol Biol 2014; 1097:379-394. [PMID: 24639168 DOI: 10.1007/978-1-62703-709-9_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Manual editing of RNA structural alignments may be considered more art than science, since it still requires an expert biologist to take multiple levels of information into account and be slightly creative when constructing high-quality alignments. Even though the task is rather tedious, it is rewarded by great insight into the evolution of structure and function of your favorite RNA molecule. In this chapter I will review the methods and considerations that go into constructing RNA structural alignments at the secondary and tertiary structure level; introduce software, databases, and algorithms that have proven useful in semiautomating the work process; and suggest future directions towards full automatization.
Collapse
|
6
|
Bussotti G, Notredame C, Enright AJ. Detecting and comparing non-coding RNAs in the high-throughput era. Int J Mol Sci 2013; 14:15423-58. [PMID: 23887659 PMCID: PMC3759867 DOI: 10.3390/ijms140815423] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Revised: 07/16/2013] [Accepted: 07/17/2013] [Indexed: 02/07/2023] Open
Abstract
In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data.
Collapse
Affiliation(s)
- Giovanni Bussotti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; E-Mail:
| | - Cedric Notredame
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), Aiguader, 88, 08003 Barcelona, Spain; E-Mail:
| | - Anton J. Enright
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; E-Mail:
| |
Collapse
|
7
|
Bompfünewerer AF, Flamm C, Fried C, Fritzsch G, Hofacker IL, Lehmann J, Missal K, Mosig A, Müller B, Prohaska SJ, Stadler BMR, Stadler PF, Tanzer A, Washietl S, Witwer C. Evolutionary patterns of non-coding RNAs. Theory Biosci 2012; 123:301-69. [PMID: 18202870 DOI: 10.1016/j.thbio.2005.01.002] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2004] [Accepted: 01/24/2005] [Indexed: 01/04/2023]
Abstract
A plethora of new functions of non-coding RNAs (ncRNAs) have been discovered in past few years. In fact, RNA is emerging as the central player in cellular regulation, taking on active roles in multiple regulatory layers from transcription, RNA maturation, and RNA modification to translational regulation. Nevertheless, very little is known about the evolution of this "Modern RNA World" and its components. In this contribution, we attempt to provide at least a cursory overview of the diversity of ncRNAs and functional RNA motifs in non-translated regions of regular messenger RNAs (mRNAs) with an emphasis on evolutionary questions. This survey is complemented by an in-depth analysis of examples from different classes of RNAs focusing mostly on their evolution in the vertebrate lineage. We present a survey of Y RNA genes in vertebrates and study the molecular evolution of the U7 snRNA, the snoRNAs E1/U17, E2, and E3, the Y RNA family, the let-7 microRNA (miRNA) family, and the mRNA-like evf-1 gene. We furthermore discuss the statistical distribution of miRNAs in metazoans, which suggests an explosive increase in the miRNA repertoire in vertebrates. The analysis of the transcription of ncRNAs suggests that small RNAs in general are genetically mobile in the sense that their association with a hostgene (e.g. when transcribed from introns of a mRNA) can change on evolutionary time scales. The let-7 family demonstrates, that even the mode of transcription (as intron or as exon) can change among paralogous ncRNA.
Collapse
|
8
|
Chen J, Dishler AL, Kennedy SD, Yildirim I, Liu B, Turner DH, Serra MJ. Testing the nearest neighbor model for canonical RNA base pairs: revision of GU parameters. Biochemistry 2012; 51:3508-22. [PMID: 22490167 PMCID: PMC3335265 DOI: 10.1021/bi3002709] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2012] [Indexed: 11/30/2022]
Abstract
Thermodynamic parameters for GU pairs are important for predicting the secondary structures of RNA and for finding genomic sequences that code for structured RNA. Optical melting curves were measured for 29 RNA duplexes with GU pairs to improve nearest neighbor parameters for predicting stabilities of helixes. The updated model eliminates a prior penalty assumed for terminal GU pairs. Six additional duplexes with the 5'GG/3'UU motif were added to the single representation in the previous database. This revises the ΔG°(37) for the 5'GG/3'UU motif from an unfavorable 0.5 kcal/mol to a favorable -0.2 kcal/mol. Similarly, the ΔG°(37) for the 5'UG/3'GU motif changes from 0.3 to -0.6 kcal/mol. The correlation coefficients between predicted and experimental ΔG°(37), ΔH°, and ΔS° for the expanded database are 0.95, 0.89, and 0.87, respectively. The results should improve predictions of RNA secondary structure.
Collapse
Affiliation(s)
- Jonathan
L. Chen
- Department
of Chemistry, University of Rochester,
Rochester, New York 14627, United States
| | - Abigael L. Dishler
- Department of Chemistry, Allegheny College, Meadville, Pennsylvania 16335, United States
| | - Scott D. Kennedy
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, United States
| | - Ilyas Yildirim
- Department
of Chemistry, University of Rochester,
Rochester, New York 14627, United States
| | - Biao Liu
- Department
of Chemistry, University of Rochester,
Rochester, New York 14627, United States
| | - Douglas H. Turner
- Department
of Chemistry, University of Rochester,
Rochester, New York 14627, United States
- Center for RNA Biology, University of Rochester, Rochester, New York 14627, United States
| | - Martin J. Serra
- Department of Chemistry, Allegheny College, Meadville, Pennsylvania 16335, United States
| |
Collapse
|
9
|
Mathews DH, Moss WN, Turner DH. Folding and finding RNA secondary structure. Cold Spring Harb Perspect Biol 2010; 2:a003665. [PMID: 20685845 DOI: 10.1101/cshperspect.a003665] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Optimal exploitation of the expanding database of sequences requires rapid finding and folding of RNAs. Methods are reviewed that automate folding and discovery of RNAs with algorithms that couple thermodynamics with chemical mapping, NMR, and/or sequence comparison. New functional noncoding RNAs in genome sequences can be found by combining sequence comparison with the assumption that functional noncoding RNAs will have more favorable folding free energies than other RNAs. When a new RNA is discovered, experiments and sequence comparison can restrict folding space so that secondary structure can be rapidly determined with the help of predicted free energies. In turn, secondary structure restricts folding in three dimensions, which allows modeling of three-dimensional structure. An example from a domain of a retrotransposon is described. Discovery of new RNAs and their structures will provide insights into evolution, biology, and design of therapeutics. Applications to studies of evolution are also reviewed.
Collapse
Affiliation(s)
- David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA
| | | | | |
Collapse
|
10
|
Bernhart SH, Hofacker IL. From consensus structure prediction to RNA gene finding. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:461-71. [PMID: 19833701 DOI: 10.1093/bfgp/elp043] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Reliable structure prediction is a prerequisite for most types of bioinformatical analysis of RNA. Since the accuracy of structure prediction from single sequences is limited, one often resorts to computing the consensus structure for a set of related RNA sequences. Since functionally important RNA structures are expected to evolve much more slowly than the underlying sequences, the pattern of sequence (co-)variation can be exploited to dramatically improve structure prediction. Since a conserved common structure is only expected when the RNA structure is under selective pressure, consensus structure prediction also provides an ideal starting point for the de novo detection of structured non-coding RNAs. Here, we review different strategies for the prediction of consensus secondary structures, and show how these approaches can be used to predict non-coding RNA genes.
Collapse
Affiliation(s)
- Stephan H Bernhart
- Department of Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Wien, Austria.
| | | |
Collapse
|
11
|
Cardioviral RNA structure logo analysis: entropy, correlations, and prediction. J Biol Phys 2009; 36:145-59. [PMID: 19728123 DOI: 10.1007/s10867-009-9154-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2008] [Accepted: 04/14/2009] [Indexed: 10/20/2022] Open
Abstract
In recent years, there has been an increased number of sequenced RNAs leading to the development of new RNA databases. Thus, predicting RNA structure from multiple alignments is an important issue to understand its function. Since RNA secondary structures are often conserved in evolution, developing methods to identify covariate sites in an alignment can be essential for discovering structural elements. Structure Logo is a technique established on the basis of entropy and mutual information measured to analyze RNA sequences from an alignment. We proposed an efficient Structure Logo approach to analyze conservations and correlations in a set of Cardioviral RNA sequences. The entropy and mutual information content were measured to examine the conservations and correlations, respectively. The conserved secondary structure motifs were predicted on the basis of the conservation and correlation analyses. Our predictive motifs were similar to the ones observed in the viral RNA structure database, and the correlations between bases also corresponded to the secondary structure in the database.
Collapse
|
12
|
Advances in RNA structure prediction from sequence: new tools for generating hypotheses about viral RNA structure-function relationships. J Virol 2009; 83:6326-34. [PMID: 19369331 DOI: 10.1128/jvi.00251-09] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
13
|
Abraham M, Dror O, Nussinov R, Wolfson HJ. Analysis and classification of RNA tertiary structures. RNA (NEW YORK, N.Y.) 2008; 14:2274-89. [PMID: 18824509 PMCID: PMC2578864 DOI: 10.1261/rna.853208] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2007] [Accepted: 07/05/2008] [Indexed: 05/19/2023]
Abstract
There is a fast growing interest in noncoding RNA transcripts. These transcripts are not translated into proteins, but play essential roles in many cellular and pathological processes. Recent efforts toward comprehension of their function has led to a substantial increase in both the number and the size of solved RNA structures. With the aim of addressing questions relating to RNA structural diversity, we examined RNA conservation at three structural levels: primary, secondary, and tertiary structure. Additionally, we developed an automated method for classifying RNA structures based on spatial (three-dimensional [3D]) similarity. Applying the method to all solved RNA structures resulted in a classified database of RNA tertiary structures (DARTS). DARTS embodies 1333 solved RNA structures classified into 94 clusters. The classification is hierarchical, reflecting the structural relationship between and within clusters. We also developed an application for searching DARTS with a new structure. The search is fast and its performance was successfully tested on all solved RNA structures since the creation of DARTS. A user-friendly interface for both the database and the search application is available online. We show intracluster and intercluster similarities in DARTS and demonstrate the usefulness of the search application. The analysis reveals the current structural repertoire of RNA and exposes common global folds and local tertiary motifs. Further study of these conserved substructures may suggest possible RNA domains and building blocks. This should be beneficial for structure prediction and for gaining insights into structure-function relationships.
Collapse
Affiliation(s)
- Mira Abraham
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | |
Collapse
|
14
|
Marz M, Mosig A, Stadler BMR, Stadler PF. U7 snRNAs: a computational survey. GENOMICS PROTEOMICS & BIOINFORMATICS 2008; 5:187-95. [PMID: 18267300 PMCID: PMC5054213 DOI: 10.1016/s1672-0229(08)60006-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for functional U7 snRNA genes throughout vertebrates including the upstream sequence elements characteristic for snRNAs transcribed by polymerase II. Based on the results of this search, we discuss the high variability of U7 snRNAs in both sequence and structure, and report on an attempt to find U7 snRNA sequences in basal deuterostomes and non-drosophilids insect genomes based on a combination of sequence, structure, and promoter features. Due to the extremely short sequence and the high variability in both sequence and structure, no unambiguous candidates were found. These results cast doubt on putative U7 homologs in even more distant organisms that are reported in the most recent release of the Rfam database.
Collapse
Affiliation(s)
- Manja Marz
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig D-04107, Germany
| | | | | | | |
Collapse
|
15
|
Mathews D. Predicting the secondary structure common to two RNA sequences with Dynalign. ACTA ACUST UNITED AC 2008; Chapter 12:Unit 12.4. [PMID: 18428718 DOI: 10.1002/0471250953.bi1204s08] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Dynalign is a dynamic programming algorithm for the simultaneous prediction of the lowest-free-energy secondary structure common to two RNA sequences and the alignment of the two sequences. It has been shown that the average accuracy of secondary structure prediction is improved using Dynalign, as compared to free-energy minimization of a single sequence. This unit provides protocols for using Dynalign on a Microsoft Windows platform as part of the RNAstructure package, and for compiling and using Dynalign on Unix/Linux computers.
Collapse
Affiliation(s)
- David Mathews
- Center for Human Genetics and Molecular Pediatric Disease Aab Institute of Biomedical Sciences University of Rochester Medical Center, Rochester, New York, USA
| |
Collapse
|
16
|
Wilm A, Linnenbrink K, Steger G. ConStruct: Improved construction of RNA consensus structures. BMC Bioinformatics 2008; 9:219. [PMID: 18442401 PMCID: PMC2408607 DOI: 10.1186/1471-2105-9-219] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2007] [Accepted: 04/28/2008] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Aligning homologous non-coding RNAs (ncRNAs) correctly in terms of sequence and structure is an unresolved problem, due to both mathematical complexity and imperfect scoring functions. High quality alignments, however, are a prerequisite for most consensus structure prediction approaches, homology searches, and tools for phylogeny inference. Automatically created ncRNA alignments often need manual corrections, yet this manual refinement is tedious and error-prone. RESULTS We present an extended version of CONSTRUCT, a semi-automatic, graphical tool suitable for creating RNA alignments correct in terms of both consensus sequence and consensus structure. To this purpose CONSTRUCT combines sequence alignment, thermodynamic data and various measures of covariation. One important feature is that the user is guided during the alignment correction step by a consensus dotplot, which displays all thermodynamically optimal base pairs and the corresponding covariation. Once the initial alignment is corrected, optimal and suboptimal secondary structures as well as tertiary interaction can be predicted. We demonstrate CONSTRUCT's ability to guide the user in correcting an initial alignment, and show an example for optimal secondary consensus structure prediction on very hard to align SECIS elements. Moreover we use CONSTRUCT to predict tertiary interactions from sequences of the internal ribosome entry site of CrP-like viruses. In addition we show that alignments specifically designed for benchmarking can be easily be optimized using CONSTRUCT, although they share very little sequence identity. CONCLUSION CONSTRUCT's graphical interface allows for an easy alignment correction based on and guided by predicted and known structural constraints. It combines several algorithms for prediction of secondary consensus structure and even tertiary interactions. The CONSTRUCT package can be downloaded from the URL listed in the Availability and requirements section of this article.
Collapse
Affiliation(s)
- Andreas Wilm
- Heinrich-Heine-Universität Düsseldorf, Institut für Physikalische Biologie, Universitätsstr, 1, D-40225 Düsseldorf, Germany.
| | | | | |
Collapse
|
17
|
Andersen ES, Lind-Thomsen A, Knudsen B, Kristensen SE, Havgaard JH, Torarinsson E, Larsen N, Zwieb C, Sestoft P, Kjems J, Gorodkin J. Semiautomated improvement of RNA alignments. RNA (NEW YORK, N.Y.) 2007; 13:1850-9. [PMID: 17804647 PMCID: PMC2040093 DOI: 10.1261/rna.215407] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
We have developed a semiautomated RNA sequence editor (SARSE) that integrates tools for analyzing RNA alignments. The editor highlights different properties of the alignment by color, and its integrated analysis tools prevent the introduction of errors when doing alignment editing. SARSE readily connects to external tools to provide a flexible semiautomatic editing environment. A new method, Pcluster, is introduced for dividing the sequences of an RNA alignment into subgroups with secondary structure differences. Pcluster was used to evaluate 574 seed alignments obtained from the Rfam database and we identified 71 alignments with significant prediction of inconsistent base pairs and 102 alignments with significant prediction of novel base pairs. Four RNA families were used to illustrate how SARSE can be used to manually or automatically correct the inconsistent base pairs detected by Pcluster: the mir-399 RNA, vertebrate telomase RNA (vert-TR), bacterial transfer-messenger RNA (tmRNA), and the signal recognition particle (SRP) RNA. The general use of the method is illustrated by the ability to accommodate pseudoknots and handle even large and divergent RNA families. The open architecture of the SARSE editor makes it a flexible tool to improve all RNA alignments with relatively little human intervention. Online documentation and software are available at (http://sarse.ku.dk).
Collapse
Affiliation(s)
- Ebbe S Andersen
- Department of Molecular Biology, University of Aarhus, Arhus C, Denmark
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Horesh Y, Doniger T, Michaeli S, Unger R. RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules. BMC Bioinformatics 2007; 8:366. [PMID: 17908318 PMCID: PMC2147038 DOI: 10.1186/1471-2105-8-366] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2007] [Accepted: 10/01/2007] [Indexed: 12/27/2022] Open
Abstract
Background In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure. Results We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space. Conclusion The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.
Collapse
Affiliation(s)
- Yair Horesh
- Department of Computer Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel
| | - Tirza Doniger
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel
| | - Shulamit Michaeli
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel
| | - Ron Unger
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel
| |
Collapse
|
19
|
Machado-Lima A, del Portillo HA, Durham AM. Computational methods in noncoding RNA research. J Math Biol 2007; 56:15-49. [PMID: 17786447 DOI: 10.1007/s00285-007-0122-6] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2007] [Indexed: 11/26/2022]
Abstract
Non protein-coding RNAs (ncRNAs) are a research hotspot in bioinformatics. Recent discoveries have revealed new ncRNA families performing a variety of roles, from gene expression regulation to catalytic activities. It is also believed that other families are still to be unveiled. Computational methods developed for protein coding genes often fail when searching for ncRNAs. Noncoding RNAs functionality is often heavily dependent on their secondary structure, which makes gene discovery very different from protein coding RNA genes. This motivated the development of specific methods for ncRNA research. This article reviews the main approaches used to identify ncRNAs and predict secondary structure.
Collapse
Affiliation(s)
- Ariane Machado-Lima
- Institute of Mathematics and Statistics, University of Sao Paulo, Sao Paulo, SP, Brazil.
| | | | | |
Collapse
|
20
|
Matousek J, Orctová L, Ptácek J, Patzak J, Dedic P, Steger G, Riesner D. Experimental transmission of pospiviroid populations to weed species characteristic of potato and hop fields. J Virol 2007; 81:11891-9. [PMID: 17715233 PMCID: PMC2168794 DOI: 10.1128/jvi.01165-07] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Weed plants characteristic for potato and hop fields have not been considered in the past as potential hosts that could transmit and lead to spreading of potato spindle tuber (PSTVd) and hop stunt (HSVd) viroids, respectively. To gain insight into this problem, we biolistically inoculated these weed plants with viroid populations either as RNA or as cDNA. New potential viroid host species, collected in central Europe, were discovered. From 12 weed species characteristic for potato fields, high viroid levels, detectable by molecular hybridization, were maintained after both RNA and DNA transfers in Chamomilla reculita and Anthemis arvensis. Low viroid levels, detectable by reverse transcription-PCR (RT-PCR) only, were maintained after plant inoculations with cDNA in Veronica argensis and Amaranthus retroflexus. In these two species PSTVd concentrations were 10(5) and 10(3) times, respectively, lower than in tomato as estimated by real-time PCR. From 14 weeds characteristic for hop fields, high HSVd levels were detected in Galinsoga ciliata after both RNA and DNA transfers. HSVd was found, however, not to be transmissible by seeds of this weed species. Traces of HSVd were detectable by RT-PCR in HSVd-cDNA-inoculated Amaranthus retroflexus. Characteristic monomeric (+)-circular and linear viroid RNAs were present in extracts from weed species propagating viroids to high levels, indicating regular replication, processing, and circularization of viroid RNA in these weed species. Sequence analyses of PSTVd progenies propagated in C. reculita and A. arvensis showed a wide spectrum of variants related to various strains, from mild to lethal variants; the sequence variants isolated from A. retroflexus and V. argensis exhibited similarity or identity to the superlethal AS1 viroid variant. All HSVd clones from G. ciliata corresponded to a HSVdg variant, which is strongly pathogenic for European hops.
Collapse
Affiliation(s)
- J Matousek
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Universitätsstrasse 1, D-40225 Düsseldorf, Germany
| | | | | | | | | | | | | |
Collapse
|
21
|
Jambhekar A, Derisi JL. Cis-acting determinants of asymmetric, cytoplasmic RNA transport. RNA (NEW YORK, N.Y.) 2007; 13:625-42. [PMID: 17449729 PMCID: PMC1852811 DOI: 10.1261/rna.262607] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Asymmetric subcellular distribution of RNA is used by many organisms to establish cell polarity, differences in cell fate, or to sequester protein activity. Accurate localization of RNA requires specific sequence and/or structural elements in the localized RNA, as well as proteins that recognize these elements and link the RNA to the appropriate molecular motors. Recent advances in biochemistry, molecular biology, and cell imaging have enabled the identification of many RNA localization elements, or "zipcodes," from a variety of systems. This review focuses on the mechanisms by which various zipcodes direct RNA transport and on the known sequence/structural requirements for their recognition by transport complexes. Computational and experimental methods for predicting and identifying zipcodes are also discussed.
Collapse
Affiliation(s)
- Ashwini Jambhekar
- Department of Biochemistry and Biophysics, University of California, San Francisco, California 94158, USA.
| | | |
Collapse
|
22
|
Matousek J, Kozlová P, Orctová L, Schmitz A, Pesina K, Bannach O, Diermann N, Steger G, Riesner D. Accumulation of viroid-specific small RNAs and increase in nucleolytic activities linked to viroid-caused pathogenesis. Biol Chem 2007; 388:1-13. [PMID: 17214544 DOI: 10.1515/bc.2007.001] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Strong viroid-caused pathogenesis was achieved in tomato cv. Rutgers by biolistic transfer of severe or lethal potato spindle tuber viroid (PSTVd) strains, while other tomato genotypes (e.g., Moneymaker) were tolerant. With reciprocal hybrids between sensitive and tolerant genotypes, we show that plant depression dominates over tolerance. Biolistic transfer of the most pathogenic PSTVd strain AS1 to Nicotiana benthamiana, which is considered to be a symptomless PSTVd host, led to a strong pathogenesis reaction and stunting, suggesting the presence of specific viroid pathogenesis-promoting target(s) in this plant species. Total levels of small siRNA-like PSTVd-specific RNAs were enhanced in strongly symptomatic tomato and N. benthamiana plants after biolistic infection with AS1 in comparison to the mild QFA strain. This indicates association of elevated levels of viroid-specific small RNA with production of strong symptoms. In symptom-bearing tomato leaves in comparison to controls, an RNase of approximately 18 kDa was induced and the activity of a nuclease of 34 kDa was elevated by a factor of seven in the vascular system. Sequence analysis of the nuclease cDNA designated TBN1 showed high homology with plant apoptotic endonucleases. The vascular-specific pathogenesis action is supported by light microscopic observations demonstrating a certain lack of xylem tissue and an arrest of the establishment of new vascular bundles in collapsed plants.
Collapse
MESH Headings
- Amino Acid Sequence
- Base Sequence
- Biolistics/methods
- Blotting, Northern
- Cloning, Molecular
- DNA, Complementary/chemistry
- DNA, Complementary/genetics
- Endonucleases/genetics
- Endonucleases/metabolism
- Genotype
- Solanum lycopersicum/genetics
- Solanum lycopersicum/metabolism
- Solanum lycopersicum/virology
- Molecular Sequence Data
- Nucleic Acid Conformation
- Plant Diseases/genetics
- Plant Diseases/virology
- Plant Leaves/genetics
- Plant Leaves/metabolism
- Plant Leaves/virology
- Plant Proteins/genetics
- Plant Proteins/metabolism
- Plant Viruses/genetics
- Plant Viruses/pathogenicity
- RNA, Small Interfering/genetics
- RNA, Small Interfering/metabolism
- RNA, Viral/chemistry
- RNA, Viral/genetics
- RNA, Viral/metabolism
- Reverse Transcriptase Polymerase Chain Reaction
- Sequence Analysis, DNA
- Sequence Homology, Amino Acid
- Solanum tuberosum/genetics
- Solanum tuberosum/metabolism
- Solanum tuberosum/virology
- Viroids/genetics
- Viroids/pathogenicity
Collapse
Affiliation(s)
- Jaroslav Matousek
- Department of Molecular Genetics, Biological Centre of the Czech Academy of Sciences, Institute of Plant Molecular Biology, Branisovská 31, CZ-37005 Ceské Budĕjovice, Czech Republic
| | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Kiryu H, Kin T, Asai K. Robust prediction of consensus secondary structures using averaged base pairing probability matrices. Bioinformatics 2006; 23:434-41. [PMID: 17182698 DOI: 10.1093/bioinformatics/btl636] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. RESULTS We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. AVAILABILITY The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hisanori Kiryu
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-42 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| | | | | |
Collapse
|
24
|
Marszalkowski M, Teune JH, Steger G, Hartmann RK, Willkomm DK. Thermostable RNase P RNAs lacking P18 identified in the Aquificales. RNA (NEW YORK, N.Y.) 2006; 12:1915-21. [PMID: 17005927 PMCID: PMC1624910 DOI: 10.1261/rna.242806] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
The RNase P RNA (rnpB) and protein (rnpA) genes were identified in the two Aquificales Sulfurihydrogenibium azorense and Persephonella marina. In contrast, neither of the two genes has been found in the sequenced genome of their close relative, Aquifex aeolicus. As in most bacteria, the rnpA genes of S. azorense and P. marina are preceded by the rpmH gene coding for ribosomal protein L34. This genetic region, including several genes up- and downstream of rpmH, is uniquely conserved among all three Aquificales strains, except that rnpA is missing in A. aeolicus. The RNase P RNAs (P RNAs) of S. azorense and P. marina are active catalysts that can be activated by heterologous bacterial P proteins at low salt. Although the two P RNAs lack helix P18 and thus one of the three major interdomain tertiary contacts, they are more thermostable than Escherichia coli P RNA and require higher temperatures for proper folding. Related to their thermostability, both RNAs include a subset of structural idiosyncrasies in their S domains, which were recently demonstrated to determine the folding properties of the thermostable S domain of Thermus thermophilus P RNA. Unlike 16S rRNA phylogeny that has placed the Aquificales as the deepest lineage of the bacterial phylogenetic tree, RNase P RNA-based phylogeny groups S. azorense and P. marina with the green sulfur, cyanobacterial, and delta/epsilon proteobacterial branches.
Collapse
Affiliation(s)
- Michal Marszalkowski
- Philipps-Universität Marburg, Institut für Pharmazeutische Chemie, D-35037 Marburg, Germany
| | | | | | | | | |
Collapse
|
25
|
Wilm A, Mainz I, Steger G. An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol Biol 2006; 1:19. [PMID: 17062125 PMCID: PMC1635699 DOI: 10.1186/1748-7188-1-19] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2006] [Accepted: 10/24/2006] [Indexed: 11/10/2022] Open
Abstract
Background The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone – the similarity range where alignment quality drops drastically – starts at 60 % for RNAs in comparison to 20 % for proteins. In this study we enhance the previous benchmark. Results The RNA sequence sets in the benchmark database are taken from an increased number of RNA families to avoid unintended impact by using only a few families. The size of sets varies from 2 to 15 sequences to assess the influence of the number of sequences on program performance. Alignment quality is scored by two measures: one takes into account only nucleotide matches, the other measures structural conservation. The performance order of parameters – like nucleotide substitution matrices and gap-costs – as well as of programs is rated by rank tests. Conclusion Most sequence alignment programs perform equally well on RNA sequence sets with high sequence identity, that is with an average pairwise sequence identity (APSI) above 75 %. Parameters for gap-open and gap-extension have a large influence on alignment quality lower than APSI ≤ 75 %; optimal parameter combinations are shown for several programs. The use of different 4 × 4 substitution matrices improved program performance only in some cases. The performance of iterative programs drastically increases with increasing sequence numbers and/or decreasing sequence identity, which makes them clearly superior to programs using a purely non-iterative, progressive approach. The best sequence alignment programs produce alignments of high quality down to APSI > 55 %; at lower APSI the use of sequence+structure alignment programs is recommended.
Collapse
Affiliation(s)
- Andreas Wilm
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Indra Mainz
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Gerhard Steger
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| |
Collapse
|
26
|
Lindgreen S, Gardner PP, Krogh A. Measuring covariation in RNA alignments: physical realism improves information measures. ACTA ACUST UNITED AC 2006; 22:2988-95. [PMID: 17038338 DOI: 10.1093/bioinformatics/btl514] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The importance of non-coding RNAs is becoming increasingly evident, and often the function of these molecules depends on the structure. It is common to use alignments of related RNA sequences to deduce the consensus secondary structure by detecting patterns of co-evolution. A central part of such an analysis is to measure covariation between two positions in an alignment. Here, we rank various measures ranging from simple mutual information to more advanced covariation measures. RESULTS Mutual information is still used for secondary structure prediction, but the results of this study indicate which measures are useful. Incorporating more structural information by considering e.g. indels and stacking improves accuracy, suggesting that physically realistic measures yield improved predictions. This can be used to improve both current and future programs for secondary structure prediction. The best measure tested is the RNAalifold covariation measure modified to include stacking. AVAILABILITY Scripts, data and supplementary material can be found at http://www.binf.ku.dk/Stinus_covariation
Collapse
Affiliation(s)
- S Lindgreen
- Bioinformatics Centre, Institute of Molecular Biology, University of Copenhagen Universitetsparken 15, 2100 Copenhagen Ø, Denmark.
| | | | | |
Collapse
|
27
|
Abstract
The knowledge about classes of non-coding RNAs (ncRNAs) is growing very fast and it is mainly the structure which is the common characteristic property shared by members of the same class. For correct characterization of such classes it is therefore of great importance to analyse the structural features in great detail. In this manuscript I present RNAlishapes which combines various secondary structure analysis methods, such as suboptimal folding and shape abstraction, with a comparative approach known as RNA alignment folding. RNAlishapes makes use of an extended thermodynamic model and covariance scoring, which allows to reward covariation of paired bases. Applying the algorithm to a set of bacterial trp-operon leaders using shape abstraction it was able to identify the two alternating conformations of this attenuator. Besides providing in-depth analysis methods for aligned RNAs, the tool also shows a fairly well prediction accuracy. Therefore, RNAlishapes provides the community with a powerful tool for structural analysis of classes of RNAs and is also a reasonable method for consensus structure prediction based on sequence alignments. RNAlishapes is available for online use and download at .
Collapse
Affiliation(s)
- Björn Voss
- Experimental Bioinformatics, Institute of Biology II, Freiburg University, Schänzlestrasse 1, 79104 Freiburg, Germany.
| |
Collapse
|
28
|
Dowell RD, Eddy SR. Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 2006; 7:400. [PMID: 16952317 PMCID: PMC1579236 DOI: 10.1186/1471-2105-7-400] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2006] [Accepted: 09/04/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm. RESULTS We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment. CONCLUSION Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm - this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN - have comparable overall performance with different strengths and weaknesses.
Collapse
Affiliation(s)
- Robin D Dowell
- Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine, 4444 Forest Park Blvd. Box 8510, St. Louis, MO 63108, USA
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar Street, Cambridge, MA 02139, USA
| | - Sean R Eddy
- Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine, 4444 Forest Park Blvd. Box 8510, St. Louis, MO 63108, USA
| |
Collapse
|
29
|
Dalli D, Wilm A, Mainz I, Steger G. STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time. Bioinformatics 2006; 22:1593-9. [PMID: 16613908 DOI: 10.1093/bioinformatics/btl142] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Alignment of RNA has a wide range of applications, for example in phylogeny inference, consensus structure prediction and homology searches. Yet aligning structural or non-coding RNAs (ncRNAs) correctly is notoriously difficult as these RNA sequences may evolve by compensatory mutations, which maintain base pairing but destroy sequence homology. Ideally, alignment programs would take RNA structure into account. The Sankoff algorithm for the simultaneous solution of RNA structure prediction and RNA sequence alignment was proposed 20 years ago but suffers from its exponential complexity. A number of programs implement lightweight versions of the Sankoff algorithm by restricting its application to a limited type of structure and/or only pairwise alignment. Thus, despite recent advances, the proper alignment of multiple structural RNA sequences remains a problem. RESULTS Here we present StrAl, a heuristic method for alignment of ncRNA that reduces sequence-structure alignment to a two-dimensional problem similar to standard multiple sequence alignment. The scoring function takes into account sequence similarity as well as up- and downstream pairing probability. To test the robustness of the algorithm and the performance of the program, we scored alignments produced by StrAl against a large set of published reference alignments. The quality of alignments predicted by StrAl is far better than that obtained by standard sequence alignment programs, especially when sequence homologies drop below approximately 65%; nevertheless StrAl's runtime is comparable to that of ClustalW.
Collapse
Affiliation(s)
- Deniz Dalli
- Heinrich-Heine-Universität Düsseldorf, Institut für Physikalische Biologie D-40225 Düsseldorf, Germany
| | | | | | | |
Collapse
|
30
|
Abstract
Some experimental results for the thermodynamics of RNA folding cannot be explained by simple pairwise hydrogen-bonding models. Such effects include the stabilities of isoguanosine-isocytidine (iG-iC) base pairs and of various 2 x 2 nucleotide internal loops. Presumably, these results can be explained by base stacking effects, which can be partitioned into Coulombic and overlap effects. We review experimental measurements that provide benchmarks for testing the approximations and theories used for modeling nucleic acids. Quantitative agreement between experiment and theory will indicate understanding of the interactions determining RNA stability and structure.
Collapse
Affiliation(s)
| | - Douglas H. Turner
- To whom correspondence should be addressed. Phone: (585) 275-3207. Fax: (585) 506-0205.
| |
Collapse
|
31
|
Quandt D, Stech M. Molecular evolution of the trnL(UAA) intron in bryophytes. Mol Phylogenet Evol 2005; 36:429-43. [PMID: 16005648 DOI: 10.1016/j.ympev.2005.03.014] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2004] [Revised: 02/09/2005] [Accepted: 03/07/2005] [Indexed: 11/20/2022]
Abstract
Structure, variability, and molecular evolution of the trnL(UAA) intron in bryophytes (mosses and liverworts) is analyzed based on more than 1000 sequences representing all classes, including comparisons of lengths and GC-contents, sequence similarities, evolutionary rates and ti/tv ratios of the major lineages and selected genera. Secondary structure analyses of the more variable stem-loop regions facilitated recognition of sequence repeats and minute inversions that often occurred independently in non-related lineages, thus supporting alignment construction and homology assessment. The most length-variable stem-loop region P8 does not share a common evolutionary history across all major bryophyte lineages. Independent nucleotide additions such as internally repeated sequence segments resulted in non-homologous P8 sequences that cannot be folded into a common P8 secondary structure, neither for all bryophytes nor for liverworts or mosses. To address evolutionary patterns, separate analyses of P6/P8 and the remaining intron (core) have to be performed, as overall values of the complete intron are misleading. It is argued that a transition bias observed above the genus level in the core structure is caused by structural constraints, not by its higher GC-content in comparison to the more AT-rich P6 and P8. Compensating base pair changes detected in highly conserved elements are often characteristic of the major bryophyte lineages (classes). Sequence divergence and evolutionary rates are generally higher in liverworts than in mosses, resulting in ambiguous alignments of P6 and P8 even within classes. In mosses, trends towards length reduction of P8 and lower evolutionary rates of the intron are observed. Average intraspecific variation is less than 1%, corresponding to 2-3 mutations in the complete intron.
Collapse
Affiliation(s)
- Dietmar Quandt
- Nees Institut für Biodiversität der Pflanzen, Rheinische Friedrich-Wilhelms-Universität Bonn, Meckenheimer Allee 170, D-53115 Bonn, Germany.
| | | |
Collapse
|
32
|
Gesell T, von Haeseler A. In silico sequence evolution with site-specific interactions along phylogenetic trees. Bioinformatics 2005; 22:716-22. [PMID: 16332711 DOI: 10.1093/bioinformatics/bti812] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION A biological sequence usually has many sites whose evolution depends on other positions of the sequence, but this is not accounted for by commonly used models of sequence evolution. Here we introduce a Markov model of nucleotide sequence evolution in which the instantaneous substitution rate at a site depends on the states of other sites. Based on the concept of neighbourhood systems, our model represents a universal description of arbitrarily complex dependencies among sites. RESULTS We show how to define complex models for some illustrative examples and demonstrate that our method provides a versatile resource for simulations of sequence evolution with site-specific interactions along a tree. For example, we are able to simulate the evolution of RNA taking into account both secondary structure as well as pseudoknots and other tertiary interactions. To this end, we have developed a program Simulating Site-Specific Interactions (SISSI) that simulates evolution of a nucleotide sequence along a phylogenetic tree incorporating user defined site-specific interactions. Furthermore, our method allows to simulate more complex interactions among nucleotide and other character based sequences. AVAILABILITY We implemented our method in an ANSI C program SISSI which runs on UNIX/Linux, Windows and Mac OS systems, including Mac OS X. SISSI is available at http://www.bi.uni-duesseldorf.de/software/sissi/
Collapse
Affiliation(s)
- Tanja Gesell
- Heinrich-Heine University Duesseldorf, Universitaetsstrasse 1 40225 Duesseldorf, Germany
| | | |
Collapse
|
33
|
|
34
|
Phuphuakrat A, Auewarakul P. Functional variability of Rev response element in HIV-1 primary isolates. Virus Genes 2005; 30:23-9. [PMID: 15744559 DOI: 10.1007/s11262-004-4578-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 06/21/2004] [Accepted: 07/12/2004] [Indexed: 11/27/2022]
Abstract
We have previously studied sequence heterogeneity of HIV-1 Rev response element (RRE), and showed uneven variations in different stem-loops of both primary sequence and secondary structure. Here we studied the functional variation of RRE clones from a set of 10 primary isolates, and demonstrated a variation in the function of these RRE clones on the expression of Gag proteins from a truncated HIV-1 genome. The difference in Gag level was, in part, if not exclusively, resulted from the differential efficiency of RNA transport and enhancing of translation. These data suggested that variation of HIV-1 RRE may play a role in regulation of viral replication rate in HIV-1 primary isolates.
Collapse
Affiliation(s)
- Angsana Phuphuakrat
- Department of Microbiology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | | |
Collapse
|
35
|
Axmann IM, Kensche P, Vogel J, Kohl S, Herzel H, Hess WR. Identification of cyanobacterial non-coding RNAs by comparative genome analysis. Genome Biol 2005; 6:R73. [PMID: 16168080 PMCID: PMC1242208 DOI: 10.1186/gb-2005-6-9-r73] [Citation(s) in RCA: 117] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2005] [Revised: 06/01/2005] [Accepted: 07/20/2005] [Indexed: 01/09/2023] Open
Abstract
The first genome-wide and systematic screen for non-coding RNAs (ncRNAs) in cyanobacteria. Several ncRNAs were computationally predicted and their presence was biochemically verified. These ncRNAs may have regulatory functions, and each shows a distinct phylogenetic distribution. Background Whole genome sequencing of marine cyanobacteria has revealed an unprecedented degree of genomic variation and streamlining. With a size of 1.66 megabase-pairs, Prochlorococcus sp. MED4 has the most compact of these genomes and it is enigmatic how the few identified regulatory proteins efficiently sustain the lifestyle of an ecologically successful marine microorganism. Small non-coding RNAs (ncRNAs) control a plethora of processes in eukaryotes as well as in bacteria; however, systematic searches for ncRNAs are still lacking for most eubacterial phyla outside the enterobacteria. Results Based on a computational prediction we show the presence of several ncRNAs (cyanobacterial functional RNA or Yfr) in several different cyanobacteria of the Prochlorococcus-Synechococcus lineage. Some ncRNA genes are present only in two or three of the four strains investigated, whereas the RNAs Yfr2 through Yfr5 are structurally highly related and are encoded by a rapidly evolving gene family as their genes exist in different copy numbers and at different sites in the four investigated genomes. One ncRNA, Yfr7, is present in at least seven other cyanobacteria. In addition, control elements for several ribosomal operons were predicted as well as riboswitches for thiamine pyrophosphate and cobalamin. Conclusion This is the first genome-wide and systematic screen for ncRNAs in cyanobacteria. Several ncRNAs were both computationally predicted and their presence was biochemically verified. These RNAs may have regulatory functions and each shows a distinct phylogenetic distribution. Our approach can be applied to any group of microorganisms for which more than one total genome sequence is available for comparative analysis.
Collapse
Affiliation(s)
- Ilka M Axmann
- Humboldt-University, Department of Biology/Genetics, Chausseestrasse, D-Berlin, Germany
| | - Philip Kensche
- Humboldt-University, Department of Biology/Genetics, Chausseestrasse, D-Berlin, Germany
- Humboldt-University, Institute for Theoretical Biology, Invalidenstrasse, Berlin, Germany
| | - Jörg Vogel
- Max Planck Institute for Infection Biology, Schumannstrasse, Berlin, Germany
| | - Stefan Kohl
- Humboldt-University, Department of Biology/Genetics, Chausseestrasse, D-Berlin, Germany
| | - Hanspeter Herzel
- Humboldt-University, Institute for Theoretical Biology, Invalidenstrasse, Berlin, Germany
| | - Wolfgang R Hess
- Humboldt-University, Department of Biology/Genetics, Chausseestrasse, D-Berlin, Germany
- University Freiburg, Institute of Biology II/Experimental Bioinformatics, Schänzlestrasse, Freiburg, Germany
| |
Collapse
|
36
|
Gillespie JJ, Yoder MJ, Wharton RA. Predicted Secondary Structure for 28S and 18S rRNA from Ichneumonoidea (Insecta: Hymenoptera: Apocrita): Impact on Sequence Alignment and Phylogeny Estimation. J Mol Evol 2005; 61:114-37. [PMID: 16059751 DOI: 10.1007/s00239-004-0246-x] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2004] [Accepted: 03/08/2005] [Indexed: 11/27/2022]
Abstract
We utilize the secondary structural properties of the 28S rRNA D2-D10 expansion segments to hypothesize a multiple sequence alignment for major lineages of the hymenopteran superfamily Ichneumonoidea (Braconidae, Ichneumonidae). The alignment consists of 290 sequences (originally analyzed in Belshaw and Quicke, Syst Biol 51:450-477, 2002) and provides the first global alignment template for this diverse group of insects. Predicted structures for these expansion segments as well as for over half of the 18S rRNA are given, with highly variable regions characterized and isolated within conserved structures. We demonstrate several pitfalls of optimization alignment and illustrate how these are potentially addressed with structure-based alignments. Our global alignment is presented online at (http://hymenoptera.tamu.edu/rna) with summary statistics, such as basepair frequency tables, along with novel tools for parsing structure-based alignments into input files for most commonly used phylogenetic software. These resources will be valuable for hymenopteran systematists, as well as researchers utilizing rRNA sequences for phylogeny estimation in any taxon. We explore the phylogenetic utility of our structure-based alignment by examining a subset of the data under a variety of optimality criteria using results from Belshaw and Quicke (2002) as a benchmark.
Collapse
Affiliation(s)
- Joseph J Gillespie
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA.
| | | | | |
Collapse
|
37
|
Siebert S, Backofen R. MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics 2005; 21:3352-9. [PMID: 15972285 DOI: 10.1093/bioinformatics/bti550] [Citation(s) in RCA: 121] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Due to the importance of considering secondary structures in aligning functional RNAs, several pairwise sequence-structure alignment methods have been developed. They use extended alignment scores that evaluate secondary structure information in addition to sequence information. However, two problems for the multiple alignment step remain. First, how to combine pairwise sequence-structure alignments into a multiple alignment and second, how to generate secondary structure information for sequences whose explicit structural information is missing. RESULTS We describe a novel approach for multiple alignment of RNAs (MARNA) taking into consideration both the primary and the secondary structures. It is based on pairwise sequence-structure comparisons of RNAs. From these sequence-structure alignments, libraries of weighted alignment edges are generated. The weights reflect the sequential and structural conservation. For sequences whose secondary structures are missing, the libraries are generated by sampling low energy conformations. The libraries are then processed by the T-Coffee system, which is a consistency based multiple alignment method. Furthermore, we are able to extract a consensus-sequence and -structure from a multiple alignment. We have successfully tested MARNA on several datasets taken from the Rfam database.
Collapse
Affiliation(s)
- Sven Siebert
- Department of Bioinformatics, Institute of Computer Science, Friedrich-Schiller-University Jena, Ernst-Abbe Platz 2, 07743 Jena, Germany
| | | |
Collapse
|
38
|
Owens RA, Thompson SM. Mutational analysis does not support the existence of a putative tertiary structural element in the left terminal domain of Potato spindle tuber viroid. J Gen Virol 2005; 86:1835-1839. [PMID: 15914863 DOI: 10.1099/vir.0.80869-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Comparative sequence analysis suggests that the left terminal domain of Potato spindle tuber viroid (PSTVd) and other large pospiviroids may assume a branched tertiary structure containing two pseudoknots. To search for evidence of such a structure in vivo, the nucleotide sequences proposed to interact were mutagenized, tomato seedlings were inoculated with mixtures of potentially infectious PSTVd RNA transcripts and the resulting progeny were screened for compensatory sequence changes. Positions 6–11 and 330–335 tolerated only limited sequence variation, and compensatory changes consistent with formation of an intact pseudoknot were observed in only two of the plants examined. No variation was detected at positions 14–16 or 29–31. Passage of selected variants in Rutgers tomato led to an increase in virulence only upon reversion to wild-type PSTVd_Intermediate. The ability of the left terminal domain to assume a branched conformation containing pseudoknots does not appear to be an important determinant of PSTVd fitness.
Collapse
Affiliation(s)
- Robert A Owens
- Molecular Plant Pathology Laboratory, USDA/ARS, Room 118 Building 004, 10300 Baltimore Avenue, Beltsville, MD 20705, USA
| | - Susan M Thompson
- Molecular Plant Pathology Laboratory, USDA/ARS, Room 118 Building 004, 10300 Baltimore Avenue, Beltsville, MD 20705, USA
| |
Collapse
|
39
|
Pang PS, Jankowsky E, Wadley LM, Pyle AM. Prediction of functional tertiary interactions and intermolecular interfaces from primary sequence data. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2005; 304:50-63. [PMID: 15595717 DOI: 10.1002/jez.b.21024] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Given the availability of sequence information for many species, one can examine how the sequence of a gene varies among different organisms. This is accomplished by aligning the sequences and observing patterns of conservation, mutation and counter-mutation at different positions in the gene. Imbedded in these patterns is information on energetic coupling and macromolecular interactions, which can be deciphered by application of statistical algorithms. Here we report a robust approach for predicting interactions within (or between) any type of biopolymer, including proteins, RNAs and RNA-protein complexes. Rather than maximize the number of predictions, this approach is designed to detect a limited number of highly significant interactions, thereby providing accurate results from alignments that contain a modest number of sequences (20-60). The versatility and accuracy of the algorithm is demonstrated by the successful prediction of important intramolecular interactions within RNAs, modified RNAs, and proteins, as well as the prediction of RNA-protein and protein-protein interactions.
Collapse
Affiliation(s)
- Phillip S Pang
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10027, USA
| | | | | | | |
Collapse
|
40
|
Mathews DH. Predicting a set of minimal free energy RNA secondary structures common to two sequences. Bioinformatics 2005; 21:2246-53. [PMID: 15731207 DOI: 10.1093/bioinformatics/bti349] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Function derives from structure, therefore, there is need for methods to predict functional RNA structures. RESULTS The Dynalign algorithm, which predicts the lowest free energy secondary structure common to two unaligned RNA sequences, is extended to the prediction of a set of low-energy structures. Dot plots can be drawn to show all base pairs in structures within an energy increment. Dynalign predicts more well-defined structures than structure prediction using a single sequence; in 5S rRNA sequences, the average number of base pairs in structures with energy within 20% of the lowest energy structure is 317 using Dynalign, but 569 using a single sequence. Structure prediction with Dynalign can also be constrained according to experiment or comparative analysis. The accuracy, measured as sensitivity and positive predictive value, of Dynalign is greater than predictions with a single sequence. AVAILABILITY Dynalign can be downloaded at http://rna.urmc.rochester.edu
Collapse
Affiliation(s)
- David H Mathews
- Center for Human Genetics and Molecular Pediatric Disease, University of Rochester Medical Center, 601 Elmwood Avenue, Box 703, Rochester, NY 14642, USA.
| |
Collapse
|
41
|
Abstract
BACKGROUND With the ever-increasing number of sequenced RNAs and the establishment of new RNA databases, such as the Comparative RNA Web Site and Rfam, there is a growing need for accurately and automatically predicting RNA structures from multiple alignments. Since RNA secondary structure is often conserved in evolution, the well known, but underused, mutual information measure for identifying covarying sites in an alignment can be useful for identifying structural elements. This article presents MIfold, a MATLAB toolbox that employs mutual information, or a related covariation measure, to display and predict conserved RNA secondary structure (including pseudoknots) from an alignment. RESULTS We show that MIfold can be used to predict simple pseudoknots, and that the performance can be adjusted to make it either more sensitive or more selective. We also demonstrate that the overall performance of MIfold improves with the number of aligned sequences for certain types of RNA sequences. In addition, we show that, for these sequences, MIfold is more sensitive but less selective than the related RNAalifold structure prediction program and is comparable with the COVE structure prediction package. CONCLUSION MIfold provides a useful supplementary tool to programs such as RNA Structure Logo, RNAalifold and COVE, and should be useful for automatically generating structural predictions for databases such as Rfam.
Collapse
Affiliation(s)
- Eva Freyhult
- The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden.
| | | | | |
Collapse
|
42
|
Quandt D, Stech M. Molecular evolution of the trnTUGU-trnFGAA region in Bryophytes. PLANT BIOLOGY (STUTTGART, GERMANY) 2004; 6:545-54. [PMID: 15375725 DOI: 10.1055/s-2004-821144] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Structure, variability, and molecular evolution of the trnT-F region in the Bryophyta (mosses and liverworts) is analyzed based on about 200 sequences of the trnT-L spacer and trnL 5' exon, 1000 sequences of the trnL intron, and 800 sequences of the trnL 3' exon and trnL-F spacer, including comparisons of lengths, GC contents, sequence similarities, and functional elements. Mutations occurring in the trnL 5' and 3' exons, including compensatory base pair changes, and a transition in the trnL anticodon in Takakia lepidozioides, are discussed. All three non-coding regions display a mosaic structure of highly variable elements (V1 - V3 in the trnT-L spacer, V4/V5 corresponding to stem-loop regions P6/P8 in the trnL intron, and V6/V7 in the trnL-F spacer) and more conserved elements. In the trnL intron this structure is a consequence of the defined secondary structure necessary for correct splicing, whereas in both spacers conserved regions are restricted to promoter elements. At least the highly variable regions in the trnT-L spacer and stem-loop region P8 of the trnL intron seem to evolve independently in the major bryophyte lineages and are therefore not suitable for high taxonomic level phylogenetic reconstructions. In mosses, a trend of length reduction towards the more derived lineages is observed in all three non-coding regions. GC contents are mostly linked to sequence variability, with the conserved regions being more GC rich and the more variable AT rich. The lowest GC values (< 10 %) are found in the trnT-L spacer of mosses. In addition to two putative sigma (70)-type promoters in the trnT-L spacer, a third putative promoter is present in the trnL-F spacer, although trnL and trnF are assumed to be co-transcribed. Consensus sequences are provided for the -35 and -10 sequences of the major bryophyte lineages. The third promoter is part of a hairpin secondary structure, whose loop region is highly homoplastic in mosses due to an inversion occurring independently in non-related taxa, even at the intraspecific level.
Collapse
MESH Headings
- Base Sequence
- Bryophyta/genetics
- Conserved Sequence
- DNA, Chloroplast/chemistry
- DNA, Chloroplast/genetics
- DNA, Intergenic/chemistry
- DNA, Intergenic/genetics
- Evolution, Molecular
- Exons
- Genes, Plant
- Introns
- Molecular Sequence Data
- Nucleic Acid Conformation
- RNA, Chloroplast/chemistry
- RNA, Chloroplast/genetics
- RNA, Transfer, Amino Acid-Specific/chemistry
- RNA, Transfer, Amino Acid-Specific/genetics
- RNA, Transfer, Leu/chemistry
- RNA, Transfer, Leu/genetics
Collapse
Affiliation(s)
- D Quandt
- Nees-Institut für Biodiversität der Pflanzen, Rheinische Friedrich-Wilhelms-Universität Bonn, Meckenheimer Allee 170, 53115 Bonn, Germany.
| | | |
Collapse
|
43
|
Knight R, Birmingham A, Yarus M. BayesFold: rational 2 degrees folds that combine thermodynamic, covariation, and chemical data for aligned RNA sequences. RNA (NEW YORK, N.Y.) 2004; 10:1323-36. [PMID: 15317972 PMCID: PMC1370620 DOI: 10.1261/rna.5168504] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
BayesFold is a Web application that folds an alignment of closely related sequences and evaluates hypotheses about their shared structure. It uses Bayes's Theorem to combine information from several sources, including chemical mapping (if available), thermodynamic folding, and observed sequence variations. Its method provides a rational basis for integrating results, even when these methods conflict. On a gapped alignment of 86 tRNAPhe sequences each 77 bases long, BayesFold takes 31 sec to perform the calculations; the best structure contained 95% of the base pairs in the true structure, and the true structure was ranked second. Notably, similar results come from random samples of only 10 sequences from the alignment (running time 3 sec), suggesting that remarkably few sequences are required for good results. In contrast, folding single sequences with BayesFold produced structures 9.6 bp different, or with the Vienna package, 13.4 bp different, from the true structure. Similar results were obtained for other families of tRNAs. We especially recommend BayesFold for alignments of 3-50 closely related sequences, such as the sequence families frequently found in SELEX. In addition to providing a convenient way to explore the effects of each of the criteria on the plausibility of different structures, BayesFold also makes it easy to produce publication-quality secondary-structure graphics. The Web interface, available at http://bayes.colorado.edu/fold/, includes the flexibility to thread any of the sequences (or the consensus sequence) through any of the structures, including the one judged most probable.
Collapse
Affiliation(s)
- Rob Knight
- Department of Molecular, Cellular, and Developmental Biology, Campus Box 347, University of Colorado, Boulder 80309, USA
| | | | | |
Collapse
|
44
|
Ruan J, Stormo GD, Zhang W. ILM: a web server for predicting RNA secondary structures with pseudoknots. Nucleic Acids Res 2004; 32:W146-9. [PMID: 15215368 PMCID: PMC441582 DOI: 10.1093/nar/gkh444] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
The ILM web server provides a web interface to two algorithms, iterated loop matching and maximum weighted matching, for efficiently predicting RNA secondary structures with pseudoknots. The algorithms can utilize either thermodynamic or comparative information or both, and thus can work on both aligned and individual sequences. Predicted secondary structures are presented in several formats compatible with a variety of existing visualization tools. The service can be accessed at http://cic.cs.wustl.edu/RNA/.
Collapse
Affiliation(s)
- Jianhua Ruan
- Department of Computer Science and Engineering, Washington University in St Louis, St Louis, MO 63130, USA
| | | | | |
Collapse
|
45
|
Thurner C, Witwer C, Hofacker IL, Stadler PF. Conserved RNA secondary structures in Flaviviridae genomes. J Gen Virol 2004; 85:1113-1124. [PMID: 15105528 DOI: 10.1099/vir.0.19462-0] [Citation(s) in RCA: 147] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Presented here is a comprehensive computational survey of evolutionarily conserved secondary structure motifs in the genomic RNAs of the family Flaviviridae: This virus family consists of the three genera Flavivirus, Pestivirus and Hepacivirus and the group of GB virus C/hepatitis G virus with a currently uncertain taxonomic classification. Based on the control of replication and translation, two subgroups were considered separately: the genus Flavivirus, with its type I cap structure at the 5' untranslated region (UTR) and a highly structured 3' UTR, and the remaining three groups, which exhibit translation control by means of an internal ribosomal entry site (IRES) in the 5' UTR and a much shorter less-structured 3' UTR. The main findings of this survey are strong hints for the possibility of genome cyclization in hepatitis C virus and GB virus C/hepatitis G virus in addition to the flaviviruses; a surprisingly large number of conserved RNA motifs in the coding regions; and a lower level of detailed structural conservation in the IRES and 3' UTR motifs than reported in the literature. An electronic atlas organizes the information on the more than 150 conserved, and therefore putatively functional, RNA secondary structure elements.
Collapse
Affiliation(s)
- Caroline Thurner
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
| | - Christina Witwer
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
| | - Ivo L Hofacker
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
| | - Peter F Stadler
- The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
- Bioinformatik, Institut für Informatik, Universität Leipzig, Kreuzstraße 7b, D-04103 Leipzig, Germany
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
| |
Collapse
|
46
|
Dowell RD, Eddy SR. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics 2004; 5:71. [PMID: 15180907 PMCID: PMC442121 DOI: 10.1186/1471-2105-5-71] [Citation(s) in RCA: 182] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2004] [Accepted: 06/04/2004] [Indexed: 11/10/2022] Open
Abstract
Background RNA secondary structure prediction methods based on probabilistic modeling can be developed using stochastic context-free grammars (SCFGs). Such methods can readily combine different sources of information that can be expressed probabilistically, such as an evolutionary model of comparative RNA sequence analysis and a biophysical model of structure plausibility. However, the number of free parameters in an integrated model for consensus RNA structure prediction can become untenable if the underlying SCFG design is too complex. Thus a key question is, what small, simple SCFG designs perform best for RNA secondary structure prediction? Results Nine different small SCFGs were implemented to explore the tradeoffs between model complexity and prediction accuracy. Each model was tested for single sequence structure prediction accuracy on a benchmark set of RNA secondary structures. Conclusions Four SCFG designs had prediction accuracies near the performance of current energy minimization programs. One of these designs, introduced by Knudsen and Hein in their PFOLD algorithm, has only 21 free parameters and is significantly simpler than the others.
Collapse
Affiliation(s)
- Robin D Dowell
- Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine, 4444 Forest Park Blvd. Box 8510, St. Louis, MO 63108 USA
| | - Sean R Eddy
- Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine, 4444 Forest Park Blvd. Box 8510, St. Louis, MO 63108 USA
| |
Collapse
|
47
|
Matousek J, Orctová L, Steger G, Skopek J, Moors M, Dedic P, Riesner D. Analysis of thermal stress-mediated PSTVd variation and biolistic inoculation of progeny of viroid "thermomutants" to tomato and Brassica species. Virology 2004; 323:9-23. [PMID: 15165815 DOI: 10.1016/j.virol.2004.02.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2003] [Revised: 01/06/2004] [Accepted: 02/10/2004] [Indexed: 10/26/2022]
Abstract
Thermal stress of PSTVd-infected Nicotiana benthamiana led to appearance of a broad PSTVd sequence distribution, where most of mutations accumulated in the left half of the viroid's secondary structure including the "pathogenicity" domain. A similar effect had been reported for hop latent viroid [Virology 287 (2001) 349]. The pool of viroid "thermomutants" progenies was transcribed into cDNA and used for biolistic inoculation of Raphanus sativa, where the PSTVd infection was detectable by reverse transcription and polymerase chain reaction (RT-PCR). Newly generated inoculum from R. sativa was used for biolistic transfer to Arabidopsis thaliana wild-type and silencing-deficient mutants bearing one of sde1, sde2, and sde3 locuses. Irrespective to A. thaliana silencing mutants, viroid levels in Brasicaceae species infected with mutated PSTVd variants were of approximately 300 times lower than it is expected for tomato. At the same time, no systemic infection of A. thaliana was achieved with the wild-type PSTVd. In Arabidopsis, a population of PSTVd, consisting of frequent and minor variants, was present and the sequence distribution differed from that of the original viroid "thermomutants"; that is, mutations were not predominantly restricted to the left half of viroid's secondary structure. At least 65% of viroid sequences from Arabidopsis library accumulated mutations in the upper conserved central region (UCCR). In addition, mutants having changes in "hairpin II" domain (C-->A transition at position 229) and in the conserved internal loop element in the left part of viroid structure (single insertion of G at position 39) were detected. All those mutants were inoculated biolistically to tomato and promoted infection especially after prolonged period of plant cultivation (50-80 days pi) when infection reached 70-90%. However, the sequence variants were unstable and reverted to the wild type and to other sequence variants stable in tomato. Our results demonstrate that heat stress-mediated production of viroid quasi-species could be of significance for viroid adaptations.
Collapse
Affiliation(s)
- Jaroslav Matousek
- Department of Molecular Genetics, Institute of Plant Molecular Biology, Czech Academy of Sciences, Branisovská 31, 37005 Ceské Budejovice, Czech Republic
| | | | | | | | | | | | | |
Collapse
|
48
|
Witwer C, Hofacker IL, Stadler PF. Prediction of consensus RNA secondary structures including pseudoknots. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2004; 1:66-77. [PMID: 17048382 DOI: 10.1109/tcbb.2004.22] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Most functional RNA molecules have characteristic structures that are highly conserved in evolution. Many of them contain pseudoknots. Here, we present a method for computing the consensus structures including pseudoknots based on alignments of a few sequences. The algorithm combines thermodynamic and covariation information to assign scores to all possible base pairs, the base pairs are chosen with the help of the maximum weighted matching algorithm. We applied our algorithm to a number of different types of RNA known to contain pseudoknots. All pseudoknots were predicted correctly and more than 85 percent of the base pairs were identified.
Collapse
Affiliation(s)
- Christina Witwer
- Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Wahringerstrasse 17, A-1090 Wien, Austria.
| | | | | |
Collapse
|
49
|
Dingley AJ, Steger G, Esters B, Riesner D, Grzesiek S. Structural characterization of the 69 nucleotide potato spindle tuber viroid left-terminal domain by NMR and thermodynamic analysis. J Mol Biol 2004; 334:751-67. [PMID: 14636600 DOI: 10.1016/j.jmb.2003.10.015] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The 69 nucleotide left-terminal domain (T(L)) of the potato spindle tuber RNA viroid (PSTVd) constitutes one of its five structural elements. Due to a twofold complementary sequence repeat, two possible conformations are proposed for the T(L) secondary structure; an elongated-rod and a bifurcated form. In the present study, two T(L) mutants were designed that remove the symmetry of the sequence repeats and ensure that either the bifurcated or the elongated-rod conformation is thermodynamically favored. Imino 1H and 15N resonances were assigned for both mutants and the native T(L) domain based on 1H-1H NOESY and heteronuclear 1H-15N HSQC high-resolution NMR spectra. The NMR secondary structure analysis of all constructs establishes unambiguously the elongated-rod form as the secondary structure of the native T(L) domain. Temperature-gradient gel electrophoresis and UV melting experiments corroborate these results. A combined secondary structure and sequence analysis of T(L) domains of other Pospiviroidae family members indicates that the elongated-rod form is thermodynamically favored for the vast majority of these viroids.
Collapse
Affiliation(s)
- Andrew J Dingley
- Institut für Physikalische Biologie, Heinrich-Heine-Universität, D-40225 Düsseldorf, Germany.
| | | | | | | | | |
Collapse
|
50
|
Stech M, Quandt D, Frey W. Molecular circumscription of the hornworts (Anthocerotophyta) based on the chloroplast DNA trnL-trnF region. JOURNAL OF PLANT RESEARCH 2003; 116:389-398. [PMID: 12955570 DOI: 10.1007/s10265-003-0118-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2003] [Accepted: 07/16/2003] [Indexed: 05/24/2023]
Abstract
In phylogenetic trees generated from partial trnL(UAA) intron sequences, the hornworts (represented by nine species from the genera Anthoceros, Dendroceros, Megaceros, Notothylas and Phaeoceros) are resolved as a monophyletic group and are separated from the clades of mosses, liverworts and tracheophytes. A secondary structure of the trnL(UAA) intron of Anthoceros agrestis is presented, displaying the arrangement of the stem-loop regions P1-P9. Compensatory base-pair changes (coevolutionary sites) are detected in regions P4/5 and P9 within the hornwort sequences. The original homology of the most variable region, P8, cannot be detected anymore due to the extremely fast divergent evolution of this segment in the major land plant groups. Similarly, a high sequence divergence occurs in the trnL-trnF intergenic spacer. Apart from synapomorphic substitutions in the trnL(UAA) intron, the hornworts are characterised by a large P6 region consisting of many repetitive elements. The molecular data therefore support the hornworts as representing an independent land plant lineage (Anthocerotophyta). Although relationships between hornworts and the other land plant groups remain unresolved in the trnL(UAA) intron trees, it is rather unlikely that bryophytes are monophyletic in their traditional circumscription, i.e. comprising hornworts, mosses and liverworts.
Collapse
Affiliation(s)
- Michael Stech
- Institut für Biologie-Systematische Botanik und Pflanzengeographie, Freie Universität Berlin, Altensteinstrasse 6, 14195 Berlin, Germany.
| | | | | |
Collapse
|