1
|
Kostenko DO, Korotkov EV. Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences. Int J Mol Sci 2022; 23:ijms23073764. [PMID: 35409125 PMCID: PMC8998981 DOI: 10.3390/ijms23073764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/23/2022] [Accepted: 03/23/2022] [Indexed: 12/10/2022] Open
Abstract
The aim of this work was to compare the multiple alignment methods MAHDS, T-Coffee, MUSCLE, Clustal Omega, Kalign, MAFFT, and PRANK in their ability to align highly divergent amino acid sequences. To accomplish this, we created test amino acid sequences with an average number of substitutions per amino acid (x) from 0.6 to 5.6, a total of 81 sets. Comparison of the performance of sequence alignments constructed by MAHDS and previously developed algorithms using the CS and Z score criteria and the benchmark alignment database (BAliBASE) indicated that, although the quality of the alignments built with MAHDS was somewhat lower than that of the other algorithms, it was compensated by greater statistical significance. MAHDS could construct statistically significant alignments of artificial sequences with x ≤ 4.8, whereas the other algorithms (T-Coffee, MUSCLE, Clustal Omega, Kalign, MAFFT, and PRANK) could not perform that at x > 2.4. The application of MAHDS to align 21 families of highly diverged proteins (identity < 20%) from Pfam and HOMSTRAD databases showed that it could calculate statistically significant alignments in cases when the other methods failed. Thus, MAHDS could be used to construct statistically significant multiple alignments of highly divergent protein sequences, which accumulated multiple mutations during evolution.
Collapse
|
2
|
|
3
|
Gilmore AR, Alderdice M, Savage KI, O'Reilly PG, Roddy AC, Dunne PD, Lawler M, McDade SS, Waugh DJ, McArt DG. ACE: A Workbench Using Evolutionary Genetic Algorithms for Analyzing Association in TCGA. Cancer Res 2019; 79:2072-2075. [PMID: 30760519 DOI: 10.1158/0008-5472.can-18-1976] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Revised: 10/10/2018] [Accepted: 02/07/2019] [Indexed: 11/16/2022]
Abstract
Modern methods of acquiring molecular data have improved rapidly in recent years, making it easier for researchers to collect large volumes of information. However, this has increased the challenge of recognizing interesting patterns within the data. Atlas Correlation Explorer (ACE) is a user-friendly workbench for seeking associations between attributes in The Cancer Genome Atlas (TCGA) database. It allows any combination of clinical and genomic data streams to be searched using an evolutionary algorithm approach. To showcase ACE, we assessed which RNA sequencing transcripts were associated with estrogen receptor (ESR1) in the TCGA breast cancer cohort. The analysis revealed already well-established associations with XBP1 and FOXA1, but also identified a strong association with CT62, a potential immunotherapeutic target with few previous associations with breast cancer. In conclusion, ACE can produce results for very large searches in a short time and will serve as an increasingly useful tool for biomarker discovery in the big data era. SIGNIFICANCE: ACE uses an evolutionary algorithm approach to perform large searches for associations between any combinations of data in the TCGA database.
Collapse
Affiliation(s)
- Alan R Gilmore
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom.
| | - Matthew Alderdice
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom
| | - Kienan I Savage
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom
| | - Paul G O'Reilly
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom
| | - Aideen C Roddy
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom
| | - Philip D Dunne
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom
| | - Mark Lawler
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom
| | - Simon S McDade
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom
| | - David J Waugh
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom
| | - Darragh G McArt
- Centre for Cancer Research and Cell Biology, Queen's University Belfast, Belfast, Northern Ireland, United Kingdom.
| |
Collapse
|
4
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
5
|
Broin PÓ, Smith TJ, Golden AA. Alignment-free clustering of transcription factor binding motifs using a genetic-k-medoids approach. BMC Bioinformatics 2015; 16:22. [PMID: 25627106 PMCID: PMC4384390 DOI: 10.1186/s12859-015-0450-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Accepted: 01/02/2015] [Indexed: 11/10/2022] Open
Abstract
Background Familial binding profiles (FBPs) represent the average binding specificity for a group of structurally related DNA-binding proteins. The construction of such profiles allows the classification of novel motifs based on similarity to known families, can help to reduce redundancy in motif databases and de novo prediction algorithms, and can provide valuable insights into the evolution of binding sites. Many current approaches to automated motif clustering rely on progressive tree-based techniques, and can suffer from so-called frozen sub-alignments, where motifs which are clustered early on in the process remain ‘locked’ in place despite the potential for better placement at a later stage. In order to avoid this scenario, we have developed a genetic-k-medoids approach which allows motifs to move freely between clusters at any point in the clustering process. Results We demonstrate the performance of our algorithm, GMACS, on multiple benchmark motif datasets, comparing results obtained with current leading approaches. The first dataset includes 355 position weight matrices from the TRANSFAC database and indicates that the k-mer frequency vector approach used in GMACS outperforms other motif comparison techniques. We then cluster a set of 79 motifs from the JASPAR database previously used in several motif clustering studies and demonstrate that GMACS can produce a higher number of structurally homogeneous clusters than other methods without the need for a large number of singletons. Finally, we show the robustness of our algorithm to noise on multiple synthetic datasets consisting of known motifs convolved with varying degrees of noise. Conclusions Our proposed algorithm is generally applicable to any DNA or protein motifs, can produce highly stable and biologically meaningful clusters, and, by avoiding the problem of frozen sub-alignments, can provide improved results when compared with existing techniques on benchmark datasets. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0450-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pilib Ó Broin
- Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Avenue, Bronx, New York, 10461, USA. .,National Centre for Biomedical Engineering Science, National University of Ireland, University Road, Galway, Ireland.
| | - Terry J Smith
- Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Avenue, Bronx, New York, 10461, USA.
| | - Aaron Aj Golden
- Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Avenue, Bronx, New York, 10461, USA. .,Department of Mathematical Sciences, Yeshiva University, New York, 10033, NY, USA.
| |
Collapse
|
6
|
Jantschi L, Bolboaca S, Sestras R. Hard Problems in Gene Sequence Analysis: Classical Approaches and Suitability of Genetic Algorithms. BIOTECHNOL BIOTEC EQ 2014. [DOI: 10.1080/13102818.2009.10817653] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
7
|
Bussotti G, Notredame C, Enright AJ. Detecting and comparing non-coding RNAs in the high-throughput era. Int J Mol Sci 2013; 14:15423-58. [PMID: 23887659 PMCID: PMC3759867 DOI: 10.3390/ijms140815423] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Revised: 07/16/2013] [Accepted: 07/17/2013] [Indexed: 02/07/2023] Open
Abstract
In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data.
Collapse
Affiliation(s)
- Giovanni Bussotti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; E-Mail:
| | - Cedric Notredame
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), Aiguader, 88, 08003 Barcelona, Spain; E-Mail:
| | - Anton J. Enright
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; E-Mail:
| |
Collapse
|
8
|
DeRonne KW, Karypis G. Pareto optimal pairwise sequence alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:481-493. [PMID: 23929871 DOI: 10.1109/tcbb.2013.2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.
Collapse
Affiliation(s)
- Kevin W DeRonne
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | | |
Collapse
|
9
|
Kemena C, Bussotti G, Capriotti E, Marti-Renom MA, Notredame C. Using tertiary structure for the computation of highly accurate multiple RNA alignments with the SARA-Coffee package. ACTA ACUST UNITED AC 2013; 29:1112-9. [PMID: 23449094 DOI: 10.1093/bioinformatics/btt096] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
MOTIVATION Aligning RNAs is useful to search for homologous genes, study evolutionary relationships, detect conserved regions and identify any patterns that may be of biological relevance. Poor levels of conservation among homologs, however, make it difficult to compare RNA sequences, even when considering closely evolutionary related sequences. RESULTS We describe SARA-Coffee, a tertiary structure-based multiple RNA aligner, which has been validated using BRAliDARTS, a new benchmark framework designed for evaluating tertiary structure-based multiple RNA aligners. We provide two methods to measure the capacity of alignments to match corresponding secondary and tertiary structure features. On this benchmark, SARA-Coffee outperforms both regular aligners and those using secondary structure information. Furthermore, we show that on sequences in which <60% of the nucleotides form base pairs, primary sequence methods usually perform better than secondary-structure aware aligners. AVAILABILITY AND IMPLEMENTATION The package and the datasets are available from http://www.tcoffee.org/Projects/saracoffee and http://structure.biofold.org/sara/.
Collapse
Affiliation(s)
- Carsten Kemena
- Bioinformatics and Genomics Program, Centre for Genomic Regulation, 08003 Barcelona, Spain
| | | | | | | | | |
Collapse
|
10
|
Bussotti G, Raineri E, Erb I, Zytnicki M, Wilm A, Beaudoing E, Bucher P, Notredame C. BlastR--fast and accurate database searches for non-coding RNAs. Nucleic Acids Res 2011; 39:6886-95. [PMID: 21624887 PMCID: PMC3167602 DOI: 10.1093/nar/gkr335] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.
Collapse
Affiliation(s)
- Giovanni Bussotti
- Bioinformatics and Genomics program, Center for Genomic Regulation (CRG) and UPF, Barcelona, C/ D. Aiguader, 88, 08003 Barcelona, Spain
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Murienne J, Edgecombe GD, Giribet G. Including secondary structure, fossils and molecular dating in the centipede tree of life. Mol Phylogenet Evol 2010; 57:301-13. [DOI: 10.1016/j.ympev.2010.06.022] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2010] [Revised: 06/07/2010] [Accepted: 06/25/2010] [Indexed: 11/25/2022]
|
12
|
|
13
|
Using iterative methods for global multiple sequence alignment. Cold Spring Harb Protoc 2010; 2009:pdb.top44. [PMID: 20147225 DOI: 10.1101/pdb.top44] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Finding a global optimal alignment of more than two sequences that includes matches, mismatches, and gaps and that takes into account the degree of variation in all of the sequences at the same time is especially difficult. The dynamic programming algorithm used for optimal alignment of pairs of sequences can be extended to global alignment of three sequences, but for more than three sequences, only a small number of relatively short sequences may be analyzed. Thus, approximate methods are used for global alignment. One class of these is iterative global alignment, which makes an initial global alignment of groups of sequences and then revises the alignment to achieve a more reasonable result. This article discusses several iterative alignment methods. In particular, steps are provided for using the Sequence Alignment by Genetic Algorithm (SAGA).
Collapse
|
14
|
Xiang X, Zhang D, Qin J, Fu Y. Ant Colony with Genetic Algorithm Based on Planar Graph for Multiple Sequence Alignment. ACTA ACUST UNITED AC 2010. [DOI: 10.3923/itj.2010.274.281] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
15
|
|
16
|
Taneda A. An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast. BMC Bioinformatics 2008; 9:521. [PMID: 19061486 PMCID: PMC2630964 DOI: 10.1186/1471-2105-9-521] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2008] [Accepted: 12/05/2008] [Indexed: 11/30/2022] Open
Abstract
Background Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery. Results We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared S. cerevisiae genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp) sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%). By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences. Conclusion The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.
Collapse
Affiliation(s)
- Akito Taneda
- Graduate School of Science and Technology, Hirosaki University, Hirosaki, Japan.
| |
Collapse
|
17
|
Abraham M, Dror O, Nussinov R, Wolfson HJ. Analysis and classification of RNA tertiary structures. RNA (NEW YORK, N.Y.) 2008; 14:2274-89. [PMID: 18824509 PMCID: PMC2578864 DOI: 10.1261/rna.853208] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2007] [Accepted: 07/05/2008] [Indexed: 05/19/2023]
Abstract
There is a fast growing interest in noncoding RNA transcripts. These transcripts are not translated into proteins, but play essential roles in many cellular and pathological processes. Recent efforts toward comprehension of their function has led to a substantial increase in both the number and the size of solved RNA structures. With the aim of addressing questions relating to RNA structural diversity, we examined RNA conservation at three structural levels: primary, secondary, and tertiary structure. Additionally, we developed an automated method for classifying RNA structures based on spatial (three-dimensional [3D]) similarity. Applying the method to all solved RNA structures resulted in a classified database of RNA tertiary structures (DARTS). DARTS embodies 1333 solved RNA structures classified into 94 clusters. The classification is hierarchical, reflecting the structural relationship between and within clusters. We also developed an application for searching DARTS with a new structure. The search is fast and its performance was successfully tested on all solved RNA structures since the creation of DARTS. A user-friendly interface for both the database and the search application is available online. We show intracluster and intercluster similarities in DARTS and demonstrate the usefulness of the search application. The analysis reveals the current structural repertoire of RNA and exposes common global folds and local tertiary motifs. Further study of these conserved substructures may suggest possible RNA domains and building blocks. This should be beneficial for structure prediction and for gaining insights into structure-function relationships.
Collapse
Affiliation(s)
- Mira Abraham
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | |
Collapse
|
18
|
Khaladkar M, Patel V, Bellofatto V, Wilusz J, Wang JTL. Detecting conserved secondary structures in RNA molecules using constrained structural alignment. Comput Biol Chem 2008; 32:264-72. [PMID: 18472302 DOI: 10.1016/j.compbiolchem.2008.03.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2007] [Revised: 03/21/2008] [Accepted: 03/24/2008] [Indexed: 01/09/2023]
Abstract
Constrained sequence alignment has been studied extensively in the past. Different forms of constraints have been investigated, where a constraint can be a subsequence, a regular expression, or a probability matrix of symbols and positions. However, constrained structural alignment has been investigated to a much lesser extent. In this paper, we present an efficient method for constrained structural alignment and apply the method to detecting conserved secondary structures, or structural motifs, in a set of RNA molecules. The proposed method combines both sequence and structural information of RNAs to find an optimal local alignment between two RNA secondary structures, one of which is a query and the other is a subject structure in the given set. The method allows a biologist to annotate conserved regions, or constraints, in the query RNA structure and incorporates these regions into the alignment process to obtain biologically more meaningful alignment scores. A statistical measure is developed to assess the significance of the scores. Experimental results based on detecting internal ribosome entry sites in the RNA molecules of hepatitis C virus and Trypanosoma brucei demonstrate the effectiveness of the proposed method and its superiority over existing techniques.
Collapse
Affiliation(s)
- Mugdha Khaladkar
- Bioinformatics Program and Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | | | | | | | | |
Collapse
|
19
|
Lee ZJ, Su SF, Chuang CC, Liu KH. Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Appl Soft Comput 2008. [DOI: 10.1016/j.asoc.2006.10.012] [Citation(s) in RCA: 136] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
20
|
Machado-Lima A, del Portillo HA, Durham AM. Computational methods in noncoding RNA research. J Math Biol 2007; 56:15-49. [PMID: 17786447 DOI: 10.1007/s00285-007-0122-6] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2007] [Indexed: 11/26/2022]
Abstract
Non protein-coding RNAs (ncRNAs) are a research hotspot in bioinformatics. Recent discoveries have revealed new ncRNA families performing a variety of roles, from gene expression regulation to catalytic activities. It is also believed that other families are still to be unveiled. Computational methods developed for protein coding genes often fail when searching for ncRNAs. Noncoding RNAs functionality is often heavily dependent on their secondary structure, which makes gene discovery very different from protein coding RNA genes. This motivated the development of specific methods for ncRNA research. This article reviews the main approaches used to identify ncRNAs and predict secondary structure.
Collapse
Affiliation(s)
- Ariane Machado-Lima
- Institute of Mathematics and Statistics, University of Sao Paulo, Sao Paulo, SP, Brazil.
| | | | | |
Collapse
|
21
|
Kjer KM, Gillespie JJ, Ober KA. Opinions on multiple sequence alignment, and an empirical comparison of repeatability and accuracy between POY and structural alignment. Syst Biol 2007; 56:133-46. [PMID: 17366144 DOI: 10.1080/10635150601156305] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Affiliation(s)
- Karl M Kjer
- Department of Ecology, Evolution and Natural Resources, Rutgers University, New Brunswick, New Jersey 08901, USA.
| | | | | |
Collapse
|
22
|
Pal S, Bandyopadhyay S, Ray S. Evolutionary computation in bioinformatics: a review. ACTA ACUST UNITED AC 2006. [DOI: 10.1109/tsmcc.2005.855515] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
23
|
Jansson J, Hieu NT, Sung WK. Local gapped subforest alignment and its application in finding RNA structural motifs. J Comput Biol 2006; 13:702-18. [PMID: 16706720 DOI: 10.1089/cmb.2006.13.702] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
RNA molecules whose secondary structures contain similar substructures often have similar functions. Therefore, an important task in the study of RNA is to develop methods for discovering substructures in RNA secondary structures that occur frequently (also referred to as motifs). In this paper, we consider the problem of computing an optimal local alignment of two given labeled ordered forests F1 and F2. This problem asks for a substructure of F1 and a substructure of F2 that exhibit a high similarity. Since an RNA molecule's secondary structure can be represented as a labeled ordered forest, the problem we study has a direct application to finding potential motifs. We generalize the previously studied concept of a closed subforest to a gapped subforest and present the first algorithm for computing the optimal local gapped subforest alignment of F1 and F2. We also show that our technique can improve the time and space complexity of the previously most efficient algorithm for optimal local closed subforest alignment. Furthermore, we prove that a special case of our local gapped subforest alignment problem is equivalent to a problem known in the literature as the local sequence-structure alignment problem (lssa) and modify our main algorithm to obtain a much faster algorithm for lssa than the one previously proposed. An implementation of our algorithm is available at www.comp.nus.edu.sg/~bioinfo/LGSFAligner/. Its running time is significantly faster than the original lssa program.
Collapse
Affiliation(s)
- Jesper Jansson
- School of Computing, National University of Singapore, 3 Science Drive 2, Singapore 117543
| | | | | |
Collapse
|
24
|
Alkan C, Karakoç E, Nadeau JH, Sahinalp SC, Zhang K. RNA–RNA Interaction Prediction and Antisense RNA Target Search. J Comput Biol 2006; 13:267-82. [PMID: 16597239 DOI: 10.1089/cmb.2006.13.267] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Recent studies demonstrating the existence of special noncoding "antisense" RNAs used in post transcriptional gene regulation have received considerable attention. These RNAs are synthesized naturally to control gene expression in C. elegans, Drosophila, and other organisms; they are known to regulate plasmid copy numbers in E. coli as well. Small RNAs have also been artificially constructed to knock out genes of interest in humans and other organisms for the purpose of finding out more about their functions. Although there are a number of algorithms for predicting the secondary structure of a single RNA molecule, no such algorithm exists for reliably predicting the joint secondary structure of two interacting RNA molecules or measuring the stability of such a joint structure. In this paper, we describe the RNA-RNA interaction prediction (RIP) problem between an antisense RNA and its target mRNA and develop efficient algorithms to solve it. Our algorithms minimize the joint free energy between the two RNA molecules under a number of energy models with growing complexity. Because the computational resources needed by our most accurate approach is prohibitive for long RNA molecules, we also describe how to speed up our techniques through a number of heuristic approaches while experimentally maintaining the original accuracy. Equipped with this fast approach, we apply our method to discover targets for any given antisense RNA in the associated genome sequence.
Collapse
Affiliation(s)
- Can Alkan
- Department of Genome Sciences, University of Washington, Seattle, 98195, USA
| | | | | | | | | |
Collapse
|
25
|
Hypsa V. Parasite histories and novel phylogenetic tools: Alternative approaches to inferring parasite evolution from molecular markers. Int J Parasitol 2006; 36:141-55. [PMID: 16387305 DOI: 10.1016/j.ijpara.2005.10.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2005] [Revised: 10/19/2005] [Accepted: 10/28/2005] [Indexed: 10/25/2022]
Abstract
Parasitological research is often contingent on the knowledge of the phylogeny/genealogy of the studied group. Although molecular phylogenetics has proved to be a powerful tool in such investigations, its application in the traditional fashion, based on a tree inference from the primary nucleotide sequences may, in many cases, be insufficient or even improper. These limitations are due to a number of factors, such as a scarcity/ambiguity of phylogenetic information in the sequences, an intricacy of gene relationships at low phylogenetic levels, or a lack of criteria when deciding among several competing coevolutionary scenarios. With respect to the importance of a precise and reliable phylogenetic background in many biological studies, attempts are being made to extend molecular phylogenetics with a variety of new data sources and methodologies. In this review, selected approaches potentially applicable to parasitological research are presented and their advantages as well as drawbacks are discussed. These issues include the usage of idiosyncratic markers (unique features with presumably low probability of homoplasy), such as insertion of mobile elements, gene rearrangements and secondary structure features; the problem of ancestral polymorphism and reticulate relationships at low phylogenetic levels; and the utility of a molecular clock to facilitate discrimination among alternative scenarios in host-parasite coevolution.
Collapse
Affiliation(s)
- Václav Hypsa
- Faculty of Biological Sciences, University of South Bohemia, and Institute of Parasitology, Academy of Sciences of the Czech Republic, Branisovská 31, 37005 Ceské Budejovice, Czech Republic.
| |
Collapse
|
26
|
Misof B, Niehuis O, Bischoff I, Rickert A, Erpenbeck D, Staniczek A. A Hexapod nuclear SSU rRNA secondary-structure model and catalog of taxon-specific structural variation. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2006; 306:70-88. [PMID: 16161065 DOI: 10.1002/jez.b.21040] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
RNA molecules and in particular the nuclear SSU RNA play an important role in molecular systematics. With the advent of increasingly parameterized substitution models in systematic research, the incorporation of secondary-structure information became a realistic option compensating interdependence of character variation. As a prerequisite, consensus structures of eukaryotic SSU RNA molecules have become available through extensive comparative analyses and crystallographic studies. Despite extensive research in hexapod phylogenetics, consensus SSU RNA secondary structures focusing on hexapods have not yet been explored. In this study, we compiled a representative hexapod SSU data set of 261 sequences and inferred a specific consensus SSU secondary-structure model. Our search for conserved structural motives relied on a combined approach of thermodynamic and covariation analyses. The hexapod consensus-structure model deviates from the canonical eukaryotic model in a number of helices. Additionally, in several helices the hexapod sequences did not support a single consensus structure. We provide consensus structures of these sections of single less-inclusive taxa, thus facilitating the adaptation of the consensus hexapod model to less-inclusive phylogenetic questions. The secondary-structure catalog will foster the application of RNA structure models in phylogenetic analyses using the SSU rRNA molecule, and it will improve the realism of substitution models and the reliability of reconstructions based on rRNA sequences.
Collapse
Affiliation(s)
- Bernhard Misof
- Zoologisches Forschungsinstitut und Museum A. Koenig Adenauerallee 160, 53113 Bonn, Germany.
| | | | | | | | | | | |
Collapse
|
27
|
Gillespie JJ, Yoder MJ, Wharton RA. Predicted Secondary Structure for 28S and 18S rRNA from Ichneumonoidea (Insecta: Hymenoptera: Apocrita): Impact on Sequence Alignment and Phylogeny Estimation. J Mol Evol 2005; 61:114-37. [PMID: 16059751 DOI: 10.1007/s00239-004-0246-x] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2004] [Accepted: 03/08/2005] [Indexed: 11/27/2022]
Abstract
We utilize the secondary structural properties of the 28S rRNA D2-D10 expansion segments to hypothesize a multiple sequence alignment for major lineages of the hymenopteran superfamily Ichneumonoidea (Braconidae, Ichneumonidae). The alignment consists of 290 sequences (originally analyzed in Belshaw and Quicke, Syst Biol 51:450-477, 2002) and provides the first global alignment template for this diverse group of insects. Predicted structures for these expansion segments as well as for over half of the 18S rRNA are given, with highly variable regions characterized and isolated within conserved structures. We demonstrate several pitfalls of optimization alignment and illustrate how these are potentially addressed with structure-based alignments. Our global alignment is presented online at (http://hymenoptera.tamu.edu/rna) with summary statistics, such as basepair frequency tables, along with novel tools for parsing structure-based alignments into input files for most commonly used phylogenetic software. These resources will be valuable for hymenopteran systematists, as well as researchers utilizing rRNA sequences for phylogeny estimation in any taxon. We explore the phylogenetic utility of our structure-based alignment by examining a subset of the data under a variety of optimality criteria using results from Belshaw and Quicke (2002) as a benchmark.
Collapse
Affiliation(s)
- Joseph J Gillespie
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA.
| | | | | |
Collapse
|
28
|
Gillespie JJ, Munro JB, Heraty JM, Yoder MJ, Owen AK, Carmichael AE. A Secondary Structural Model of the 28S rRNA Expansion Segments D2 and D3 for Chalcidoid Wasps (Hymenoptera: Chalcidoidea). Mol Biol Evol 2005; 22:1593-608. [PMID: 15843598 DOI: 10.1093/molbev/msi152] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We analyze the secondary structure of two expansion segments (D2, D3) of the 28S ribosomal (rRNA)-encoding gene region from 527 chalcidoid wasp taxa (Hymenoptera: Chalcidoidea) representing 18 of the 19 extant families. The sequences are compared in a multiple sequence alignment, with secondary structure inferred primarily from the evidence of compensatory base changes in conserved helices of the rRNA molecules. This covariation analysis yielded 36 helices that are composed of base pairs exhibiting positional covariation. Several additional regions are also involved in hydrogen bonding, and they form highly variable base-pairing patterns across the alignment. These are identified as regions of expansion and contraction or regions of slipped-strand compensation. Additionally, 31 single-stranded locales are characterized as regions of ambiguous alignment based on the difficulty in assigning positional homology in the presence of multiple adjacent indels. Based on comparative analysis of these sequences, the largest genetic study on any hymenopteran group to date, we report an annotated secondary structural model for the D2, D3 expansion segments that will prove useful in assigning positional nucleotide homology for phylogeny reconstruction in these and closely related apocritan taxa.
Collapse
|
29
|
Liu J, Wang JTL, Hu J, Tian B. A method for aligning RNA secondary structures and its application to RNA motif detection. BMC Bioinformatics 2005; 6:89. [PMID: 15817128 PMCID: PMC1090556 DOI: 10.1186/1471-2105-6-89] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2004] [Accepted: 04/07/2005] [Indexed: 11/17/2022] Open
Abstract
Background Alignment of RNA secondary structures is important in studying functional RNA motifs. In recent years, much progress has been made in RNA motif finding and structure alignment. However, existing tools either require a large number of prealigned structures or suffer from high time complexities. This makes it difficult for the tools to process RNAs whose prealigned structures are unavailable or process very large RNA structure databases. Results We present here an efficient tool called RSmatch for aligning RNA secondary structures and for motif detection. Motivated by widely used algorithms for RNA folding, we decompose an RNA secondary structure into a set of atomic structure components that are further organized by a tree model to capture the structural particularities. RSmatch can find the optimal global or local alignment between two RNA secondary structures using two scoring matrices, one for single-stranded regions and the other for double-stranded regions. The time complexity of RSmatch is O(mn) where m is the size of the query structure and n that of the subject structure. When applied to searching a structure database, RSmatch can find similar RNA substructures, and is capable of conducting multiple structure alignment and iterative database search. Therefore it can be used to identify functional RNA motifs. The accuracy of RSmatch is tested by experiments using a number of known RNA structures, including simple stem-loops and complex structures containing junctions. Conclusion With respect to computing efficiency and accuracy, RSmatch compares favorably with other tools for RNA structure alignment and motif detection. This tool shall be useful to researchers interested in comparing RNA structures obtained from wet lab experiments or RNA folding programs, particularly when the size of the structure dataset is large.
Collapse
Affiliation(s)
- Jianghui Liu
- Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ 07101, USA
- Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
| | - Jason TL Wang
- Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
| | - Jun Hu
- Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ 07101, USA
| | - Bin Tian
- Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ 07101, USA
| |
Collapse
|
30
|
Taneda A. Cofolga: a genetic algorithm for finding the common folding of two RNAs. Comput Biol Chem 2005; 29:111-9. [PMID: 15833439 DOI: 10.1016/j.compbiolchem.2005.02.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2004] [Revised: 02/22/2005] [Accepted: 02/22/2005] [Indexed: 10/25/2022]
Abstract
In order to predict non-coding RNA genes and functions on the basis of genome sequences, accurate secondary structure prediction is useful. Although single-sequence folding programs such as mfold have been successful, it is of great importance to develop a novel approach for further improvement of the prediction performance. In the present paper, a secondary structure prediction method based on genetic algorithm, Cofolga, is proposed. The program developed performs folding and alignment of two homologous RNAs simultaneously. Cofolga was tested with a dataset composed of 13 tRNAs, seven 5S rRNAs, five RNase P RNAs, and five SRP RNAs; as a result, it turned out that the average prediction accuracies for the tRNAs, 5S rRNAs, RNase P RNAs, and SRP RNAs obtained by Cofolga with an optimal weight factor and default parameters were 83.6, 81.8, 73.5, and 67.7%, respectively. These results were superior to those obtained by a single-sequence folding based on free-energy minimization in which corresponding average prediction accuracies were 52.4, 47.4, 57.7, and 52.3%, respectively. Cofolga has a post-processing in which a single-sequence folding is performed after fixation of a predicted common structure; this post-processing enables Cofolga to predict a structure that is present in one of two RNAs alone. The executable files of Cofolga (for Windows/Unix/Mac) can be obtained by an e-mail request.
Collapse
Affiliation(s)
- Akito Taneda
- Department of Electronic and Information System Engineering, Faculty of Science and Technology, Hirosaki University, Hirosaki 036-8561, Japan.
| |
Collapse
|
31
|
Gillespie JJ. Characterizing regions of ambiguous alignment caused by the expansion and contraction of hairpin-stem loops in ribosomal RNA molecules. Mol Phylogenet Evol 2004; 33:936-43. [PMID: 15522814 DOI: 10.1016/j.ympev.2004.08.004] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2004] [Revised: 06/24/2004] [Indexed: 10/26/2022]
Affiliation(s)
- Joseph J Gillespie
- Department of Entomology, Texas A&M University, College Station, TX 77843-2475, USA.
| |
Collapse
|
32
|
Gillespie J, Cannone J, Gutell R, Cognato A. A secondary structural model of the 28S rRNA expansion segments D2 and D3 from rootworms and related leaf beetles (Coleoptera: Chrysomelidae; Galerucinae). INSECT MOLECULAR BIOLOGY 2004; 13:495-518. [PMID: 15373807 DOI: 10.1111/j.0962-1075.2004.00509.x] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
We analysed the secondary structure of two expansion segments (D2, D3) of the 28S rRNA gene from 229 leaf beetles (Coleoptera: Chrysomelidae), the majority of which are in the subfamily Galerucinae. The sequences were compared in a multiple sequence alignment, with secondary structure inferred primarily from the compensatory base changes in the conserved helices of the rRNA molecules. This comparative approach yielded thirty helices comprised of base pairs with positional covariation. Based on these leaf beetle sequences, we report an annotated secondary structural model for the D2 and D3 expansion segments that will prove useful in assigning positional nucleotide homology for phylogeny reconstruction in these and closely related beetle taxa. This predicted structure, consisting of seven major compound helices, is mostly consistent with previously proposed models for the D2 and D3 expansion segments in insects. Despite a lack of conservation in the primary structure of these regions of insect 28S rRNA, the evolution of the secondary structure of these seven major motifs may be informative above the nucleotide level for higher-order phylogeny reconstruction of major insect lineages.
Collapse
Affiliation(s)
- J Gillespie
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA.
| | | | | | | |
Collapse
|
33
|
Abstract
The nuclear small subunit rRNA (18S) has played a dominant role in the estimation of relationships among insect orders from molecular data. In previous studies, 18S sequences have been aligned by unadjusted automated approaches (computer alignments that are not manually readjusted), most recently with direct optimization (simultaneous alignment and tree building using a program called "POY"). Parsimony has been the principal optimality criterion. Given the problems associated with the alignment of rRNA, and the recent availability of the doublet model for the analysis of covarying sites using Bayesian MCMC analysis, a different approach is called for in the analysis of these data. In this paper, nucleotide sequence data from the 18S small subunit rRNA gene of insects are aligned manually with reference to secondary structure, and analyzed under Bayesian phylogenetic methods with both GTR+I+G and doublet models in MrBayes. A credible phylogeny of Insecta is recovered that is independent of the morphological data and (unlike many other analyses of 18S in insects) not contradictory to traditional ideas of insect ordinal relationships based on morphology. Hexapoda, including Collembola, are monophyletic. Paraneoptera are the sister taxon to a monophyletic Holometabola but weakly supported. Ephemeroptera are supported as the sister taxon of Neoptera, and this result is interpreted with respect to the evolution of direct sperm transfer and the evolution of flight. Many other relationships are well-supported but several taxa remain problematic, e.g., there is virtually no support for relationships among orthopteroid orders. A website is made available that provides aligned 18S data in formats that include structural symbols and Nexus formats.
Collapse
Affiliation(s)
- Karl M Kjer
- Department of Ecology Evolution and Natural Resources, 14 College Farm Road, Cook College, Rutgers University, New Brunswick, NJ 08901, USA.
| |
Collapse
|
34
|
|
35
|
Abstract
Using a computational model of string-like haploid genotypes, we verify the conjecture (J. Theor. Biol. 188 (1997) 153) that phenotypic plasticity can speed up evolution. The corresponding real-life situation was realized by Waddington in experiments carried out on the fruit fly Drosophila. Waddington found that after selecting for an environmentally induced trait over a number of generations, a new, true-breeding phenotype resulted that was absent in the starting population. The phenomenon, termed 'genetic assimilation', continues to attract interest because of the rapidity of the effect and because of its seemingly Lamarckian implications. By making use of a genetic algorithm-based approach developed previously, we show that conventional Darwinian selection acting on regulatory genes can account for genetic assimilation. The essential assumption in our model is that a structural gene can be in either of three allelic states. These correspond to its being (a) 'on' or 'off' constitutively or (b) in a plastic state in which the probability that it is 'on' or 'off' is influenced by regulatory loci in a dosage-dependent manner.
Collapse
Affiliation(s)
- Narayan Behera
- Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur, Bangalore 560004, India
| | | |
Collapse
|
36
|
Misof B, Fleck G. Comparative analysis of mt LSU rRNA secondary structures of Odonates: structural variability and phylogenetic signal. INSECT MOLECULAR BIOLOGY 2003; 12:535-547. [PMID: 14986915 DOI: 10.1046/j.1365-2583.2003.00432.x] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Secondary structures of the most conserved part of the mt 16S rRNA gene, domains IV and V, have been recently analysed in a comparative study. However, full secondary structures of the mt LSU rRNA molecule are published for only a few insect species. The present study presents full secondary structures of domains I, II, IV and V of Odonates and one representative of mayflies, Ephemera sp. The reconstructions are based on a comparative approach and minimal consensus structures derived from sequence alignments. The inferred structures exhibit remarkable similarities to the published Drosophila melanogaster model, which increases confidence in these structures. Structural variance within Odonates is homoplastic, and neighbour-joining trees based on tree edit distances do not correspond to any of the phylogenetically expected patterns. However, despite homoplastic quantitative structural variation, many similarities between Odonates and Ephemera sp. suggest promising character sets for higher order insect systematics that merit further investigations.
Collapse
Affiliation(s)
- B Misof
- Department of Entomology, Researchinstitute Alexander Koenig and Museum of Zoology, Bonn, Germany.
| | | |
Collapse
|
37
|
Page RDM, Cruickshank R, Johnson KP. Louse (Insecta: Phthiraptera) mitochondrial 12S rRNA secondary structure is highly variable. INSECT MOLECULAR BIOLOGY 2002; 11:361-369. [PMID: 12144702 DOI: 10.1046/j.1365-2583.2002.00346.x] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Lice are ectoparasitic insects hosted by birds and mammals. Mitochondrial 12S rRNA sequences obtained from lice show considerable length variation and are very difficult to align. We show that the louse 12S rRNA domain III secondary structure displays considerable variation compared to other insects, in both the shape and number of stems and loops. Phylogenetic trees constructed from tree edit distances between louse 12S rRNA structures do not closely resemble trees constructed from sequence data, suggesting that at least some of this structural variation has arisen independently in different louse lineages. Taken together with previous work on mitochondrial gene order and elevated rates of substitution in louse mitochondrial sequences, the structural variation in louse 12S rRNA confirms the highly distinctive nature of molecular evolution in these insects.
Collapse
Affiliation(s)
- R D M Page
- Division of Environmental and Evolutionary Biology, Institute of Biomedical and Life Sciences, University of Glasgow, Glasgow, UK.
| | | | | |
Collapse
|
38
|
Eddy SR. A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics 2002; 3:18. [PMID: 12095421 PMCID: PMC119854 DOI: 10.1186/1471-2105-3-18] [Citation(s) in RCA: 145] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2002] [Accepted: 07/02/2002] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Covariance models (CMs) are probabilistic models of RNA secondary structure, analogous to profile hidden Markov models of linear sequence. The dynamic programming algorithm for aligning a CM to an RNA sequence of length N is O(N3) in memory. This is only practical for small RNAs. RESULTS I describe a divide and conquer variant of the alignment algorithm that is analogous to memory-efficient Myers/Miller dynamic programming algorithms for linear sequence alignment. The new algorithm has an O(N2 log N) memory complexity, at the expense of a small constant factor in time. CONCLUSIONS Optimal ribosomal RNA structural alignments that previously required up to 150 GB of memory now require less than 270 MB.
Collapse
Affiliation(s)
- Sean R Eddy
- Howard Hughes Medical Institute & Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110 USA.
| |
Collapse
|
39
|
Mathews DH, Turner DH. Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 2002; 317:191-203. [PMID: 11902836 DOI: 10.1006/jmbi.2001.5351] [Citation(s) in RCA: 255] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
With the rapid increase in the size of the genome sequence database, computational analysis of RNA will become increasingly important in revealing structure-function relationships and potential drug targets. RNA secondary structure prediction for a single sequence is 73 % accurate on average for a large database of known secondary structures. This level of accuracy provides a good starting point for determining a secondary structure either by comparative sequence analysis or by the interpretation of experimental studies. Dynalign is a new computer algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity. It uses a dynamic programming construct suggested by Sankoff. Dynalign, however, restricts the maximum distance, M, allowed between aligned nucleotides in the two sequences. This makes the calculation tractable because the complexity is simplified to O(M(3)N(3)), where N is the length of the shorter sequence. The accuracy of Dynalign was tested with sets of 13 tRNAs, seven 5 S rRNAs, and two R2 3' UTR sequences. On average, Dynalign predicted 86.1 % of known base-pairs in the tRNAs, as compared to 59.7 % for free energy minimization alone. For the 5 S rRNAs, the average accuracy improves from 47.8 % to 86.4 %. The secondary structure of the R2 3' UTR from Drosophila takahashii is poorly predicted by standard free energy minimization. With Dynalign, however, the structure predicted in tandem with the sequence from Drosophila melanogaster nearly matches the structure determined by comparative sequence analysis.
Collapse
Affiliation(s)
- David H Mathews
- Department of Chemistry, University of Rochester, NY 14627-0216, USA
| | | |
Collapse
|
40
|
Abstract
The assembly of a multiple sequence alignment (MSA) has become one of the most common tasks when dealing with sequence analysis. Unfortunately, the wide range of available methods and the differences in the results given by these methods makes it hard for a non-specialist to decide which program is best suited for a given purpose. In this review we briefly describe existing techniques and expose the potential strengths and weaknesses of the most widely used multiple alignment packages.
Collapse
Affiliation(s)
- Cédric Notredame
- Information Génétique et Structurale, UMR 1889, 31 Chemin Joseph Aiguier, 13 006 Marseille, France.
| |
Collapse
|
41
|
Dawson W, Suzuki K, Yamamoto K. A physical origin for functional domain structure in nucleic acids as evidenced by cross-linking entropy: II. J Theor Biol 2001; 213:387-412. [PMID: 11735287 DOI: 10.1006/jtbi.2001.2437] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In Part I, cross-linking entropy (CLE) was proposed as a mechanism that limits the size of functional domains of RNA. To test this hypothesis, the theory is developed into an RNA secondary structure prediction filter which is applied to nearest-neighbor secondary structure (NNSS) algorithms that utilize a free energy (FE) minimization strategy. (The NNSS strategies are also referred to as the dynamic programming algorithm in the literature.) The cross-linking entropy for RNA is derived from a generalized Gaussian polymer chain model where the entropic contributions caused by the formation of base pairs (stacking) in RNA are analysed globally. Local entropic contributions are associated with the freezing out of degrees of freedom in the links. Both global and local entropic effects are strongly influenced by the persistence length. The cross-linking entropy provides a physical origin for the size of functional domains in long nucleic acid sequences and may go further to explain as to why the majority of the domain regions in typical sequences tend to be less than 600 nucleotides in length. In addition, improvements were observed in the "best guess" predictive capacity over NNSS prediction strategies. The thermodynamic distribution is more representative of the expected structures and is strongly governed by such physical parameters as the persistence length and the excluded volume. The CLE appears to generalize the tabulated penalties used in NNSS algorithms. The principal parameter influencing this entropy is the persistence length. The model is shown to accomodate a variable persistence length and is capable of describing the folding dynamics of RNA. A two-state kinetic model based on the CLE principle is used to help elucidate the folding kinetics of functional domains in the group I introns.
Collapse
Affiliation(s)
- W Dawson
- Department of Bioactive Molecules, National Institute of Infectious Diseases, 1-23-1 Toyama, Shinjuku-ku, Tokyo, 162-8640, Japan.
| | | | | |
Collapse
|
42
|
Page RD. Comparative analysis of secondary structure of insect mitochondrial small subunit ribosomal RNA using maximum weighted matching. Nucleic Acids Res 2000; 28:3839-45. [PMID: 11024161 PMCID: PMC110784 DOI: 10.1093/nar/28.20.3839] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2000] [Revised: 08/21/2000] [Accepted: 08/21/2000] [Indexed: 11/13/2022] Open
Abstract
Comparative analysis is the preferred method of inferring RNA secondary structure, but its use requires considerable expertise and manual effort. As the importance of secondary structure for accurate sequence alignment and phylogenetic analysis becomes increasingly realised, the need for secondary structure models for diverse taxonomic groups becomes more pressing. The number of available structures bears little relation to the relative diversity or importance of the different taxonomic groups. Insects, for example, comprise the largest group of animals and yet are very poorly represented in secondary structure databases. This paper explores the utility of maximum weighted matching (MWM) to help automate the process of comparative analysis by inferring secondary structure for insect mitochondrial small subunit (12S) rRNA sequences. By combining information on correlated changes in substitutions and helix dot plots, MWM can rapidly generate plausible models of secondary structure. These models can be further refined using standard comparative techniques. This paper presents a secondary structure model for insect 12S rRNA based on an alignment of 225 insect sequences and an alignment for 16 exemplar insect sequences. This alignment is used as a template for a web server that automatically generates secondary structures for insect sequences.
Collapse
Affiliation(s)
- R D Page
- Division of Environmental and Evolutionary Biology, Institute of Biomedical and Life Sciences, Graham Kerr Building, University of Glasgow, Glasgow G12 8QQ, UK
| |
Collapse
|
43
|
Abstract
New results for calculating nucleic acid secondary structure by free energy minimization and phylogenetic comparisons have recently been reported. A complete set of DNA energy parameters is now available and the RNA parameters have been improved. Although databases of RNA secondary structures are still derived and expanded using computer-assisted, ad hoc comparative analysis, a number of new computer algorithms combine covariation analysis with energy methods.
Collapse
Affiliation(s)
- M Zuker
- Department of Biochemistry and Molecular Biophysics, Washington University, St Louis, 63110, USA.
| |
Collapse
|
44
|
Cunningham CO, Aliesky H, Collins CM. Sequence and secondary structure variation in the Gyrodactylus (Platyhelminthes: Monogenea) ribosomal RNA gene array. J Parasitol 2000; 86:567-76. [PMID: 10864256 DOI: 10.1645/0022-3395(2000)086[0567:sassvi]2.0.co;2] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Nucleotide sequences were determined for the rRNA internal transcribed spacers 1 and 2 (ITS1 and 2) and the 5' terminus of the large subunit rRNA in selected Gyrodactylus species. Examination of primary sequence variation and secondary structure models in ITS2 and variable region V4 of the small subunit rRNA revealed that structure was largely conserved despite significant variation in sequence. ITS1 sequences were highly variable, and models of structure were unreliable but, despite this, show some resemblance to structures predicted in Digenea. ITS2 models demonstrated binding of the 3' end of 5.8S rRNA to the 5' end of the large subunit rRNA and enabled the termini of these genes to be defined with greater confidence than previously. The structure model shown here may prove useful in future phylogenetic analyses.
Collapse
|
45
|
Chen JH, Le SY, Maizel JV. Prediction of common secondary structures of RNAs: a genetic algorithm approach. Nucleic Acids Res 2000; 28:991-9. [PMID: 10648793 PMCID: PMC102574 DOI: 10.1093/nar/28.4.991] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In this study we apply a genetic algorithm to a set of RNA sequences to find common RNA secondary structures. Our method is a three-step procedure. At the first stage of the procedure for each sequence, a genetic algorithm is used to optimize the structures in a population to a certain degree of stability. In this step, the free energy of a structure is the fitness criterion for the algorithm. Next, for each structure, we define a measure of structural conservation with respect to those in other sequences. We use this measure in a genetic algorithm to improve the structural similarity among sequences for the structures in the population of a sequence. Finally, we select those structures satisfying certain conditions of structural stability and similarity as predicted common structures for a set of RNA sequences. We have obtained satisfactory results from a set of tRNA, 5S rRNA, rev response elements (RRE) of HIV-1 and RRE of HIV-2/SIV, respectively.
Collapse
Affiliation(s)
- J H Chen
- Advanced Biomedical Computing Center, SAIC, NCI/FCRDC, Frederick, MD 21702, USA.
| | | | | |
Collapse
|
46
|
Taylor WR, Sælensminde G, Eidhammer I. Multiple protein sequence alignment using double-dynamic programming. ACTA ACUST UNITED AC 2000. [DOI: 10.1016/s0097-8485(00)80003-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
47
|
Abstract
Elucidation of interrelationships among sequence, structure, function, and evolution (FESS relationships) of a family of genes or gene products is a central theme of modern molecular biology. Multiple sequence alignment has been proven to be a powerful tool for many fields of studies such as phylogenetic reconstruction, illumination of functionally important regions, and prediction of higher order structures of proteins and RNAs. However, it is far too trivial to automatically construct a multiple alignment from a set of related sequences. A variety of methods for solving this computationally difficult problem are reviewed. Several important applications of multiple alignment for elucidation of the FESS relationships are also discussed. For a long period, progressive methods have been the only practical means to solve a multiple alignment problem of appreciable size. This situation is now changing with the development of new techniques including several classes of iterative methods. Today's progress in multiple sequence alignment methods has been made by the multidisciplinary endeavors of mathematicians, computer scientists, and biologists in various fields including biophysicists in particular. The ideas are also originated from various backgrounds, pure algorithmics, statistics, thermodynamics, and others. The outcomes are now enjoyed by researchers in many fields of biological sciences. In the near future, generalized multiple alignment may play a central role in studies of FESS relationships. The organized mixture of knowledge from multiple fields will ferment to develop fruitful results which would be hard to obtain within each area. I hope this review provides a useful information resource for future development of theory and practice in this rapidly expanding area of bioinformatics.
Collapse
Affiliation(s)
- O Gotoh
- Saitama Cancer Center Research Institute, Japan
| |
Collapse
|
48
|
Rivas E, Eddy SR. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol 1999; 285:2053-68. [PMID: 9925784 DOI: 10.1006/jmbi.1998.2436] [Citation(s) in RCA: 369] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We describe a dynamic programming algorithm for predicting optimal RNA secondary structure, including pseudoknots. The algorithm has a worst case complexity of O(N6) in time and O(N4) in storage. The description of the algorithm is complex, which led us to adopt a useful graphical representation (Feynman diagrams) borrowed from quantum field theory. We present an implementation of the algorithm that generates the optimal minimum energy structure for a single RNA sequence, using standard RNA folding thermodynamic parameters augmented by a few parameters describing the thermodynamic stability of pseudoknots. We demonstrate the properties of the algorithm by using it to predict structures for several small pseudoknotted and non-pseudoknotted RNAs. Although the time and memory demands of the algorithm are steep, we believe this is the first algorithm to be able to fold optimal (minimum energy) pseudoknotted RNAs with the accepted RNA thermodynamic model.
Collapse
Affiliation(s)
- E Rivas
- Department of Genetics, Washington University, St. Louis, MO, 63110, USA
| | | |
Collapse
|