301
|
Kachouri R, Stribinskis V, Zhu Y, Ramos KS, Westhof E, Li Y. A surprisingly large RNase P RNA in Candida glabrata. RNA (NEW YORK, N.Y.) 2005; 11:1064-72. [PMID: 15987816 PMCID: PMC1370791 DOI: 10.1261/rna.2130705] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
We have found an extremely large ribonuclease P (RNase P) RNA (RPR1) in the human pathogen Candida glabrata and verified that this molecule is expressed and present in the active enzyme complex of this hemiascomycete yeast. A structural alignment of the C. glabrata sequence with 36 other hemiascomycete RNase P RNAs (abbreviated as P RNAs) allows us to characterize the types of insertions. In addition, 15 P RNA sequences were newly characterized by searching in the recently sequenced genomes Candida albicans, C. glabrata, Debaryomyces hansenii, Eremothecium gossypii, Kluyveromyces lactis, Kluyveromyces waltii, Naumovia castellii, Saccharomyces kudriavzevii, Saccharomyces mikatae, and Yarrowia lipolytica; and by PCR amplification for other Candida species (Candida guilliermondii, Candida krusei, Candida parapsilosis, Candida stellatoidea, and Candida tropicalis). The phylogenetic comparative analysis identifies a hemiascomycete secondary structure consensus that presents a conserved core in all species with variable insertions or deletions. The most significant variability is found in C. glabrata P RNA in which three insertions exceeding in total 700 nt are present in the Specificity domain. This P RNA is more than twice the length of any other homologous P RNAs known in the three domains of life and is eight times the size of the smallest. RNase P RNA, therefore, represents one of the most diversified noncoding RNAs in terms of size variation and structural diversity.
Collapse
MESH Headings
- Ascomycota/classification
- Ascomycota/genetics
- Base Sequence
- Candida glabrata/chemistry
- Candida glabrata/enzymology
- Candida glabrata/genetics
- Candida glabrata/metabolism
- Conserved Sequence
- DNA, Fungal
- Databases, Genetic
- Genes, Fungal
- Genetic Variation
- Genome, Fungal
- Models, Chemical
- Molecular Sequence Data
- Mutation
- Nucleic Acid Conformation
- Phylogeny
- RNA, Fungal/chemistry
- RNA, Fungal/genetics
- RNA, Fungal/isolation & purification
- RNA, Fungal/metabolism
- Ribonuclease P/chemistry
- Ribonuclease P/genetics
- Ribonuclease P/metabolism
- Sequence Homology, Nucleic Acid
Collapse
Affiliation(s)
- Rym Kachouri
- Department of Biochemistry and Molecular Biology, and Center for Genetics and Molecular Medicine School of Medicine, University of Louisville, 319 Abraham Flexner Way, Louisville, KY 40202, USA
| | | | | | | | | | | |
Collapse
|
302
|
Harper RW, Xu C, Soucek K, Setiadi H, Eiserich JP. A reappraisal of the genomic organization of human Nox1 and its splice variants. Arch Biochem Biophys 2005; 435:323-30. [PMID: 15708375 DOI: 10.1016/j.abb.2004.12.021] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2004] [Revised: 12/21/2004] [Indexed: 11/19/2022]
Abstract
The recent discovery of non-phagocytic NAD(P)H oxidases belonging to the Nox family of enzymes sharing extensive homology to the leukocyte NAD(P)H oxidase has revolutionized our understanding of oxidative signaling related to fundamental biological processes and disease states. One form of this enzyme, Nox1, is a growth factor-responsive enzyme that catalyzes formation of the reactive oxygen species superoxide (O(2)(-)) and hydrogen peroxide (H(2)O(2)). Its expression is linked to a number of biological responses including cellular proliferation, angiogenesis, and activation of cellular signaling pathways. Whereas early published studies have described three distinct isoforms of Nox1, the current body of literature fails to adequately recognize this notion. Also, functional differences between isoforms remain relatively unexplored. Herein, we report that expression of human Nox1 is restricted to two distinct isoforms derived from a single gene; that is, the full-length gene product and a shorter spliced variant which lacks one of the NAD(P)H binding domains. We have developed PCR primer sets that distinguish between the two forms of Nox1 in several human cell lines. We could not find evidence for expression of the shortest reported form of Nox1 (NOH-1S), previously identified as a proton channel, and the absence of paired splice sites in the gene suggests that it represents a reverse transcriptase artifact. A survey of the scientific literature reveals that the majority of studies related to Nox1 do not utilize molecular strategies that would adequately discern between the two Nox1 variants. The current literature suggest the two identified isoforms of human Nox1 (which we have named Nox1-L and Nox1-S) may be functionally distinct. Future studies related to Nox1 will benefit from establishing the identity of the Nox1 isoform expressed and the functions attributed to each variant.
Collapse
Affiliation(s)
- Richart W Harper
- Department of Internal Medicine, School of Medicine, University of California, Davis, CA 95616, USA.
| | | | | | | | | |
Collapse
|
303
|
Ghazal G, Ge D, Gervais-Bird J, Gagnon J, Abou Elela S. Genome-wide prediction and analysis of yeast RNase III-dependent snoRNA processing signals. Mol Cell Biol 2005; 25:2981-94. [PMID: 15798187 PMCID: PMC1069626 DOI: 10.1128/mcb.25.8.2981-2994.2005] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
In Saccharomyces cerevisiae, the maturation of both pre-rRNA and pre-small nucleolar RNAs (pre-snoRNAs) involves common factors, thereby providing a potential mechanism for the coregulation of snoRNA and rRNA synthesis. In this study, we examined the global impact of the double-stranded-RNA-specific RNase Rnt1p, which is required for pre-rRNA processing, on the maturation of all known snoRNAs. In silico searches for Rnt1p cleavage signals, and genome-wide analysis of the Rnt1p-dependent expression profile, identified seven new Rnt1p substrates. Interestingly, two of the newly identified Rnt1p-dependent snoRNAs, snR39 and snR59, are located in the introns of the ribosomal protein genes RPL7A and RPL7B. In vitro and in vivo experiments indicated that snR39 is normally processed from the lariat of RPL7A, suggesting that the expressions of RPL7A and snR39 are linked. In contrast, snR59 is produced by a direct cleavage of the RPL7B pre-mRNA, indicating that a single pre-mRNA transcript cannot be spliced to produce a mature RPL7B mRNA and processed by Rnt1p to produce a mature snR59 simultaneously. The results presented here reveal a new role of yeast RNase III in the processing of intron-encoded snoRNAs that permits independent regulation of the host mRNA and its associated snoRNA.
Collapse
Affiliation(s)
- Ghada Ghazal
- Université de Sherbrooke, Département de Microbiologie et d'Infectiologie, 3001 12e Ave nord, Sherbrooke, Québec J1H 5N4, Canada
| | | | | | | | | |
Collapse
|
304
|
Plant EP, Pérez-Alvarado GC, Jacobs JL, Mukhopadhyay B, Hennig M, Dinman JD. A three-stemmed mRNA pseudoknot in the SARS coronavirus frameshift signal. PLoS Biol 2005; 3:e172. [PMID: 15884978 PMCID: PMC1110908 DOI: 10.1371/journal.pbio.0030172] [Citation(s) in RCA: 136] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2005] [Accepted: 03/14/2005] [Indexed: 12/16/2022] Open
Abstract
A wide range of RNA viruses use programmed -1 ribosomal frameshifting for the production of viral fusion proteins. Inspection of the overlap regions between ORF1a and ORF1b of the SARS-CoV genome revealed that, similar to all coronaviruses, a programmed -1 ribosomal frameshift could be used by the virus to produce a fusion protein. Computational analyses of the frameshift signal predicted the presence of an mRNA pseudoknot containing three double-stranded RNA stem structures rather than two. Phylogenetic analyses showed the conservation of potential three-stemmed pseudoknots in the frameshift signals of all other coronaviruses in the GenBank database. Though the presence of the three-stemmed structure is supported by nuclease mapping and two-dimensional nuclear magnetic resonance studies, our findings suggest that interactions between the stem structures may result in local distortions in the A-form RNA. These distortions are particularly evident in the vicinity of predicted A-bulges in stems 2 and 3. In vitro and in vivo frameshifting assays showed that the SARS-CoV frameshift signal is functionally similar to other viral frameshift signals: it promotes efficient frameshifting in all of the standard assay systems, and it is sensitive to a drug and a genetic mutation that are known to affect frameshifting efficiency of a yeast virus. Mutagenesis studies reveal that both the specific sequences and structures of stems 2 and 3 are important for efficient frameshifting. We have identified a new RNA structural motif that is capable of promoting efficient programmed ribosomal frameshifting. The high degree of conservation of three-stemmed mRNA pseudoknot structures among the coronaviruses suggests that this presents a novel target for antiviral therapeutics.
Collapse
Affiliation(s)
- Ewan P Plant
- 1Department of Cell Biology and Molecular Genetics, University of MarylandCollege Park, MarylandUnited States of America
| | - Gabriela C Pérez-Alvarado
- 2Department of Molecular Biology and the Skaggs Institute for Chemical Biology, The Scripps Research InstituteLa Jolla, CaliforniaUnited States of America
| | - Jonathan L Jacobs
- 1Department of Cell Biology and Molecular Genetics, University of MarylandCollege Park, MarylandUnited States of America
| | - Bani Mukhopadhyay
- 1Department of Cell Biology and Molecular Genetics, University of MarylandCollege Park, MarylandUnited States of America
| | - Mirko Hennig
- 2Department of Molecular Biology and the Skaggs Institute for Chemical Biology, The Scripps Research InstituteLa Jolla, CaliforniaUnited States of America
| | - Jonathan D Dinman
- 1Department of Cell Biology and Molecular Genetics, University of MarylandCollege Park, MarylandUnited States of America
| |
Collapse
|
305
|
Lambert A, Legendre M, Fontaine JF, Gautheret D. Computing expectation values for RNA motifs using discrete convolutions. BMC Bioinformatics 2005; 6:118. [PMID: 15892887 PMCID: PMC1168889 DOI: 10.1186/1471-2105-6-118] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2005] [Accepted: 05/13/2005] [Indexed: 11/22/2022] Open
Abstract
Background Computational biologists use Expectation values (E-values) to estimate the number of solutions that can be expected by chance during a database scan. Here we focus on computing Expectation values for RNA motifs defined by single-strand and helix lod-score profiles with variable helix spans. Such E-values cannot be computed assuming a normal score distribution and their estimation previously required lengthy simulations. Results We introduce discrete convolutions as an accurate and fast mean to estimate score distributions of lod-score profiles. This method provides excellent score estimations for all single-strand or helical elements tested and also applies to the combination of elements into larger, complex, motifs. Further, the estimated distributions remain accurate even when pseudocounts are introduced into the lod-score profiles. Estimated score distributions are then easily converted into E-values. Conclusion A good agreement was observed between computed E-values and simulations for a number of complete RNA motifs. This method is now implemented into the ERPIN software, but it can be applied as well to any search procedure based on ungapped profiles with statistically independent columns.
Collapse
Affiliation(s)
- André Lambert
- CNRS UMR 6207, Université de la Méditerranée, Luminy Case 907, 13288 Marseille cedex 9, France
| | - Matthieu Legendre
- INSERM ERM 206, Université de la Méditerranée, Luminy Case 928, 13288 Marseille Cedex 9, France
| | - Jean-Fred Fontaine
- INSERM ERM 206, Université de la Méditerranée, Luminy Case 928, 13288 Marseille Cedex 9, France
- INSERM EMI U 00.18, CHU d'Angers, 49033 Angers, France
| | - Daniel Gautheret
- INSERM ERM 206, Université de la Méditerranée, Luminy Case 928, 13288 Marseille Cedex 9, France
| |
Collapse
|
306
|
Clote P, Ferré F, Kranakis E, Krizanc D. Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. RNA (NEW YORK, N.Y.) 2005; 11:578-91. [PMID: 15840812 PMCID: PMC1370746 DOI: 10.1261/rna.7220505] [Citation(s) in RCA: 135] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
We present results of computer experiments that indicate that several RNAs for which the native state (minimum free energy secondary structure) is functionally important (type III hammerhead ribozymes, signal recognition particle RNAs, U2 small nucleolar spliceosomal RNAs, certain riboswitches, etc.) all have lower folding energy than random RNAs of the same length and dinucleotide frequency. Additionally, we find that whole mRNA as well as 5'-UTR, 3'-UTR, and cds regions of mRNA have folding energies comparable to that of random RNA, although there may be a statistically insignificant trace signal in 3'-UTR and cds regions. Various authors have used nucleotide (approximate) pattern matching and the computation of minimum free energy as filters to detect potential RNAs in ESTs and genomes. We introduce a new concept of the asymptotic Z-score and describe a fast, whole-genome scanning algorithm to compute asymptotic minimum free energy Z-scores of moving-window contents. Asymptotic Z-score computations offer another filter, to be used along with nucleotide pattern matching and minimum free energy computations, to detect potential functional RNAs in ESTs and genomic regions.
Collapse
Affiliation(s)
- Peter Clote
- Department of Biology, Higgins 416, Boston College, Chestnut Hill, MA 02467, USA.
| | | | | | | |
Collapse
|
307
|
El-Mabrouk N, Raffinot M, Duchesne JE, Lajoie M, Luc N. Approximate matching of structured motifs in DNA sequences. J Bioinform Comput Biol 2005; 3:317-42. [PMID: 15852508 DOI: 10.1142/s0219720005001065] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2004] [Revised: 02/04/2004] [Accepted: 08/02/2004] [Indexed: 11/18/2022]
Abstract
Several methods have been developed for identifying more or less complex RNA structures in a genome. All these methods are based on the search for conserved primary and secondary sub-structures. In this paper, we present a simple formal representation of a helix, which is a combination of sequence and folding constraints, as a constrained regular expression. This representation allows us to develop a well-founded algorithm that searches for all approximate matches of a helix in a genome. The algorithm is based on an alignment graph constructed from several copies of a pushdown automaton, arranged one on top of another. This is a first attempt to take advantage of the possibilities of pushdown automata in the context of approximate matching. The worst time complexity is O(krpn), where k is the error threshold, n the size of the genome, p the size of the secondary expression, and r its number of union symbols. We then extend the algorithm to search for pseudo-knots and secondary structures containing an arbitrary number of helices.
Collapse
Affiliation(s)
- Nadia El-Mabrouk
- Département d'informatique et de recherche opérationnelle, Université de Montréal, CP 6128 Succursale Centre-ville, Montréal, Québec H3C 3J7.
| | | | | | | | | |
Collapse
|
308
|
Liu J, Wang JTL, Hu J, Tian B. A method for aligning RNA secondary structures and its application to RNA motif detection. BMC Bioinformatics 2005; 6:89. [PMID: 15817128 PMCID: PMC1090556 DOI: 10.1186/1471-2105-6-89] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2004] [Accepted: 04/07/2005] [Indexed: 11/17/2022] Open
Abstract
Background Alignment of RNA secondary structures is important in studying functional RNA motifs. In recent years, much progress has been made in RNA motif finding and structure alignment. However, existing tools either require a large number of prealigned structures or suffer from high time complexities. This makes it difficult for the tools to process RNAs whose prealigned structures are unavailable or process very large RNA structure databases. Results We present here an efficient tool called RSmatch for aligning RNA secondary structures and for motif detection. Motivated by widely used algorithms for RNA folding, we decompose an RNA secondary structure into a set of atomic structure components that are further organized by a tree model to capture the structural particularities. RSmatch can find the optimal global or local alignment between two RNA secondary structures using two scoring matrices, one for single-stranded regions and the other for double-stranded regions. The time complexity of RSmatch is O(mn) where m is the size of the query structure and n that of the subject structure. When applied to searching a structure database, RSmatch can find similar RNA substructures, and is capable of conducting multiple structure alignment and iterative database search. Therefore it can be used to identify functional RNA motifs. The accuracy of RSmatch is tested by experiments using a number of known RNA structures, including simple stem-loops and complex structures containing junctions. Conclusion With respect to computing efficiency and accuracy, RSmatch compares favorably with other tools for RNA structure alignment and motif detection. This tool shall be useful to researchers interested in comparing RNA structures obtained from wet lab experiments or RNA folding programs, particularly when the size of the structure dataset is large.
Collapse
Affiliation(s)
- Jianghui Liu
- Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ 07101, USA
- Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
| | - Jason TL Wang
- Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
| | - Jun Hu
- Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ 07101, USA
| | - Bin Tian
- Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ 07101, USA
| |
Collapse
|
309
|
Havgaard JH, Lyngsø RB, Stormo GD, Gorodkin J. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 2005; 21:1815-24. [PMID: 15657094 DOI: 10.1093/bioinformatics/bti279] [Citation(s) in RCA: 118] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Searching for non-coding RNA (ncRNA) genes and structural RNA elements (eleRNA) are major challenges in gene finding today as these often are conserved in structure rather than in sequence. Even though the number of available methods is growing, it is still of interest to pairwise detect two genes with low sequence similarity, where the genes are part of a larger genomic region. RESULTS Here we present such an approach for pairwise local alignment which is based on foldalign and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy as well as for substitution matrices similar to RIBOSUM. The new foldalign implementation is tested on a dataset where the ncRNAs and eleRNAs have sequence similarity <40% and where the ncRNAs and eleRNAs are energetically indistinguishable from the surrounding genomic sequence context. The method is tested in two ways: (1) its ability to find the common structure between the genes only and (2) its ability to locate ncRNAs and eleRNAs in a genomic context. In case (1), it makes sense to compare with methods like Dynalign, and the performances are very similar, but foldalign is substantially faster. The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. AVAILABILITY The program is available online at http://foldalign.kvl.dk/
Collapse
Affiliation(s)
- Jakob Hull Havgaard
- Center for Bioinformatics and Division of Genetics, IBHV, The Royal Veterinary and Agricultural University, Frederiksberg, Denmark
| | | | | | | |
Collapse
|
310
|
Profiling and Searching for RNA Pseudoknot Structures in Genomes. TRANSACTIONS ON COMPUTATIONAL SYSTEMS BIOLOGY II 2005. [PMCID: PMC7120494 DOI: 10.1007/11567752_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
We developed a new method that can profile and efficiently search for pseudoknot structures in noncoding RNA genes. It profiles interleaving stems in pseudoknot structures with independent Covariance Model (CM) components. The statistical alignment score for searching is obtained by combining the alignment scores from all CM components. Our experiments show that the model can achieve excellent accuracy on both random and biological data. The efficiency achieved by the method makes it possible to search for structures that contain pseudoknot in genomes of a variety of organisms.
Collapse
|
311
|
Profiling and Searching for RNA Pseudoknot Structures in Genomes. LECTURE NOTES IN COMPUTER SCIENCE 2005. [PMCID: PMC7122704 DOI: 10.1007/11428848_123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A new method is developed that can profile and efficiently search for pseudoknot structures in noncoding RNA genes. It profiles interleaving stems in pseudoknot structures with independent Covariance Model (CM) components. The statistical alignment score for searching is obtained by combining the alignment scores from all CM components. Our experiments show that the model can achieve excellent accuracy on both random and biological data. The efficiency achieved by the method makes it possible to search for the pseudoknot structures in genomes of a variety of organisms.
Collapse
|
312
|
Lesnik EA, Fogel GB, Weekes D, Henderson TJ, Levene HB, Sampath R, Ecker DJ. Identification of conserved regulatory RNA structures in prokaryotic metabolic pathway genes. Biosystems 2004; 80:145-54. [PMID: 15823413 DOI: 10.1016/j.biosystems.2004.11.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2004] [Revised: 11/04/2004] [Accepted: 11/05/2004] [Indexed: 11/24/2022]
Abstract
A combination of algorithms to search RNA sequence for the potential for secondary structure formation, and search large numbers of sequences for structural similarity, were used to search the 5'UTRs of annotated genes in the Escherichia coli genome for regulatory RNA structures. Using this approach, similar RNA structures that regulate genes in the thiamin metabolic pathway were identified. In addition, several putative regulatory structures were discovered upstream of genes involved in other metabolic pathways including glycerol metabolism and ethanol fermentation. The results demonstrate that this computational approach is a powerful tool for discovery of important RNA structures within prokaryotic organisms.
Collapse
|
313
|
Li Y, Altman S. In search of RNase P RNA from microbial genomes. RNA (NEW YORK, N.Y.) 2004; 10:1533-40. [PMID: 15337843 PMCID: PMC1370640 DOI: 10.1261/rna.7970404] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2004] [Accepted: 07/06/2004] [Indexed: 05/19/2023]
Abstract
A simple procedure has been developed to quickly retrieve and validate the DNA sequence encoding the RNA subunit of ribonuclease P (RNase P RNA) from microbial genomes. RNase P RNA sequences were identified from 94% of bacterial and archaeal complete genomes where previously no RNase P RNA was annotated. A sequence was found in camelpox virus, highly conserved in all orthopoxviruses (including smallpox virus), which could fold into a putative RNase P RNA in terms of conserved primary features and secondary structure. New structure features of RNase P RNA that enable one to distinguish bacteria from archaea and eukarya were found. This RNA is yet another RNA that can be a molecular criterion to divide the living world into three domains (bacteria, archaea, and eukarya). The catalytic center of this RNA, and its detection from some environmental whole genome shotgun sequences, is also discussed.
Collapse
Affiliation(s)
- Yong Li
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06520, USA
| | | |
Collapse
|
314
|
Lambert A, Fontaine JF, Legendre M, Leclerc F, Permal E, Major F, Putzer H, Delfour O, Michot B, Gautheret D. The ERPIN server: an interface to profile-based RNA motif identification. Nucleic Acids Res 2004; 32:W160-5. [PMID: 15215371 PMCID: PMC441556 DOI: 10.1093/nar/gkh418] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
ERPIN is an RNA motif identification program that takes an RNA sequence alignment as an input and identifies related sequences using a profile-based dynamic programming algorithm. ERPIN differs from other RNA motif search programs in its ability to capture subtle biases in the training set and produce highly specific and sensitive searches, while keeping CPU requirements at a practical level. In its latest version, ERPIN also computes E-values, which tell biologists how likely they are to encounter a specific sequence match by chance-a useful indication of biological significance. We present here the ERPIN online search interface (http://tagc.univ-mrs.fr/erpin/). This web server automatically performs ERPIN searches for different RNA genes or motifs, using predefined training sets and search parameters. With a couple of clicks, users can analyze an entire bacterial genome or a genomic segment of up to 5Mb for the presence of tRNAs, 5S rRNAs, SRP RNA, C/D box snoRNAs, hammerhead motifs, miRNAs and other motifs. Search results are displayed with sequence, score, position, E-value and secondary structure graphics. An example of a complete genome scan is provided, as well as an evaluation of run times and specificity/sensitivity information for all available motifs.
Collapse
Affiliation(s)
- André Lambert
- CNRS UMR 6207, Université de la Méditerranée, Luminy Case 906, 13288 Marseille, Cedex 09, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
315
|
Schattner P, Decatur WA, Davis CA, Ares M, Fournier MJ, Lowe TM. Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome. Nucleic Acids Res 2004; 32:4281-96. [PMID: 15306656 PMCID: PMC514388 DOI: 10.1093/nar/gkh768] [Citation(s) in RCA: 129] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2004] [Revised: 07/15/2004] [Accepted: 07/26/2004] [Indexed: 12/21/2022] Open
Abstract
One of the largest families of small RNAs in eukaryotes is the H/ACA small nucleolar RNAs (snoRNAs), most of which guide RNA pseudouridine formation. So far, an effective computational method specifically for identifying H/ACA snoRNA gene sequences has not been established. We have developed snoGPS, a program for computationally screening genomic sequences for H/ACA guide snoRNAs. The program implements a deterministic screening algorithm combined with a probabilistic model to score gene candidates. We report here the results of testing snoGPS on the budding yeast Saccharomyces cerevisiae. Six candidate snoRNAs were verified as novel RNA transcripts, and five of these were verified as guides for pseudouridine formation at specific sites in ribosomal RNA. We also predicted 14 new base-pairings between snoRNAs and known pseudouridine sites in S.cerevisiae rRNA, 12 of which were verified by gene disruption and loss of the cognate pseudouridine site. Our findings include the first prediction and verification of snoRNAs that guide pseudouridine modification at more than two sites. With this work, 41 of the 44 known pseudouridine modifications in S.cerevisiae rRNA have been linked with a verified snoRNA, providing the most complete accounting of the H/ACA snoRNAs that guide pseudouridylation in any species.
Collapse
MESH Headings
- Algorithms
- Base Sequence
- Computational Biology/methods
- Genome, Fungal
- Genomics/methods
- Molecular Sequence Data
- Phylogeny
- Pseudouridine/chemistry
- Pseudouridine/metabolism
- RNA, Fungal/chemistry
- RNA, Fungal/genetics
- RNA, Fungal/metabolism
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/metabolism
- RNA, Small Nucleolar/chemistry
- RNA, Small Nucleolar/genetics
- RNA, Small Nucleolar/physiology
- Saccharomyces cerevisiae/genetics
- Saccharomyces cerevisiae/metabolism
- Software
- RNA, Small Untranslated
Collapse
Affiliation(s)
- Peter Schattner
- Department of Biomolecular Engineering, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | | | | | | | | | | |
Collapse
|
316
|
Reeder J, Giegerich R. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics 2004; 5:104. [PMID: 15294028 PMCID: PMC514697 DOI: 10.1186/1471-2105-5-104] [Citation(s) in RCA: 215] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2004] [Accepted: 08/04/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n6)time and O(n4) space algorithm by Rivas and Eddy is currently the best available program. RESULTS We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n4) time and O(n2) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm. CONCLUSIONS RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm.
Collapse
Affiliation(s)
- Jens Reeder
- Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany
| | - Robert Giegerich
- Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany
| |
Collapse
|
317
|
Pavesi G, Mauri G, Stefani M, Pesole G. RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences. Nucleic Acids Res 2004; 32:3258-69. [PMID: 15199174 PMCID: PMC434454 DOI: 10.1093/nar/gkh650] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2004] [Revised: 04/05/2004] [Accepted: 05/21/2004] [Indexed: 11/13/2022] Open
Abstract
The recent interest sparked due to the discovery of a variety of functions for non-coding RNA molecules has highlighted the need for suitable tools for the analysis and the comparison of RNA sequences. Many trans-acting non-coding RNA genes and cis-acting RNA regulatory elements present motifs, conserved both in structure and sequence, that can be hardly detected by primary sequence analysis alone. We present an algorithm that takes as input a set of unaligned RNA sequences expected to share a common motif, and outputs the regions that are most conserved throughout the sequences, according to a similarity measure that takes into account both the sequence of the regions and the secondary structure they can form according to base-pairing and thermodynamic rules. Only a single parameter is needed as input, which denotes the number of distinct hairpins the motif has to contain. No further constraints on the size, number and position of the single elements comprising the motif are required. The algorithm can be split into two parts: first, it extracts from each input sequence a set of candidate regions whose predicted optimal secondary structure contains the number of hairpins given as input. Then, the regions selected are compared with each other to find the groups of most similar ones, formed by a region taken from each sequence. To avoid exhaustive enumeration of the search space and to reduce the execution time, a greedy heuristic is introduced for this task. We present different experiments, which show that the algorithm is capable of characterizing and discovering known regulatory motifs in mRNA like the iron responsive element (IRE) and selenocysteine insertion sequence (SECIS) stem-loop structures. We also show how it can be applied to corrupted datasets in which a motif does not appear in all the input sequences, as well as to the discovery of more complex motifs in the non-coding RNA.
Collapse
Affiliation(s)
- Giulio Pavesi
- Department of Computer Science and Communication-(D.I.Co.), University of Milan, Via Comelico 39, 20135 Milan, Italy
| | | | | | | |
Collapse
|
318
|
Paredes CJ, Rigoutsos I, Papoutsakis ET. Transcriptional organization of the Clostridium acetobutylicum genome. Nucleic Acids Res 2004; 32:1973-81. [PMID: 15060177 PMCID: PMC390361 DOI: 10.1093/nar/gkh509] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Prokaryotic genes are frequently organized in multicistronic operons (or transcriptional units, TUs), and usually the regulatory motifs for the whole TU are located upstream of the first TU gene. Although the number of sequenced genomes has increased dramatically, experimental information on TU organization is extremely limited. Even for organisms as extensively studied as Escherichia coli and Bacillus subtilis, TU annotation is far from complete. It therefore becomes imperative to rely on computational approaches to complement experimental information. Here we present a TU map for the obligate anaerobe Clostridium acetobutylicum ATCC 824. This map is largely based on the distance between pairs of consecutive genes but enhanced and refined by predictions of several types of promoters (sigmaA, sigmaE and sigmaF/G) and rho-independent terminator structures. Based on the set of known C.acetobutylicum TUs, the presented TU map offers an 88% prediction accuracy.
Collapse
Affiliation(s)
- Carlos J Paredes
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA
| | | | | |
Collapse
|
319
|
Zorn J, Gan HH, Shiffeldrim N, Schlick T. Structural motifs in ribosomal RNAs: implications for RNA design and genomics. Biopolymers 2004; 73:340-7. [PMID: 14755570 DOI: 10.1002/bip.10525] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The various motifs of RNA molecules are closely related to their structural and functional properties. To better understand the nature and distributions of such structural motifs (i.e., paired and unpaired bases in stems, junctions, hairpin loops, bulges, and internal loops) and uncover characteristic features, we analyze the large 16S and 23S ribosomal RNAs of Escherichia coli. We find that the paired and unpaired bases in structural motifs have characteristic distribution shapes and ranges; for example, the frequency distribution of paired bases in stems declines linearly with the number of bases, whereas that for unpaired bases in junctions has a pronounced peak. Significantly, our survey reveals that the ratio of total (over the entire molecule) unpaired to paired bases (0.75) and the fraction of bases in stems (0.6), junctions (0.16), hairpin loops (0.12), and bulges/internal loops (0.12) are shared by 16S and 23S ribosomal RNAs, suggesting that natural RNAs may maintain certain proportions of bases in various motifs to ensure structural integrity. These findings may help in the design of novel RNAs and in the search (via constraints) for RNA-coding motifs in genomes, problems of intense current focus.
Collapse
MESH Headings
- Base Sequence
- Drug Design
- Escherichia coli/chemistry
- Escherichia coli/genetics
- Genomics
- Models, Molecular
- Molecular Sequence Data
- Nucleic Acid Conformation
- RNA, Bacterial/chemistry
- RNA, Bacterial/genetics
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/genetics
- RNA, Ribosomal, 16S/chemistry
- RNA, Ribosomal, 16S/genetics
- RNA, Ribosomal, 23S/chemistry
- RNA, Ribosomal, 23S/genetics
- RNA, Transfer/chemistry
Collapse
Affiliation(s)
- Julie Zorn
- Department of Chemistry, New York University, 1021 Silver, 100 Washington Square East, New York, NY 10003, USA
| | | | | | | |
Collapse
|
320
|
Bornes S, Boulard M, Hieblot C, Zanibellato C, Iacovoni JS, Prats H, Touriol C. Control of the vascular endothelial growth factor internal ribosome entry site (IRES) activity and translation initiation by alternatively spliced coding sequences. J Biol Chem 2004; 279:18717-26. [PMID: 14764596 DOI: 10.1074/jbc.m308410200] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The vascular endothelial growth factor-A (VEGF) gene locus contains eight exons that span 14 kb. Alternative splicing generates multiple, different mRNAs that in turn translate into at least five protein isoforms. While the canonical AUG start codon is located at position 1039 in exon 1, there also exists an upstream, in-frame CUG initiation codon that drives expression of L-VEGF, containing an additional 180 amino acids. Two separate internal ribosome entry sites (IRES) regulate the activity of each initiation codon. Thus the 5'-UTR of VEGF, which comprises the majority of exon 1, consists of IRES B, the CUG, IRES A, and the AUG, from 5' to 3'. Previously, it has been shown that IRES B regulates initiation at the CUG and IRES A regulates AUG usage. In this study, we have found evidence that the exon content of the VEGF mRNA, determined through alternative splicing, controls IRES A activity. While the CUG is most efficient at initiating translation, transcripts that lack both exons 6 and 7 and therefore contain an exon 5/8 junction lack AUG-initiated translation. The process of splicing is not responsible for this start codon selection since transfection of genomic and cDNA VEGF sequences give the same expression pattern. We hypothesize that long range tertiary interactions in the VEGF mRNA regulate IRES activity and thus control start codon selection. This is the first report describing the influence of alternatively spliced coding sequences on codon selection by modulating IRES activity.
Collapse
Affiliation(s)
- Stéphanie Bornes
- Institut National de la Santé et de la Recherche Médicale INSERM U589, Hormones, Facteurs de Croissance et Physiopathologie Vasculaire, Institut Fédératif de Recherche Louis Bugnard, C. H. U. Rangueil, 31403 Toulouse Cedex 04, France
| | | | | | | | | | | | | |
Collapse
|
321
|
Badidi E, De Sousa C, Lang BF, Burger G. AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis. BMC Bioinformatics 2003; 4:63. [PMID: 14678565 PMCID: PMC328086 DOI: 10.1186/1471-2105-4-63] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2003] [Accepted: 12/16/2003] [Indexed: 11/26/2022] Open
Abstract
Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere.
Collapse
Affiliation(s)
- Elarbi Badidi
- The Canadian Institute for Advanced Research, Program in Evolutionary Biology. Département de Biochimie, Université de Montréal, 2900, Boul. Édouard Montpetit, Montréal, QC H3T 1J4, Canada
| | - Cristina De Sousa
- The Canadian Institute for Advanced Research, Program in Evolutionary Biology. Département de Biochimie, Université de Montréal, 2900, Boul. Édouard Montpetit, Montréal, QC H3T 1J4, Canada
| | - B Franz Lang
- The Canadian Institute for Advanced Research, Program in Evolutionary Biology. Département de Biochimie, Université de Montréal, 2900, Boul. Édouard Montpetit, Montréal, QC H3T 1J4, Canada
| | - Gertraud Burger
- The Canadian Institute for Advanced Research, Program in Evolutionary Biology. Département de Biochimie, Université de Montréal, 2900, Boul. Édouard Montpetit, Montréal, QC H3T 1J4, Canada
| |
Collapse
|
322
|
Klein RJ, Eddy SR. RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics 2003; 4:44. [PMID: 14499004 PMCID: PMC239859 DOI: 10.1186/1471-2105-4-44] [Citation(s) in RCA: 160] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2003] [Accepted: 09/22/2003] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND For many RNA molecules, secondary structure rather than primary sequence is the evolutionarily conserved feature. No programs have yet been published that allow searching a sequence database for homologs of a single RNA molecule on the basis of secondary structure. RESULTS We have developed a program, RSEARCH, that takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. For this purpose, we have developed a series of base pair and single nucleotide substitution matrices for RNA sequences called RIBOSUM matrices. RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit. We show several examples in which RSEARCH outperforms the primary sequence search programs BLAST and SSEARCH. The primary drawback of the program is that it is slow. The C code for RSEARCH is freely available from our lab's website. CONCLUSION RSEARCH outperforms primary sequence programs in finding homologs of structured RNA sequences.
Collapse
MESH Headings
- Animals
- Arabidopsis/genetics
- Archaeoglobus fulgidus/genetics
- Base Composition/genetics
- Computational Biology/methods
- Computational Biology/standards
- Computational Biology/statistics & numerical data
- Databases, Genetic
- MicroRNAs/chemistry
- MicroRNAs/genetics
- Models, Genetic
- Nucleic Acid Conformation
- Pyrococcus horikoshii/genetics
- RNA/chemistry
- RNA/genetics
- RNA, Archaeal/chemistry
- RNA, Archaeal/genetics
- RNA, Fungal/chemistry
- RNA, Fungal/genetics
- RNA, Helminth/chemistry
- RNA, Helminth/classification
- RNA, Helminth/genetics
- RNA, Plant/chemistry
- RNA, Plant/genetics
- RNA, Transfer, Ala/chemistry
- RNA, Transfer, Ala/genetics
- RNA, Transfer, Asn/chemistry
- RNA, Transfer, Asn/genetics
- Ribonuclease P/chemistry
- Ribonuclease P/genetics
- Saccharomyces cerevisiae/genetics
- Sequence Homology, Nucleic Acid
- Signal Recognition Particle/genetics
- Software/standards
- Software/statistics & numerical data
- Software Validation
Collapse
Affiliation(s)
- Robert J Klein
- Howard Hughes Medical Institute & Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110 USA
| | - Sean R Eddy
- Howard Hughes Medical Institute & Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110 USA
| |
Collapse
|
323
|
Grillo G, Licciulli F, Liuni S, Sbisà E, Pesole G. PatSearch: A program for the detection of patterns and structural motifs in nucleotide sequences. Nucleic Acids Res 2003; 31:3608-12. [PMID: 12824377 PMCID: PMC168955 DOI: 10.1093/nar/gkg548] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Regulation of gene expression at transcriptional and post-transcriptional level involves the interaction between short DNA or RNA tracts and the corresponding trans-acting protein factors. Detection of such cis-acting elements in genome-wide screenings may significantly contribute to genome annotation and comparative analysis as well as to target functional characterization experiments. We present here PatSearch, a flexible and fast pattern matcher able to search for specific combinations of oligonucleotide consensus sequences, secondary structure elements and position-weight matrices. It can also allow for mismatches/mispairings below a user fixed threshold. We report three different applications of the program in the search of complex patterns such as those of the iron responsive element hairpin-loop structure, the p53 responsive element and a promoter module containing CAAT-, TATA- and cap-boxes. PatSearch is available on the web at http://bighost.area.ba.cnr.it/BIG/PatSearch/.
Collapse
Affiliation(s)
- Giorgio Grillo
- Sezione di Bioinformatica e Genomica di Bari, Istituto Tecnologie Biomediche CNR, via Amendola 168/5, 70125 Bari, Italy
| | | | | | | | | |
Collapse
|
324
|
Delihas N. Annotation and evolutionary relationships of a small regulatory RNA gene micF and its target ompF in Yersinia species. BMC Microbiol 2003; 3:13. [PMID: 12834539 PMCID: PMC166144 DOI: 10.1186/1471-2180-3-13] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2003] [Accepted: 06/30/2003] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND micF RNA, a small regulatory RNA found in bacteria, post-transcriptionally regulates expression of outer membrane protein F (OmpF) by interaction with the ompF mRNA 5'UTR. Phylogenetic data can be useful for RNA/RNA duplex structure analyses and aid in elucidation of mechanism of regulation. However micF and associated genes, ompF and ompC are difficult to annotate because of either similarities or divergences in nucleotide sequence. We report by using sequences that represent "gene signatures" as probes, e.g., mRNA 5'UTR sequences, closely related genes can be accurately located in genomic sequences. RESULTS Alignment and search methods using NCBI BLAST programs have been used to identify micF, ompF and ompC in Yersinia pestis and Yersinia enterocolitica. By alignment with DNA sequences from other bacterial species, 5' start sites of genes and upstream transcriptional regulatory sites in promoter regions were predicted. Annotated genes from Yersinia species provide phylogenetic information on the micF regulatory system. High sequence conservation in binding sites of transcriptional regulatory factors are found in the promoter region upstream of micF and conservation in blocks of sequences as well as marked sequence variation is seen in segments of the micF RNA gene. Unexpected large differences in rates of evolution were found between the interacting RNA transcripts, micF RNA and the 5' UTR of the ompF mRNA. micF RNA/ompF mRNA 5' UTR duplex structures were modeled by the mfold program. Functional domains such as RNA/RNA interacting sites appear to display a minimum of evolutionary drift in sequence with the exception of a significant change in Y. enterocolitica micF RNA. CONCLUSIONS Newly annotated Yersinia micF and ompF genes and the resultant RNA/RNA duplex structures add strong phylogenetic support for a generalized duplex model. The alignment and search approach using 5' UTR signatures may be a model to help define other genes and their start sites when annotated genes are available in well-defined reference organisms.
Collapse
MESH Headings
- 5' Untranslated Regions/chemistry
- Base Sequence
- Evolution, Molecular
- Genes, Bacterial
- Molecular Sequence Data
- Nucleic Acid Conformation
- Phylogeny
- Porins/classification
- Porins/genetics
- RNA, Bacterial/chemistry
- RNA, Bacterial/classification
- RNA, Bacterial/genetics
- RNA, Messenger/chemistry
- RNA, Untranslated/chemistry
- RNA, Untranslated/classification
- RNA, Untranslated/genetics
- Regulatory Sequences, Nucleic Acid
- Sequence Alignment
- Transcription Initiation Site
- Yersinia/genetics
- Yersinia enterocolitica/genetics
- Yersinia pestis/genetics
Collapse
Affiliation(s)
- Nicholas Delihas
- Department of Molecular Genetics and Microbiology, School of Medicine, SUNY Stony Brook, NY 11794-5222, USA.
| |
Collapse
|
325
|
Abstract
We describe a novel procedure for generating and optimizing pattern descriptors that can be used to find structural motifs in DNA or RNA sequences. This combines a pattern-description language (based primarily on secondary structure alignment and conservation of some key nucleotides) with a scoring function that relies heavily on estimated folding free energies for the secondary structure of interest. For the cloverleaf secondary structure characteristic of tRNA, we show that a fairly simple pattern descriptor can find almost all known tRNA genes in both bacterial and eukaryotic genomes, and that false positives (sequences that match the pattern but that are probably not tRNAs) can be recognized by their high estimated folding free energies. A general procedure for optimizing descriptors (and hence for finding new structural motifs) is also described. For six bacterial, four eukaryotic, and four archaea genome sequences, our results compare favorably with those of the more complex and specialized tRNAscan-SE algorithm. Prospects for using this general approach to find other RNA structural motifs are discussed.
Collapse
Affiliation(s)
- Vickie Tsui
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | | | |
Collapse
|
326
|
Dai L, Zimmerly S. ORF-less and reverse-transcriptase-encoding group II introns in archaebacteria, with a pattern of homing into related group II intron ORFs. RNA (NEW YORK, N.Y.) 2003; 9:14-19. [PMID: 12554871 PMCID: PMC1370365 DOI: 10.1261/rna.2126203] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2002] [Accepted: 10/07/2002] [Indexed: 05/24/2023]
Abstract
Although group II intron retroelements are prevalent in eubacteria, they have not been identified in archaebacteria in the first 10 genomes sequenced. However, the recently sequenced archael genome of Methanosarcina acetivorans contains 21 group II introns, including 7 introns that do not encode reverse transcriptase ORFs. To our knowledge, these are the first retroelements identified in archaebacteria, and the first ORF-less group II introns in bacteria. Furthermore, the insertion pattern of the introns is highly unusual. The introns appear to insert site-specifically into ORFs of other group II introns, forming nested clusters of up to four introns, but there are no flanking exons that could encode a functional protein after the introns have been spliced out.
Collapse
|
327
|
Abstract
In recent years, noncoding RNAs (ncRNAs) have been shown to constitute key elements implicated in a number of regulatory mechanisms in the cell. They are present in bacteria and eukaryotes. The ncRNAs are involved in regulation of expression at both transcriptional and posttranscriptional levels, by mediating chromatin modifications, modulating transcription factor activity, and influencing mRNA stability, processing, and translation. Noncoding RNAs play a key role in genetic imprinting, dosage compensation of X-chromosome-linked genes, and many processes of differentiation and development.
Collapse
Affiliation(s)
- Maciej Szymański
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznań, Poland
| | | |
Collapse
|
328
|
Fogel GB, Porto VW, Weekes DG, Fogel DB, Griffey RH, McNeil JA, Lesnik E, Ecker DJ, Sampath R. Discovery of RNA structural elements using evolutionary computation. Nucleic Acids Res 2002; 30:5310-7. [PMID: 12466557 PMCID: PMC137967 DOI: 10.1093/nar/gkf653] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
RNA molecules fold into characteristic secondary and tertiary structures that account for their diverse functional activities. Many of these RNA structures, or certain structural motifs within them, are thought to recur in multiple genes within a single organism or across the same gene in several organisms and provide a common regulatory mechanism. Search algorithms, such as RNAMotif, can be used to mine nucleotide sequence databases for these repeating motifs. RNAMotif allows users to capture essential features of known structures in detailed descriptors and can be used to identify, with high specificity, other similar motifs within the nucleotide database. However, when the descriptor constraints are relaxed to provide more flexibility, or when there is very little a priori information about hypothesized RNA structures, the number of motif 'hits' may become very large. Exhaustive methods to search for similar RNA structures over these large search spaces are likely to be computationally intractable. Here we describe a powerful new algorithm based on evolutionary computation to solve this problem. A series of experiments using ferritin IRE and SRP RNA stem-loop motifs were used to verify the method. We demonstrate that even when searching extremely large search spaces, of the order of 10(23) potential solutions, we could find the correct solution in a fraction of the time it would have taken for exhaustive comparisons.
Collapse
Affiliation(s)
- Gary B Fogel
- Natural Selection Inc., 3333 North Torrey Pines Court, Suite 200, La Jolla, CA 92037, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
329
|
Lesnik EA, Sampath R, Ecker DJ. Rev response elements (RRE) in lentiviruses: an RNAMotif algorithm-based strategy for RRE prediction. Med Res Rev 2002; 22:617-36. [PMID: 12369091 DOI: 10.1002/med.10027] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Lentiviruses (a sub-family of the retroviridae family) include primate and non-primate viruses associated with chronic diseases of the immune system and the central nervous system. All lentiviruses encode a regulatory protein Rev that is essential for post-transcriptional transport of the unspliced and incompletely spliced viral mRNAs from nuclei to cytoplasm. The Rev protein acts via binding to an RNA structural element known as the Rev responsive element (RRE). The RRE location and structure and the mechanism of the Rev-RRE interaction in primate and non-primate lentiviruses have been analyzed and compared. Based on structural data available for RRE of HIV-1, a two step computational strategy for prediction of putative RRE regions in lentivirus genomes has been developed. First, the RNAMotif algorithm was used to search genomic sequence for highly structured regions (HSR). Then the program RNAstructure, version 3.6 was used to calculate the structure and thermodynamic stability of the region of approximately 350 nucleotides encompassing the HSR. Our strategy correctly predicted the locations of all previously reported lentivirus RREs. We were able also to predict the locations and structures of potential RREs in four additional lentiviruses.
Collapse
Affiliation(s)
- Elena A Lesnik
- IBIS Therapeutics, 2292 Faraday Ave, Carlsbad, California 92008, USA
| | | | | |
Collapse
|
330
|
Leontis NB, Stombaugh J, Westhof E. The non-Watson-Crick base pairs and their associated isostericity matrices. Nucleic Acids Res 2002; 30:3497-531. [PMID: 12177293 PMCID: PMC134247 DOI: 10.1093/nar/gkf481] [Citation(s) in RCA: 585] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
RNA molecules exhibit complex structures in which a large fraction of the bases engage in non-Watson-Crick base pairing, forming motifs that mediate long-range RNA-RNA interactions and create binding sites for proteins and small molecule ligands. The rapidly growing number of three-dimensional RNA structures at atomic resolution requires that databases contain the annotation of such base pairs. An unambiguous and descriptive nomenclature was proposed recently in which RNA base pairs were classified by the base edges participating in the interaction (Watson-Crick, Hoogsteen/CH or sugar edge) and the orientation of the glycosidic bonds relative to the hydrogen bonds (cis or trans). Twelve basic geometric families were identified and all 12 have been observed in crystal structures. For each base pairing family, we present here the 4 x 4 'isostericity matrices' summarizing the geometric relationships between the 16 pairwise combinations of the four standard bases, A, C, G and U. Whenever available, a representative example of each observed base pair from X-ray crystal structures (3.0 A resolution or better) is provided or, otherwise, theoretically plausible models. This format makes apparent the recurrent geometric patterns that are observed and helps identify isosteric pairs that co-vary or interchange in sequences of homologous molecules while maintaining conserved three-dimensional motifs.
Collapse
Affiliation(s)
- Neocles B Leontis
- Chemistry Department and Center for Biomolecular Sciences, Overman Hall, Bowling Green State University, Bowling Green, OH 43403, USA.
| | | | | |
Collapse
|
331
|
Chen S, Lesnik EA, Hall TA, Sampath R, Griffey RH, Ecker DJ, Blyn LB. A bioinformatics based approach to discover small RNA genes in the Escherichia coli genome. Biosystems 2002; 65:157-77. [PMID: 12069726 DOI: 10.1016/s0303-2647(02)00013-8] [Citation(s) in RCA: 182] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The recent explosion in available bacterial genome sequences has initiated the need to improve an ability to annotate important sequence and structural elements in a fast, efficient and accurate manner. In particular, small non-coding RNAs (sRNAs) have been difficult to predict. The sRNAs play an important number of structural, catalytic and regulatory roles in the cell. Although a few groups have recently published prediction methods for annotating sRNAs in bacterial genome, much remains to be done in this field. Toward the goal of developing an efficient method for predicting unknown sRNA genes in the completed Escherichia coli genome, we adopted a bioinformatics approach to search for DNA regions that contain a sigma70 promoter within a short distance of a rho-independent terminator. Among a total of 227 candidate sRNA genes initially identified, 32 were previously described sRNAs, orphan tRNAs, and partial tRNA and rRNA operons. Fifty-one are mRNAs genes encoding annotated extremely small open reading frames (ORFs) following an acceptable ribosome binding site. One hundred forty-four are potentially novel non-translatable sRNA genes. Using total RNA isolated from E. coli MG1655 cells grown under four different conditions, we verified transcripts of some of the genes by Northern hybridization. Here we summarize our data and discuss the rules and advantages/disadvantages of using this approach in annotating sRNA genes on bacterial genomes.
Collapse
Affiliation(s)
- Shuo Chen
- Ibis Therapeutics, Isis Pharmaceuticals, Inc, 2292 Faraday Ave, Carlsbad, CA 92008, USA.
| | | | | | | | | | | | | |
Collapse
|
332
|
Lesnik EA, Sampath R, Levene HB, Henderson TJ, McNeil JA, Ecker DJ. Prediction of rho-independent transcriptional terminators in Escherichia coli. Nucleic Acids Res 2001; 29:3583-94. [PMID: 11522828 PMCID: PMC55870 DOI: 10.1093/nar/29.17.3583] [Citation(s) in RCA: 180] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A new algorithm called RNAMotif containing RNA structure and sequence constraints and a thermodynamic scoring system was used to search for intrinsic rho-independent terminators in the Escherichia coli K-12 genome. We identified all 135 reported terminators and 940 putative terminator sequences beginning no more than 60 nt away from the 3'-end of the annotated transcription units (TU). Putative and reported terminators with the scores above our chosen threshold were found for 37 of the 53 non-coding RNA TU and for almost 50% of the 2592 annotated protein-encoding TU, which correlates well with the number of TU expected to contain rho-independent terminators. We also identified 439 terminators that could function in a bi-directional fashion, servicing one gene on the positive strand and a different gene on the negative strand. Approximately 700 additional termination signals in non-coding regions (NCR) far away from the nearest annotated gene were predicted. This number correlates well with the excess number of predicted 'orphan' promoters in the NCR, and these promoters and terminators may be associated with as yet unidentified TU. The significant number of high scoring hits that occurred within the reading frame of annotated genes suggests that either an additional component of rho-independent terminators exists or that a suppressive mechanism to prevent unwanted termination remains to be discovered.
Collapse
Affiliation(s)
- E A Lesnik
- IBIS Therapeutics, 2292 Faraday Avenue, Carlsbad, CA 92008, USA
| | | | | | | | | | | |
Collapse
|