251
|
Considerations in the identification of functional RNA structural elements in genomic alignments. BMC Bioinformatics 2007; 8:33. [PMID: 17263882 PMCID: PMC1803800 DOI: 10.1186/1471-2105-8-33] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2006] [Accepted: 01/30/2007] [Indexed: 11/25/2022] Open
Abstract
Background Accurate identification of novel, functional noncoding (nc) RNA features in genome sequence has proven more difficult than for exons. Current algorithms identify and score potential RNA secondary structures on the basis of thermodynamic stability, conservation, and/or covariance in sequence alignments. Neither the algorithms nor the information gained from the individual inputs have been independently assessed. Furthermore, due to issues in modelling background signal, it has been difficult to gauge the precision of these algorithms on a genomic scale, in which even a seemingly small false-positive rate can result in a vast excess of false discoveries. Results We developed a shuffling algorithm, shuffle-pair.pl, that simultaneously preserves dinucleotide frequency, gaps, and local conservation in pairwise sequence alignments. We used shuffle-pair.pl to assess precision and recall of six ncRNA search tools (MSARI, QRNA, ddbRNA, RNAz, Evofold, and several variants of simple thermodynamic stability on a test set of 3046 alignments of known ncRNAs. Relative to mononucleotide shuffling, preservation of dinucleotide content in shuffling the alignments resulted in a drastic increase in estimated false-positive detection rates for ncRNA elements, precluding evaluation of higher order alignments, which cannot not be adequately shuffled maintaining both dinucleotides and alignment structure. On pairwise alignments, none of the covariance-based tools performed markedly better than thermodynamic scoring alone. Although the high false-positive rates call into question the veracity of any individual predicted secondary structural element in our analysis, we nevertheless identified intriguing global trends in human genome alignments. The distribution of ncRNA prediction scores in 75-base windows overlapping UTRs, introns, and intergenic regions analyzed using both thermodynamic stability and EvoFold (which has no thermodynamic component) was significantly higher for real than shuffled sequence, while the distribution for coding sequences was lower than that of corresponding shuffles. Conclusion Accurate prediction of novel RNA structural elements in genome sequence remains a difficult problem, and development of an appropriate negative-control strategy for multiple alignments is an important practical challenge. Nonetheless, the general trends we observed for the distributions of predicted ncRNAs across genomic features are biologically meaningful, supporting the presence of secondary structural elements in many 3' UTRs, and providing evidence for evolutionary selection against secondary structures in coding regions.
Collapse
|
252
|
Tolbert BS, Kennedy SD, Schroeder SJ, Krugh TR, Turner DH. NMR structures of (rGCUGAGGCU)2 and (rGCGGAUGCU)2: probing the structural features that shape the thermodynamic stability of GA pairs. Biochemistry 2007; 46:1511-22. [PMID: 17279616 PMCID: PMC4032317 DOI: 10.1021/bi061350m] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The NMR structures of [see text] and [see text] are reported. The internal loop, [see text], is about 2 kcal/mol more stable than [see text] at 37 degrees C. The duplexes assemble into similar global folds characterized by the formation of tandem sheared GA pairs. The different stabilities of the loops are accompanied by differences in the local structure of the closing GU pairs. In the [see text] internal loop, the GU pairs form canonical wobble configurations with two hydrogen bonds, whereas in [see text], the GU pairs form a single hydrogen bond involving the amino group, GH22, and the carbonyl group, UO4. This pairing is similar to the GU closing pair of the 690 hairpin loop found in E. coli 16S rRNA. The [see text] and [see text] structures reveal how the subtle interplay between stacking and hydrogen bonding determines sequence dependent conformation and thermodynamic stability. Thus, this work provides structural and thermodynamic benchmarks for theoreticians in the ongoing effort to understand the sequence dependence of RNA physicochemical properties.
Collapse
Affiliation(s)
- Blanton S. Tolbert
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642
| | - Scott D. Kennedy
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642
| | - Susan J. Schroeder
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, Oklahoma 73019-3051
| | - Thomas R. Krugh
- Department of Chemistry, University of Rochester, Rochester, NY 14627-0216
| | - Douglas H. Turner
- Department of Chemistry, University of Rochester, Rochester, NY 14627-0216
- Center for Pediatric Biomedical Research and Department of Pediatrics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642
- To whom correspondence should be addressed to: , (Phone) 585-275-3207, (Fax) 585-276-0205
| |
Collapse
|
253
|
Prakash A, Tompa M. Measuring the accuracy of genome-size multiple alignments. Genome Biol 2007; 8:R124. [PMID: 17594489 PMCID: PMC2394773 DOI: 10.1186/gb-2007-8-6-r124] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Revised: 12/07/2006] [Accepted: 06/26/2007] [Indexed: 02/07/2023] Open
Abstract
Whole-genome alignments are invaluable for comparative genomics. Before doing any comparative analysis on a region of interest, one must have confidence in that region's alignment. We provide a methodology to measure the accuracy of arbitrary regions of these alignments, and apply it to the UCSC Genome Browser's 17-vertebrate alignment. We identify 9.7% (21 Mbp) of the human chromosome 1 alignment as suspiciously aligned. We present independent evidence that many of these suspicious regions represent misalignments.
Collapse
Affiliation(s)
- Amol Prakash
- Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195-2350, USA
- Thermo BRIMS Center, Memorial Drive, Cambridge, MA 02139, USA
| | - Martin Tompa
- Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195-2350, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195-2350, USA
| |
Collapse
|
254
|
Kiryu H, Kin T, Asai K. Robust prediction of consensus secondary structures using averaged base pairing probability matrices. Bioinformatics 2006; 23:434-41. [PMID: 17182698 DOI: 10.1093/bioinformatics/btl636] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. RESULTS We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. AVAILABILITY The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hisanori Kiryu
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-42 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| | | | | |
Collapse
|
255
|
Obernosterer G, Meister G, Poy MN, Kuras A. The impact of small RNAs. Microsymposium on small RNAs. EMBO Rep 2006; 8:23-7. [PMID: 17170758 PMCID: PMC1796758 DOI: 10.1038/sj.embor.7400874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2006] [Accepted: 11/02/2006] [Indexed: 11/08/2022] Open
Affiliation(s)
- Gregor Obernosterer
- Institute of Molecuar Biotechnology of the Austrian Academy of Sciences (IMBA), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Gunter Meister
- Max-Planck-Institut für Biochemie, Am Klopferspitz 18, 82152 Martinsried, Germany
| | - Matthew N Poy
- Laboratory of Metabolic Diseases, The Rockefeller University, 1230 York Avenue, New York, New York 10021, USA
| | - Annerose Kuras
- Institute of Molecuar Biotechnology of the Austrian Academy of Sciences (IMBA), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
- Tel: +43 1 79044 4668; Fax: +43 1 79044 110;
| |
Collapse
|
256
|
Abstract
The noncoding RNA database (ncRNAdb) was created as a source of information on RNA molecules, which do not possess protein-coding capacity. It is now widely accepted that, in addition to constitutively expressed, housekeeping or infrastructural RNAs, there is a wide variety of RNAs participating in mechanisms involved in regulation of gene expression at all levels of transmission of genetic information from DNA to proteins. Noncoding RNAs' activities include chromatin structure remodeling, transcriptional and translational regulation of gene expression, modulation of protein function and regulation of subcellular distribution of RNAs as well as proteins. Noncoding transcripts have been identified in organisms belonging to all domains of life. Currently, the ncRNAdb contains >30 000 ncRNA sequences from Eukaryotes, Eubacteria and Archaea, but does not include housekeeping transcripts or microRNAs and snoRNAs for which more specialized databases are available. The contents of the database can be accessed via the WWW at .
Collapse
Affiliation(s)
- Maciej Szymanski
- Institute of Bioorganic Chemistry of the Polish Academy of Sciences, Noskowskiego 12 61-704 Poznan, Poland.
| | | | | |
Collapse
|
257
|
Freyhult EK, Bollback JP, Gardner PP. Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. Genome Res 2006; 17:117-25. [PMID: 17151342 PMCID: PMC1716261 DOI: 10.1101/gr.5890907] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Homology search is one of the most ubiquitous bioinformatic tasks, yet it is unknown how effective the currently available tools are for identifying noncoding RNAs (ncRNAs). In this work, we use reliable ncRNA data sets to assess the effectiveness of methods such as BLAST, FASTA, HMMer, and Infernal. Surprisingly, the most popular homology search methods are often the least accurate. As a result, many studies have used inappropriate tools for their analyses. On the basis of our results, we suggest homology search strategies using the currently available tools and some directions for future development.
Collapse
Affiliation(s)
- Eva K. Freyhult
- The Linnaeus Centre for Bioinformatics, Uppsala University, 75124 Uppsala, Sweden
| | - Jonathan P. Bollback
- Evolution Department, Biological Institute, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Paul P. Gardner
- Molecular Evolution Group, Institute of Molecular Biology and Physiology, University of Copenhagen, 2100 Copenhagen, Denmark
- Corresponding author.E-mail ; fax 45-35321300
| |
Collapse
|
258
|
Pang KC, Stephen S, Dinger ME, Engström PG, Lenhard B, Mattick JS. RNAdb 2.0--an expanded database of mammalian non-coding RNAs. Nucleic Acids Res 2006; 35:D178-82. [PMID: 17145715 PMCID: PMC1751534 DOI: 10.1093/nar/gkl926] [Citation(s) in RCA: 133] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
RNAdb is a comprehensive database of mammalian non-protein-coding RNAs (ncRNAs). There is increasing recognition that ncRNAs play important regulatory roles in multicellular organisms, and there is an expanding rate of discovery of novel ncRNAs as well as an increasing allocation of function. In this update to RNAdb, we provide nucleotide sequences and annotations for tens of thousands of non-housekeeping ncRNAs, including a wide range of mammalian microRNAs, small nucleolar RNAs and larger mRNA-like ncRNAs. Some of these have documented functions and/or expression patterns, but the majority remain of unclear significance, and include PIWI-interacting RNAs, ncRNAs identified from the latest rounds of large-scale cDNA sequencing projects, putative antisense transcripts, as well as ncRNAs predicted on the basis of structural features and alignments. Improvements to the database comprise not only new and updated ncRNA datasets, but also provision of microarray-based expression data and closer interface with more specialized ncRNA resources such as miRBase and snoRNA-LBME-db. To access RNAdb, visit http://research.imb.uq.edu.au/RNAdb.
Collapse
Affiliation(s)
- Ken C. Pang
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of QueenslandBrisbane, Queensland 4072, Australia
- T cell Laboratory, Ludwig Institute for Cancer Research, Melbourne Centre for Clinical Sciences, Austin Hospital, HeidelbergVictoria 3084, Australia
| | - Stuart Stephen
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of QueenslandBrisbane, Queensland 4072, Australia
| | - Marcel E. Dinger
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of QueenslandBrisbane, Queensland 4072, Australia
| | - Pär G. Engström
- Computational Biology Unit, Bergen Center for Computational Science, University of BergenBergen, Norway
- Programme for Genomics and Bioinformatics, Department of Cell and Molecular Biology, Karolinska InstitutetStockholm, Sweden
| | - Boris Lenhard
- Computational Biology Unit, Bergen Center for Computational Science, University of BergenBergen, Norway
| | - John S. Mattick
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of QueenslandBrisbane, Queensland 4072, Australia
- To whom correspondence should be addressed. Tel: + 61 7 3346 2079; Fax: +1 61 7 3346 2111;
| |
Collapse
|
259
|
Kin T, Yamada K, Terai G, Okida H, Yoshinari Y, Ono Y, Kojima A, Kimura Y, Komori T, Asai K. fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res 2006; 35:D145-8. [PMID: 17099231 PMCID: PMC1669753 DOI: 10.1093/nar/gkl837] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
There are abundance of transcripts that code for no particular protein and that remain functionally uncharacterized. Some of these transcripts may have novel functions while others might be junk transcripts. Unfortunately, the experimental validation of such transcripts to find functional non-coding RNA candidates is very costly. Therefore, our primary interest is to computationally mine candidate functional transcripts from a pool of uncharacterized transcripts. We introduce fRNAdb: a novel database service that hosts a large collection of non-coding transcripts including annotated/non-annotated sequences from the H-inv database, NONCODE and RNAdb. A set of computational analyses have been performed on the included sequences. These analyses include RNA secondary structure motif discovery, EST support evaluation, cis-regulatory element search, protein homology search, etc. fRNAdb provides an efficient interface to help users filter out particular transcripts under their own criteria to sort out functional RNA candidates. fRNAdb is available at
Collapse
Affiliation(s)
- Taishin Kin
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST) Aomi 2-42, Koto-ku, Tokyo 135-0064, Japan.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
260
|
Costa FF. Non-coding RNAs: lost in translation? Gene 2006; 386:1-10. [PMID: 17113247 DOI: 10.1016/j.gene.2006.09.028] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2006] [Revised: 08/15/2006] [Accepted: 09/13/2006] [Indexed: 01/07/2023]
Abstract
In the last ten years, several RNAs with no protein-coding potential have been accumulating in RNA databases and are in need of further molecular characterization. At the same time, examples of non-coding RNAs (ncRNAs) such as microRNAs, small RNAs, small interfering RNAs (siRNAs) and medium/large RNAs with various functions have been described in the literature. Recent evidence points to a widespread role of these molecules in eukaryotic cells, suggesting that the majority of the new ncRNA examples might have specific functions. The aim of this review is to describe several new functional ncRNAs that have been recently identified and characterized, providing some clues that these molecules might not be produced by chance or as by-products of transcription as has been speculated.
Collapse
Affiliation(s)
- Fabrício F Costa
- Cancer Biology and Epigenomics Program, Children's Memorial Research Center and Northwestern University's Feinberg School of Medicine, 2300 Children's Plaza, Box 220, Chicago, IL 60614, USA
| |
Collapse
|
261
|
Abstract
The knowledge about classes of non-coding RNAs (ncRNAs) is growing very fast and it is mainly the structure which is the common characteristic property shared by members of the same class. For correct characterization of such classes it is therefore of great importance to analyse the structural features in great detail. In this manuscript I present RNAlishapes which combines various secondary structure analysis methods, such as suboptimal folding and shape abstraction, with a comparative approach known as RNA alignment folding. RNAlishapes makes use of an extended thermodynamic model and covariance scoring, which allows to reward covariation of paired bases. Applying the algorithm to a set of bacterial trp-operon leaders using shape abstraction it was able to identify the two alternating conformations of this attenuator. Besides providing in-depth analysis methods for aligned RNAs, the tool also shows a fairly well prediction accuracy. Therefore, RNAlishapes provides the community with a powerful tool for structural analysis of classes of RNAs and is also a reasonable method for consensus structure prediction based on sequence alignments. RNAlishapes is available for online use and download at .
Collapse
Affiliation(s)
- Björn Voss
- Experimental Bioinformatics, Institute of Biology II, Freiburg University, Schänzlestrasse 1, 79104 Freiburg, Germany.
| |
Collapse
|
262
|
Yang JH, Zhang XC, Huang ZP, Zhou H, Huang MB, Zhang S, Chen YQ, Qu LH. snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Res 2006; 34:5112-23. [PMID: 16990247 PMCID: PMC1636440 DOI: 10.1093/nar/gkl672] [Citation(s) in RCA: 100] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2006] [Revised: 08/28/2006] [Accepted: 08/28/2006] [Indexed: 11/23/2022] Open
Abstract
Small nucleolar RNAs (snoRNAs) represent an abundant group of non-coding RNAs in eukaryotes. They can be divided into guide and orphan snoRNAs according to the presence or absence of antisense sequence to rRNAs or snRNAs. Current snoRNA-searching programs, which are essentially based on sequence complementarity to rRNAs or snRNAs, exist only for the screening of guide snoRNAs. In this study, we have developed an advanced computational package, snoSeeker, which includes CDseeker and ACAseeker programs, for the highly efficient and specific screening of both guide and orphan snoRNA genes in mammalian genomes. By using these programs, we have systematically scanned four human-mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes. Eighteen novel snoRNAs were further experimentally confirmed with four snoRNAs exhibiting a tissue-specific or restricted expression pattern. The results of this study provide the most comprehensive listing of two families of snoRNA genes in the human genome till date.
Collapse
Affiliation(s)
- Jian-Hua Yang
- Key Laboratory of Gene Engineering of the Ministry of Education, Zhongshan UniversityGuangzhou 510275, PR China
- State Key Laboratory for Biocontrol, Zhongshan UniversityGuangzhou 510275, PR China
| | - Xiao-Chen Zhang
- Key Laboratory of Gene Engineering of the Ministry of Education, Zhongshan UniversityGuangzhou 510275, PR China
- State Key Laboratory for Biocontrol, Zhongshan UniversityGuangzhou 510275, PR China
| | - Zhan-Peng Huang
- Key Laboratory of Gene Engineering of the Ministry of Education, Zhongshan UniversityGuangzhou 510275, PR China
- State Key Laboratory for Biocontrol, Zhongshan UniversityGuangzhou 510275, PR China
| | - Hui Zhou
- Key Laboratory of Gene Engineering of the Ministry of Education, Zhongshan UniversityGuangzhou 510275, PR China
- State Key Laboratory for Biocontrol, Zhongshan UniversityGuangzhou 510275, PR China
| | - Mian-Bo Huang
- State Key Laboratory for Biocontrol, Zhongshan UniversityGuangzhou 510275, PR China
| | - Shu Zhang
- State Key Laboratory for Biocontrol, Zhongshan UniversityGuangzhou 510275, PR China
| | - Yue-Qin Chen
- To whom correspondence should be addressed at Biotechnology Research Center, Zhongshan University, Guangzhou 510275, PR China. Tel: +86 20 84112399; Fax: +86 20 84036551;
| | - Liang-Hu Qu
- State Key Laboratory for Biocontrol, Zhongshan UniversityGuangzhou 510275, PR China
| |
Collapse
|
263
|
Cao X, Yeo G, Muotri AR, Kuwabara T, Gage FH. Noncoding RNAs in the mammalian central nervous system. Annu Rev Neurosci 2006; 29:77-103. [PMID: 16776580 DOI: 10.1146/annurev.neuro.29.051605.112839] [Citation(s) in RCA: 334] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The central nervous system (CNS) is arguably one of the most complex systems in the universe. To understand the CNS, scientists have investigated a variety of molecules, including proteins, lipids, and various small molecules. However, one large class of molecules, noncoding RNAs (ncRNAs), has been relatively unexplored. ncRNAs function directly as structural, catalytic, or regulatory molecules rather than serving as templates for protein synthesis. The increasing variety of ncRNAs being identified in the CNS suggests a strong connection between the biogenesis, dynamics of action, and combinatorial regulatory potential of ncRNAs and the complexity of the CNS. In this review, we give an overview of the diversity and abundance of ncRNAs before delving into specific examples that illustrate their importance in the CNS. In particular, we cover recent evidence for the roles of microRNAs, small nucleolar RNAs, retrotransposons, the NRSE small modulatory RNA, and BC1/BC200 in the CNS. Finally, we speculate why ncRNAs are well adapted to improving organism-environment interactions.
Collapse
Affiliation(s)
- Xinwei Cao
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, California 92037, USA.
| | | | | | | | | |
Collapse
|
264
|
Abstract
MicroRNAs (miRNAs) are noncoding RNAs that can regulate gene expression. Several hundred genes encoding miRNAs have been experimentally identified in animals, and many more are predicted by computational methods. How can new miRNAs be discovered and distinguished from other types of small RNA? Here we summarize current methods for identifying and validating miRNAs and discuss criteria used to define an miRNA.
Collapse
Affiliation(s)
- Eugene Berezikov
- Hubrecht Laboratory, Uppsalalaan 8, 3584CT Utrecht, The Netherlands
| | | | | |
Collapse
|
265
|
Biemar F, Nix DA, Piel J, Peterson B, Ronshaugen M, Sementchenko V, Bell I, Manak JR, Levine MS. Comprehensive identification of Drosophila dorsal-ventral patterning genes using a whole-genome tiling array. Proc Natl Acad Sci U S A 2006; 103:12763-8. [PMID: 16908844 PMCID: PMC1636694 DOI: 10.1073/pnas.0604484103] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Dorsal-ventral (DV) patterning of the Drosophila embryo is initiated by Dorsal, a sequence-specific transcription factor distributed in a broad nuclear gradient in the precellular embryo. Previous studies have identified as many as 70 protein-coding genes and one microRNA (miRNA) gene that are directly or indirectly regulated by this gradient. A gene regulation network, or circuit diagram, including the functional interconnections among 40 Dorsal target genes and 20 associated tissue-specific enhancers, has been determined for the initial stages of gastrulation. Here, we attempt to extend this analysis by identifying additional DV patterning genes using a recently developed whole-genome tiling array. This analysis led to the identification of another 30 protein-coding genes, including the Drosophila homolog of Idax, an inhibitor of Wnt signaling. In addition, remote 5' exons were identified for at least 10 of the approximately 100 protein-coding genes that were missed in earlier annotations. As many as nine intergenic uncharacterized transcription units were identified, including two that contain known microRNAs, miR-1 and -9a. We discuss the potential functions of these recently identified genes and suggest that intronic enhancers are a common feature of the DV gene network.
Collapse
Affiliation(s)
- Frédéric Biemar
- *Division of Genetics and Development, Department of Molecular Cell Biology, Center for Integrative Genomics, University of California, Berkeley, CA 94720; and
| | | | - Jessica Piel
- *Division of Genetics and Development, Department of Molecular Cell Biology, Center for Integrative Genomics, University of California, Berkeley, CA 94720; and
| | - Brant Peterson
- *Division of Genetics and Development, Department of Molecular Cell Biology, Center for Integrative Genomics, University of California, Berkeley, CA 94720; and
| | - Matthew Ronshaugen
- *Division of Genetics and Development, Department of Molecular Cell Biology, Center for Integrative Genomics, University of California, Berkeley, CA 94720; and
| | | | - Ian Bell
- Affymetrix, Inc., Santa Clara, CA 95951
| | | | - Michael S. Levine
- *Division of Genetics and Development, Department of Molecular Cell Biology, Center for Integrative Genomics, University of California, Berkeley, CA 94720; and
| |
Collapse
|
266
|
Hamada M, Tsuda K, Kudo T, Kin T, Asai K. Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics 2006; 22:2480-7. [PMID: 16908501 DOI: 10.1093/bioinformatics/btl431] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In detection of non-coding RNAs, it is often necessary to identify the secondary structure motifs from a set of putative RNA sequences. Most of the existing algorithms aim to provide the best motif or few good motifs, but biologists often need to inspect all the possible motifs thoroughly. RESULTS Our method RNAmine employs a graph theoretic representation of RNA sequences and detects all the possible motifs exhaustively using a graph mining algorithm. The motif detection problem boils down to finding frequently appearing patterns in a set of directed and labeled graphs. In the tasks of common secondary structure prediction and local motif detection from long sequences, our method performed favorably both in accuracy and in efficiency with the state-of-the-art methods such as CMFinder. AVAILABILITY The software is available upon request.
Collapse
Affiliation(s)
- Michiaki Hamada
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST) 2-43 Aomi, Koto-ku, Tokyo, Japan.
| | | | | | | | | |
Collapse
|
267
|
Missal K, Zhu X, Rose D, Deng W, Skogerbø G, Chen R, Stadler PF. Prediction of structured non-coding RNAs in the genomes of the nematodesCaenorhabditis elegans andCaenorhabditis briggsae. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2006; 306:379-92. [PMID: 16425273 DOI: 10.1002/jez.b.21086] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We present a survey for non-coding RNAs and other structured RNA motifs in the genomes of Caenorhabditis elegans and Caenorhabditis briggsae using the RNAz program. This approach explicitly evaluates comparative sequence information to detect stabilizing selection acting on RNA secondary structure. We detect 3,672 structured RNA motifs, of which only 678 are known non-translated RNAs (ncRNAs) or clear homologs of known C. elegans ncRNAs. Most of these signals are located in introns or at a distance from known protein-coding genes. With an estimated false positive rate of about 50% and a sensitivity on the order of 50%, we estimate that the nematode genomes contain between 3,000 and 4,000 RNAs with evolutionary conserved secondary structures. Only a small fraction of these belongs to the known RNA classes, including tRNAs, snoRNAs, snRNAs, or microRNAs. A relatively small class of ncRNA candidates is associated with previously observed RNA-specific upstream elements.
Collapse
Affiliation(s)
- Kristin Missal
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16 18, D 04107 Leipzig, Germany.
| | | | | | | | | | | | | |
Collapse
|
268
|
Torarinsson E, Sawera M, Havgaard JH, Fredholm M, Gorodkin J. Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure. Genome Res 2006; 16:885-9. [PMID: 16751343 PMCID: PMC1484455 DOI: 10.1101/gr.5226606] [Citation(s) in RCA: 138] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Human and mouse genome sequences contain roughly 100,000 regions that are unalignable in primary sequence and neighbor corresponding alignable regions between both organisms. These pairs are generally assumed to be nonconserved, although the level of structural conservation between these has never been investigated. Owing to the limitations in computational methods, comparative genomics has been lacking the ability to compare such nonconserved sequence regions for conserved structural RNA elements. We have investigated the presence of structural RNA elements by conducting a local structural alignment, using FOLDALIGN, on a subset of these 100,000 corresponding regions and estimate that 1800 contain common RNA structures. Comparing our results with the recent mapping of transcribed fragments (transfrags) in human, we find that high-scoring candidates are twice as likely to be found in regions overlapped by transfrags than regions that are not overlapped by transfrags. To verify the coexpression between predicted candidates in human and mouse, we conducted expression studies by RT-PCR and Northern blotting on mouse candidates, which overlap with transfrags on human chromosome 20. RT-PCR results confirmed expression of 32 out of 36 candidates, whereas Northern blots confirmed four out of 12 candidates. Furthermore, many RT-PCR results indicate differential expression in different tissues. Hence, our findings suggest that there are corresponding regions between human and mouse, which contain expressed non-coding RNA sequences not alignable in primary sequence.
Collapse
Affiliation(s)
- Elfar Torarinsson
- Division of Genetics and Bioinformatics, IBHV, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
- Department of Natural Sciences, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
| | - Milena Sawera
- Division of Genetics and Bioinformatics, IBHV, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
| | - Jakob H. Havgaard
- Division of Genetics and Bioinformatics, IBHV, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
| | - Merete Fredholm
- Division of Genetics and Bioinformatics, IBHV, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Division of Genetics and Bioinformatics, IBHV, The Royal Veterinary and Agricultural University, 1870 Frederiksberg C, Denmark
- Corresponding author.E-mail ; fax 45-3528-3042
| |
Collapse
|
269
|
|
270
|
Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol 2006; 2:e33. [PMID: 16628248 PMCID: PMC1440920 DOI: 10.1371/journal.pcbi.0020033] [Citation(s) in RCA: 406] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2005] [Accepted: 03/06/2006] [Indexed: 12/28/2022] Open
Abstract
The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization. Structurally functional RNA is a versatile component of the cell that comprises both independent molecules and regulatory elements of mRNA transcripts. The many recent discoveries of functional RNAs, most notably miRNAs, suggests that many more are yet to be found. Computational identification of functional RNAs has traditionally been hampered by the lack of strong sequence signals. However, structural conservation over long evolutionary times creates a characteristic substitution pattern, which can be exploited with the advent of comparative genomics. The authors have devised a method for identification of functional RNA structures based on phylogenetic analysis of multiple alignments. This method has been used to screen the regions of the human genome that are under strong selective constraints. The result is a set of 48,479 candidate RNA structures. For some classes of known functional RNAs, such as miRNAs and histone 3′UTR stem loops, this set includes nearly all deeply conserved members. The initial large candidate set has been partitioned by size, shape, and genomic location and ranked by score to produce specific lists of top candidates for miRNAs, selenocysteine insertion sites, RNA editing hairpins, and RNAs involved in transcript auto regulation.
Collapse
Affiliation(s)
- Jakob Skou Pedersen
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
271
|
Abstract
The term non-coding RNA (ncRNA) is commonly employed for RNA that does not encode a protein, but this does not mean that such RNAs do not contain information nor have function. Although it has been generally assumed that most genetic information is transacted by proteins, recent evidence suggests that the majority of the genomes of mammals and other complex organisms is in fact transcribed into ncRNAs, many of which are alternatively spliced and/or processed into smaller products. These ncRNAs include microRNAs and snoRNAs (many if not most of which remain to be identified), as well as likely other classes of yet-to-be-discovered small regulatory RNAs, and tens of thousands of longer transcripts (including complex patterns of interlacing and overlapping sense and antisense transcripts), most of whose functions are unknown. These RNAs (including those derived from introns) appear to comprise a hidden layer of internal signals that control various levels of gene expression in physiology and development, including chromatin architecture/epigenetic memory, transcription, RNA splicing, editing, translation and turnover. RNA regulatory networks may determine most of our complex characteristics, play a significant role in disease and constitute an unexplored world of genetic variation both within and between species.
Collapse
Affiliation(s)
- John S Mattick
- Australian Research Council Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, QLD 4072, Australia.
| | | |
Collapse
|
272
|
Uzilov AV, Keegan JM, Mathews DH. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics 2006; 7:173. [PMID: 16566836 PMCID: PMC1570369 DOI: 10.1186/1471-2105-7-173] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Accepted: 03/27/2006] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Non-coding RNAs (ncRNAs) have a multitude of roles in the cell, many of which remain to be discovered. However, it is difficult to detect novel ncRNAs in biochemical screens. To advance biological knowledge, computational methods that can accurately detect ncRNAs in sequenced genomes are therefore desirable. The increasing number of genomic sequences provides a rich dataset for computational comparative sequence analysis and detection of novel ncRNAs. RESULTS Here, Dynalign, a program for predicting secondary structures common to two RNA sequences on the basis of minimizing folding free energy change, is utilized as a computational ncRNA detection tool. The Dynalign-computed optimal total free energy change, which scores the structural alignment and the free energy change of folding into a common structure for two RNA sequences, is shown to be an effective measure for distinguishing ncRNA from randomized sequences. To make the classification as a ncRNA, the total free energy change of an input sequence pair can either be compared with the total free energy changes of a set of control sequence pairs, or be used in combination with sequence length and nucleotide frequencies as input to a classification support vector machine. The latter method is much faster, but slightly less sensitive at a given specificity. Additionally, the classification support vector machine method is shown to be sensitive and specific on genomic ncRNA screens of two different Escherichia coli and Salmonella typhi genome alignments, in which many ncRNAs are known. The Dynalign computational experiments are also compared with two other ncRNA detection programs, RNAz and QRNA. CONCLUSION The Dynalign-based support vector machine method is more sensitive for known ncRNAs in the test genomic screens than RNAz and QRNA. Additionally, both Dynalign-based methods are more sensitive than RNAz and QRNA at low sequence pair identities. Dynalign can be used as a comparable or more accurate tool than RNAz or QRNA in genomic screens, especially for low-identity regions. Dynalign provides a method for discovering ncRNAs in sequenced genomes that other methods may not identify. Significant improvements in Dynalign runtime have also been achieved.
Collapse
Affiliation(s)
- Andrew V Uzilov
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA
- Center for Pediatric Biomedical Research, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA
| | - Joshua M Keegan
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA
- Center for Pediatric Biomedical Research, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA
- Center for Pediatric Biomedical Research, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA
| |
Collapse
|
273
|
Bernhart SH, Tafer H, Mückstein U, Flamm C, Stadler PF, Hofacker IL. Partition function and base pairing probabilities of RNA heterodimers. Algorithms Mol Biol 2006; 1:3. [PMID: 16722605 PMCID: PMC1459172 DOI: 10.1186/1748-7188-1-3] [Citation(s) in RCA: 189] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2006] [Accepted: 03/16/2006] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND RNA has been recognized as a key player in cellular regulation in recent years. In many cases, non-coding RNAs exert their function by binding to other nucleic acids, as in the case of microRNAs and snoRNAs. The specificity of these interactions derives from the stability of inter-molecular base pairing. The accurate computational treatment of RNA-RNA binding therefore lies at the heart of target prediction algorithms. METHODS The standard dynamic programming algorithms for computing secondary structures of linear single-stranded RNA molecules are extended to the co-folding of two interacting RNAs. RESULTS We present a program, RNAcofold, that computes the hybridization energy and base pairing pattern of a pair of interacting RNA molecules. In contrast to earlier approaches, complex internal structures in both RNAs are fully taken into account. RNAcofold supports the calculation of the minimum energy structure and of a complete set of suboptimal structures in an energy band above the ground state. Furthermore, it provides an extension of McCaskill's partition function algorithm to compute base pairing probabilities, realistic interaction energies, and equilibrium concentrations of duplex structures.
Collapse
Affiliation(s)
- Stephan H Bernhart
- Theoretical Biochemistry Group, Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, Austria
| | - Hakim Tafer
- Theoretical Biochemistry Group, Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, Austria
| | - Ulrike Mückstein
- Theoretical Biochemistry Group, Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, Austria
| | - Christoph Flamm
- Theoretical Biochemistry Group, Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, Austria
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16–18, D-04170 Leipzig, Germany
| | - Peter F Stadler
- Theoretical Biochemistry Group, Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, Austria
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16–18, D-04170 Leipzig, Germany
- The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, New Mexico
| | - Ivo L Hofacker
- Theoretical Biochemistry Group, Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, Austria
| |
Collapse
|
274
|
Kamal M, Xie X, Lander ES. A large family of ancient repeat elements in the human genome is under strong selection. Proc Natl Acad Sci U S A 2006; 103:2740-5. [PMID: 16477033 PMCID: PMC1413850 DOI: 10.1073/pnas.0511238103] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Although conserved noncoding elements (CNEs) constitute the majority of sequences under purifying selection in the human genome, they remain poorly understood. CNEs seem to be largely unique, with no large families of similar elements reported to date. Here, we search for CNEs among the ancestral repeat classes in the human genome and report the discovery of a large CNE family containing >900 members. This family belongs to the MER121 class of repeats. Although the MER121 family members show considerable sequence variation among one another, the individual copies show striking conservation in orthologous locations across the human, dog, mouse, and rat genomes. The element is also present and conserved in orthologous locations in the marsupial, but its genome-wide dispersal postdates the divergence from birds. The comparative genomic data indicate that MER121 does not encode a family of either protein-coding or RNA genes. Although the precise function of these elements remains unknown, the evidence suggests that this unusual family may play a cis-regulatory or structural role in mammalian genomes.
Collapse
Affiliation(s)
- Michael Kamal
- *Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
| | - Xiaohui Xie
- *Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
| | - Eric S. Lander
- *Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139; and
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
275
|
Hüttenhofer A, Vogel J. Experimental approaches to identify non-coding RNAs. Nucleic Acids Res 2006; 34:635-46. [PMID: 16436800 PMCID: PMC1351373 DOI: 10.1093/nar/gkj469] [Citation(s) in RCA: 141] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2005] [Revised: 01/10/2006] [Accepted: 01/10/2006] [Indexed: 12/12/2022] Open
Abstract
Cellular RNAs that do not function as messenger RNAs (mRNAs), transfer RNAs (tRNAs) or ribosomal RNAs (rRNAs) comprise a diverse class of molecules that are commonly referred to as non-protein-coding RNAs (ncRNAs). These molecules have been known for quite a while, but their importance was not fully appreciated until recent genome-wide searches discovered thousands of these molecules and their genes in a variety of model organisms. Some of these screens were based on biocomputational prediction of ncRNA candidates within entire genomes of model organisms. Alternatively, direct biochemical isolation of expressed ncRNAs from cells, tissues or entire organisms has been shown to be a powerful approach to identify ncRNAs both at the level of individual molecules and at a global scale. In this review, we will survey several such wet-lab strategies, i.e. direct sequencing of ncRNAs, shotgun cloning of small-sized ncRNAs (cDNA libraries), microarray analysis and genomic SELEX to identify novel ncRNAs, and discuss the advantages and limits of these approaches.
Collapse
Affiliation(s)
- Alexander Hüttenhofer
- Innsbruck Biocenter, Division of Genomics and RNomics, Innsbruck Medical University, Fritz-Pregl-Str. 3, 6020 Innsbruck, Austria.
| | | |
Collapse
|
276
|
|
277
|
Sempere LF, Cole CN, McPeek MA, Peterson KJ. The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2006; 306:575-88. [PMID: 16838302 DOI: 10.1002/jez.b.21118] [Citation(s) in RCA: 247] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
How complex body plans evolved in animals such as fruit flies and vertebrates, as compared to the relatively simple jellyfish and sponges, is not known, given the similarity of developmental genetic repertoires shared by all these taxa. Here, we show that a core set of 18 microRNAs (miRNAs), non-coding RNA molecules that negatively regulate the expression of protein-coding genes, are found only in protostomes and deuterostomes and not in sponges or cnidarians. Because many of these miRNAs are expressed in specific tissues and/or organs, miRNA-mediated regulation could have played a fundamental evolutionary role in the origins of organs such as brain and heart--structures not found in cnidarians or sponges--and thus contributed greatly to the evolution of complex body plans. Furthermore, the continuous acquisition and fixation of miRNAs in various animal groups strongly correlates both with the hierarchy of metazoan relationships and with the non-random origination of metazoan morphological innovations through geologic time.
Collapse
Affiliation(s)
- Lorenzo F Sempere
- Department of Biochemistry, Dartmouth Medical School, Hanover, NH 03755, USA
| | | | | | | |
Collapse
|
278
|
Hüttenhofer A. RNomics: identification and function of small non-protein-coding RNAs in model organisms. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2006; 71:135-40. [PMID: 17381289 DOI: 10.1101/sqb.2006.71.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
In the recent past, our knowledge on small non-protein-coding RNAs (ncRNAs) has exponentially grown. Different approaches to identify novel ncRNAs that include computational and experimental RNomics have led to a plethora of novel ncRNAs. A picture emerges, in which ncRNAs have a variety of roles during regulation of gene expression. Thereby, many of these ncRNAs appear to function in guiding specific protein complexes to target nucleic acids. The concept of RNA guiding seems to be a widespread and very effective regulatory mechanism. In addition to guide RNAs, numerous RNAs were identified by RNomics screens, lacking known sequence and structure motifs; hence no function could be assigned to them as yet. Future challenges in the field of RNomics will include elucidation of their biological roles in the cell.
Collapse
Affiliation(s)
- A Hüttenhofer
- Innsbruck Biocenter, Division of Genomics and RNomics, Innsbruck Medical University, Innsbruck, Austria
| |
Collapse
|
279
|
Willingham AT, Dike S, Cheng J, Manak JR, Bell I, Cheung E, Drenkow J, Dumais E, Duttagupta R, Ganesh M, Ghosh S, Helt G, Nix D, Piccolboni A, Sementchenko V, Tammana H, Kapranov P, Gingeras TR. Transcriptional landscape of the human and fly genomes: nonlinear and multifunctional modular model of transcriptomes. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2006; 71:101-10. [PMID: 17480199 DOI: 10.1101/sqb.2006.71.068] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Regions of the genome not coding for proteins or not involved in cis-acting regulatory activities are frequently viewed as lacking in functional value. However, a number of recent large-scale studies have revealed significant regulated transcription of unannotated portions of a variety of plant and animal genomes, allowing a new appreciation of the widespread transcription of large portions of the genome. High-resolution mapping of the sites of transcription of the human and fly genomes has provided an alternative picture of the extent and organization of transcription and has offered insights for biological functions of some of the newly identified unannotated transcripts. Considerable portions of the unannotated transcription observed are developmental or cell-type-specific parts of protein-coding transcripts, often serving as novel, alternative 5' transcriptional start sites. These distal 5' portions are often situated at significant distances from the annotated gene and alternatively join with or ignore portions of other intervening genes to comprise novel unannotated protein-coding transcripts. These data support an interlaced model of the genome in which many regions serve multifunctional purposes and are highly modular in their utilization. This model illustrates the underappreciated organizational complexity of the genome and one of the functional roles of transcription from unannotated portions of the genome.
Collapse
|