1
|
Leonetti P, Consiglio A, Arendt D, Golbik RP, Rubino L, Gursinsky T, Behrens SE, Pantaleo V. Exogenous and endogenous dsRNAs perceived by plant Dicer-like 4 protein in the RNAi-depleted cellular context. Cell Mol Biol Lett 2023; 28:64. [PMID: 37550627 PMCID: PMC10405411 DOI: 10.1186/s11658-023-00469-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 06/24/2023] [Indexed: 08/09/2023] Open
Abstract
BACKGROUND In plants, RNase III Dicer-like proteins (DCLs) act as sensors of dsRNAs and process them into short 21- to 24-nucleotide (nt) (s)RNAs. Plant DCL4 is involved in the biogenesis of either functional endogenous or exogenous (i.e. viral) short interfering (si)RNAs, thus playing crucial antiviral roles. METHODS In this study we expressed plant DCL4 in Saccharomyces cerevisiae, an RNAi-depleted organism, in which we could highlight the role of dicing as neither Argonautes nor RNA-dependent RNA polymerase is present. We have therefore tested the DCL4 functionality in processing exogenous dsRNA-like substrates, such as a replicase-assisted viral replicon defective-interfering RNA and RNA hairpin substrates, or endogenous antisense transcripts. RESULTS DCL4 was shown to be functional in processing dsRNA-like molecules in vitro and in vivo into 21- and 22-nt sRNAs. Conversely, DCL4 did not efficiently process a replicase-assisted viral replicon in vivo, providing evidence that viral RNAs are not accessible to DCL4 in membranes associated in active replication. Worthy of note, in yeast cells expressing DCL4, 21- and 22-nt sRNAs are associated with endogenous loci. CONCLUSIONS We provide new keys to interpret what was studied so far on antiviral DCL4 in the host system. The results all together confirm the role of sense/antisense RNA-based regulation of gene expression, expanding the sense/antisense atlas of S. cerevisiae. The results described herein show that S. cerevisiae can provide insights into the functionality of plant dicers and extend the S. cerevisiae tool to new biotechnological applications.
Collapse
Affiliation(s)
- Paola Leonetti
- Department of Biology, Agricultural and Food Sciences, National Research Council, Institute for Sustainable Plant Protection, Bari Unit, Bari, Italy
| | - Arianna Consiglio
- Department of Biomedical Sciences, National Research Council, Institute for Biomedical Technologies, Bari Unit, Bari, Italy
| | - Dennis Arendt
- Institute of Biochemistry and Biotechnology, Section Microbial Biotechnology, Martin Luther University Halle-Wittenberg, Halle Saale, Germany
| | - Ralph Peter Golbik
- Institute of Biochemistry and Biotechnology, Section Microbial Biotechnology, Martin Luther University Halle-Wittenberg, Halle Saale, Germany
| | - Luisa Rubino
- Department of Biology, Agricultural and Food Sciences, National Research Council, Institute for Sustainable Plant Protection, Bari Unit, Bari, Italy
| | - Torsten Gursinsky
- Institute of Biochemistry and Biotechnology, Section Microbial Biotechnology, Martin Luther University Halle-Wittenberg, Halle Saale, Germany
| | - Sven-Erik Behrens
- Institute of Biochemistry and Biotechnology, Section Microbial Biotechnology, Martin Luther University Halle-Wittenberg, Halle Saale, Germany
| | - Vitantonio Pantaleo
- Department of Biology, Agricultural and Food Sciences, National Research Council, Institute for Sustainable Plant Protection, Bari Unit, Bari, Italy.
| |
Collapse
|
2
|
Xu W, Liu C, Deng B, Lin P, Sun Z, Liu A, Xuan J, Li Y, Zhou K, Zhang X, Huang Q, Zhou H, He Q, Li B, Qu L, Yang J. TP53-inducible putative long noncoding RNAs encode functional polypeptides that suppress cell proliferation. Genome Res 2022; 32:1026-1041. [PMID: 35609991 DOI: 10.1101/gr.275831.121] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 05/06/2022] [Indexed: 01/10/2023]
Abstract
Polypeptides encoded by long non-coding RNAs (lncRNAs) are a novel class of functional molecules. However, whether these hidden polypeptides participate in the TP53 pathway and play a significant biological role is still unclear. Here, we discover that TP53-regulated lncRNAs encode peptides, two of which are functional in various human cell lines. Using ribosome profiling and RNA-seq approaches in HepG2 cells, we systematically identified more than 300 novel TP53-regulated lncRNAs and further confirmed that fifteen of these TP53-regulated lncRNAs encode peptides. Furthermore, several peptides were validated by multiple mass spectrometry measures. Ten of the novel translational lncRNAs were directly inducible by TP53 in response to DNA damage. Notably, we showed that the TP53-inducible peptides TP53LC02 and TP53LC04, but not their lncRNAs, could suppress cell proliferation. TP53LC04 peptide also had a function associated with cell proliferation by regulating the cell cycle in response to DNA damage. This study demonstrates that TP53-inducible lncRNAs encode new functional peptides, leading to the enlargement of the components of TP53 tumor suppressor network and providing novel potential targets for cancer therapy.
Collapse
Affiliation(s)
- Wenli Xu
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, The Third Affiliated Hospital, Sun Yat-sen University
| | - Chang Liu
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Sun Yat-sen University
| | - Bing Deng
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Sun Yat-sen University
| | - Penghui Lin
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Sun Yat-sen University
| | - Zhenghua Sun
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, Jinan University
| | - Anrui Liu
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Sun Yat-sen University
| | - Jiajia Xuan
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Sun Yat-sen University
| | - Yuying Li
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, Jinan University
| | - Keren Zhou
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Sun Yat-sen University
| | | | - Qiaojuan Huang
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Sun Yat-sen University
| | - Hui Zhou
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Sun Yat-sen University
| | - Qingyu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, Jinan University
| | - Bin Li
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Sun Yat-sen University
| | - Lianghu Qu
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol, Sun Yat-sen University
| | - Jianhua Yang
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory for Biocontrol,The Fifth Affiliated Hospital, Sun Yat-sen University
| |
Collapse
|
3
|
Chen M, Yan C, Zhao X. Research Progress on Circular RNA in Glioma. Front Oncol 2021; 11:705059. [PMID: 34745938 PMCID: PMC8568300 DOI: 10.3389/fonc.2021.705059] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 10/06/2021] [Indexed: 12/12/2022] Open
Abstract
The discovery of circular RNA (circRNA) greatly complements the traditional gene expression theory. CircRNA is a class of non-coding RNA with a stable cyclic structure. They are highly expressed, spatiotemporal-specific and conservative across species. Importantly, circRNA participates in the occurrence of many kinds of tumors and regulates the tumor development. Glioma is featured by limited therapy and grim prognosis. Cancer-associated circRNA compromises original function or creates new effects in glioma, thus contributing to oncogenesis. Therefore, this article reviews the biogenesis, metabolism, functions and properties of circRNA as a novel potential biomarker for gliomas. We elaborate the expression characteristics, interaction between circRNA and other molecules, aiming to identify new targets for early diagnosis and treatment of gliomas.
Collapse
Affiliation(s)
- Mengyu Chen
- Department of Clinical Oncology, Shengjing Hospital of China Medical University, Shenyang, China
| | - Chunyan Yan
- Department of Clinical Oncology, Shengjing Hospital of China Medical University, Shenyang, China
| | - Xihe Zhao
- Department of Clinical Oncology, Shengjing Hospital of China Medical University, Shenyang, China
| |
Collapse
|
4
|
Epigenetic and non-coding regulation of alcohol abuse and addiction. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2020; 156:63-86. [PMID: 33461665 DOI: 10.1016/bs.irn.2020.08.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Alcohol use disorder is a chronic debilitated condition adversely affecting the lives of millions of individuals throughout the modern world. Individuals suffering from an alcohol use disorder diagnosis frequently have serious cooccurring conditions, which often further exacerbates problematic drinking behavior. Comprehending the biochemical processes underlying the progression and perpetuation of disease is essential for mitigating maladaptive behavior in order to restore both physiological and psychological health. The range of cellular and biological systems contributing to, and affected by, alcohol use disorder and other comorbid disorders necessitates a fundamental grasp of intricate functional relationships that govern molecular biology. Epigenetic factors are recognized as essential mediators of cellular behavior, orchestrating a symphony of gene expression changes within multicellular environments that are ultimately responsible for directing human behavior. Understanding the epigenetic and transcriptional regulatory mechanisms involved in the pathogenesis of disease is important for improving available pharmacotherapies and reducing the incidence of alcohol abuse and cooccurring conditions.
Collapse
|
5
|
Mahdinloo S, Kiaie SH, Amiri A, Hemmati S, Valizadeh H, Zakeri-Milani P. Efficient drug and gene delivery to liver fibrosis: rationale, recent advances, and perspectives. Acta Pharm Sin B 2020; 10:1279-1293. [PMID: 32874828 PMCID: PMC7451940 DOI: 10.1016/j.apsb.2020.03.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 02/22/2020] [Accepted: 02/28/2020] [Indexed: 12/17/2022] Open
Abstract
Liver fibrosis results from chronic damages together with an accumulation of extracellular matrix, and no specific medical therapy is approved for that until now. Due to liver metabolic capacity for drugs, the fragility of drugs, and the presence of insurmountable physiological obstacles in the way of targeting, the development of efficient drug delivery systems for anti-fibrotics seems vital. We have explored articles with a different perspective on liver fibrosis over the two decades, then collected and summarized the information by providing corresponding in vitro and in vivo cases. We have discussed the mechanism of hepatic fibrogenesis with different ways of fibrosis induction in animals. Furthermore, the critical chemical and herbal anti-fibrotics, biological molecules such as micro-RNAs, siRNAs, and growth factors, which can affect cell division and differentiation, are mentioned. Likewise, drug and gene delivery and therapeutic systems on in vitro and in vivo models are summarized in the data tables. This review article enlightens recent advances in emerging drugs and nanocarriers and represents perspectives on targeting strategies employed in liver fibrosis treatment.
Collapse
Affiliation(s)
- Somayeh Mahdinloo
- Faculty of Pharmacy, Tabriz University of Medical Science, Tabriz 5166616471, Iran
| | - Seyed Hossein Kiaie
- Faculty of Pharmacy, Tabriz University of Medical Science, Tabriz 5166616471, Iran
- Nano Drug Delivery Research Center, Kermanshah University of Medical Sciences, Kermanshah 6715847141, Iran
| | - Ala Amiri
- Faculty of Basic Sciences, Islamic Azad University, Science and Research Branch, Tehran 1477893855, Iran
| | - Salar Hemmati
- Drug Applied Research Center, Tabriz University of Medical Sciences, Tabriz 5166616471, Iran
| | - Hadi Valizadeh
- Drug Applied Research Center and Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz 5166616471, Iran
| | - Parvin Zakeri-Milani
- Liver and Gastrointestinal Diseases Research Center and Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz 5166616471, Iran
| |
Collapse
|
6
|
Xu W, Deng B, Lin P, Liu C, Li B, Huang Q, Zhou H, Yang J, Qu L. Ribosome profiling analysis identified a KRAS-interacting microprotein that represses oncogenic signaling in hepatocellular carcinoma cells. SCIENCE CHINA-LIFE SCIENCES 2019; 63:529-542. [DOI: 10.1007/s11427-019-9580-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 05/28/2019] [Indexed: 12/13/2022]
|
7
|
Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, Barnes I, Berry A, Bignell A, Carbonell Sala S, Chrast J, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Hunt T, Izuogu OG, Lagarde J, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FC, Parker A, Pei B, Pozo F, Ruffier M, Schmitt BM, Stapleton E, Suner MM, Sycheva I, Uszczynska-Ratajczak B, Xu J, Yates A, Zerbino D, Zhang Y, Aken B, Choudhary JS, Gerstein M, Guigó R, Hubbard TJ, Kellis M, Paten B, Reymond A, Tress ML, Flicek P. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 2019; 47:D766-D773. [PMID: 30357393 PMCID: PMC6323946 DOI: 10.1093/nar/gky955] [Citation(s) in RCA: 2005] [Impact Index Per Article: 334.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/20/2018] [Accepted: 10/08/2018] [Indexed: 02/06/2023] Open
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anne-Maud Ferreira
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Rory Johnson
- Department of Medical Oncology, Inselspital, University Hospital, University of Bern, Bern, Switzerland
- Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Bioscience, Brunel University London, Uxbridge UB8 3PH, UK
| | - James Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
| | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Silvia Carbonell Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Jacqueline Chrast
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tomás Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Carlos García Girón
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Osagie G Izuogu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Shamika Mohanan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Muir
- Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT 06520, USA
- Systems Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Fabio C P Navarro
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Baikang Pei
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eloise Stapleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Jinuri Xu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yan Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Bronwen Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology & Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
8
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make most effective use of our rapidly growing databases of whole genomes.
Collapse
Affiliation(s)
- Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
9
|
Stormo GD. An Overview of RNA Sequence Analyses: Structure Prediction, ncRNA Gene Identification, and RNAi Design. ACTA ACUST UNITED AC 2018; 43:12.1.1-12.1.3. [DOI: 10.1002/0471250953.bi1201s43] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Gary D. Stormo
- Washington University School of Medicine Saint Louis Missouri
| |
Collapse
|
10
|
Ward M, Datta A, Wise M, Mathews DH. Advanced multi-loop algorithms for RNA secondary structure prediction reveal that the simplest model is best. Nucleic Acids Res 2017; 45:8541-8550. [PMID: 28586479 PMCID: PMC5737859 DOI: 10.1093/nar/gkx512] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Accepted: 05/31/2017] [Indexed: 01/08/2023] Open
Abstract
Algorithmic prediction of RNA secondary structure has been an area of active inquiry since the 1970s. Despite many innovations since then, our best techniques are not yet perfect. The workhorses of the RNA secondary structure prediction engine are recursions first described by Zuker and Stiegler in 1981. These have well understood caveats; a notable flaw is the ad-hoc treatment of multi-loops, also called helical-junctions, that persists today. While several advanced models for multi-loops have been proposed, it seems to have been assumed that incorporating them into the recursions would lead to intractability, and so no algorithms for these models exist. Some of these models include the classical model based on Jacobson–Stockmayer polymer theory, and another by Aalberts and Nadagopal that incorporates two-length-scale polymer physics. We have realized practical, tractable algorithms for each of these models. However, after implementing these algorithms, we found that no advanced model was better than the original, ad-hoc model used for multi-loops. While this is unexpected, it supports the praxis of the current model.
Collapse
Affiliation(s)
- Max Ward
- Computer Science & Software Engineering, The University of Western Australia, Australia
| | - Amitava Datta
- Computer Science & Software Engineering, The University of Western Australia, Australia
| | - Michael Wise
- Computer Science & Software Engineering, The University of Western Australia, Australia.,The Marshall Centre for Infectious Diseases Research and Training, The University of Western Australia, Australia
| | - David H Mathews
- Department of Biochemistry & Biophysics, Department of Biostatistics & Computational Biology, and Center for RNA Biology, University of Rochester, NY, USA
| |
Collapse
|
11
|
Abstract
AbstractOne of the benefits of the genomics revolution for animal production will be knowledge of genes that can be used to select more profitable livestock. Although it is possible to use genetic markers linked to genes of economic importance, tests for the genes themselves will be much more successful. Consequently finding genes of economic importance to livestock will be a major research aim for the future. Most traits of economic importance are quantitative traits affected by many genes. Mutations at many genes (e.g. 500) and at many positions within a gene (e.g. 1000 coding and non-coding bases) can affect a typical quantitative trait. The effect of these mutations on phenotype is usually small (e.g. 0·1 standard deviation) but occasionally large. Many mutations are lost from the population through genetic drift and selection, so that polymorphisms exist at only a subset of the relevant genes (e.g. 100 genes). Finding these genes, that have relatively small effects, is more difficult than finding genes for a classical Mendellian trait but, as the genomic tools become more powerful, it is becoming feasible and some successes have already occurred. The standard approach is to map a quantitative trait loci (QTL) to a chromosome region using linkage and linkage disequilibrium. Then test polymorphisms in positional candidate genes for an effect on the trait. Tools such as genomic sequence, EST collections and comparative maps make this approach feasible. Candidate genes can be selected based on functional data such as gene expression obtained from microarrays. At present the gain in rate of genetic improvement from use of DNA-based tests for QTL is small, because selection without them is already quite accurate, not enough QTL have been identified and genotyping is too expensive. However, in the future, with many QTL identified and inexpensive genotyping combined with decreased generation intervals, large gains are possible.
Collapse
|
12
|
Huang Y, Chen SY, Deng F. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction. Comput Struct Biotechnol J 2016; 14:298-303. [PMID: 27536341 PMCID: PMC4975701 DOI: 10.1016/j.csbj.2016.07.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2016] [Revised: 07/06/2016] [Accepted: 07/12/2016] [Indexed: 12/31/2022] Open
Abstract
In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
Collapse
Affiliation(s)
- Ying Huang
- College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China
| | - Shi-Yi Chen
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
- Corresponding author at: Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, 211# Huimin Road, Wenjiang 611130, Sichuan, China.Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan ProvinceSichuan Agricultural University211# Huimin RoadWenjiangSichuan611130China
| | - Feilong Deng
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
| |
Collapse
|
13
|
Chen Z, Li L, Shan Z, Huang H, Chen H, Ding X, Guo J, Liu L. Transcriptome sequencing analysis of novel sRNAs of Kineococcus radiotolerans in response to ionizing radiation. Microbiol Res 2016; 192:122-129. [PMID: 27664730 DOI: 10.1016/j.micres.2016.06.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Revised: 05/30/2016] [Accepted: 06/01/2016] [Indexed: 11/29/2022]
Abstract
Kineococcus radiotolerans is a Gram-positive, radio-resistant bacterium isolated from a radioactive environment. The small noncoding RNAs (sRNAs) in bacteria are reported to play roles in the immediate response to stress and/or the recovery from stress. The analysis of K. radiotolerans transcriptome sequencing results can identify these sRNAs in a genome-wide detection, using RNA sequencing (RNA-seq) by the deep sequencing technique. In this study, the raw data of radiation-exposed samples (RS) and control samples (CS) were acquired separately from the sequencing platform. There were 217 common sRNA candidates in the two samples screened in the genome-wide scale by bioinformatics analysis. There were 43 differentially expressed sRNA candidates, including 28 up-regulated and 15 down-regulated ones. The down-regulated sRNAs were selected for the sRNA target prediction, of which 12 sRNAs that may modulate the genes related to the transcription regulation and DNA repair were considered as the candidates involved in the radio-resistance regulation system.
Collapse
Affiliation(s)
- Zhouwei Chen
- College of Life Sciences, Zhejiang Sci-Tech University, No. 2 Road, Xiasha, Hangzhou, Zhejiang, PR China, PR China; Zhejiang Institute of Microbiology, Hangzhou, Zhejiang, PR China
| | - Lufeng Li
- College of Life Sciences, Zhejiang Sci-Tech University, No. 2 Road, Xiasha, Hangzhou, Zhejiang, PR China, PR China
| | - Zhan Shan
- College of Life Sciences, Zhejiang Sci-Tech University, No. 2 Road, Xiasha, Hangzhou, Zhejiang, PR China, PR China
| | - Hannian Huang
- Department of Applied Engineering, Zhejiang Economic & Trade Polytechnic, Hangzhou, Zhejiang, PR China
| | - Huan Chen
- Zhejiang Institute of Microbiology, Hangzhou, Zhejiang, PR China
| | - Xianfeng Ding
- College of Life Sciences, Zhejiang Sci-Tech University, No. 2 Road, Xiasha, Hangzhou, Zhejiang, PR China, PR China
| | - Jiangfeng Guo
- College of Life Sciences, Zhejiang Sci-Tech University, No. 2 Road, Xiasha, Hangzhou, Zhejiang, PR China, PR China.
| | - Lili Liu
- College of Life Sciences, Zhejiang Sci-Tech University, No. 2 Road, Xiasha, Hangzhou, Zhejiang, PR China, PR China.
| |
Collapse
|
14
|
Pignatelli M, Vilella AJ, Muffato M, Gordon L, White S, Flicek P, Herrero J. ncRNA orthologies in the vertebrate lineage. Database (Oxford) 2016; 2016:bav127. [PMID: 26980512 PMCID: PMC4792531 DOI: 10.1093/database/bav127] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 12/23/2015] [Accepted: 12/23/2015] [Indexed: 01/07/2023]
Abstract
Annotation of orthologous and paralogous genes is necessary for many aspects of evolutionary analysis. Methods to infer these homology relationships have traditionally focused on protein-coding genes and evolutionary models used by these methods normally assume the positions in the protein evolve independently. However, as our appreciation for the roles of non-coding RNA genes has increased, consistently annotated sets of orthologous and paralogous ncRNA genes are increasingly needed. At the same time, methods such as PHASE or RAxML have implemented substitution models that consider pairs of sites to enable proper modelling of the loops and other features of RNA secondary structure. Here, we present a comprehensive analysis pipeline for the automatic detection of orthologues and paralogues for ncRNA genes. We focus on gene families represented in Rfam and for which a specific covariance model is provided. For each family ncRNA genes found in all Ensembl species are aligned using Infernal, and several trees are built using different substitution models. In parallel, a genomic alignment that includes the ncRNA genes and their flanking sequence regions is built with PRANK. This alignment is used to create two additional phylogenetic trees using the neighbour-joining (NJ) and maximum-likelihood (ML) methods. The trees arising from both the ncRNA and genomic alignments are merged using TreeBeST, which reconciles them with the species tree in order to identify speciation and duplication events. The final tree is used to infer the orthologues and paralogues following Fitch's definition. We also determine gene gain and loss events for each family using CAFE. All data are accessible through the Ensembl Comparative Genomics ('Compara') API, on our FTP site and are fully integrated in the Ensembl genome browser, where they can be accessed in a user-friendly manner. Database URL: http://www.ensembl.org.
Collapse
Affiliation(s)
- Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Albert J Vilella
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute
| | - Simon White
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute UCL Cancer Institute, University College London, London WC1E 6BT, UK
| |
Collapse
|
15
|
Gore-Panter SR, Hsu J, Barnard J, Moravec CS, Van Wagoner DR, Chung MK, Smith JD. PANCR, the PITX2 Adjacent Noncoding RNA, Is Expressed in Human Left Atria and Regulates PITX2c Expression. Circ Arrhythm Electrophysiol 2016; 9:e003197. [PMID: 26783232 PMCID: PMC4719779 DOI: 10.1161/circep.115.003197] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
BACKGROUND Genome-wide studies reveal that genetic variants at chromosome 4q25 constitute the strongest locus associated with atrial fibrillation, the most frequent arrhythmia. However, the mechanisms underlying this association are unknown. Our goal is to find and characterize left atrial-expressed transcripts in the chromosome 4q25 atrial fibrillation risk locus that may play a role in atrial fibrillation pathogenesis. METHODS AND RESULTS RNA sequencing performed on human left/right pairs identified an intergenic long noncoding RNA adjacent to the PITX2 gene, which we have named PANCR (PITX2 adjacent noncoding RNA). In a human tissue screen, PANCR was expressed specifically in the left atria and eye and in no other chambers of the heart. The levels of PANCR and PITX2c RNAs were highly correlated in 233 human left atrial appendage samples. PANCR levels were not associated with either atrial rhythm status or the genotypes of the chromosome 4q25 atrial fibrillation risk variants. Both PANCR and PITX2c RNAs were induced early during differentiation of human embryonic stem cells into cardiomyocytes. Because long noncoding RNAs often control gene expression, we performed siRNA-mediated knockdown of PANCR, and this treatment repressed PITX2c expression and mimicked the effects of PITX2c knockdown on global mRNA and miRNA expression. Cell fractionation studies demonstrate that PANCR is primarily localized in the cytoplasm. CONCLUSIONS PANCR and PITX2c are coordinately expressed early during cardiomyocyte differentiation from stem cells. PANCR knockdown decreased PITX2c expression in differentiated cardiomyocytes, altering the transcriptome in a manner similar to PITX2c knockdown.
Collapse
Affiliation(s)
- Shamone R Gore-Panter
- From the Department of Molecular Cardiology, Lerner Research Institute (S.R.G.-P., C.S.M., D.R.V.W., M.K.C.), Department of Cellular & Molecular Medicine, Lerner Research Institute (S.R.G.-P., J.H., J.D.S.), Department of Quantitative Health Sciences (J.B.), and Department of Cardiovascular Medicine (C.S.M., D.R.V.W., M.K.C., J.D.S.), Cleveland Clinic, OH
| | - Jeffrey Hsu
- From the Department of Molecular Cardiology, Lerner Research Institute (S.R.G.-P., C.S.M., D.R.V.W., M.K.C.), Department of Cellular & Molecular Medicine, Lerner Research Institute (S.R.G.-P., J.H., J.D.S.), Department of Quantitative Health Sciences (J.B.), and Department of Cardiovascular Medicine (C.S.M., D.R.V.W., M.K.C., J.D.S.), Cleveland Clinic, OH
| | - John Barnard
- From the Department of Molecular Cardiology, Lerner Research Institute (S.R.G.-P., C.S.M., D.R.V.W., M.K.C.), Department of Cellular & Molecular Medicine, Lerner Research Institute (S.R.G.-P., J.H., J.D.S.), Department of Quantitative Health Sciences (J.B.), and Department of Cardiovascular Medicine (C.S.M., D.R.V.W., M.K.C., J.D.S.), Cleveland Clinic, OH
| | - Christine S Moravec
- From the Department of Molecular Cardiology, Lerner Research Institute (S.R.G.-P., C.S.M., D.R.V.W., M.K.C.), Department of Cellular & Molecular Medicine, Lerner Research Institute (S.R.G.-P., J.H., J.D.S.), Department of Quantitative Health Sciences (J.B.), and Department of Cardiovascular Medicine (C.S.M., D.R.V.W., M.K.C., J.D.S.), Cleveland Clinic, OH
| | - David R Van Wagoner
- From the Department of Molecular Cardiology, Lerner Research Institute (S.R.G.-P., C.S.M., D.R.V.W., M.K.C.), Department of Cellular & Molecular Medicine, Lerner Research Institute (S.R.G.-P., J.H., J.D.S.), Department of Quantitative Health Sciences (J.B.), and Department of Cardiovascular Medicine (C.S.M., D.R.V.W., M.K.C., J.D.S.), Cleveland Clinic, OH
| | - Mina K Chung
- From the Department of Molecular Cardiology, Lerner Research Institute (S.R.G.-P., C.S.M., D.R.V.W., M.K.C.), Department of Cellular & Molecular Medicine, Lerner Research Institute (S.R.G.-P., J.H., J.D.S.), Department of Quantitative Health Sciences (J.B.), and Department of Cardiovascular Medicine (C.S.M., D.R.V.W., M.K.C., J.D.S.), Cleveland Clinic, OH
| | - Jonathan D Smith
- From the Department of Molecular Cardiology, Lerner Research Institute (S.R.G.-P., C.S.M., D.R.V.W., M.K.C.), Department of Cellular & Molecular Medicine, Lerner Research Institute (S.R.G.-P., J.H., J.D.S.), Department of Quantitative Health Sciences (J.B.), and Department of Cardiovascular Medicine (C.S.M., D.R.V.W., M.K.C., J.D.S.), Cleveland Clinic, OH.
| |
Collapse
|
16
|
Anandakumar S, Vijayakumar S, Arumugam N, Gromiha MM. Mammalian Mitochondrial ncRNA Database. Bioinformation 2015; 11:512-3. [PMID: 26912953 PMCID: PMC4748022 DOI: 10.6026/97320630011512] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Accepted: 11/21/2015] [Indexed: 12/18/2022] Open
Abstract
Mammalian Mitochondrial ncRNA is a web-based database, which provides specific information on non-coding RNA in mammals.
This database includes easy searching, comparing with BLAST and retrieving information on predicted structure and its function
about mammalian ncRNAs.
Collapse
Affiliation(s)
- Shanmugam Anandakumar
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - Saravanan Vijayakumar
- Centre for Advanced Study in Crystallography and Biophysics, University of Madras, Chennai 600005, Tamil Nadu, India
| | - Nagarajan Arumugam
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| |
Collapse
|
17
|
Abstract
Over the past decade there has been a greater understanding of genomic complexity in eukaryotes ushered in by the immense technological advances in high-throughput sequencing of DNA and its corresponding RNA transcripts. This has resulted in the realization that beyond protein-coding genes, there are a large number of transcripts that do not encode for proteins and, therefore, may perform their function through RNA sequences and/or through secondary and tertiary structural determinants. This review is focused on the latest findings on a class of noncoding RNAs that are relatively large (>200 nucleotides), display nuclear localization, and use different strategies to regulate transcription. These are exciting times for discovering the biological scope and the mechanism of action for these RNA molecules, which have roles in dosage compensation, imprinting, enhancer function, and transcriptional regulation, with a great impact on development and disease.
Collapse
Affiliation(s)
- Roberto Bonasio
- Department of Cell and Developmental Biology and Epigenetics Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
| | | |
Collapse
|
18
|
Goodarzi H, Zhang S, Buss CG, Fish L, Tavazoie S, Tavazoie SF. Metastasis-suppressor transcript destabilization through TARBP2 binding of mRNA hairpins. Nature 2014; 513:256-60. [PMID: 25043050 DOI: 10.1038/nature13466] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Accepted: 05/12/2014] [Indexed: 01/30/2023]
Abstract
Aberrant regulation of RNA stability has an important role in many disease states. Deregulated post-transcriptional modulation, such as that governed by microRNAs targeting linear sequence elements in messenger RNAs, has been implicated in the progression of many cancer types. A defining feature of RNA is its ability to fold into structures. However, the roles of structural mRNA elements in cancer progression remain unexplored. Here we performed an unbiased search for post-transcriptional modulators of mRNA stability in breast cancer by conducting whole-genome transcript stability measurements in poorly and highly metastatic isogenic human breast cancer lines. Using a computational framework that searches RNA sequence and structure space, we discovered a family of GC-rich structural cis-regulatory RNA elements, termed sRSEs for structural RNA stability elements, which are significantly overrepresented in transcripts displaying reduced stability in highly metastatic cells. By integrating computational and biochemical approaches, we identified TARBP2, a double-stranded RNA-binding protein implicated in microRNA processing, as the trans factor that binds the sRSE family and similar structural elements--collectively termed TARBP2-binding structural elements (TBSEs)--in transcripts. TARBP2 is overexpressed in metastatic cells and metastatic human breast tumours and destabilizes transcripts containing TBSEs. Endogenous TARBP2 promotes metastatic cell invasion and colonization by destabilizing amyloid precursor protein (APP) and ZNF395 transcripts, two genes previously associated with Alzheimer's and Huntington's disease, respectively. We reveal these genes to be novel metastasis suppressor genes in breast cancer. The cleavage product of APP, extracellular amyloid-α peptide, directly suppresses invasion while ZNF395 transcriptionally represses a pro-metastatic gene expression program. The expression levels of TARBP2, APP and ZNF395 in human breast carcinomas support their experimentally uncovered roles in metastasis. Our findings establish a non-canonical and direct role for TARBP2 in mammalian gene expression regulation and reveal that regulated RNA destabilization through protein-mediated binding of mRNA structural elements can govern cancer progression.
Collapse
Affiliation(s)
- Hani Goodarzi
- Laboratory of Systems Cancer Biology, Rockefeller University, 1230 York Avenue, New York, New York 10065, USA
| | - Steven Zhang
- Laboratory of Systems Cancer Biology, Rockefeller University, 1230 York Avenue, New York, New York 10065, USA
| | - Colin G Buss
- Laboratory of Systems Cancer Biology, Rockefeller University, 1230 York Avenue, New York, New York 10065, USA
| | - Lisa Fish
- Laboratory of Systems Cancer Biology, Rockefeller University, 1230 York Avenue, New York, New York 10065, USA
| | - Saeed Tavazoie
- Department of Biochemistry and Molecular Biophysics, and Department of Systems Biology, Columbia University, New York, New York 10032, USA
| | - Sohail F Tavazoie
- Laboratory of Systems Cancer Biology, Rockefeller University, 1230 York Avenue, New York, New York 10065, USA
| |
Collapse
|
19
|
Babski J, Maier LK, Heyer R, Jaschinski K, Prasse D, Jäger D, Randau L, Schmitz RA, Marchfelder A, Soppa J. Small regulatory RNAs in Archaea. RNA Biol 2014; 11:484-93. [PMID: 24755959 PMCID: PMC4152357 DOI: 10.4161/rna.28452] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Small regulatory RNAs (sRNAs) are universally distributed in all three domains of life, Archaea, Bacteria, and Eukaryotes. In bacteria, sRNAs typically function by binding near the translation start site of their target mRNAs and thereby inhibit or activate translation. In eukaryotes, miRNAs and siRNAs typically bind to the 3′-untranslated region (3′-UTR) of their target mRNAs and influence translation efficiency and/or mRNA stability. In archaea, sRNAs have been identified in all species investigated using bioinformatic approaches, RNomics, and RNA-Seq. Their size can vary significantly between less than 50 to more than 500 nucleotides. Differential expression of sRNA genes has been studied using northern blot analysis, microarrays, and RNA-Seq. In addition, biological functions have been unraveled by genetic approaches, i.e., by characterization of designed mutants. As in bacteria, it was revealed that archaeal sRNAs are involved in many biological processes, including metabolic regulation, adaptation to extreme conditions, stress responses, and even in regulation of morphology and cellular behavior. Recently, the first target mRNAs were identified in archaea, including one sRNA that binds to the 5′-region of two mRNAs in Methanosarcina mazei Gö1 and a few sRNAs that bind to 3′-UTRs in Sulfolobus solfataricus, three Pyrobaculum species, and Haloferax volcanii, indicating that archaeal sRNAs appear to be able to target both the 5′-UTR or the 3′-UTRs of their respective target mRNAs. In addition, archaea contain tRNA-derived fragments (tRFs), and one tRF has been identified as a major ribosome-binding sRNA in H. volcanii, which downregulates translation in response to stress. Besides regulatory sRNAs, archaea contain further classes of sRNAs, e.g., CRISPR RNAs (crRNAs) and snoRNAs.
Collapse
Affiliation(s)
- Julia Babski
- Institute for Molecular Biosciences; Biocentre; Goethe University; Frankfurt, Germany
| | | | - Ruth Heyer
- Biology II; Ulm University; Ulm, Germany
| | - Katharina Jaschinski
- Institute for Molecular Biosciences; Biocentre; Goethe University; Frankfurt, Germany
| | - Daniela Prasse
- Institute for Microbiology; Christian-Albrechts-University; Kiel, Germany
| | - Dominik Jäger
- Institute for Microbiology; Christian-Albrechts-University; Kiel, Germany
| | - Lennart Randau
- Prokaryotic Small RNA Biology Group; Max Planck Institute for Terrestrial Microbiology; Marburg, Germany
| | - Ruth A Schmitz
- Institute for Microbiology; Christian-Albrechts-University; Kiel, Germany
| | | | - Jörg Soppa
- Institute for Molecular Biosciences; Biocentre; Goethe University; Frankfurt, Germany
| |
Collapse
|
20
|
Generation and phenotyping of a collection of sRNA gene deletion mutants of the haloarchaeon Haloferax volcanii. PLoS One 2014; 9:e90763. [PMID: 24637842 PMCID: PMC3956466 DOI: 10.1371/journal.pone.0090763] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2012] [Accepted: 02/04/2014] [Indexed: 11/19/2022] Open
Abstract
The haloarchaeon Haloferax volcanii was shown to contain 145 intergenic and 45 antisense sRNAs. In a comprehensive approach to unravel various biological roles of haloarchaeal sRNAs in vivo, 27 sRNA genes were selected and deletion mutants were generated. The phenotypes of these mutants were compared to that of the parent strain under ten different conditions, i.e. growth on four different carbon sources, growth at three different salt concentrations, and application of four different stress conditions. In addition, cell morphologies in exponential and stationary phase were observed. Furthermore, swarming of 17 mutants was analyzed. 24 of the 27 mutants exhibited a difference from the parent strain under at least one condition, revealing that haloarchaeal sRNAs are involved in metabolic regulation, growth under extreme conditions, regulation of morphology and behavior, and stress adaptation. Notably, 7 deletion mutants showed a gain of function phenotype, which has not yet been described for any other prokaryotic sRNA gene deletion mutant. Comparison of the transcriptomes of one sRNA gene deletion mutant and the parent strain led to the identification of differentially expressed genes. Genes for flagellins and chemotaxis were up-regulated in the mutant, in accordance with its gain of function swarming phenotype. While the deletion mutant analysis underscored that haloarchaeal sRNAs are involved in many biological functions, the degree of conservation is extremely low. Only 3 of the 27 genes are conserved in more than 10 haloarchaeal species. 22 of the 27 genes are confined to H. volcanii, indicating a fast evolution of haloarchaeal sRNA genes.
Collapse
|
21
|
Wang C, Wei L, Guo M, Zou Q. Computational approaches in detecting non- coding RNA. Curr Genomics 2014; 14:371-7. [PMID: 24396270 PMCID: PMC3861888 DOI: 10.2174/13892029113149990005] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Revised: 07/18/2013] [Accepted: 07/18/2013] [Indexed: 12/21/2022] Open
Abstract
The important role of non coding RNAs (ncRNAs) in the cell has made their identification a critical issue in the biological research. However, traditional approaches such as PT-PCR and Northern Blot are costly. With recent progress in bioinformatics and computational prediction technology, the discovery of ncRNAs has become realistically possible. This paper aims to introduce major computational approaches in the identification of ncRNAs, including homologous search, de novo prediction and mining in deep sequencing data. Furthermore, related software tools have been compared and reviewed along with a discussion on future improvements.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Leyi Wei
- School of Information Science and Technology, Xiamen University, Xiamen 361005, China
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Quan Zou
- School of Information Science and Technology, Xiamen University, Xiamen 361005, China
| |
Collapse
|
22
|
Abstract
De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.
Collapse
Affiliation(s)
- Walter L Ruzzo
- Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | | |
Collapse
|
23
|
Abstract
A key step toward understanding a metagenomics data set is the identification of functional sequence elements within it, such as protein coding genes and structural RNAs. Relative to protein coding genes, structural RNAs are more difficult to identify because of their reduced alphabet size, lack of open reading frames, and short length. Infernal is a software package that implements “covariance models” (CMs) for RNA homology search, which harness both sequence and structural conservation when searching for RNA homologs. Thanks to the added statistical signal inherent in the secondary structure conservation of many RNA families, Infernal is more powerful than sequence-only based methods such as BLAST and profile HMMs. Together with the Rfam database of CMs, Infernal is a useful tool for identifying RNAs in metagenomics data sets.
Collapse
|
24
|
Micevski D, Dougan DA. Proteolytic regulation of stress response pathways in Escherichia coli. Subcell Biochem 2013; 66:105-28. [PMID: 23479439 DOI: 10.1007/978-94-007-5940-4_5] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Maintaining correct cellular function is a fundamental biological process for all forms of life. A critical aspect of this process is the maintenance of protein homeostasis (proteostasis) in the cell, which is largely performed by a group of proteins, referred to as the protein quality control (PQC) network. This network of proteins, comprised of chaperones and proteases, is critical for maintaining proteostasis not only during favourable growth conditions, but also in response to stress. Indeed proteases play a crucial role in the clearance of unwanted proteins that accumulate during stress, but more importantly, in the activation of various different stress response pathways. In bacteria, the cells response to stress is usually orchestrated by a specific transcription factor (sigma factor). In Escherichia coli there are seven different sigma factors, each of which responds to a particular stress, resulting in the rapid expression of a specific set of genes. The cellular concentration of each transcription factor is tightly controlled, at the level of transcription, translation and protein stability. Here we will focus on the proteolytic regulation of two sigma factors (σ(32) and σ(S)), which control the heat and general stress response pathways, respectively. This review will also briefly discuss the role proteolytic systems play in the clearance of unwanted proteins that accumulate during stress.
Collapse
Affiliation(s)
- Dimce Micevski
- Department of Biochemistry, La Trobe Institute for Molecular Science (LIMS), La Trobe University, Melbourne, 3086, Australia
| | | |
Collapse
|
25
|
Li W, Ying X, Lu Q, Chen L. Predicting sRNAs and their targets in bacteria. GENOMICS PROTEOMICS & BIOINFORMATICS 2012. [PMID: 23200137 PMCID: PMC5054197 DOI: 10.1016/j.gpb.2012.09.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Bacterial small RNAs (sRNAs) are an emerging class of regulatory RNAs of about 40–500 nucleotides in length and, by binding to their target mRNAs or proteins, get involved in many biological processes such as sensing environmental changes and regulating gene expression. Thus, identification of bacterial sRNAs and their targets has become an important part of sRNA biology. Current strategies for discovery of sRNAs and their targets usually involve bioinformatics prediction followed by experimental validation, emphasizing a key role for bioinformatics prediction. Here, therefore, we provided an overview on prediction methods, focusing on the merits and limitations of each class of models. Finally, we will present our thinking on developing related bioinformatics models in future.
Collapse
Affiliation(s)
- Wuju Li
- Beijing Institute of Basic Medical Sciences, Beijing 100850, China.
| | | | | | | |
Collapse
|
26
|
Abstract
Many aspects of gene regulation are mediated by RNA molecules. However, regulatory RNAs have remained elusive until very recently. At least three types of small regulatory RNAs have been characterized in Drosophila: microRNAs (miRNAs), piwi-interacting RNAs and endogenous siRNAs. A fourth class of regulatory RNAs includes known long non-coding RNAs such as roX1 or bxd. The initial sequencing of the Drosophila melanogaster genome has served as a scaffold to study the transcriptional profile of an animal, revealing the complexities of the function and biogenesis of regulatory RNAs. The comparative analysis of 12 Drosophila genomes has been crucial for the study of microRNA evolution. However, comparative genomics of other RNA regulators is confounded by technical problems: genomic loci are poorly conserved and frequently encoded in the heterochromatin. Future developments in genome sequencing and population genomics in Drosophila will continue to shed light on the conservation, evolution and function of regulatory RNAs.
Collapse
Affiliation(s)
- Antonio Marco
- University of Manchester, Michael Smith Building, Manchester, UK.
| |
Collapse
|
27
|
|
28
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.
Collapse
Affiliation(s)
- Colin N Dewey
- Biostatistics and Medical Informatics and Computer Sciences, Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
29
|
Wang Y, Manzour A, Shareghi P, Shaw TI, Li YW, Malmberg RL, Cai L. Stable stem enabled Shannon entropies distinguish non-coding RNAs from random backgrounds. BMC Bioinformatics 2012; 13 Suppl 5:S1. [PMID: 22537005 PMCID: PMC3358654 DOI: 10.1186/1471-2105-13-s5-s1] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences. Shannon base pairing entropy is an indicator for RNA secondary structure fold certainty in detection of structural, non-coding RNAs (ncRNAs). Under the Boltzmann ensemble of secondary structures, the probability of a base pair is estimated from its frequency across all the alternative equilibrium structures. However, such an entropy has yet to deliver the desired performance for distinguishing ncRNAs from random sequences. Developing novel methods to improve the entropy measure performance may result in more effective ncRNA gene finding based on structure detection. Results This paper shows that the measuring performance of base pairing entropy can be significantly improved with a constrained secondary structure ensemble in which only canonical base pairs are assumed to occur in energetically stable stems in a fold. This constraint actually reduces the space of the secondary structure and may lower the probabilities of base pairs unfavorable to the native fold. Indeed, base pairing entropies computed with this constrained model demonstrate substantially narrowed gaps of Z-scores between ncRNAs, as well as drastic increases in the Z-score for all 13 tested ncRNA sets, compared to shuffled sequences. Conclusions These results suggest the viability of developing effective structure-based ncRNA gene finding methods by investigating secondary structure ensembles of ncRNAs.
Collapse
Affiliation(s)
- Yingfeng Wang
- Department of Computer Science, University of Georgia, Athens, Georgia 30602, USA.
| | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center's production genome annotation environment.
Collapse
Affiliation(s)
- Brian J Haas
- Genome Sequencing and Analysis Program, Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, U.S.A
| | | | | | | | | |
Collapse
|
31
|
Abstract
In organisms of all three domains of life, a plethora of sRNAs (small regulatory RNAs) exists in addition to the well-known RNAs such as rRNAs, tRNAs and mRNAs. Although sRNAs have been well studied in eukaryotes and in bacteria, the sRNA population in archaea has just recently been identified and only in a few archaeal species. In the present paper, we summarize our current knowledge about sRNAs and their function in the halophilic archaeon Haloferax volcanii. Using two different experimental approaches, 111 intergenic and 38 antisense sRNAs were identified, as well as 42 tRFs (tRNA-derived fragments). Observation of differential expression under various conditions suggests that these sRNAs might be active as regulators in gene expression like their bacterial and eukaryotic counterparts. The severe phenotypes observed upon deletion and overexpression of sRNA genes revealed that sRNAs are involved in, and important for, a variety of biological functions in H. volcanii and possibly other archaea. Investigation of the Haloferax Lsm protein suggests that this protein is involved in the archaeal sRNA pathway.
Collapse
|
32
|
Harmanci AO, Sharma G, Mathews DH. TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences. BMC Bioinformatics 2011; 12:108. [PMID: 21507242 PMCID: PMC3120699 DOI: 10.1186/1471-2105-12-108] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Accepted: 04/20/2011] [Indexed: 01/07/2023] Open
Abstract
Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms. Conclusions TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu.
Collapse
Affiliation(s)
- Arif O Harmanci
- Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA
| | | | | |
Collapse
|
33
|
Yang F, Wang J, Ji Y, Cheng H, Wan J, Xiao Z, Zhou G. Amplification of unknown RNAs and RNA mixtures based on unique restriction enzyme cleavage in vitro. Acta Biochim Biophys Sin (Shanghai) 2010; 42:873-82. [PMID: 21106769 DOI: 10.1093/abbs/gmq098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Small RNAs, generally expressed at low levels, are difficult to reach usable levels from limited material. In this study, we have developed a novel method to amplify target RNA. The amplification procedure was carried out by sequential RT-PCR, effective separation, restriction enzymatic cleavage of cDNA strand, and run-off transcription in vitro of target RNA from its cDNA. Introduction of a unique stem-loop linker into cDNA strand is the key step to form a unique restriction enzyme recognition sequence that is not in cDNA sequence of target RNA. This method can be used to amplify RNA samples from various origins and has many advantages in amplifying unknown small RNAs and small RNA mixtures. The amplified RNA has the full sequence of original RNA except for an extra 5' G and an additional 3' A or C. The method worked well for amplifications of a microRNA, a piwi interacting RNA and two small RNA mixtures.
Collapse
Affiliation(s)
- Fangyi Yang
- Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, China
| | | | | | | | | | | | | |
Collapse
|
34
|
Saito Y, Sato K, Sakakibara Y. Robust and accurate prediction of noncoding RNAs from aligned sequences. BMC Bioinformatics 2010; 11 Suppl 7:S3. [PMID: 21106125 PMCID: PMC2957686 DOI: 10.1186/1471-2105-11-s7-s3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Computational prediction of noncoding RNAs (ncRNAs) is an important task in the post-genomic era. One common approach is to utilize the profile information contained in alignment data rather than single sequences. However, this strategy involves the possibility that the quality of input alignments can influence the performance of prediction methods. Therefore, the evaluation of the robustness against alignment errors is necessary as well as the development of accurate prediction methods. RESULTS We describe a new method, called Profile BPLA kernel, which predicts ncRNAs from alignment data in combination with support vector machines (SVMs). Profile BPLA kernel is an extension of base-pairing profile local alignment (BPLA) kernel which we previously developed for the prediction from single sequences. By utilizing the profile information of alignment data, the proposed kernel can achieve better accuracy than the original BPLA kernel. We show that Profile BPLA kernel outperforms the existing prediction methods which also utilize the profile information using the high-quality structural alignment dataset. In addition to these standard benchmark tests, we extensively evaluate the robustness of Profile BPLA kernel against errors in input alignments. We consider two different types of error: first, that all sequences in an alignment are actually ncRNAs but are aligned ignoring their secondary structures; second, that an alignment contains unrelated sequences which are not ncRNAs but still aligned. In both cases, the effects on the performance of Profile BPLA kernel are surprisingly small. Especially for the latter case, we demonstrate that Profile BPLA kernel is more robust compared to the existing prediction methods. CONCLUSIONS Profile BPLA kernel provides a promising way for identifying ncRNAs from alignment data. It is more accurate than the existing prediction methods, and can keep its performance under the practical situations in which the quality of input alignments is not necessarily high.
Collapse
Affiliation(s)
- Yutaka Saito
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan
| | | | | |
Collapse
|
35
|
Sridhar J, Sowmiya G, Sekar K, Rafi ZA. PsRNA: a computing engine for the comparative identification of putative small RNA locations within intergenic regions. GENOMICS PROTEOMICS & BIOINFORMATICS 2010; 8:127-34. [PMID: 20691398 PMCID: PMC5054453 DOI: 10.1016/s1672-0229(10)60014-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Small RNAs (sRNAs) are non-coding transcripts exerting their functions in the cells directly. Identification of sRNAs is a difficult task due to the lack of clear sequence and structural biases. Most sRNAs are identified within genus specific intergenic regions in related genomes. However, several of these regions remain un-annotated due to lack of sequence homology and/or potent statistical identification tools. A computational engine has been built to search within the intergenic regions to identify and roughly annotate new putative sRNA regions in Enterobacteriaceae genomes. It utilizes experimentally known sRNA data and their flanking genes/KEGG Orthology (KO) numbers as templates to identify similar sRNA regions in related query genomes. The search engine not only has the capability to locate putative intergenic regions for specific sRNAs, but also has the potency to locate conserved, shuffled or deleted gene clusters in query genomes. Because it uses the KO terms for locating functionally important regions such as sRNAs, any further KO number assignment to additional genes will increase the sensitivity. The PsRNA server is used for the identification of putative sRNA regions through the information retrieved from the sRNA of interest. The computing engine is available online at http://bioserver1.physics.iisc.ernet.in/psrna/ and http://bicmku.in:8081/psrna/.
Collapse
Affiliation(s)
- Jayavel Sridhar
- Centre of Excellence in Bioinformatics, School of Biotechnology, Madurai Kamaraj University, Madurai, India
| | | | | | | |
Collapse
|
36
|
Zimmermann B, Bilusic I, Lorenz C, Schroeder R. Genomic SELEX: a discovery tool for genomic aptamers. Methods 2010; 52:125-32. [PMID: 20541015 PMCID: PMC2954320 DOI: 10.1016/j.ymeth.2010.06.004] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2010] [Accepted: 06/03/2010] [Indexed: 11/29/2022] Open
Abstract
Genomic SELEX is a discovery tool for genomic aptamers, which are genomically encoded functional domains in nucleic acid molecules that recognize and bind specific ligands. When combined with genomic libraries and using RNA-binding proteins as baits, Genomic SELEX used with high-throughput sequencing enables the discovery of genomic RNA aptamers and the identification of RNA-protein interaction networks. Here we describe how to construct and analyze genomic libraries, how to choose baits for selections, how to perform the selection procedure and finally how to analyze the enriched sequences derived from deep sequencing. As a control procedure, we recommend performing a "Neutral" SELEX experiment in parallel to the selection, omitting the selection step. This control experiment provides a background signal for comparison with the positively selected pool. We also recommend deep sequencing the initial library in order to facilitate the final in silico analysis of enrichment with respect to the initial levels. Counter selection procedures, using modified or inactive baits, allow strengthening the binding specificity of the winning selected sequences.
Collapse
Affiliation(s)
| | | | | | - Renée Schroeder
- Department of Biochemistry and Cell Biology, Max F. Perutz Laboratories, University of Vienna, Austria
| |
Collapse
|
37
|
Wiebe NJP, Meyer IM. TRANSAT-- method for detecting the conserved helices of functional RNA structures, including transient, pseudo-knotted and alternative structures. PLoS Comput Biol 2010; 6:e1000823. [PMID: 20589081 PMCID: PMC2891591 DOI: 10.1371/journal.pcbi.1000823] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2009] [Accepted: 05/19/2010] [Indexed: 12/20/2022] Open
Abstract
The prediction of functional RNA structures has attracted increased interest, as it allows us to study the potential functional roles of many genes. RNA structure prediction methods, however, assume that there is a unique functional RNA structure and also do not predict functional features required for in vivo folding. In order to understand how functional RNA structures form in vivo, we require sophisticated experiments or reliable prediction methods. So far, there exist only a few, experimentally validated transient RNA structures. On the computational side, there exist several computer programs which aim to predict the co-transcriptional folding pathway in vivo, but these make a range of simplifying assumptions and do not capture all features known to influence RNA folding in vivo. We want to investigate if evolutionarily related RNA genes fold in a similar way in vivo. To this end, we have developed a new computational method, Transat, which detects conserved helices of high statistical significance. We introduce the method, present a comprehensive performance evaluation and show that Transat is able to predict the structural features of known reference structures including pseudo-knotted ones as well as those of known alternative structural configurations. Transat can also identify unstructured sub-sequences bound by other molecules and provides evidence for new helices which may define folding pathways, supporting the notion that homologous RNA sequence not only assume a similar reference RNA structure, but also fold similarly. Finally, we show that the structural features predicted by Transat differ from those assuming thermodynamic equilibrium. Unlike the existing methods for predicting folding pathways, our method works in a comparative way. This has the disadvantage of not being able to predict features as function of time, but has the considerable advantage of highlighting conserved features and of not requiring a detailed knowledge of the cellular environment. Many non-coding genes exert their function via an RNA structure which starts emerging while the RNA sequence is being transcribed from the genome. The resulting folding pathway is known to depend on a variety of features such as the transcription speed, the concentration of various ions and the binding of proteins and other molecules. Not all of these influences can be adequately captured by the existing computational methods which try to replicate what happens in vivo. So far, it has been challenging to experimentally investigate co-transcriptional folding pathways in vivo and only little data from in vitro experiments exists. In order to investigate if functionally similar RNA sequences from different organisms fold in a similar way, we have developed a new computational method, called Transat, which does not require the detailed computational modeling of the cellular environment. We show in a comprehensive analysis that our method is capable of detecting known structural features and provide evidence that structural features of the in vivo folding pathways have been conserved for several biologically interesting classes of RNA sequences.
Collapse
Affiliation(s)
- Nicholas J. P. Wiebe
- Centre for High-Throughput Biology & Department of Computer Science and Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Irmtraud M. Meyer
- Centre for High-Throughput Biology & Department of Computer Science and Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- * E-mail:
| |
Collapse
|
38
|
Geissmann T, Chevalier C, Cros MJ, Boisset S, Fechter P, Noirot C, Schrenzel J, François P, Vandenesch F, Gaspin C, Romby P. A search for small noncoding RNAs in Staphylococcus aureus reveals a conserved sequence motif for regulation. Nucleic Acids Res 2010; 37:7239-57. [PMID: 19786493 PMCID: PMC2790875 DOI: 10.1093/nar/gkp668] [Citation(s) in RCA: 169] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Bioinformatic analysis of the intergenic regions of Staphylococcus aureus predicted multiple regulatory regions. From this analysis, we characterized 11 novel noncoding RNAs (RsaA-K) that are expressed in several S. aureus strains under different experimental conditions. Many of them accumulate in the late-exponential phase of growth. All ncRNAs are stable and their expression is Hfq-independent. The transcription of several of them is regulated by the alternative sigma B factor (RsaA, D and F) while the expression of RsaE is agrA-dependent. Six of these ncRNAs are specific to S. aureus, four are conserved in other Staphylococci, and RsaE is also present in Bacillaceae. Transcriptomic and proteomic analysis indicated that RsaE regulates the synthesis of proteins involved in various metabolic pathways. Phylogenetic analysis combined with RNA structure probing, searches for RsaE-mRNA base pairing, and toeprinting assays indicate that a conserved and unpaired UCCC sequence motif of RsaE binds to target mRNAs and prevents the formation of the ribosomal initiation complex. This study unexpectedly shows that most of the novel ncRNAs carry the conserved C-rich motif, suggesting that they are members of a class of ncRNAs that target mRNAs by a shared mechanism.
Collapse
Affiliation(s)
- Thomas Geissmann
- Architecture et Réactivité de l'ARN, Université de Strasbourg, CNRS, IBMC, 15 rue René Descartes, F-67084 Strasbourg, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Gorodkin J, Hofacker IL, Torarinsson E, Yao Z, Havgaard JH, Ruzzo WL. De novo prediction of structured RNAs from genomic sequences. Trends Biotechnol 2009; 28:9-19. [PMID: 19942311 DOI: 10.1016/j.tibtech.2009.09.006] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2009] [Revised: 08/31/2009] [Accepted: 09/22/2009] [Indexed: 12/29/2022]
Abstract
Growing recognition of the numerous, diverse and important roles played by non-coding RNA in all organisms motivates better elucidation of these cellular components. Comparative genomics is a powerful tool for this task and is arguably preferable to any high-throughput experimental technology currently available, because evolutionary conservation highlights functionally important regions. Conserved secondary structure, rather than primary sequence, is the hallmark of many functionally important RNAs, because compensatory substitutions in base-paired regions preserve structure. Unfortunately, such substitutions also obscure sequence identity and confound alignment algorithms, which complicates analysis greatly. This paper surveys recent computational advances in this difficult arena, which have enabled genome-scale prediction of cross-species conserved RNA elements. These predictions suggest that a wealth of these elements indeed exist.
Collapse
Affiliation(s)
- Jan Gorodkin
- Section for Genetics and Bioinformatics, IBHV and Center for Applied Bioinformatics, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.
| | | | | | | | | | | |
Collapse
|
40
|
Sridhar J, Kumar SS, Rafi ZA. Small RNA identification in Enterobacteriaceae using synteny and genomic backbone retention II. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2009; 13:261-84. [PMID: 19445646 DOI: 10.1089/omi.2008.0067] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Small RNAs are bacterial counterparts of noncoding RNAs. Increasing evidence being added in the literature indicates that these small RNAs play major roles in prokaryotes both at the transcriptome and proteome levels. Based on comparative genomic studies, we present manually curated small RNA regions in 25 recently completed genomes from Enterobacteriaceae. The study is a continuation of our earlier work that uses the presence of small RNAs sandwiched between specific conserved flanking genes retaining genomic backbone and gene synteny. Based on this study, a total of 931 identified sRNA/sRNA regions are reported. This data contains 498 small RNA homologs, 80 putative small RNA regions containing partial stretches of homologous sequences, and 353 putative nonhomologous sRNA regions. This homologs/partial homologs includes, 84 putative small RNA homologous regions retaining at least one of the conserved flanking genes pair which may possibly act as hotspots for genetic pool insertion/deletion in genomes. Nonhomologous CsrB sRNA region reported by us in Yersinia pseudotuberculosis IP32953 has been experimentally confirmed by Kulkarni's group and sraH and ryeE sRNAs from Erwinia carotovora subsp. atroseptica SCRI1043 recently added to the Rfam database are indicative proof of our positive approach.
Collapse
Affiliation(s)
- Jayavel Sridhar
- Centre of Excellence in Bioinformatics, Department of Genetic Engineering, School of Biotechnology, Madurai Kamaraj University, Madurai 625021, India
| | | | | |
Collapse
|
41
|
Lu ZJ, Gloor JW, Mathews DH. Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA (NEW YORK, N.Y.) 2009; 15:1805-13. [PMID: 19703939 PMCID: PMC2743040 DOI: 10.1261/rna.1643609] [Citation(s) in RCA: 154] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Free energy minimization has been the most popular method for RNA secondary structure prediction for decades. It is based on a set of empirical free energy change parameters derived from experiments using a nearest-neighbor model. In this study, a program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported. This approach was first pioneered in the program CONTRAfold, using pair probabilities predicted with a statistical learning method. Here, a partition function calculation that utilizes the free energy change nearest-neighbor parameters is used to predict base-pair probabilities as well as probabilities of nucleotides being single-stranded. MaxExpect predicts both the optimal structure (having highest expected pair accuracy) and suboptimal structures to serve as alternative hypotheses for the structure. Tested on a large database of different types of RNA, the maximum expected accuracy structures are, on average, of higher accuracy than minimum free energy structures. Accuracy is measured by sensitivity, the percentage of known base pairs correctly predicted, and positive predictive value (PPV), the percentage of predicted pairs that are in the known structure. By favoring double-strandedness or single-strandedness, a higher sensitivity or PPV of prediction can be favored, respectively. Using MaxExpect, the average PPV of optimal structure is improved from 66% to 68% at the same sensitivity level (73%) compared with free energy minimization.
Collapse
Affiliation(s)
- Zhi John Lu
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, New York 14642, USA
| | | | | |
Collapse
|
42
|
Zhang Y, Wang J, Huang S, Zhu X, Liu J, Yang N, Song D, Wu R, Deng W, Skogerbø G, Wang XJ, Chen R, Zhu D. Systematic identification and characterization of chicken (Gallus gallus) ncRNAs. Nucleic Acids Res 2009; 37:6562-74. [PMID: 19720738 PMCID: PMC2770669 DOI: 10.1093/nar/gkp704] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Recent studies have demonstrated that non-coding RNAs (ncRNAs) play important roles during development and evolution. Chicken, the first genome-sequenced non-mammalian amniote, possesses unique features for developmental and evolutionary studies. However, apart from microRNAs, information on chicken ncRNAs has mainly been obtained from computational predictions without experimental validation. In the present study, we performed a systematic identification of intermediate size ncRNAs (50–500 nt) by ncRNA library construction and identified 125 chicken ncRNAs. Importantly, through the bioinformatics and expression analysis, we found the chicken ncRNAs has several novel features: (i) comparative genomic analysis against 18 sequenced vertebrate genomes revealed that the majority of the newly identified ncRNA candidates is not conserved and most are potentially bird/chicken specific, suggesting that ncRNAs play roles in lineage/species specification during evolution. (ii) The expression pattern analysis of intronic snoRNAs and their host genes suggested the coordinated expression between snoRNAs and their host genes. (iii) Several spatio-temporal specific expression patterns suggest involvement of ncRNAs in tissue development. Together, these findings provide new clues for future functional study of ncRNAs during development and evolution.
Collapse
Affiliation(s)
- Yong Zhang
- National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Parallel algorithm research on several important open problems in bioinformatics. Interdiscip Sci 2009; 1:187-95. [DOI: 10.1007/s12539-009-0004-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2008] [Revised: 08/28/2008] [Accepted: 08/28/2008] [Indexed: 10/20/2022]
|
44
|
Smith JA. RNA search with decision trees and partial covariance models. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:517-527. [PMID: 19644178 PMCID: PMC3646588 DOI: 10.1109/tcbb.2008.120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order to apply the partial models and the score thresholds on which to make the decisions. The decision trees are chosen to minimize computation time subject to the constraint that all of the training sequences are passed to the full covariance model for final evaluation. Computational intelligence methods are suggested to select the decision tree since the tree can be quite complex and there is no obvious method to build the tree in these cases. Experimental results from seven RNA families shows execution times of 0.066-0.268 relative to using the full covariance model alone. Tests on the full sets of known sequences for each family show that at least 95 percent of these sequences are found for two families and 100 percent for five others. Since the full covariance model is run on all sequences accepted by the partial model decision tree, the false alarm rate is at least as low as that of the full model alone.
Collapse
Affiliation(s)
- Jennifer A Smith
- Electrical and Computer Engineering Department, Boise State University, 1910 University Ave., Boise, ID 83725-2075, USA.
| |
Collapse
|
45
|
Meyer MM, Ames TD, Smith DP, Weinberg Z, Schwalbach MS, Giovannoni SJ, Breaker RR. Identification of candidate structured RNAs in the marine organism 'Candidatus Pelagibacter ubique'. BMC Genomics 2009; 10:268. [PMID: 19531245 PMCID: PMC2704228 DOI: 10.1186/1471-2164-10-268] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2009] [Accepted: 06/16/2009] [Indexed: 02/04/2023] Open
Abstract
Background Metagenomic sequence data are proving to be a vast resource for the discovery of biological components. Yet analysis of this data to identify functional RNAs lags behind efforts to characterize protein diversity. The genome of 'Candidatus Pelagibacter ubique' HTCC 1062 is the closest match for approximately 20% of marine metagenomic sequence reads. It is also small, contains little non-coding DNA, and has strikingly low GC content. Results To aid the discovery of RNA motifs within the marine metagenome we exploited the genomic properties of 'Cand. P. ubique' by targeting our search to long intergenic regions (IGRs) with relatively high GC content. Analysis of known RNAs (rRNA, tRNA, riboswitches etc.) shows that structured RNAs are significantly enriched in such IGRs. To identify additional candidate structured RNAs, we examined other IGRs with similar characteristics from 'Cand. P. ubique' using comparative genomics approaches in conjunction with marine metagenomic data. Employing this strategy, we discovered four candidate structured RNAs including a new riboswitch class as well as three additional likely cis-regulatory elements that precede genes encoding ribosomal proteins S2 and S12, and the cytoplasmic protein component of the signal recognition particle. We also describe four additional potential RNA motifs with few or no examples occurring outside the metagenomic data. Conclusion This work begins the process of identifying functional RNA motifs present in the metagenomic data and illustrates how existing completed genomes may be used to aid in this task.
Collapse
Affiliation(s)
- Michelle M Meyer
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, CT 06520, USA.
| | | | | | | | | | | | | |
Collapse
|
46
|
Hiller M, Findeiss S, Lein S, Marz M, Nickel C, Rose D, Schulz C, Backofen R, Prohaska SJ, Reuter G, Stadler PF. Conserved introns reveal novel transcripts in Drosophila melanogaster. Genome Res 2009; 19:1289-300. [PMID: 19458021 DOI: 10.1101/gr.090050.108] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Noncoding RNAs that are-like mRNAs-spliced, capped, and polyadenylated have important functions in cellular processes. The inventory of these mRNA-like noncoding RNAs (mlncRNAs), however, is incomplete even in well-studied organisms, and so far, no computational methods exist to predict such RNAs from genomic sequences only. The subclass of these transcripts that is evolutionarily conserved usually has conserved intron positions. We demonstrate here that a genome-wide comparative genomics approach searching for short conserved introns is capable of identifying conserved transcripts with a high specificity. Our approach requires neither an open reading frame nor substantial sequence or secondary structure conservation in the surrounding exons. Thus it identifies spliced transcripts in an unbiased way. After applying our approach to insect genomes, we predict 369 introns outside annotated coding transcripts, of which 131 are confirmed by expressed sequence tags (ESTs) and/or noncoding FlyBase transcripts. Of the remaining 238 novel introns, about half are associated with protein-coding genes-either extending coding or untranslated regions or likely belonging to unannotated coding genes. The remaining 129 introns belong to novel mlncRNAs that are largely unstructured. Using RT-PCR, we verified seven of 12 tested introns in novel mlncRNAs and 11 of 17 introns in novel coding genes. The expression level of all verified mlncRNA transcripts is low but varies during development, which suggests regulation. As conserved introns indicate both purifying selection on the exon-intron structure and conserved expression of the transcript in related species, the novel mlncRNAs are good candidates for functional transcripts.
Collapse
Affiliation(s)
- Michael Hiller
- Bioinformatics Group, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Abstract
It has been proved that noncoding RNA (ncRNA) genes are much more numerous than expected. However, it remains a difficult task to identify ncRNAs with either computational algorithms or biological experiments. Recent reports have suggested that ncRNAs may also appear in the expressed sequence tags (EST's) database. Nevertheless, intergenic ESTs have received little attention and are poorly annotated owing to their low abundance. Here, we have developed a computational strategy for discovering ncRNA genes from human ESTs. We first collected ESTs that are located in the intergenic regions and do not have detailed annotations. The intergenic regions were divided into non-overlapping 50-nt windows and PhastCons scores obtained from the UCSC database were assigned to these windows. We kept conserved windows that had PhastCons scores of over 0.8 and that had at least three supporting ESTs to act as seeds. Each cluster of ESTs corresponding to the seeds was assembled into a long contig. We used two criteria to screen for ncRNA transcripts from these contigs: the first was that the longest predicted open reading frame was less than 300 nt and the second was that the likely Pol-II promoters exist within 2,000 nt upstream or downstream of the contigs. As a result, 118 novel ncRNA genes were identified from human low abundance ESTs. Of seven randomly selected candidates, six were transcribed in human 2BS cells as shown by RT-PCR. Our work proves that the EST is a 'hidden treasure' for detecting novel ncRNA genes.
Collapse
|
48
|
Do CB, Foo CS, Batzoglou S. A max-margin model for efficient simultaneous alignment and folding of RNA sequences. Bioinformatics 2008; 24:i68-76. [PMID: 18586747 PMCID: PMC2718655 DOI: 10.1093/bioinformatics/btn177] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction. In this work, we present RAF (RNA Alignment and Folding), an efficient algorithm for simultaneous alignment and consensus folding of unaligned RNA sequences. Algorithmically, RAF exploits sparsity in the set of likely pairing and alignment candidates for each nucleotide (as identified by the CONTRAfold or CONTRAlign programs) to achieve an effectively quadratic running time for simultaneous pairwise alignment and folding. RAF's fast sparse dynamic programming, in turn, serves as the inference engine within a discriminative machine learning algorithm for parameter estimation. RESULTS In cross-validated benchmark tests, RAF achieves accuracies equaling or surpassing the current best approaches for RNA multiple sequence secondary structure prediction. However, RAF requires nearly an order of magnitude less time than other simultaneous folding and alignment methods, thus making it especially appropriate for high-throughput studies. AVAILABILITY Source code for RAF is available at:http://contra.stanford.edu/contrafold/.
Collapse
Affiliation(s)
- Chuong B Do
- Computer Science Department, Stanford University, Stanford, CA 94305, USA.
| | | | | |
Collapse
|
49
|
Keith JM, Adams P, Stephen S, Mattick JS. Delineating slowly and rapidly evolving fractions of the Drosophila genome. J Comput Biol 2008; 15:407-30. [PMID: 18435570 DOI: 10.1089/cmb.2007.0173] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.
Collapse
Affiliation(s)
- Jonathan M Keith
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia.
| | | | | | | |
Collapse
|
50
|
Eddy SR. A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput Biol 2008; 4:e1000069. [PMID: 18516236 PMCID: PMC2396288 DOI: 10.1371/journal.pcbi.1000069] [Citation(s) in RCA: 234] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2007] [Accepted: 03/26/2008] [Indexed: 11/19/2022] Open
Abstract
Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments. Sequence database searches are a fundamental tool of molecular biology, enabling researchers to identify related sequences in other organisms, which often provides invaluable clues to the function and evolutionary history of genes. The power of database searches to detect more and more remote evolutionary relationships – essentially, to look back deeper in time – has improved steadily, with the adoption of more complex and realistic models. However, database searches require not just a realistic scoring model, but also the ability to distinguish good scores from bad ones – the ability to calculate the statistical significance of scores. For many models and scoring schemes, accurate statistical significance calculations have either involved expensive computational simulations, or not been feasible at all. Here, I introduce a probabilistic model of local sequence alignment that has readily predictable score statistics for position-specific profile scoring systems, and not just for traditional optimal alignment scores, but also for more powerful log-likelihood ratio scores derived in a full probabilistic inference framework. These results remove one of the main obstacles that have impeded the use of more powerful and biologically realistic statistical inference methods in sequence homology searches.
Collapse
Affiliation(s)
- Sean R Eddy
- Howard Hughes Medical Institute, Janelia Farm Research Campus, Ashburn, Virginia, United States of America.
| |
Collapse
|