51
|
The Small Toxic Salmonella Protein TimP Targets the Cytoplasmic Membrane and Is Repressed by the Small RNA TimR. mBio 2020; 11:mBio.01659-20. [PMID: 33172998 PMCID: PMC7667032 DOI: 10.1128/mbio.01659-20] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Next-generation sequencing (NGS) has enabled the revelation of a vast number of genomes from organisms spanning all domains of life. To reduce complexity when new genome sequences are annotated, open reading frames (ORFs) shorter than 50 codons in length are generally omitted. However, it has recently become evident that this procedure sorts away ORFs encoding small proteins of high biological significance. For instance, tailored small protein identification approaches have shown that bacteria encode numerous small proteins with important physiological functions. As the number of predicted small ORFs increase, it becomes important to characterize the corresponding proteins. In this study, we discovered a conserved but previously overlooked small enterobacterial protein. We show that this protein, which we dubbed TimP, is a potent toxin that inhibits bacterial growth by targeting the cell membrane. Toxicity is relieved by a small regulatory RNA, which binds the toxin mRNA to inhibit toxin synthesis. Small proteins are gaining increased attention due to their important functions in major biological processes throughout the domains of life. However, their small size and low sequence conservation make them difficult to identify. It is therefore not surprising that enterobacterial ryfA has escaped identification as a small protein coding gene for nearly 2 decades. Since its identification in 2001, ryfA has been thought to encode a noncoding RNA and has been implicated in biofilm formation in Escherichia coli and pathogenesis in Shigella dysenteriae. Although a recent ribosome profiling study suggested ryfA to be translated, the corresponding protein product was not detected. In this study, we provide evidence that ryfA encodes a small toxic inner membrane protein, TimP, overexpression of which causes cytoplasmic membrane leakage. TimP carries an N-terminal signal sequence, indicating that its membrane localization is Sec-dependent. Expression of TimP is repressed by the small RNA (sRNA) TimR, which base pairs with the timP mRNA to inhibit its translation. In contrast to overexpression, endogenous expression of TimP upon timR deletion permits cell growth, possibly indicating a toxicity-independent function in the bacterial membrane.
Collapse
|
52
|
Zhou B, Yang H, Yang C, Bao YL, Yang SM, Liu J, Xiao YF. Translation of noncoding RNAs and cancer. Cancer Lett 2020; 497:89-99. [PMID: 33038492 DOI: 10.1016/j.canlet.2020.10.002] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 02/07/2023]
Abstract
The human genome contains thousands of noncoding RNAs (ncRNAs), which are thought to lack open reading frames (ORFs) and cannot be translated. Some ncRNAs reportedly have important functions, including epigenetic regulation, chromatin remolding, protein modification, and RNA degradation, but the functions of most ncRNAs remain elusive. Through the application and development of ribosome profiling and sequencing technologies, an increasing number of studies have discovered the translation of ncRNAs. Although ncRNAs were initially defined as noncoding RNAs, a number of ncRNAs actually contain ORFs that are translated into peptides. Here, we summarize the available methods, tools, and databases for identifying and validating ncRNA-encoded peptides/proteins, and the recent findings regarding ncRNA-encoded small peptides/proteins in cancer are compiled and synthesized. Importantly, the role of ncRNA-encoding peptides/proteins has application prospects in cancer research, but some potential challenges remain unresolved. The aim of this review is to provide a theoretical basis that might promote the discovery of more peptides/proteins encoded by ncRNAs and aid the further development of novel diagnostic and prognostic cancer markers and therapeutic targets.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Huan Yang
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Chuan Yang
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Yu-Lu Bao
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Shi-Ming Yang
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China
| | - Jiao Liu
- Department of Endoscope, General Hospital of Northern Theater Command, Shenyang, 110016, Liaoning, China.
| | - Yu-Feng Xiao
- Department of Gastroenterology, Xinqiao Hospital, Chongqing, 400037, China.
| |
Collapse
|
53
|
Choi SW, Kim HW, Nam JW. The small peptide world in long noncoding RNAs. Brief Bioinform 2020; 20:1853-1864. [PMID: 30010717 PMCID: PMC6917221 DOI: 10.1093/bib/bby055] [Citation(s) in RCA: 200] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 05/08/2018] [Indexed: 02/07/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) are a group of transcripts that are longer than 200 nucleotides (nt) without coding potential. Over the past decade, tens of thousands of novel lncRNAs have been annotated in animal and plant genomes because of advanced high-throughput RNA sequencing technologies and with the aid of coding transcript classifiers. Further, a considerable number of reports have revealed the existence of stable, functional small peptides (also known as micropeptides), translated from lncRNAs. In this review, we discuss the methods of lncRNA classification, the investigations regarding their coding potential and the functional significance of the peptides they encode.
Collapse
Affiliation(s)
- Seo-Won Choi
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| | - Hyun-Woo Kim
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
| |
Collapse
|
54
|
Müller T, Miladi M, Hutter F, Hofacker I, Will S, Backofen R. The locality dilemma of Sankoff-like RNA alignments. Bioinformatics 2020; 36:i242-i250. [PMID: 32657398 PMCID: PMC7355259 DOI: 10.1093/bioinformatics/btaa431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Motivation Elucidating the functions of non-coding RNAs by homology has been strongly limited due to fundamental computational and modeling issues. While existing simultaneous alignment and folding (SA&F) algorithms successfully align homologous RNAs with precisely known boundaries (global SA&F), the more pressing problem of identifying new classes of homologous RNAs in the genome (local SA&F) is intrinsically more difficult and much less understood. Typically, the length of local alignments is strongly overestimated and alignment boundaries are dramatically mispredicted. We hypothesize that local SA&F approaches are compromised this way due to a score bias, which is caused by the contribution of RNA structure similarity to their overall alignment score. Results In the light of this hypothesis, we study pairwise local SA&F for the first time systematically—based on a novel local RNA alignment benchmark set and quality measure. First, we vary the relative influence of structure similarity compared to sequence similarity. Putting more emphasis on the structure component leads to overestimating the length of local alignments. This clearly shows the bias of current scores and strongly hints at the structure component as its origin. Second, we study the interplay of several important scoring parameters by learning parameters for local and global SA&F. The divergence of these optimized parameter sets underlines the fundamental obstacles for local SA&F. Third, by introducing a position-wise correction term in local SA&F, we constructively solve its principal issues. Availability and implementation The benchmark data, detailed results and scripts are available at https://github.com/BackofenLab/local_alignment. The RNA alignment tool LocARNA, including the modifications proposed in this work, is available at https://github.com/s-will/LocARNA/releases/tag/v2.0.0RC6. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Teresa Müller
- Bioinformatics Group, University of Freiburg, Freiburg 79110, Germany
| | - Milad Miladi
- Bioinformatics Group, University of Freiburg, Freiburg 79110, Germany
| | - Frank Hutter
- Machine Learning Lab, Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Ivo Hofacker
- Theoretical Biochemistry Group (TBI), Institute for Theoretical Chemistry, University of Vienna, Vienna, Wien 1090, Austria
| | - Sebastian Will
- Theoretical Biochemistry Group (TBI), Institute for Theoretical Chemistry, University of Vienna, Vienna, Wien 1090, Austria.,Bioinformatics Group AMIBio, LIX-Laboratoire d'Informatique d'École Polytechnique, IPP, Palaiseau 91120, France
| | - Rolf Backofen
- Bioinformatics Group, University of Freiburg, Freiburg 79110, Germany.,Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Freiburg 79104, Germany
| |
Collapse
|
55
|
Abstract
No method exists to measure large-scale translation of genes in uncultured organisms in microbiomes. To overcome this limitation, we develop MetaRibo-Seq, a method for simultaneous ribosome profiling of tens to hundreds of organisms in microbiome samples. MetaRibo-Seq was benchmarked against gold-standard Ribo-Seq in a mock microbial community and applied to five different human fecal samples. Unlike RNA-Seq, Ribo-Seq signal of a predicted gene suggests it encodes a translated protein. We demonstrate two applications of this technique: First, MetaRibo-Seq identifies small genes, whose identification until now has been challenging. For example, MetaRibo-Seq identifies 2,091 translated, previously unannotated small protein families from five fecal samples, more than doubling the number of small proteins predicted to exist in this niche. Second, the combined application of RNA-Seq and MetaRibo-Seq identifies differences in the translation of transcripts. In summary, MetaRibo-Seq enables comprehensive translational profiling in microbiomes and identifies previously unannotated small proteins. Defining the functions of individual organisms or communities within microbiomes is a challenging task. Here, the authors develop MetaRibo-Seq, a method for simultaneous high-throughput ribosome profiling of organisms in uncultured microbiome samples.
Collapse
|
56
|
de Alvarenga LV, Hess WR, Hagemann M. AcnSP - A Novel Small Protein Regulator of Aconitase Activity in the Cyanobacterium Synechocystis sp. PCC 6803. Front Microbiol 2020; 11:1445. [PMID: 32695088 PMCID: PMC7336809 DOI: 10.3389/fmicb.2020.01445] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 06/04/2020] [Indexed: 12/28/2022] Open
Abstract
Synechocystis sp. PCC 6803 is a widely used model cyanobacterium whose genome has been well annotated. However, several additional small protein coding sequences (sORFs) have been recently identified, which might play important roles, for example in the regulation of cellular metabolism. Here, we analyzed the function of a sORF encoding a 44 amino acid peptide showing high similarity to the N-terminal part of aconitase (AcnB). The expression of the gene, which probably originated from a partial gene duplication of chromosomal acnB into the plasmid pSYSA, was verified and it was designated as acnSP. The protein-coding part of acnSP was inactivated by interposon mutagenesis. The obtained mutant displayed slower growth under photoautotrophic conditions with light exceeding 100 μmol photons m–2 s–1 and showed significant changes in the metabolome compared to wild type, including alterations in many metabolites associated to the tricarboxylic acid (TCA) cycle. To analyze a possible direct impact of AcnSP on aconitase, the recombinant Synechocystis enzyme was generated and biochemically characterized. Biochemical analysis revealed that addition of equimolar amounts of AcnSP resulted in an improved substrate affinity (lower Km) and lowered Vmax of aconitase. These results imply that AcnSP can regulate aconitase activity, thereby impacting the carbon flow into the oxidative branch of the cyanobacterial TCA cycle, which is mainly responsible for the synthesis of carbon skeletons needed for ammonia assimilation.
Collapse
Affiliation(s)
- Luna V de Alvarenga
- Department of Plant Physiology, Institute of Biosciences, University of Rostock, Rostock, Germany
| | - Wolfgang R Hess
- Genetics & Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Freiburg im Breisgau, Germany
| | - Martin Hagemann
- Department of Plant Physiology, Institute of Biosciences, University of Rostock, Rostock, Germany.,Department Life, Light and Matter, University of Rostock, Rostock, Germany
| |
Collapse
|
57
|
Castandet B, Germain A, Hotto AM, Stern DB. Systematic sequencing of chloroplast transcript termini from Arabidopsis thaliana reveals >200 transcription initiation sites and the extensive imprints of RNA-binding proteins and secondary structures. Nucleic Acids Res 2020; 47:11889-11905. [PMID: 31732725 PMCID: PMC7145512 DOI: 10.1093/nar/gkz1059] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 10/02/2019] [Accepted: 11/05/2019] [Indexed: 12/23/2022] Open
Abstract
Chloroplast transcription requires numerous quality control steps to generate the complex but selective mixture of accumulating RNAs. To gain insight into how this RNA diversity is achieved and regulated, we systematically mapped transcript ends by developing a protocol called Terminome-seq. Using Arabidopsis thaliana as a model, we catalogued >215 primary 5′ ends corresponding to transcription start sites (TSS), as well as 1628 processed 5′ ends and 1299 3′ ends. While most termini were found in intergenic regions, numerous abundant termini were also found within coding regions and introns, including several major TSS at unexpected locations. A consistent feature was the clustering of both 5′ and 3′ ends, contrasting with the prevailing description of discrete 5′ termini, suggesting an imprecision of the transcription and/or RNA processing machinery. Numerous termini correlated with the extremities of small RNA footprints or predicted stem-loop structures, in agreement with the model of passive RNA protection. Terminome-seq was also implemented for pnp1–1, a mutant lacking the processing enzyme polynucleotide phosphorylase. Nearly 2000 termini were altered in pnp1–1, revealing a dominant role in shaping the transcriptome. In summary, Terminome-seq permits precise delineation of the roles and regulation of the many factors involved in organellar transcriptome quality control.
Collapse
Affiliation(s)
- Benoît Castandet
- Boyce Thompson Institute, Ithaca, NY 14853, USA.,Institut des Sciences des Plantes de Paris Saclay (IPS2), UEVE, INRA, CNRS, Univ. Paris Sud, Université Paris-Saclay, F-91192 Gif sur Yvette, France.,Université de Paris, IPS2, F-91192 Gif sur Yvette, France
| | | | | | | |
Collapse
|
58
|
Parra-Rivero O, Pardo-Medina J, Gutiérrez G, Limón MC, Avalos J. A novel lncRNA as a positive regulator of carotenoid biosynthesis in Fusarium. Sci Rep 2020; 10:678. [PMID: 31959816 PMCID: PMC6971296 DOI: 10.1038/s41598-020-57529-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 11/07/2019] [Indexed: 01/28/2023] Open
Abstract
The fungi Fusarium oxysporum and Fusarium fujikuroi produce carotenoids, lipophilic terpenoid pigments of biotechnological interest, with xanthophyll neurosporaxanthin as the main end product. Their carotenoid biosynthesis is activated by light and negatively regulated by the RING-finger protein CarS. Global transcriptomic analysis identified in both species a putative 1-kb lncRNA that we call carP, referred to as Fo-carP and Ff-carP in each species, upstream to the gene carS and transcribed from the same DNA strand. Fo-carP and Ff-carP are poorly transcribed, but their RNA levels increase in carS mutants. The deletion of Fo-carP or Ff-carP in the respective species results in albino phenotypes, with strong reductions in mRNA levels of structural genes for carotenoid biosynthesis and higher mRNA content of the carS gene, which could explain the low accumulation of carotenoids. Upon alignment, Fo-carP and Ff-carP show 75-80% identity, with short insertions or deletions resulting in a lack of coincident ORFs. Moreover, none of the ORFs found in their sequences have indications of possible coding functions. We conclude that Fo-carP and Ff-carP are regulatory lncRNAs necessary for the active expression of the carotenoid genes in Fusarium through an unknown molecular mechanism, probably related to the control of carS function or expression.
Collapse
Affiliation(s)
- Obdulia Parra-Rivero
- Department of Genetics, Faculty of Biology, University of Seville, E-41012, Seville, Spain
| | - Javier Pardo-Medina
- Department of Genetics, Faculty of Biology, University of Seville, E-41012, Seville, Spain
| | - Gabriel Gutiérrez
- Department of Genetics, Faculty of Biology, University of Seville, E-41012, Seville, Spain
| | - M Carmen Limón
- Department of Genetics, Faculty of Biology, University of Seville, E-41012, Seville, Spain
| | - Javier Avalos
- Department of Genetics, Faculty of Biology, University of Seville, E-41012, Seville, Spain.
| |
Collapse
|
59
|
Zhou B, Yang Y, Zhan J, Dou X, Wang J, Zhou Y. Predicting functional long non-coding RNAs validated by low throughput experiments. RNA Biol 2019; 16:1555-1564. [PMID: 31345106 PMCID: PMC6779387 DOI: 10.1080/15476286.2019.1644590] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 06/17/2019] [Accepted: 07/10/2019] [Indexed: 01/05/2023] Open
Abstract
High-throughput techniques have uncovered hundreds and thousands of long non-coding RNAs (lncRNAs). Among them, only a tiny fraction has experimentally validated functions (EVlncRNAs) by low-throughput methods. What fraction of lncRNAs from high-throughput experiments (HTlncRNAs) is truly functional is an active subject of debate. Here, we developed the first method to distinguish EVlncRNAs from HTlncRNAs and mRNAs by using Support Vector Machines and found that EVlncRNAs can be well separated from HTlncRNAs and mRNAs with 0.6 for Matthews correlation coefficient, 64% for sensitivity, and 81% for precision for the independent human test set. The most useful features for classification are related to sequence conservations at RNA (for separating from HTlncRNAs) and protein (for separating from mRNA) levels. The method is found to be robust as the human-RNA-trained model is applicable to independent mouse RNAs with similar accuracy and to a lesser extent to plant RNAs. The method can recover newly discovered EVlncRNAs with high sensitivity. Its application to randomly selected 2000 human HTlncRNAs indicates that the majority of HTlncRNAs is probably non-functional but a large portion (nearly 30%) are likely functional. In other words, there is an ample number of lncRNAs whose specific biological roles are yet to be discovered. The method developed here is expected to speed up and reduce the cost of the discovery by prioritizing potentially functional lncRNAs prior to experimental validation. EVlncRNA-pred is available as a web server at http://biophy.dzu.edu.cn/lncrnapred/index.html . All datasets used in this study can be obtained from the same website.
Collapse
Affiliation(s)
- Bailing Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- College of Physics and Electronic Information, Dezhou University, Dezhou, China
| | - Yuedong Yang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Jian Zhan
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Xianghua Dou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- College of Physics and Electronic Information, Dezhou University, Dezhou, China
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- College of Physics and Electronic Information, Dezhou University, Dezhou, China
| | - Yaoqi Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| |
Collapse
|
60
|
Monteiro JP, Bennett M, Rodor J, Caudrillier A, Ulitsky I, Baker AH. Endothelial function and dysfunction in the cardiovascular system: the long non-coding road. Cardiovasc Res 2019; 115:1692-1704. [PMID: 31214683 PMCID: PMC6755355 DOI: 10.1093/cvr/cvz154] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 04/23/2019] [Accepted: 06/05/2019] [Indexed: 12/18/2022] Open
Abstract
Present throughout the vasculature, endothelial cells (ECs) are essential for blood vessel function and play a central role in the pathogenesis of diverse cardiovascular diseases. Understanding the intricate molecular determinants governing endothelial function and dysfunction is essential to develop novel clinical breakthroughs and improve knowledge. An increasing body of evidence demonstrates that long non-coding RNAs (lncRNAs) are active regulators of the endothelial transcriptome and function, providing emerging insights into core questions surrounding EC contributions to pathology, and perhaps the emergence of novel therapeutic opportunities. In this review, we discuss this class of non-coding transcripts and their role in endothelial biology during cardiovascular development, homeostasis, and disease, highlighting challenges during discovery and characterization and how these have been overcome to date. We further discuss the translational therapeutic implications and the challenges within the field, highlighting lncRNA that support endothelial phenotypes prevalent in cardiovascular disease.
Collapse
Affiliation(s)
- João P Monteiro
- Centre for Cardiovascular Science, Queen's Medical Research Institute, University of Edinburgh, 47 Little France Crescent, Edinburgh, UK
| | - Matthew Bennett
- Centre for Cardiovascular Science, Queen's Medical Research Institute, University of Edinburgh, 47 Little France Crescent, Edinburgh, UK
| | - Julie Rodor
- Centre for Cardiovascular Science, Queen's Medical Research Institute, University of Edinburgh, 47 Little France Crescent, Edinburgh, UK
| | - Axelle Caudrillier
- Centre for Cardiovascular Science, Queen's Medical Research Institute, University of Edinburgh, 47 Little France Crescent, Edinburgh, UK
| | - Igor Ulitsky
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel
| | - Andrew H Baker
- Centre for Cardiovascular Science, Queen's Medical Research Institute, University of Edinburgh, 47 Little France Crescent, Edinburgh, UK
| |
Collapse
|
61
|
Sberro H, Fremin BJ, Zlitni S, Edfors F, Greenfield N, Snyder MP, Pavlopoulos GA, Kyrpides NC, Bhatt AS. Large-Scale Analyses of Human Microbiomes Reveal Thousands of Small, Novel Genes. Cell 2019; 178:1245-1259.e14. [PMID: 31402174 PMCID: PMC6764417 DOI: 10.1016/j.cell.2019.07.016] [Citation(s) in RCA: 155] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 05/06/2019] [Accepted: 07/11/2019] [Indexed: 12/12/2022]
Abstract
Small proteins are traditionally overlooked due to computational and experimental difficulties in detecting them. To systematically identify small proteins, we carried out a comparative genomics study on 1,773 human-associated metagenomes from four different body sites. We describe >4,000 conserved protein families, the majority of which are novel; ∼30% of these protein families are predicted to be secreted or transmembrane. Over 90% of the small protein families have no known domain and almost half are not represented in reference genomes. We identify putative housekeeping, mammalian-specific, defense-related, and protein families that are likely to be horizontally transferred. We provide evidence of transcription and translation for a subset of these families. Our study suggests that small proteins are highly abundant and those of the human microbiome, in particular, may perform diverse functions that have not been previously reported.
Collapse
Affiliation(s)
- Hila Sberro
- Department of Medicine (Hematology; Blood and Marrow Transplantation) and Genetics, Stanford University, Stanford, CA, USA; Department of Genetics, Stanford University, Stanford, CA, USA
| | - Brayon J Fremin
- Department of Medicine (Hematology; Blood and Marrow Transplantation) and Genetics, Stanford University, Stanford, CA, USA
| | - Soumaya Zlitni
- Department of Medicine (Hematology; Blood and Marrow Transplantation) and Genetics, Stanford University, Stanford, CA, USA
| | - Fredrik Edfors
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | | | - Georgios A Pavlopoulos
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA; Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center Alexander Fleming, Vari, Greece
| | - Nikos C Kyrpides
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ami S Bhatt
- Department of Medicine (Hematology; Blood and Marrow Transplantation) and Genetics, Stanford University, Stanford, CA, USA; Department of Genetics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
62
|
Developmental dynamics of lncRNAs across mammalian organs and species. Nature 2019; 571:510-514. [PMID: 31243368 PMCID: PMC6660317 DOI: 10.1038/s41586-019-1341-x] [Citation(s) in RCA: 207] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 05/31/2019] [Indexed: 12/17/2022]
Abstract
While many long noncoding RNAs (lncRNAs) have been identified in human and other mammalian genomes, there has been limited systematic functional characterization. In particular, the contribution of lncRNAs to organ development remains largely unexplored. Here we analyze the expression patterns of lncRNAs across developmental timepoints in seven major organs, from early organogenesis to adulthood, across seven species (human, macaque, mouse, rat, rabbit, opossum, and chicken). Our analyses identified ~15,000-35,000 candidate lncRNAs in each species, most of which show species specificity. We characterized expression patterns of lncRNAs across developmental stages, and found many with dynamic expression patterns across time that show signatures of enrichment for functionality. During development, there is a transition from broadly expressed and conserved lncRNAs towards an increasing number of lineage- and organ-specific lncRNAs. Our study provides a resource of candidate lncRNAs and their patterns of expression and evolutionary conservation across mammalian organ development.
Collapse
|
63
|
Saghafi T, Taheri RA, Parkkila S, Emameh RZ. Phytochemicals as Modulators of Long Non-Coding RNAs and Inhibitors of Cancer-Related Carbonic Anhydrases. Int J Mol Sci 2019; 20:E2939. [PMID: 31208095 PMCID: PMC6627131 DOI: 10.3390/ijms20122939] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 05/29/2019] [Accepted: 05/30/2019] [Indexed: 01/17/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are classified as a group of transcripts which regulate various biological processes, such as RNA processing, epigenetic control, and signaling pathways. According to recent studies, lncRNAs are dysregulated in cancer and play an important role in cancer incidence and spreading. There is also an association between lncRNAs and the overexpression of some tumor-associated proteins, including carbonic anhydrases II, IX, and XII (CA II, CA IX, and CA XII). Therefore, not only CA inhibition, but also lncRNA modulation, could represent an attractive strategy for cancer prevention and therapy. Experimental studies have suggested that herbal compounds regulate the expression of many lncRNAs involved in cancer, such as HOTAIR (HOX transcript antisense RNA), H19, MALAT1 (metastasis-associated lung adenocarcinoma transcript 1), PCGEM1 (Prostate cancer gene expression marker 1), PVT1, etc. These plant-derived drugs or phytochemicals include resveratrol, curcumin, genistein, quercetin, epigallocatechin-3-galate, camptothcin, and 3,3'-diindolylmethane. More comprehensive information about lncRNA modulation via phytochemicals would be helpful for the administration of new herbal derivatives in cancer therapy. In this review, we describe the state-of-the-art and potential of phytochemicals as modulators of lncRNAs in different types of cancers.
Collapse
Affiliation(s)
- Tayebeh Saghafi
- Department of Energy and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology (NIGEB), 14965/161, Tehran, Iran.
| | - Ramezan Ali Taheri
- Nanobiotechnology Research Center, Baqiyatallah University of Medical Sciences, P.O.Box 14965/161 Tehran, Iran.
| | - Seppo Parkkila
- Faculty of Medicine and Health Technology, Tampere University, FI-33520 Tampere, Finland.
- Fimlab Laboratories Ltd. and Tampere University Hospital, FI-33520 Tampere, Finland.
| | - Reza Zolfaghari Emameh
- Department of Energy and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology (NIGEB), 14965/161, Tehran, Iran.
| |
Collapse
|
64
|
Kang YJ, Yang DC, Kong L, Hou M, Meng YQ, Wei L, Gao G. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res 2019; 45:W12-W16. [PMID: 28521017 PMCID: PMC5793834 DOI: 10.1093/nar/gkx428] [Citation(s) in RCA: 960] [Impact Index Per Article: 160.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 05/03/2017] [Indexed: 12/19/2022] Open
Abstract
With advances in next-generation sequencing technologies, numerous novel transcripts in a large number of organisms have been identified. With the goal of fast, accurate assessment of the coding ability of RNA transcripts, we upgraded the coding potential calculator CPC1 to CPC2. CPC2 runs ∼1000 times faster than CPC1 and exhibits superior accuracy compared with CPC1, especially for long non-coding transcripts. Moreover, the model of CPC2 is species-neutral, making it feasible for ever-growing non-model organism transcriptomes. A mobile-friendly web server, as well as a downloadable standalone package, is freely available at http://cpc2.cbi.pku.edu.cn.
Collapse
Affiliation(s)
- Yu-Jian Kang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - De-Chang Yang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Lei Kong
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Mei Hou
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Yu-Qi Meng
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Liping Wei
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| |
Collapse
|
65
|
Kong Y, Lu Z, Liu P, Liu Y, Wang F, Liang EY, Hou FF, Liang M. Long Noncoding RNA: Genomics and Relevance to Physiology. Compr Physiol 2019; 9:933-946. [PMID: 31187897 DOI: 10.1002/cphy.c180032] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The mammalian cell expresses thousands of long noncoding RNAs (lncRNAs) that are longer than 200 nucleotides but do not encode any protein. lncRNAs can change the expression of protein-coding genes through both cis and trans mechanisms, including imprinting and other types of transcriptional regulation, and posttranscriptional regulation including serving as molecular sponges. Deep sequencing, coupled with analysis of sequence characteristics, is the primary method used to identify lncRNAs. Physiological roles of specific lncRNAs can be examined using genetic targeting or knockdown with modified oligonucleotides. Identification of nucleic acids or proteins with which an lncRNA interacts is essential for understanding the molecular mechanism underlying its physiological role. lncRNAs have been reported to contribute to the regulation of physiological functions and disease development in several organ systems, including the cardiovascular, renal, muscular, endocrine, digestive, nervous, respiratory, and reproductive systems. The physiological role of the majority of lncRNAs, many of which are species and tissue specific, remains to be determined. © 2019 American Physiological Society. Compr Physiol 9:933-946, 2019.
Collapse
Affiliation(s)
- Yiwei Kong
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Zeyuan Lu
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Pengyuan Liu
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Sir Run Run Shaw Hospital, Institute of Translational Medicine, Zhejiang University, Zhejiang, China
| | - Yong Liu
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Feng Wang
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.,Department of Nephrology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Eugene Y Liang
- Center for Advancing Population Science, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Fan Fan Hou
- National Clinical Research Center for Kidney Disease, State Key Laboratory of Organ Failure Research, Guangzhou Regenerative Medicine and Health - Guangdong Laboratory, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Mingyu Liang
- Center of Systems Molecular Medicine, Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
66
|
Stav S, Atilho RM, Mirihana Arachchilage G, Nguyen G, Higgs G, Breaker RR. Genome-wide discovery of structured noncoding RNAs in bacteria. BMC Microbiol 2019; 19:66. [PMID: 30902049 PMCID: PMC6429828 DOI: 10.1186/s12866-019-1433-7] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 03/07/2019] [Indexed: 12/15/2022] Open
Abstract
Background Structured noncoding RNAs (ncRNAs) play essential roles in many biological processes such as gene regulation, signaling, RNA processing, and protein synthesis. Among the most common groups of ncRNAs in bacteria are riboswitches. These cis-regulatory, metabolite-binding RNAs are present in many species where they regulate various metabolic and signaling pathways. Collectively, there are likely to be hundreds of novel riboswitch classes that remain hidden in the bacterial genomes that have already been sequenced, and potentially thousands of classes distributed among various other species in the biosphere. The vast majority of these undiscovered classes are proposed to be exceedingly rare, and so current bioinformatics search techniques are reaching their limits for differentiating between true riboswitch candidates and false positives. Results Herein, we exploit a computational search pipeline that can efficiently identify intergenic regions most likely to encode structured ncRNAs. Application of this method to five bacterial genomes yielded nearly 70 novel genetic elements including 30 novel candidate ncRNA motifs. Among the riboswitch candidates identified is an RNA motif involved in the regulation of thiamin biosynthesis. Conclusions Analysis of other genomes will undoubtedly lead to the discovery of many additional novel structured ncRNAs, and provide insight into the range of riboswitches and other kinds of ncRNAs remaining to be discovered in bacteria and archaea. Electronic supplementary material The online version of this article (10.1186/s12866-019-1433-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shira Stav
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, USA
| | - Ruben M Atilho
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
| | | | - Giahoa Nguyen
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, USA
| | - Gadareth Higgs
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, USA
| | - Ronald R Breaker
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, USA. .,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA. .,Howard Hughes Medical Institute, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
67
|
Karlik E, Ari S, Gozukirmizi N. LncRNAs: genetic and epigenetic effects in plants. BIOTECHNOL BIOTEC EQ 2019. [DOI: 10.1080/13102818.2019.1581085] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Elif Karlik
- Department of Biotechnology Institute of Graduate Studies in Science and Engineering, Istanbul University, Istanbul, Turkey
- Department of Molecular Biology and Genetics Faculty of Science, Istinye University, Istanbul, Turkey
| | - Sule Ari
- Department of Molecular Biology and Genetics Faculty of Science, Istanbul University, Istanbul, Turkey
| | - Nermin Gozukirmizi
- Department of Molecular Biology and Genetics Faculty of Science, Istanbul University, Istanbul, Turkey
- Department of Molecular Biology and Genetics Faculty of Science, Istinye University, Istanbul, Turkey
| |
Collapse
|
68
|
Abstract
Computational methods can often facilitate the functional characterization of individual sRNAs and furthermore allow high-throughput analysis on large numbers of sRNA candidates. This chapter outlines a potential workflow for computational sRNA analyses and describes in detail methods for homolog detection, target prediction, and functional characterization based on enrichment analysis. The cyanobacterial sRNA IsaR1 is used as a specific example. All methods are available as webservers and easily accessible for nonexpert users.
Collapse
|
69
|
Wang G, Yin H, Li B, Yu C, Wang F, Xu X, Cao J, Bao Y, Wang L, Abbasi AA, Bajic VB, Ma L, Zhang Z. Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics 2019; 35:2949-2956. [DOI: 10.1093/bioinformatics/btz008] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 12/05/2018] [Accepted: 01/07/2019] [Indexed: 01/24/2023] Open
Abstract
Abstract
Motivation
The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations.
Results
Here we first characterize lncRNAs in contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between open reading frame length and guanine-cytosine (GC) content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species.
Availability and implementation
LGC web server is publicly available at http://bigd.big.ac.cn/lgc/calculator. The scripts and data can be downloaded at http://bigd.big.ac.cn/biocode/tools/BT000004.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guangyu Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hongyan Yin
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Boyang Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Chunlei Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Fan Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Xingjian Xu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jiabao Cao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yiming Bao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Liguo Wang
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Amir A Abbasi
- National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Kingdom of Saudi Arabia
| | - Lina Ma
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
70
|
Lott SC, Schäfer RA, Mann M, Backofen R, Hess WR, Voß B, Georg J. GLASSgo - Automated and Reliable Detection of sRNA Homologs From a Single Input Sequence. Front Genet 2018; 9:124. [PMID: 29719549 PMCID: PMC5913331 DOI: 10.3389/fgene.2018.00124] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 03/26/2018] [Indexed: 11/24/2022] Open
Abstract
Bacterial small RNAs (sRNAs) are important post-transcriptional regulators of gene expression. The functional and evolutionary characterization of sRNAs requires the identification of homologs, which is frequently challenging due to their heterogeneity, short length and partly, little sequence conservation. We developed the GLobal Automatic Small RNA Search go (GLASSgo) algorithm to identify sRNA homologs in complex genomic databases starting from a single sequence. GLASSgo combines an iterative BLAST strategy with pairwise identity filtering and a graph-based clustering method that utilizes RNA secondary structure information. We tested the specificity, sensitivity and runtime of GLASSgo, BLAST and the combination RNAlien/cmsearch in a typical use case scenario on 40 bacterial sRNA families. The sensitivity of the tested methods was similar, while the specificity of GLASSgo and RNAlien/cmsearch was significantly higher than that of BLAST. GLASSgo was on average ∼87 times faster than RNAlien/cmsearch, and only ∼7.5 times slower than BLAST, which shows that GLASSgo optimizes the trade-off between speed and accuracy in the task of finding sRNA homologs. GLASSgo is fully automated, whereas BLAST often recovers only parts of homologs and RNAlien/cmsearch requires extensive additional bioinformatic work to get a comprehensive set of homologs. GLASSgo is available as an easy-to-use web server to find homologous sRNAs in large databases.
Collapse
Affiliation(s)
- Steffen C Lott
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Richard A Schäfer
- Institute of Biochemical Engineering, University of Stuttgart, Stuttgart, Germany
| | - Martin Mann
- Bioinformatics Group, Faculty of Computer Science, University of Freiburg, Freiburg, Germany.,Forest Growth and Dendroecology, Institute of Forest Sciences, University of Freiburg, Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Faculty of Computer Science, University of Freiburg, Freiburg, Germany.,ZBSA Center for Biological Systems Analysis, University of Freiburg, Freiburg, Germany.,BIOSS Centre for Biological Signalling Studies, Cluster of Excellence, University of Freiburg, Freiburg, Germany.,Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Wolfgang R Hess
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Freiburg, Germany.,Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany
| | - Björn Voß
- Institute of Biochemical Engineering, University of Stuttgart, Stuttgart, Germany
| | - Jens Georg
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Freiburg, Germany
| |
Collapse
|
71
|
Lokits AD, Indrischek H, Meiler J, Hamm HE, Stadler PF. Tracing the evolution of the heterotrimeric G protein α subunit in Metazoa. BMC Evol Biol 2018; 18:51. [PMID: 29642851 PMCID: PMC5896119 DOI: 10.1186/s12862-018-1147-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 03/06/2018] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Heterotrimeric G proteins are fundamental signaling proteins composed of three subunits, Gα and a Gβγ dimer. The role of Gα as a molecular switch is critical for transmitting and amplifying intracellular signaling cascades initiated by an activated G protein Coupled Receptor (GPCR). Despite their biochemical and therapeutic importance, the study of G protein evolution has been limited to the scope of a few model organisms. Furthermore, of the five primary Gα subfamilies, the underlying gene structure of only two families has been thoroughly investigated outside of Mammalia evolution. Therefore our understanding of Gα emergence and evolution across phylogeny remains incomplete. RESULTS We have computationally identified the presence and absence of every Gα gene (GNA-) across all major branches of Deuterostomia and evaluated the conservation of the underlying exon-intron structures across these phylogenetic groups. We provide evidence of mutually exclusive exon inclusion through alternative splicing in specific lineages. Variations of splice site conservation and isoforms were found for several paralogs which coincide with conserved, putative motifs of DNA-/RNA-binding proteins. In addition to our curated gene annotations, within Primates, we identified 15 retrotranspositions, many of which have undergone pseudogenization. Most importantly, we find numerous deviations from previous findings regarding the presence and absence of individual GNA- genes, nuanced differences in phyla-specific gene copy numbers, novel paralog duplications and subsequent intron gain and loss events. CONCLUSIONS Our curated annotations allow us to draw more accurate inferences regarding the emergence of all Gα family members across Metazoa and to present a new, updated theory of Gα evolution. Leveraging this, our results are critical for gaining new insights into the co-evolution of the Gα subunit and its many protein binding partners, especially therapeutically relevant G protein - GPCR signaling pathways which radiated in Vertebrata evolution.
Collapse
Affiliation(s)
- A. D. Lokits
- 0000 0001 2264 7217grid.152326.1Neuroscience Program, Vanderbilt University, Nashville, TN USA ,0000 0001 2264 7217grid.152326.1Center for Structural Biology, Vanderbilt University, Nashville, TN USA
| | - H. Indrischek
- 0000 0001 2230 9752grid.9647.cBioinformatics Group, Department of Computer Science, Leipzig University, Leipzig, Germany ,0000 0001 2230 9752grid.9647.cComputational EvoDevo Group, Bioinformatics Department, Leipzig University, Leipzig, Germany
| | - J. Meiler
- 0000 0001 2264 7217grid.152326.1Center for Structural Biology, Vanderbilt University, Nashville, TN USA ,0000 0001 2264 7217grid.152326.1Chemistry Department, Vanderbilt University, Nashville, TN USA
| | - H. E. Hamm
- 0000 0004 1936 9916grid.412807.8Pharmacology Department, Vanderbilt University Medical Center, Nashville, TN USA
| | - P. F. Stadler
- 0000 0001 2230 9752grid.9647.cBioinformatics Group, Department of Computer Science, Leipzig University, Leipzig, Germany ,0000 0001 0674 042Xgrid.5254.6Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg C, Denmark ,0000 0001 2286 1424grid.10420.37Institute for Theoretical Chemistry, University of Vienna, Wien, Austria ,0000 0001 2230 9752grid.9647.cIZBI-Interdisciplinary Center for Bioinformatics and LIFE-Leipzig Research Center for Civilization Diseases and Competence Center for Scalable Data Services and Solutions, University Leipzig, Leipzig, Germany ,grid.419532.8Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany ,0000 0001 1941 1940grid.209665.eSanta Fe Institute, Santa Fe, NM USA
| |
Collapse
|
72
|
Yuan J, Li J, Yang Y, Tan C, Zhu Y, Hu L, Qi Y, Lu ZJ. Stress-responsive regulation of long non-coding RNA polyadenylation in Oryza sativa. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2018; 93:814-827. [PMID: 29265542 DOI: 10.1111/tpj.13804] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 11/18/2017] [Accepted: 11/28/2017] [Indexed: 05/22/2023]
Abstract
Recently, long non-coding RNAs (lncRNAs) have been demonstrated to be involved in many biological processes of plants; however, a systematic study on transcriptional and, in particular, post-transcriptional regulation of stress-responsive lncRNAs in Oryza sativa (rice) is lacking. We sequenced three types of RNA libraries (poly(A)+, poly(A)- and nuclear RNAs) under four abiotic stresses (cold, heat, drought and salt). Based on an integrative bioinformatics approach and ~200 high-throughput data sets, ~170 of which have been published, we revealed over 7000 lncRNAs, nearly half of which were identified for the first time. Notably, we found that the majority of the ~500 poly(A) lncRNAs that were differentially expressed under stress were significantly downregulated, but approximately 25% were found to have upregulated non-poly(A) forms. Moreover, hundreds of lncRNAs with downregulated polyadenylation (DPA) tend to be highly conserved, show significant nuclear retention and are co-expressed with protein-coding genes that function under stress. Remarkably, these DPA lncRNAs are significantly enriched in quantitative trait loci (QTLs) for stress tolerance or development, suggesting their potential important roles in rice growth under various stresses. In particular, we observed substantially accumulated DPA lncRNAs in plants exposed to drought and salt, which is consistent with the severe reduction of RNA 3'-end processing factors under these conditions. Taken together, the results of this study reveal that polyadenylation and subcellular localization of many rice lncRNAs are likely to be regulated at the post-transcriptional level. Our findings strongly suggest that many upregulated/downregulated lncRNAs previously identified by traditional RNA-seq analyses need to be carefully reviewed to assess the influence of post-transcriptional modification.
Collapse
Affiliation(s)
- Jiapei Yuan
- MOE Key Laboratory of Bioinformatics, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Jingrui Li
- MOE Key Laboratory of Bioinformatics, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
- Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Yang Yang
- MOE Key Laboratory of Bioinformatics, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Chang Tan
- MOE Key Laboratory of Bioinformatics, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Yumin Zhu
- MOE Key Laboratory of Bioinformatics, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Long Hu
- MOE Key Laboratory of Bioinformatics, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Yijun Qi
- MOE Key Laboratory of Bioinformatics, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
- Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Zhi John Lu
- MOE Key Laboratory of Bioinformatics, Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
73
|
Discovering viral genomes in human metagenomic data by predicting unknown protein families. Sci Rep 2018; 8:28. [PMID: 29311716 PMCID: PMC5758519 DOI: 10.1038/s41598-017-18341-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 11/28/2017] [Indexed: 01/15/2023] Open
Abstract
Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM.
Collapse
|
74
|
Weinberg Z, Lünse CE, Corbino KA, Ames TD, Nelson JW, Roth A, Perkins KR, Sherlock ME, Breaker RR. Detection of 224 candidate structured RNAs by comparative analysis of specific subsets of intergenic regions. Nucleic Acids Res 2017; 45:10811-10823. [PMID: 28977401 PMCID: PMC5737381 DOI: 10.1093/nar/gkx699] [Citation(s) in RCA: 109] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 08/02/2017] [Indexed: 11/29/2022] Open
Abstract
The discovery of structured non-coding RNAs (ncRNAs) in bacteria can reveal new facets of biology and biochemistry. Comparative genomics analyses executed by powerful computer algorithms have successfully been used to uncover many novel bacterial ncRNA classes in recent years. However, this general search strategy favors the discovery of more common ncRNA classes, whereas progressively rarer classes are correspondingly more difficult to identify. In the current study, we confront this problem by devising several methods to select subsets of intergenic regions that can concentrate these rare RNA classes, thereby increasing the probability that comparative sequence analysis approaches will reveal their existence. By implementing these methods, we discovered 224 novel ncRNA classes, which include ROOL RNA, an RNA class averaging 581 nt and present in multiple phyla, several highly conserved and widespread ncRNA classes with properties that suggest sophisticated biochemical functions and a multitude of putative cis-regulatory RNA classes involved in a variety of biological processes. We expect that further research on these newly found RNA classes will reveal additional aspects of novel biology, and allow for greater insights into the biochemistry performed by ncRNAs.
Collapse
Affiliation(s)
- Zasha Weinberg
- HHMI, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Christina E Lünse
- Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Keith A Corbino
- HHMI, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Tyler D Ames
- Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - James W Nelson
- Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Adam Roth
- HHMI, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Kevin R Perkins
- Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Madeline E Sherlock
- Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| | - Ronald R Breaker
- HHMI, Yale University, Box 208103, New Haven, CT 06520-8103, USA.,Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA
| |
Collapse
|
75
|
Li S, Breaker RR. Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics. BMC Genomics 2017; 18:785. [PMID: 29029611 PMCID: PMC5640933 DOI: 10.1186/s12864-017-4171-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 10/05/2017] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. RESULTS We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. CONCLUSIONS Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs. Although initial examinations of several motifs provide evidence for their likely functions, other motifs will require more in-depth analysis to reveal their functions.
Collapse
Affiliation(s)
- Sanshu Li
- Institute of Genomics, School of Biomedical Sciences, Huaqiao University, 668 Jimei Road, Xiamen, 361021 China
- Howard Hughes Medical Institute, Yale University, Box 208103, New Haven, CT 06520-8103 USA
| | - Ronald R. Breaker
- Howard Hughes Medical Institute, Yale University, Box 208103, New Haven, CT 06520-8103 USA
- Department of Molecular, Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520-8103 USA
- Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103 USA
| |
Collapse
|
76
|
Blasi B, Tafer H, Kustor C, Poyntner C, Lopandic K, Sterflinger K. Genomic and transcriptomic analysis of the toluene degrading black yeast Cladophialophora immunda. Sci Rep 2017; 7:11436. [PMID: 28900256 PMCID: PMC5595782 DOI: 10.1038/s41598-017-11807-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 08/30/2017] [Indexed: 12/30/2022] Open
Abstract
Cladophialophora immunda is an ascomycotal species belonging to the group of the black yeasts. These fungi have a thick and melanized cell wall and other physiological adaptations that allows them to cope with several extreme physical and chemical conditions. Member of the group can colonize some of the most extremophilic environments on Earth. Cladophialophora immunda together with a few other species of the order Chaetothyriales show a special association with hydrocarbon polluted environments. The finding that the fungus is able to completely mineralize toluene makes it an interesting candidate for bioremediation purposes. The present study is the first transcriptomic investigation of a fungus grown in presence of toluene as sole carbon and energy source. We could observe the activation of genes involved in toluene degradatation and several stress response mechanisms which allowed the fungus to survive the toluene exposure. The thorough comparative genomics analysis allowed us to identify several events of horizontal gene transfer between bacteria and Cladophialophora immunda and unveil toluene degradation steps that were previously reported in bacteria. The work presented here aims to give new insights into the ecology of Cladophialophora immunda and its adaptation strategies to hydrocarbon polluted environments.
Collapse
Affiliation(s)
- Barbara Blasi
- Department of Biotechnology, VIBT-EQ Extremophile Center, University of Natural Resources and Life Sciences, 1190, Vienna, Austria.
| | - Hakim Tafer
- Department of Biotechnology, VIBT-EQ Extremophile Center, University of Natural Resources and Life Sciences, 1190, Vienna, Austria
| | - Christina Kustor
- Department of Biotechnology, VIBT-EQ Extremophile Center, University of Natural Resources and Life Sciences, 1190, Vienna, Austria
| | - Caroline Poyntner
- Department of Biotechnology, VIBT-EQ Extremophile Center, University of Natural Resources and Life Sciences, 1190, Vienna, Austria
| | - Ksenija Lopandic
- Department of Biotechnology, VIBT-EQ Extremophile Center, University of Natural Resources and Life Sciences, 1190, Vienna, Austria
| | - Katja Sterflinger
- Department of Biotechnology, VIBT-EQ Extremophile Center, University of Natural Resources and Life Sciences, 1190, Vienna, Austria
| |
Collapse
|
77
|
Andergassen D, Dotter CP, Wenzel D, Sigl V, Bammer PC, Muckenhuber M, Mayer D, Kulinski TM, Theussl HC, Penninger JM, Bock C, Barlow DP, Pauler FM, Hudson QJ. Mapping the mouse Allelome reveals tissue-specific regulation of allelic expression. eLife 2017; 6. [PMID: 28806168 PMCID: PMC5555720 DOI: 10.7554/elife.25125] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Accepted: 06/14/2017] [Indexed: 01/02/2023] Open
Abstract
To determine the dynamics of allelic-specific expression during mouse development, we analyzed RNA-seq data from 23 F1 tissues from different developmental stages, including 19 female tissues allowing X chromosome inactivation (XCI) escapers to also be detected. We demonstrate that allelic expression arising from genetic or epigenetic differences is highly tissue-specific. We find that tissue-specific strain-biased gene expression may be regulated by tissue-specific enhancers or by post-transcriptional differences in stability between the alleles. We also find that escape from X-inactivation is tissue-specific, with leg muscle showing an unexpectedly high rate of XCI escapers. By surveying a range of tissues during development, and performing extensive validation, we are able to provide a high confidence list of mouse imprinted genes including 18 novel genes. This shows that cluster size varies dynamically during development and can be substantially larger than previously thought, with the Igf2r cluster extending over 10 Mb in placenta. DOI:http://dx.doi.org/10.7554/eLife.25125.001
Collapse
Affiliation(s)
- Daniel Andergassen
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Christoph P Dotter
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Daniel Wenzel
- IMBA, Institute of Molecular Biotechnology of the Austrian Academy of Sciences, Vienna, Austria
| | - Verena Sigl
- IMBA, Institute of Molecular Biotechnology of the Austrian Academy of Sciences, Vienna, Austria
| | - Philipp C Bammer
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Markus Muckenhuber
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Daniela Mayer
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Tomasz M Kulinski
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | | | - Josef M Penninger
- IMBA, Institute of Molecular Biotechnology of the Austrian Academy of Sciences, Vienna, Austria
| | - Christoph Bock
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Denise P Barlow
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Florian M Pauler
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Quanah J Hudson
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| |
Collapse
|
78
|
Friedman RC, Kalkhof S, Doppelt-Azeroual O, Mueller SA, Chovancová M, von Bergen M, Schwikowski B. Common and phylogenetically widespread coding for peptides by bacterial small RNAs. BMC Genomics 2017; 18:553. [PMID: 28732463 PMCID: PMC5521070 DOI: 10.1186/s12864-017-3932-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2016] [Accepted: 07/09/2017] [Indexed: 12/14/2022] Open
Abstract
Background While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. Methods Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in 14 phylogenetically diverse bacteria. Importantly, we quantify uncertainty in our predictions, and follow up on them using mass spectrometry proteomics and comparison to datasets including ribosome profiling. Results A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 409±191.7 unannotated sRNA ORFs are under selection to maintain coding (mean estimate and 95% confidence interval), an average of 29 per species considered here. This implies that overall at least 10.3±0.5% of sRNAs have a coding ORF, and in some species around 20% do. 165±69 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated in published ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are predicted novel components of type I toxin/antitoxin systems. Conclusions We predict over two dozen new protein-coding genes per bacterial species, but crucially also quantified the uncertainty in this estimate. Our predictions for sRNA coding ORFs, along with predicted novel type I toxins and tools for sorting and visualizing genomic context, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr. We expect these easily-accessible predictions to be a valuable tool for the study not only of bacterial sRNAs and type I toxin-antitoxin systems, but also of bacterial genetics and genomics. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3932-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Robin C Friedman
- Systems Biology Laboratory, Department of Genomes and Genetics, Institut Pasteur, Paris, France. .,Molecular Microbial Pathogenesis Unit, Department of Cell Biology and Infection, Institut Pasteur, Paris, France. .,Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France.
| | - Stefan Kalkhof
- Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany.,Current Address: Department of Bioanalytics, University of Applied Sciences and Arts of Coburg, Coburg, Germany
| | - Olivia Doppelt-Azeroual
- Bioinformatics and Biostatistics Hub, C3BI, USR 3756 IP CNRS, Institut Pasteur, Paris, France
| | - Stephan A Mueller
- Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany.,Current Address: Neuroproteomics, German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
| | - Martina Chovancová
- Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany.,Institute of Biochemistry, University of Leipzig, Leipzig, Germany
| | - Benno Schwikowski
- Systems Biology Laboratory, Department of Genomes and Genetics, Institut Pasteur, Paris, France.,Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France
| |
Collapse
|
79
|
Indrischek H, Prohaska SJ, Gurevich VV, Gurevich EV, Stadler PF. Uncovering missing pieces: duplication and deletion history of arrestins in deuterostomes. BMC Evol Biol 2017; 17:163. [PMID: 28683816 PMCID: PMC5501109 DOI: 10.1186/s12862-017-1001-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 06/19/2017] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The cytosolic arrestin proteins mediate desensitization of activated G protein-coupled receptors (GPCRs) via competition with G proteins for the active phosphorylated receptors. Arrestins in active, including receptor-bound, conformation are also transducers of signaling. Therefore, this protein family is an attractive therapeutic target. The signaling outcome is believed to be a result of structural and sequence-dependent interactions of arrestins with GPCRs and other protein partners. Here we elucidated the detailed evolution of arrestins in deuterostomes. RESULTS Identity and number of arrestin paralogs were determined searching deuterostome genomes and gene expression data. In contrast to standard gene prediction methods, our strategy first detects exons situated on different scaffolds and then solves the problem of assigning them to the correct gene. This increases both the completeness and the accuracy of the annotation in comparison to conventional database search strategies applied by the community. The employed strategy enabled us to map in detail the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes. The two rounds of whole genome duplications in the vertebrate stem lineage gave rise to four arrestin paralogs. Surprisingly, visual arrestin ARR3 was lost in the mammalian clades Afrotheria and Xenarthra. Duplications in specific clades, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing. CONCLUSION The current study traces the functional evolution of deuterostome arrestins in unprecedented detail. Based on a precise re-annotation of the exon-intron structure at nucleotide resolution, we infer the gain and loss of paralogs and patterns of conservation, co-variation and selection.
Collapse
Affiliation(s)
- Henrike Indrischek
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
| | - Vsevolod V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Eugenia V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany
- Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090, Austria
- Center for non-coding RNA in Technology and Health, Grønegårdsvej 3, Frederiksberg C, DK-1870, Denmark
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| |
Collapse
|
80
|
Grüning BA, Fallmann J, Yusuf D, Will S, Erxleben A, Eggenhofer F, Houwaart T, Batut B, Videm P, Bagnacani A, Wolfien M, Lott SC, Hoogstrate Y, Hess WR, Wolkenhauer O, Hoffmann S, Akalin A, Ohler U, Stadler PF, Backofen R. The RNA workbench: best practices for RNA and high-throughput sequencing bioinformatics in Galaxy. Nucleic Acids Res 2017; 45:W560-W566. [PMID: 28582575 PMCID: PMC5570170 DOI: 10.1093/nar/gkx409] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Revised: 04/13/2017] [Accepted: 05/31/2017] [Indexed: 01/23/2023] Open
Abstract
RNA-based regulation has become a major research topic in molecular biology. The analysis of epigenetic and expression data is therefore incomplete if RNA-based regulation is not taken into account. Thus, it is increasingly important but not yet standard to combine RNA-centric data and analysis tools with other types of experimental data such as RNA-seq or ChIP-seq. Here, we present the RNA workbench, a comprehensive set of analysis tools and consolidated workflows that enable the researcher to combine these two worlds. Based on the Galaxy framework the workbench guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses that are independent of command-line knowledge. Currently, it includes more than 50 bioinformatics tools that are dedicated to different research areas of RNA biology including RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. The workbench is developed and maintained by experts in RNA bioinformatics and the Galaxy framework. Together with the growing community evolving around this workbench, we are committed to keep the workbench up-to-date for future standards and needs, providing researchers with a reliable and robust framework for RNA data analysis. AVAILABILITY The RNA workbench is available at https://github.com/bgruening/galaxy-rna-workbench.
Collapse
Affiliation(s)
- Björn A. Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
- Center for Biological Systems Analysis (ZBSA), University of Freiburg, Habsburgerstr. 49, D-79104 Freiburg, Germany
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107 Leipzig, Germany
| | - Dilmurat Yusuf
- Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Robert-Rössle-Str. 10, D-13125, Berlin, Germany
| | - Sebastian Will
- Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Vienna, Austria
| | - Anika Erxleben
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
| | - Torsten Houwaart
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
| | - Pavankumar Videm
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
| | - Andrea Bagnacani
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstr. 69, D-18051 Rostock, Germany
| | - Markus Wolfien
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstr. 69, D-18051 Rostock, Germany
| | - Steffen C. Lott
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Youri Hoogstrate
- Department of Urology, Erasmus University Medical Center, Wytemaweg 80, 3015 CN Rotterdam, Netherlands
| | - Wolfgang R. Hess
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstr. 69, D-18051 Rostock, Germany
| | - Steve Hoffmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107 Leipzig, Germany
| | - Altuna Akalin
- Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Robert-Rössle-Str. 10, D-13125, Berlin, Germany
| | - Uwe Ohler
- Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Robert-Rössle-Str. 10, D-13125, Berlin, Germany
- Departments of Biology and Computer Science, Humboldt University, Unter den Linden 6, D-10099 Berlin
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Vienna, Austria
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
- Center for Biological Systems Analysis (ZBSA), University of Freiburg, Habsburgerstr. 49, D-79104 Freiburg, Germany
- BIOSS Centre for Biological Signaling Studies, University of Freiburg, Schänzlestr. 18, D-79104 Freiburg, Germany
| |
Collapse
|
81
|
Chwalenia K, Facemire L, Li H. Chimeric RNAs in cancer and normal physiology. WILEY INTERDISCIPLINARY REVIEWS-RNA 2017; 8. [DOI: 10.1002/wrna.1427] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 04/27/2017] [Accepted: 04/28/2017] [Indexed: 12/20/2022]
Affiliation(s)
- Katarzyna Chwalenia
- Department of Pathology, School of Medicine; University of Virginia; Charlottesville VA USA
| | - Loryn Facemire
- Department of Pathology, School of Medicine; University of Virginia; Charlottesville VA USA
| | - Hui Li
- Department of Pathology, School of Medicine; University of Virginia; Charlottesville VA USA
- Department of Biochemistry and Molecular Genetics, School of Medicine; University of Virginia; Charlottesville VA USA
| |
Collapse
|
82
|
Nelson ADL, Devisetty UK, Palos K, Haug-Baltzell AK, Lyons E, Beilstein MA. Evolinc: A Tool for the Identification and Evolutionary Comparison of Long Intergenic Non-coding RNAs. Front Genet 2017; 8:52. [PMID: 28536600 PMCID: PMC5422434 DOI: 10.3389/fgene.2017.00052] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 04/12/2017] [Indexed: 11/25/2022] Open
Abstract
Long intergenic non-coding RNAs (lincRNAs) are an abundant and functionally diverse class of eukaryotic transcripts. Reported lincRNA repertoires in mammals vary, but are commonly in the thousands to tens of thousands of transcripts, covering ~90% of the genome. In addition to elucidating function, there is particular interest in understanding the origin and evolution of lincRNAs. Aside from mammals, lincRNA populations have been sparsely sampled, precluding evolutionary analyses focused on their emergence and persistence. Here we present Evolinc, a two-module pipeline designed to facilitate lincRNA discovery and characterize aspects of lincRNA evolution. The first module (Evolinc-I) is a lincRNA identification workflow that also facilitates downstream differential expression analysis and genome browser visualization of identified lincRNAs. The second module (Evolinc-II) is a genomic and transcriptomic comparative analysis workflow that determines the phylogenetic depth to which a lincRNA locus is conserved within a user-defined group of related species. Here we validate lincRNA catalogs generated with Evolinc-I against previously annotated Arabidopsis and human lincRNA data. Evolinc-I recapitulated earlier findings and uncovered an additional 70 Arabidopsis and 43 human lincRNAs. We demonstrate the usefulness of Evolinc-II by examining the evolutionary histories of a public dataset of 5,361 Arabidopsis lincRNAs. We used Evolinc-II to winnow this dataset to 40 lincRNAs conserved across species in Brassicaceae. Finally, we show how Evolinc-II can be used to recover the evolutionary history of a known lincRNA, the human telomerase RNA (TERC). These latter analyses revealed unexpected duplication events as well as the loss and subsequent acquisition of a novel TERC locus in the lineage leading to mice and rats. The Evolinc pipeline is currently integrated in CyVerse's Discovery Environment and is free for use by researchers.
Collapse
Affiliation(s)
- Andrew D L Nelson
- Beilstein Lab, School of Plant Sciences, University of ArizonaTucson, AZ, USA
| | | | - Kyle Palos
- Beilstein Lab, School of Plant Sciences, University of ArizonaTucson, AZ, USA
| | - Asher K Haug-Baltzell
- Lyons Lab, Genetics Graduate Interdisciplinary Group, University of ArizonaTucson, AZ, USA
| | - Eric Lyons
- CyVerse, Bio5, University of ArizonaTucson, AZ, USA.,Lyons Lab, Genetics Graduate Interdisciplinary Group, University of ArizonaTucson, AZ, USA
| | - Mark A Beilstein
- Beilstein Lab, School of Plant Sciences, University of ArizonaTucson, AZ, USA
| |
Collapse
|
83
|
Ventola GMM, Noviello TMR, D'Aniello S, Spagnuolo A, Ceccarelli M, Cerulo L. Identification of long non-coding transcripts with feature selection: a comparative study. BMC Bioinformatics 2017; 18:187. [PMID: 28335739 PMCID: PMC5364679 DOI: 10.1186/s12859-017-1594-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 03/10/2017] [Indexed: 01/15/2023] Open
Abstract
Background The unveiling of long non-coding RNAs as important gene regulators in many biological contexts has increased the demand for efficient and robust computational methods to identify novel long non-coding RNAs from transcripts assembled with high throughput RNA-seq data. Several classes of sequence-based features have been proposed to distinguish between coding and non-coding transcripts. Among them, open reading frame, conservation scores, nucleotide arrangements, and RNA secondary structure have been used with success in literature to recognize intergenic long non-coding RNAs, a particular subclass of non-coding RNAs. Results In this paper we perform a systematic assessment of a wide collection of features extracted from sequence data. We use most of the features proposed in the literature, and we include, as a novel set of features, the occurrence of repeats contained in transposable elements. The aim is to detect signatures (groups of features) able to distinguish long non-coding transcripts from other classes, both protein-coding and non-coding. We evaluate different feature selection algorithms, test for signature stability, and evaluate the prediction ability of a signature with a machine learning algorithm. The study reveals different signatures in human, mouse, and zebrafish, highlighting that some features are shared among species, while others tend to be species-specific. Compared to coding potential tools and similar supervised approaches, including novel signatures, such as those identified here, in a machine learning algorithm improves the prediction performance, in terms of area under precision and recall curve, by 1 to 24%, depending on the species and on the signature. Conclusions Understanding which features are best suited for the prediction of long non-coding RNAs allows for the development of more effective automatic annotation pipelines especially relevant for poorly annotated genomes, such as zebrafish. We provide a web tool that recognizes novel long non-coding RNAs with the obtained signatures from fasta and gtf formats. The tool is available at the following url: http://www.bioinformatics-sannio.org/software/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1594-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Giovanna M M Ventola
- Department of Science and Technology, University of Sannio, via Port'Arsa, 11, Benevento, 82100, Italy.,BioGeM, Institute of Genetic Research "Gaetano Salvatore", c.da Camporeale, Ariano Irpino (AV), 83031, Italy
| | - Teresa M R Noviello
- Department of Science and Technology, University of Sannio, via Port'Arsa, 11, Benevento, 82100, Italy.,BioGeM, Institute of Genetic Research "Gaetano Salvatore", c.da Camporeale, Ariano Irpino (AV), 83031, Italy
| | - Salvatore D'Aniello
- Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, 80121, Italy
| | - Antonietta Spagnuolo
- Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, 80121, Italy
| | - Michele Ceccarelli
- Department of Science and Technology, University of Sannio, via Port'Arsa, 11, Benevento, 82100, Italy
| | - Luigi Cerulo
- Department of Science and Technology, University of Sannio, via Port'Arsa, 11, Benevento, 82100, Italy. .,BioGeM, Institute of Genetic Research "Gaetano Salvatore", c.da Camporeale, Ariano Irpino (AV), 83031, Italy.
| |
Collapse
|
84
|
An atlas of human long non-coding RNAs with accurate 5' ends. Nature 2017; 543:199-204. [PMID: 28241135 DOI: 10.1038/nature21374] [Citation(s) in RCA: 729] [Impact Index Per Article: 91.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Accepted: 01/08/2017] [Indexed: 12/15/2022]
Abstract
Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.
Collapse
|
85
|
Neuhaus K, Landstorfer R, Simon S, Schober S, Wright PR, Smith C, Backofen R, Wecko R, Keim DA, Scherer S. Differentiation of ncRNAs from small mRNAs in Escherichia coli O157:H7 EDL933 (EHEC) by combined RNAseq and RIBOseq - ryhB encodes the regulatory RNA RyhB and a peptide, RyhP. BMC Genomics 2017; 18:216. [PMID: 28245801 PMCID: PMC5331693 DOI: 10.1186/s12864-017-3586-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2016] [Accepted: 02/13/2017] [Indexed: 12/14/2022] Open
Abstract
Background While NGS allows rapid global detection of transcripts, it remains difficult to distinguish ncRNAs from short mRNAs. To detect potentially translated RNAs, we developed an improved protocol for bacterial ribosomal footprinting (RIBOseq). This allowed distinguishing ncRNA from mRNA in EHEC. A high ratio of ribosomal footprints per transcript (ribosomal coverage value, RCV) is expected to indicate a translated RNA, while a low RCV should point to a non-translated RNA. Results Based on their low RCV, 150 novel non-translated EHEC transcripts were identified as putative ncRNAs, representing both antisense and intergenic transcripts, 74 of which had expressed homologs in E. coli MG1655. Bioinformatics analysis predicted statistically significant target regulons for 15 of the intergenic transcripts; experimental analysis revealed 4-fold or higher differential expression of 46 novel ncRNA in different growth media. Out of 329 annotated EHEC ncRNAs, 52 showed an RCV similar to protein-coding genes, of those, 16 had RIBOseq patterns matching annotated genes in other enterobacteriaceae, and 11 seem to possess a Shine-Dalgarno sequence, suggesting that such ncRNAs may encode small proteins instead of being solely non-coding. To support that the RIBOseq signals are reflecting translation, we tested the ribosomal-footprint covered ORF of ryhB and found a phenotype for the encoded peptide in iron-limiting condition. Conclusion Determination of the RCV is a useful approach for a rapid first-step differentiation between bacterial ncRNAs and small mRNAs. Further, many known ncRNAs may encode proteins as well. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3586-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Klaus Neuhaus
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany. .,Core Facility Microbiome/NGS, ZIEL Institute for Food & Health, Weihenstephaner Berg 3, D-85354, Freising, Germany.
| | - Richard Landstorfer
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany
| | - Svenja Simon
- Informatik und Informationswissenschaft, Universität Konstanz, D-78457, Konstanz, Germany
| | - Steffen Schober
- Institut für Nachrichtentechnik, Universität Ulm, Albert-Einstein-Allee 43, D-89081, Ulm, Germany
| | - Patrick R Wright
- Bioinformatics Group, Department of Computer Science and BIOSS Centre for Biological Signaling Studies, Cluster of Excellence, University of Freiburg, D-79110, Freiburg, Germany
| | - Cameron Smith
- Bioinformatics Group, Department of Computer Science and BIOSS Centre for Biological Signaling Studies, Cluster of Excellence, University of Freiburg, D-79110, Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science and BIOSS Centre for Biological Signaling Studies, Cluster of Excellence, University of Freiburg, D-79110, Freiburg, Germany
| | - Romy Wecko
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany
| | - Daniel A Keim
- Informatik und Informationswissenschaft, Universität Konstanz, D-78457, Konstanz, Germany
| | - Siegfried Scherer
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany
| |
Collapse
|
86
|
Wu T, Du Y. LncRNAs: From Basic Research to Medical Application. Int J Biol Sci 2017; 13:295-307. [PMID: 28367094 PMCID: PMC5370437 DOI: 10.7150/ijbs.16968] [Citation(s) in RCA: 107] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2016] [Accepted: 11/02/2016] [Indexed: 01/17/2023] Open
Abstract
This review aimed to summarize the current research contents about long noncoding RNAs (lncRNAs) and some related lncRNAs as molecular biomarkers or therapy strategies in human cancer and cardiovascular diseases. Following the development of various kinds of sequencing technologies, lncRNAs have become one of the most unknown areas that need to be explored. First, the definition and classification of lncRNAs were constantly amended and supplemented because of their complexity and diversity. Second, several methods and strategies have been developed to study the characteristic of lncRNAs, including new species identifications, subcellular localization, gain or loss of function, molecular interaction, and bioinformatics analysis. Third, based on the present results from basic researches, the working mechanisms of lncRNAs were proved to be different forms of interactions involving DNAs, RNAs, and proteins. Fourth, lncRNA can play different important roles during the embryogenesis and organ differentiations. Finally, because of the tissue-specific expression of lncRNAs, they could be used as biomarkers or therapy targets and effectively applied in different kinds of diseases, such as human cancer and cardiovascular diseases.
Collapse
Affiliation(s)
- Tao Wu
- Cardiovascular Department, The Affiliated Hospital of Medical College, Ningbo University, No.247, Renmin Road, Jiangbei District, Ningbo, China
| | - Yantao Du
- Ningbo Institute of Medical Science, No.42-46, Yangshan Road, Jiangbei District, Ningbo, China
| |
Collapse
|
87
|
Baumgartner D, Kopf M, Klähn S, Steglich C, Hess WR. Small proteins in cyanobacteria provide a paradigm for the functional analysis of the bacterial micro-proteome. BMC Microbiol 2016; 16:285. [PMID: 27894276 PMCID: PMC5126843 DOI: 10.1186/s12866-016-0896-z] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 11/14/2016] [Indexed: 12/21/2022] Open
Abstract
Background Despite their versatile functions in multimeric protein complexes, in the modification of enzymatic activities, intercellular communication or regulatory processes, proteins shorter than 80 amino acids (μ-proteins) are a systematically underestimated class of gene products in bacteria. Photosynthetic cyanobacteria provide a paradigm for small protein functions due to extensive work on the photosynthetic apparatus that led to the functional characterization of 19 small proteins of less than 50 amino acids. In analogy, previously unstudied small ORFs with similar degrees of conservation might encode small proteins of high relevance also in other functional contexts. Results Here we used comparative transcriptomic information available for two model cyanobacteria, Synechocystis sp. PCC 6803 and Synechocystis sp. PCC 6714 for the prediction of small ORFs. We found 293 transcriptional units containing candidate small ORFs ≤80 codons in Synechocystis sp. PCC 6803, also including the known mRNAs encoding small proteins of the photosynthetic apparatus. From these transcriptional units, 146 are shared between the two strains, 42 are shared with the higher plant Arabidopsis thaliana and 25 with E. coli. To verify the existence of the respective μ-proteins in vivo, we selected five genes as examples to which a FLAG tag sequence was added and re-introduced them into Synechocystis sp. PCC 6803. These were the previously annotated gene ssr1169, two newly defined genes norf1 and norf4, as well as nsiR6(nitrogen stress-induced RNA 6) and hliR1(high light-inducible RNA 1) , which originally were considered non-coding. Upon activation of expression via the Cu2+.responsive petE promoter or from the native promoters, all five proteins were detected in Western blot experiments. Conclusions The distribution and conservation of these five genes as well as their regulation of expression and the physico-chemical properties of the encoded proteins underline the likely great bandwidth of small protein functions in bacteria and makes them attractive candidates for functional studies.
Collapse
Affiliation(s)
- Desiree Baumgartner
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, D-79104, Freiburg, Germany
| | - Matthias Kopf
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, D-79104, Freiburg, Germany.,Present Address: Molecular Health GmbH, Kurfürsten-Anlage 21, 69115, Heidelberg, Germany
| | - Stephan Klähn
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, D-79104, Freiburg, Germany
| | - Claudia Steglich
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, D-79104, Freiburg, Germany
| | - Wolfgang R Hess
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, D-79104, Freiburg, Germany.
| |
Collapse
|
88
|
Poyntner C, Blasi B, Arcalis E, Mirastschijski U, Sterflinger K, Tafer H. The Transcriptome of Exophiala dermatitidis during Ex-vivo Skin Model Infection. Front Cell Infect Microbiol 2016; 6:136. [PMID: 27822460 PMCID: PMC5075926 DOI: 10.3389/fcimb.2016.00136] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 10/06/2016] [Indexed: 12/12/2022] Open
Abstract
The black yeast Exophiala dermatitidis is a widespread polyextremophile and human pathogen, that is found in extreme natural habitats and man-made environments such as dishwashers. It can cause various diseases ranging from phaeohyphomycosis and systemic infections, with fatality rates reaching 40%. While the number of cases in immunocompromised patients are increasing, knowledge of the infections, virulence factors and host response is still scarce. In this study, for the first time, an artificial infection of an ex-vivo skin model with Exophiala dermatitidis was monitored microscopically and transcriptomically. Results show that Exophiala dermatitidis is able to actively grow and penetrate the skin. The analysis of the genomic and RNA-sequencing data delivers a rich and complex transcriptome where circular RNAs, fusion transcripts, long non-coding RNAs and antisense transcripts are found. Changes in transcription strongly affect pathways related to nutrients acquisition, energy metabolism, cell wall, morphological switch, and known virulence factors. The L-Tyrosine melanin pathway is specifically upregulated during infection. Moreover the production of secondary metabolites, especially alkaloids, is increased. Our study is the first that gives an insight into the complexity of the transcriptome of Exophiala dermatitidis during artificial skin infections and reveals new virulence factors.
Collapse
Affiliation(s)
- Caroline Poyntner
- Department of Biotechnology, VIBT EQ Extremophile Center, University of Natural Resources and Life Sciences Vienna, Austria
| | - Barbara Blasi
- Department of Biotechnology, VIBT EQ Extremophile Center, University of Natural Resources and Life Sciences Vienna, Austria
| | - Elsa Arcalis
- Department for Applied Genetics and Cell Biology, Molecular Plant Physiology and Crop Biotechnology, University of Natural Resources and Life Sciences Vienna, Austria
| | - Ursula Mirastschijski
- Klinikum Bremen-Mitte, Department of Plastic, Reconstructive and Aesthetic Surgery, Faculty of Biology and Chemistry, Center for Biomolecular Interactions Bremen, University Bremen Bremen, Germany
| | - Katja Sterflinger
- Department of Biotechnology, VIBT EQ Extremophile Center, University of Natural Resources and Life Sciences Vienna, Austria
| | - Hakim Tafer
- Department of Biotechnology, VIBT EQ Extremophile Center, University of Natural Resources and Life Sciences Vienna, Austria
| |
Collapse
|
89
|
Eggenhofer F, Hofacker IL, Höner Zu Siederdissen C. RNAlien - Unsupervised RNA family model construction. Nucleic Acids Res 2016; 44:8433-41. [PMID: 27330139 PMCID: PMC5041467 DOI: 10.1093/nar/gkw558] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Revised: 06/06/2016] [Accepted: 06/08/2016] [Indexed: 02/06/2023] Open
Abstract
Determining the function of a non-coding RNA requires costly and time-consuming wet-lab experiments. For this reason, computational methods which ascertain the homology of a sequence and thereby deduce functionality and family membership are often exploited. In this fashion, newly sequenced genomes can be annotated in a completely computational way. Covariance models are commonly used to assign novel RNA sequences to a known RNA family. However, to construct such models several examples of the family have to be already known. Moreover, model building is the work of experts who manually edit the necessary RNA alignment and consensus structure. Our method, RNAlien, starting from a single input sequence collects potential family member sequences by multiple iterations of homology search. RNA family models are fully automatically constructed for the found sequences. We have tested our method on a subset of the Rfam RNA family database. RNAlien models are a starting point to construct models of comparable sensitivity and specificity to manually curated ones from the Rfam database. RNAlien Tool and web server are available at http://rna.tbi.univie.ac.at/rnalien/.
Collapse
Affiliation(s)
- Florian Eggenhofer
- Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Vienna, Austria Bioinformatics Group, Department of Computer Science University of Freiburg, Georges-Köhler-Allee, 79110 Freiburg, Germany
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Vienna, Austria Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, A-1090 Vienna, Austria
| | - Christian Höner Zu Siederdissen
- Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Vienna, Austria Bioinformatics Group, Department of Computer Science, University of Leipzig, D-04107 Leipzig, Germany Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
| |
Collapse
|
90
|
A Genomic Analysis of Factors Driving lincRNA Diversification: Lessons from Plants. G3-GENES GENOMES GENETICS 2016; 6:2881-91. [PMID: 27440919 PMCID: PMC5015945 DOI: 10.1534/g3.116.030338] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Transcriptomic analyses from across eukaryotes indicate that most of the genome is transcribed at some point in the developmental trajectory of an organism. One class of these transcripts is termed long intergenic noncoding RNAs (lincRNAs). Recently, attention has focused on understanding the evolutionary dynamics of lincRNAs, particularly their conservation within genomes. Here, we take a comparative genomic and phylogenetic approach to uncover factors influencing lincRNA emergence and persistence in the plant family Brassicaceae, to which Arabidopsis thaliana belongs. We searched 10 genomes across the family for evidence of > 5000 lincRNA loci from A. thaliana. From loci conserved in the genomes of multiple species, we built alignments and inferred phylogeny. We then used gene tree/species tree reconciliation to examine the duplication history and timing of emergence of these loci. Emergence of lincRNA loci appears to be linked to local duplication events, but, surprisingly, not whole genome duplication events (WGD), or transposable elements. Interestingly, WGD events are associated with the loss of loci for species having undergone relatively recent polyploidy. Lastly, we identify 1180 loci of the 6480 previously annotated A. thaliana lincRNAs (18%) with elevated levels of conservation. These conserved lincRNAs show higher expression, and are enriched for stress-responsiveness and cis-regulatory motifs known as conserved noncoding sequences (CNSs). These data highlight potential functional pathways and suggest that CNSs may regulate neighboring genes at both the genomic and transcriptomic level. In sum, we provide insight into processes that may influence lincRNA diversification by providing an evolutionary context for previously annotated lincRNAs.
Collapse
|
91
|
Hu L, Xu Z, Hu B, Lu ZJ. COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features. Nucleic Acids Res 2016; 45:e2. [PMID: 27608726 PMCID: PMC5224497 DOI: 10.1093/nar/gkw798] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Revised: 08/25/2016] [Accepted: 08/31/2016] [Indexed: 12/31/2022] Open
Abstract
Recent genomic studies suggest that novel long non-coding RNAs (lncRNAs) are specifically expressed and far outnumber annotated lncRNA sequences. To identify and characterize novel lncRNAs in RNA sequencing data from new samples, we have developed COME, a coding potential calculation tool based on multiple features. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes it more accurate and robust than other well-known tools. We also showed that COME was able to substantially improve the consistency of predication results from other coding potential calculators. Moreover, COME annotates and characterizes each predicted lncRNA transcript with multiple lines of supporting evidence, which are not provided by other tools. Remarkably, we found that one subgroup of lncRNAs classified by such supporting features (i.e. conserved local RNA secondary structure) was highly enriched in a well-validated database (lncRNAdb). We further found that the conserved structural domains on lncRNAs had better chance than other RNA regions to interact with RNA binding proteins, based on the recent eCLIP-seq data in human, indicating their potential regulatory roles. Overall, we present COME as an accurate, robust and multiple-feature supported method for the identification and characterization of novel lncRNAs. The software implementation is available at https://github.com/lulab/COME.
Collapse
Affiliation(s)
- Long Hu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PKU-Tsinghua-NIBS Graduate Program, School of Life Sciences, Peking University, Beijing 100871, China
| | - Zhiyu Xu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Boqin Hu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Zhi John Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
92
|
Nyberg KG, Machado CA. Comparative Expression Dynamics of Intergenic Long Noncoding RNAs in the Genus Drosophila. Genome Biol Evol 2016; 8:1839-58. [PMID: 27189981 PMCID: PMC4943187 DOI: 10.1093/gbe/evw116] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Thousands of long noncoding RNAs (lncRNAs) have been annotated in eukaryotic genomes, but comparative transcriptomic approaches are necessary to understand their biological impact and evolution. To facilitate such comparative studies in Drosophila, we identified and characterized lncRNAs in a second Drosophilid—the evolutionary model Drosophila pseudoobscura. Using RNA-Seq and computational filtering of protein-coding potential, we identified 1,589 intergenic lncRNA loci in D. pseudoobscura. We surveyed multiple sex-specific developmental stages and found, like in Drosophila melanogaster, increasingly prolific lncRNA expression through male development and an overrepresentation of lncRNAs in the testes. Other trends seen in D. melanogaster, like reduced pupal expression, were not observed. Nonrandom distributions of female-biased and non-testis-specific male-biased lncRNAs between the X chromosome and autosomes are consistent with selection-based models of gene trafficking to optimize genomic location of sex-biased genes. The numerous testis-specific lncRNAs, however, are randomly distributed between the X and autosomes, and we cannot reject the hypothesis that many of these are likely to be spurious transcripts. Finally, using annotated lncRNAs in both species, we identified 134 putative lncRNA homologs between D. pseudoobscura and D. melanogaster and find that many have conserved developmental expression dynamics, making them ideal candidates for future functional analyses.
Collapse
Affiliation(s)
- Kevin G Nyberg
- Department of Biology, University of Maryland, College Park
| | | |
Collapse
|
93
|
Sharma V, Elghafari A, Hiller M. Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation. Nucleic Acids Res 2016; 44:e103. [PMID: 27016733 PMCID: PMC4914097 DOI: 10.1093/nar/gkw210] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Revised: 03/04/2016] [Accepted: 03/18/2016] [Indexed: 12/03/2022] Open
Abstract
Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes.
Collapse
Affiliation(s)
- Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresden, Germany
| | - Anas Elghafari
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresden, Germany Technical University, 01069 Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresden, Germany
| |
Collapse
|
94
|
Kita Y, Yonemori K, Osako Y, Baba K, Mori S, Maemura K, Natsugoe S. Noncoding RNA and colorectal cancer: its epigenetic role. J Hum Genet 2016; 62:41-47. [PMID: 27278790 DOI: 10.1038/jhg.2016.66] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 05/02/2016] [Accepted: 05/11/2016] [Indexed: 12/15/2022]
Abstract
The use of novel sequencing and high-throughput techniques has become widespread, and are now readily available to obtain the comprehensive transcription profile of the human genome. Noncoding RNAs (ncRNAs) are transcripts that have no apparent protein-coding capacity, but they have important roles in human physiology. Most research in this area has focused on micro-RNAs. However, the role of long ncRNAs (lncRNAs) as drivers of tumor suppression and oncogenic functions has recently been examined in numerous cancer types. Epigenetic alterations can reportedly deregulate the expression of any type of transcript. However, the exact mechanisms of epigenetic regulation of lncRNA are still unknown. In this review, the authors primarily focus on the epigenetic effects modulating ncRNA in colorectal cancer (CRC). The authors specifically discuss examples of oncogenic ncRNA in CRC pathobiology, as well as its extended diagnosis, prognosis and therapy.
Collapse
Affiliation(s)
- Yoshiaki Kita
- Department of Digestive Surgery, Breast and Thyroid Surgery, Graduate School of Medicine, Kagoshima University, Kagoshima, Japan
| | - Keiichi Yonemori
- Department of Digestive Surgery, Breast and Thyroid Surgery, Graduate School of Medicine, Kagoshima University, Kagoshima, Japan
| | - Yusaku Osako
- Department of Digestive Surgery, Breast and Thyroid Surgery, Graduate School of Medicine, Kagoshima University, Kagoshima, Japan
| | - Kenji Baba
- Department of Digestive Surgery, Breast and Thyroid Surgery, Graduate School of Medicine, Kagoshima University, Kagoshima, Japan
| | - Shinichiro Mori
- Department of Digestive Surgery, Breast and Thyroid Surgery, Graduate School of Medicine, Kagoshima University, Kagoshima, Japan
| | - Kosei Maemura
- Department of Digestive Surgery, Breast and Thyroid Surgery, Graduate School of Medicine, Kagoshima University, Kagoshima, Japan
| | - Shoji Natsugoe
- Department of Digestive Surgery, Breast and Thyroid Surgery, Graduate School of Medicine, Kagoshima University, Kagoshima, Japan
| |
Collapse
|
95
|
Pian C, Zhang G, Chen Z, Chen Y, Zhang J, Yang T, Zhang L. LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature. PLoS One 2016; 11:e0154567. [PMID: 27228152 PMCID: PMC4882039 DOI: 10.1371/journal.pone.0154567] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 04/15/2016] [Indexed: 12/31/2022] Open
Abstract
As a novel class of noncoding RNAs, long noncoding RNAs (lncRNAs) have been verified to be associated with various diseases. As large scale transcripts are generated every year, it is significant to accurately and quickly identify lncRNAs from thousands of assembled transcripts. To accurately discover new lncRNAs, we develop a classification tool of random forest (RF) named LncRNApred based on a new hybrid feature. This hybrid feature set includes three new proposed features, which are MaxORF, RMaxORF and SNR. LncRNApred is effective for classifying lncRNAs and protein coding transcripts accurately and quickly. Moreover,our RF model only requests the training using data on human coding and non-coding transcripts. Other species can also be predicted by using LncRNApred. The result shows that our method is more effective compared with the Coding Potential Calculate (CPC). The web server of LncRNApred is available for free at http://mm20132014.wicp.net:57203/LncRNApred/home.jsp.
Collapse
Affiliation(s)
- Cong Pian
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Guangle Zhang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Zhi Chen
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Yuanyuan Chen
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Jin Zhang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Tao Yang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| | - Liangyun Zhang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, Jiangsu, People’s Republic of China
| |
Collapse
|
96
|
Rutenberg-Schoenberg M, Sexton AN, Simon MD. The Properties of Long Noncoding RNAs That Regulate Chromatin. Annu Rev Genomics Hum Genet 2016; 17:69-94. [PMID: 27147088 DOI: 10.1146/annurev-genom-090314-024939] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Beyond coding for proteins, RNA molecules have well-established functions in the posttranscriptional regulation of gene expression. Less clear are the upstream roles of RNA in regulating transcription and chromatin-based processes in the nucleus. RNA is transcribed in the nucleus, so it is logical that RNA could play diverse and broad roles that would impact human physiology. Indeed, this idea is supported by well-established examples of noncoding RNAs that affect chromatin structure and function. There has been dramatic growth in studies focused on the nuclear roles of long noncoding RNAs (lncRNAs). Although little is known about the biochemical mechanisms of these lncRNAs, there is a developing consensus regarding the challenges of defining lncRNA function and mechanism. In this review, we examine the definition, discovery, functions, and mechanisms of lncRNAs. We emphasize areas where challenges remain and where consensus among laboratories has underscored the exciting ways in which human lncRNAs may affect chromatin biology.
Collapse
Affiliation(s)
- Michael Rutenberg-Schoenberg
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06511; , , .,Chemical Biology Institute, Yale University, West Haven, Connecticut 06516
| | - Alec N Sexton
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06511; , , .,Chemical Biology Institute, Yale University, West Haven, Connecticut 06516
| | - Matthew D Simon
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06511; , , .,Chemical Biology Institute, Yale University, West Haven, Connecticut 06516
| |
Collapse
|
97
|
Chen J, Shishkin AA, Zhu X, Kadri S, Maza I, Guttman M, Hanna JH, Regev A, Garber M. Evolutionary analysis across mammals reveals distinct classes of long non-coding RNAs. Genome Biol 2016; 17:19. [PMID: 26838501 PMCID: PMC4739325 DOI: 10.1186/s13059-016-0880-9] [Citation(s) in RCA: 124] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 01/14/2016] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Recent advances in transcriptome sequencing have enabled the discovery of thousands of long non-coding RNAs (lncRNAs) across many species. Though several lncRNAs have been shown to play important roles in diverse biological processes, the functions and mechanisms of most lncRNAs remain unknown. Two significant obstacles lie between transcriptome sequencing and functional characterization of lncRNAs: identifying truly non-coding genes from de novo reconstructed transcriptomes, and prioritizing the hundreds of resulting putative lncRNAs for downstream experimental interrogation. RESULTS We present slncky, a lncRNA discovery tool that produces a high-quality set of lncRNAs from RNA-sequencing data and further uses evolutionary constraint to prioritize lncRNAs that are likely to be functionally important. Our automated filtering pipeline is comparable to manual curation efforts and more sensitive than previously published computational approaches. Furthermore, we developed a sensitive alignment pipeline for aligning lncRNA loci and propose new evolutionary metrics relevant for analyzing sequence and transcript evolution. Our analysis reveals that evolutionary selection acts in several distinct patterns, and uncovers two notable classes of intergenic lncRNAs: one showing strong purifying selection on RNA sequence and another where constraint is restricted to the regulation but not the sequence of the transcript. CONCLUSION Our results highlight that lncRNAs are not a homogenous class of molecules but rather a mixture of multiple functional classes with distinct biological mechanism and/or roles. Our novel comparative methods for lncRNAs reveals 233 constrained lncRNAs out of tens of thousands of currently annotated transcripts, which we make available through the slncky Evolution Browser.
Collapse
Affiliation(s)
- Jenny Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.,Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, MA, 02140, USA
| | - Alexander A Shishkin
- Division of Biology and Biological Engineering, California Institute of Technology, Cambridge, MA, 02140, USA
| | - Xiaopeng Zhu
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, 01655, USA
| | - Sabah Kadri
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Itay Maza
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Mitchell Guttman
- Division of Biology and Biological Engineering, California Institute of Technology, Cambridge, MA, 02140, USA
| | - Jacob H Hanna
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.,Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02140, USA
| | - Manuel Garber
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, 01655, USA. .,Program in Molecular Biology, University of Massachusetts Medical School, Worcester, MA, 01655, USA.
| |
Collapse
|
98
|
Brenes-Álvarez M, Olmedo-Verd E, Vioque A, Muro-Pastor AM. Identification of Conserved and Potentially Regulatory Small RNAs in Heterocystous Cyanobacteria. Front Microbiol 2016; 7:48. [PMID: 26870012 PMCID: PMC4734099 DOI: 10.3389/fmicb.2016.00048] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Accepted: 01/12/2016] [Indexed: 12/13/2022] Open
Abstract
Small RNAs (sRNAs) are a growing class of non-protein-coding transcripts that participate in the regulation of virtually every aspect of bacterial physiology. Heterocystous cyanobacteria are a group of photosynthetic organisms that exhibit multicellular behavior and developmental alternatives involving specific transcriptomes exclusive of a given physiological condition or even a cell type. In the context of our ongoing effort to understand developmental decisions in these organisms we have undertaken an approach to the global identification of sRNAs. Using differential RNA-Seq we have previously identified transcriptional start sites for the model heterocystous cyanobacterium Nostoc sp. PCC 7120. Here we combine this dataset with a prediction of Rho-independent transcriptional terminators and an analysis of phylogenetic conservation of potential sRNAs among 89 available cyanobacterial genomes. In contrast to predictive genome-wide approaches, the use of an experimental dataset comprising all active transcriptional start sites (differential RNA-Seq) facilitates the identification of bona fide sRNAs. The output of our approach is a dataset of predicted potential sRNAs in Nostoc sp. PCC 7120, with different degrees of phylogenetic conservation across the 89 cyanobacterial genomes analyzed. Previously described sRNAs appear among the predicted sRNAs, demonstrating the performance of the algorithm. In addition, new predicted sRNAs are now identified that can be involved in regulation of different aspects of cyanobacterial physiology, including adaptation to nitrogen stress, the condition that triggers differentiation of heterocysts (specialized nitrogen-fixing cells). Transcription of several predicted sRNAs that appear exclusively in the genomes of heterocystous cyanobacteria is experimentally verified by Northern blot. Cell-specific transcription of one of these sRNAs, NsiR8 (nitrogen stress-induced RNA 8), in developing heterocysts is also demonstrated.
Collapse
Affiliation(s)
- Manuel Brenes-Álvarez
- Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas and Universidad de Sevilla Sevilla, Spain
| | - Elvira Olmedo-Verd
- Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas and Universidad de Sevilla Sevilla, Spain
| | - Agustín Vioque
- Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas and Universidad de Sevilla Sevilla, Spain
| | - Alicia M Muro-Pastor
- Instituto de Bioquímica Vegetal y Fotosíntesis, Consejo Superior de Investigaciones Científicas and Universidad de Sevilla Sevilla, Spain
| |
Collapse
|
99
|
Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans. Genome Biol 2016; 17:14. [PMID: 26821746 PMCID: PMC4731934 DOI: 10.1186/s13059-016-0873-8] [Citation(s) in RCA: 113] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/06/2016] [Indexed: 02/06/2023] Open
Abstract
Background Long non-coding RNAs (lncRNAs) are increasingly implicated as gene regulators and may ultimately be more numerous than protein-coding genes in the human genome. Despite large numbers of reported lncRNAs, reference annotations are likely incomplete due to their lower and tighter tissue-specific expression compared to mRNAs. An unexplored factor potentially confounding lncRNA identification is inter-individual expression variability. Here, we characterize lncRNA natural expression variability in human primary granulocytes. Results We annotate granulocyte lncRNAs and mRNAs in RNA-seq data from 10 healthy individuals, identifying multiple lncRNAs absent from reference annotations, and use this to investigate three known features (higher tissue-specificity, lower expression, and reduced splicing efficiency) of lncRNAs relative to mRNAs. Expression variability was examined in seven individuals sampled three times at 1- or more than 1-month intervals. We show that lncRNAs display significantly more inter-individual expression variability compared to mRNAs. We confirm this finding in two independent human datasets by analyzing multiple tissues from the GTEx project and lymphoblastoid cell lines from the GEUVADIS project. Using the latter dataset we also show that including more human donors into the transcriptome annotation pipeline allows identification of an increasing number of lncRNAs, but minimally affects mRNA gene number. Conclusions A comprehensive annotation of lncRNAs is known to require an approach that is sensitive to low and tight tissue-specific expression. Here we show that increased inter-individual expression variability is an additional general lncRNA feature to consider when creating a comprehensive annotation of human lncRNAs or proposing their use as prognostic or disease markers. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0873-8) contains supplementary material, which is available to authorized users.
Collapse
|
100
|
A Comprehensive Review of Emerging Computational Methods for Gene Identification. JOURNAL OF INFORMATION PROCESSING SYSTEMS 2016. [DOI: 10.3745/jips.04.0023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|