1
|
Brunet MA, Levesque SA, Hunting DJ, Cohen AA, Roucou X. Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship. Genome Res 2018; 28:609-624. [PMID: 29626081 PMCID: PMC5932603 DOI: 10.1101/gr.230938.117] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 03/27/2018] [Indexed: 12/12/2022]
Abstract
Technological advances promise unprecedented opportunities for whole exome sequencing and proteomic analyses of populations. Currently, data from genome and exome sequencing or proteomic studies are searched against reference genome annotations. This provides the foundation for research and clinical screening for genetic causes of pathologies. However, current genome annotations substantially underestimate the proteomic information encoded within a gene. Numerous studies have now demonstrated the expression and function of alternative (mainly small, sometimes overlapping) ORFs within mature gene transcripts. This has important consequences for the correlation of phenotypes and genotypes. Most alternative ORFs are not yet annotated because of a lack of evidence, and this absence from databases precludes their detection by standard proteomic methods, such as mass spectrometry. Here, we demonstrate how current approaches tend to overlook alternative ORFs, hindering the discovery of new genetic drivers and fundamental research. We discuss available tools and techniques to improve identification of proteins from alternative ORFs and finally suggest a novel annotation system to permit a more complete representation of the transcriptomic and proteomic information contained within a gene. Given the crucial challenge of distinguishing functional ORFs from random ones, the suggested pipeline emphasizes both experimental data and conservation signatures. The addition of alternative ORFs in databases will render identification less serendipitous and advance the pace of research and genomic knowledge. This review highlights the urgent medical and research need to incorporate alternative ORFs in current genome annotations and thus permit their inclusion in hypotheses and models, which relate phenotypes and genotypes.
Collapse
Affiliation(s)
- Marie A Brunet
- Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.,Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada
| | - Sébastien A Levesque
- Pediatric Department, Centre Hospitalier de l'Université de Sherbrooke, Quebec J1H 5N4, Canada
| | - Darel J Hunting
- Department of Nuclear Medicine & Radiobiology, Université de Sherbrooke, Quebec J1H 5N4, Canada
| | - Alan A Cohen
- Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada
| | - Xavier Roucou
- Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada
| |
Collapse
|
2
|
Hücker SM, Ardern Z, Goldberg T, Schafferhans A, Bernhofer M, Vestergaard G, Nelson CW, Schloter M, Rost B, Scherer S, Neuhaus K. Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome. PLoS One 2017; 12:e0184119. [PMID: 28902868 PMCID: PMC5597208 DOI: 10.1371/journal.pone.0184119] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 08/20/2017] [Indexed: 12/29/2022] Open
Abstract
In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.
Collapse
Affiliation(s)
- Sarah M. Hücker
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Tatyana Goldberg
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Andrea Schafferhans
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Michael Bernhofer
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Gisle Vestergaard
- Research Unit Environmental Genomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Chase W. Nelson
- Sackler Institute for Comparative Genomics, American Museum of Natural History New York, New York, United States of America
| | - Michael Schloter
- Research Unit Environmental Genomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Burkhard Rost
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
- * E-mail:
| |
Collapse
|
3
|
Neuhaus K, Landstorfer R, Simon S, Schober S, Wright PR, Smith C, Backofen R, Wecko R, Keim DA, Scherer S. Differentiation of ncRNAs from small mRNAs in Escherichia coli O157:H7 EDL933 (EHEC) by combined RNAseq and RIBOseq - ryhB encodes the regulatory RNA RyhB and a peptide, RyhP. BMC Genomics 2017; 18:216. [PMID: 28245801 PMCID: PMC5331693 DOI: 10.1186/s12864-017-3586-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2016] [Accepted: 02/13/2017] [Indexed: 12/14/2022] Open
Abstract
Background While NGS allows rapid global detection of transcripts, it remains difficult to distinguish ncRNAs from short mRNAs. To detect potentially translated RNAs, we developed an improved protocol for bacterial ribosomal footprinting (RIBOseq). This allowed distinguishing ncRNA from mRNA in EHEC. A high ratio of ribosomal footprints per transcript (ribosomal coverage value, RCV) is expected to indicate a translated RNA, while a low RCV should point to a non-translated RNA. Results Based on their low RCV, 150 novel non-translated EHEC transcripts were identified as putative ncRNAs, representing both antisense and intergenic transcripts, 74 of which had expressed homologs in E. coli MG1655. Bioinformatics analysis predicted statistically significant target regulons for 15 of the intergenic transcripts; experimental analysis revealed 4-fold or higher differential expression of 46 novel ncRNA in different growth media. Out of 329 annotated EHEC ncRNAs, 52 showed an RCV similar to protein-coding genes, of those, 16 had RIBOseq patterns matching annotated genes in other enterobacteriaceae, and 11 seem to possess a Shine-Dalgarno sequence, suggesting that such ncRNAs may encode small proteins instead of being solely non-coding. To support that the RIBOseq signals are reflecting translation, we tested the ribosomal-footprint covered ORF of ryhB and found a phenotype for the encoded peptide in iron-limiting condition. Conclusion Determination of the RCV is a useful approach for a rapid first-step differentiation between bacterial ncRNAs and small mRNAs. Further, many known ncRNAs may encode proteins as well. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3586-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Klaus Neuhaus
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany. .,Core Facility Microbiome/NGS, ZIEL Institute for Food & Health, Weihenstephaner Berg 3, D-85354, Freising, Germany.
| | - Richard Landstorfer
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany
| | - Svenja Simon
- Informatik und Informationswissenschaft, Universität Konstanz, D-78457, Konstanz, Germany
| | - Steffen Schober
- Institut für Nachrichtentechnik, Universität Ulm, Albert-Einstein-Allee 43, D-89081, Ulm, Germany
| | - Patrick R Wright
- Bioinformatics Group, Department of Computer Science and BIOSS Centre for Biological Signaling Studies, Cluster of Excellence, University of Freiburg, D-79110, Freiburg, Germany
| | - Cameron Smith
- Bioinformatics Group, Department of Computer Science and BIOSS Centre for Biological Signaling Studies, Cluster of Excellence, University of Freiburg, D-79110, Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science and BIOSS Centre for Biological Signaling Studies, Cluster of Excellence, University of Freiburg, D-79110, Freiburg, Germany
| | - Romy Wecko
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany
| | - Daniel A Keim
- Informatik und Informationswissenschaft, Universität Konstanz, D-78457, Konstanz, Germany
| | - Siegfried Scherer
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany
| |
Collapse
|
4
|
Singh NP, Tiwari A, Bansal A, Thakur S, Sharma G, Gabrani R. Genome level analysis of bacteriocins of lactic acid bacteria. Comput Biol Chem 2015; 56:1-6. [DOI: 10.1016/j.compbiolchem.2015.02.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Revised: 02/09/2015] [Accepted: 02/21/2015] [Indexed: 10/23/2022]
|
5
|
Crappé J, Van Criekinge W, Menschaert G. Little things make big things happen: A summary of micropeptide encoding genes. EUPA OPEN PROTEOMICS 2014. [DOI: 10.1016/j.euprot.2014.02.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
6
|
The Escherichia coli CydX protein is a member of the CydAB cytochrome bd oxidase complex and is required for cytochrome bd oxidase activity. J Bacteriol 2013; 195:3640-50. [PMID: 23749980 DOI: 10.1128/jb.00324-13] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Cytochrome bd oxidase operons from more than 50 species of bacteria contain a short gene encoding a small protein that ranges from ∼30 to 50 amino acids and is predicted to localize to the cell membrane. Although cytochrome bd oxidases have been studied for more than 70 years, little is known about the role of this small protein, denoted CydX, in oxidase activity. Here we report that Escherichia coli mutants lacking CydX exhibit phenotypes associated with reduced oxidase activity. In addition, cell membrane extracts from ΔcydX mutant strains have reduced oxidase activity in vitro. Consistent with data showing that CydX is required for cytochrome bd oxidase activity, copurification experiments indicate that CydX interacts with the CydAB cytochrome bd oxidase complex. Together, these data support the hypothesis that CydX is a subunit of the CydAB cytochrome bd oxidase complex that is required for complex activity. The results of mutation analysis of CydX suggest that few individual amino acids in the small protein are essential for function, at least in the context of protein overexpression. In addition, the results of analysis of the paralogous small transmembrane protein AppX show that the two proteins could have some overlapping functionality in the cell and that both have the potential to interact with the CydAB complex.
Collapse
|
7
|
Kenney GE, Rosenzweig AC. Genome mining for methanobactins. BMC Biol 2013; 11:17. [PMID: 23442874 PMCID: PMC3621798 DOI: 10.1186/1741-7007-11-17] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 02/26/2013] [Indexed: 01/27/2023] Open
Abstract
Background Methanobactins (Mbns) are a family of copper-binding natural products involved in copper uptake by methanotrophic bacteria. The few Mbns that have been structurally characterized feature copper coordination by two nitrogen-containing heterocycles next to thioamide groups embedded in a peptidic backbone of varying composition. Mbns are proposed to derive from post-translational modification of ribosomally synthesized peptides, but only a few genes encoding potential precursor peptides have been identified. Moreover, the relevance of neighboring genes in these genomes has been unclear. Results The potential for Mbn production in a wider range of bacterial species was assessed by mining microbial genomes. Operons encoding Mbn-like precursor peptides, MbnAs, were identified in 16 new species, including both methanotrophs and, surprisingly, non-methanotrophs. Along with MbnA, the core of the operon is formed by two putative biosynthetic genes denoted MbnB and MbnC. The species can be divided into five groups on the basis of their MbnA and MbnB sequences and their operon compositions. Additional biosynthetic proteins, including aminotransferases, sulfotransferases and flavin adenine dinucleotide (FAD)-dependent oxidoreductases were also identified in some families. Beyond biosynthetic machinery, a conserved set of transporters was identified, including MATE multidrug exporters and TonB-dependent transporters. Additional proteins of interest include a di-heme cytochrome c peroxidase and a partner protein, the roles of which remain a mystery. Conclusions This study indicates that Mbn-like compounds may be more widespread than previously thought, but are not present in all methanotrophs. This distribution of species suggests a broader role in metal homeostasis. These data provide a link between precursor peptide sequence and Mbn structure, facilitating predictions of new Mbn structures and supporting a post-translational modification biosynthetic pathway. In addition, testable models for Mbn transport and for methanotrophic copper regulation have emerged. Given the unusual modifications observed in Mbns characterized thus far, understanding the roles of the putative biosynthetic proteins is likely to reveal novel pathways and chemistry.
Collapse
Affiliation(s)
- Grace E Kenney
- Departments of Molecular Biosciences and of Chemistry, Northwestern University, Evanston, IL 60208, USA
| | | |
Collapse
|