51
|
Khitun A, Ness TJ, Slavoff SA. Small open reading frames and cellular stress responses. Mol Omics 2019; 15:108-116. [PMID: 30810554 DOI: 10.1039/c8mo00283e] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Small open reading frames (smORFs) encoding polypeptides of less than 100 amino acids in eukaryotes (50 amino acids in prokaryotes) were historically excluded from genome annotation. However, recent advances in genomics, ribosome footprinting, and proteomics have revealed thousands of translated smORFs in genomes spanning evolutionary space. These smORFs can encode functional polypeptides, or act as cis-translational regulators. Herein we review evidence that some smORF-encoded polypeptides (SEPs) participate in stress responses in both prokaryotes and eukaryotes, and that some upstream ORFs (uORFs) regulate stress-responsive translation of downstream cistrons in eukaryotic cells. These studies provide insight into a regulated subclass of smORFs and suggest that at least some SEPs may participate in maintenance of cellular homeostasis under stress.
Collapse
Affiliation(s)
- Alexandra Khitun
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Travis J Ness
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Sarah A Slavoff
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA and Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
52
|
Mat-Sharani S, Firdaus-Raih M. Computational discovery and annotation of conserved small open reading frames in fungal genomes. BMC Bioinformatics 2019; 19:551. [PMID: 30717662 PMCID: PMC7394265 DOI: 10.1186/s12859-018-2550-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Accepted: 11/30/2018] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes. RESULTS A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized. CONCLUSIONS It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.
Collapse
Affiliation(s)
- Shuhaila Mat-Sharani
- Centre for Frontier Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia.,Malaysia Genome Institute, Ministry of Science, Technology & Innovation, Jalan Bangi, 43000, Kajang, Selangor, Malaysia
| | - Mohd Firdaus-Raih
- Centre for Frontier Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia. .,Institute of Systems Biology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia.
| |
Collapse
|
53
|
Bacterial ribosome heterogeneity: Changes in ribosomal protein composition during transition into stationary growth phase. Biochimie 2019; 156:169-180. [DOI: 10.1016/j.biochi.2018.10.013] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 10/18/2018] [Indexed: 12/11/2022]
|
54
|
Affiliation(s)
- Maria E. Sousa
- Ophthalmology, Jacobs School of Medicine and Biomedical Science, University of New York at Buffalo, Buffalo, NY, United States of America
- Research Service, Veterans Administration Western New York Healthcare System, Buffalo, NY, United States of America
| | - Michael H. Farkas
- Ophthalmology, Jacobs School of Medicine and Biomedical Science, University of New York at Buffalo, Buffalo, NY, United States of America
- Research Service, Veterans Administration Western New York Healthcare System, Buffalo, NY, United States of America
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Science, State University of New York at Buffalo, Buffalo, NY, United States of America
| |
Collapse
|
55
|
Plágaro AH, Pearman PB, Kaberdin VR. Defining the transcription landscape of the Gram-negative marine bacterium Vibrio harveyi. Genomics 2018; 111:1547-1556. [PMID: 30423347 DOI: 10.1016/j.ygeno.2018.10.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 09/13/2018] [Accepted: 10/23/2018] [Indexed: 12/13/2022]
Abstract
Vibrio harveyi is a Gram-negative pathogenic bacterium ubiquitously present in natural aquatic systems. Although environmental adaptability in V. harveyi may be enabled by profound reprogramming of gene expression previously observed during responses to starvation, suboptimal temperatures and other stress factors, the key characteristics of V. harveyi transcripts and operons, such as their boundaries and size as well as location of small RNA genes, remain largely unknown. To reveal the main features of the V. harveyi transcriptome, total RNA of this organism was analyzed by differential RNA sequencing (dRNA-seq). Analysis of the dRNA-seq data made it possible to define the primary transcriptome of V. harveyi along with cis-acting regulatory elements (riboswitches and leader sequences) and new genes. The latter encode a number of putative polypeptides and new phylogenetically conserved antisense RNAs potentially involved in the post-transcriptional control of gene expression.
Collapse
Affiliation(s)
- Ander Hernández Plágaro
- Department of Immunology, Microbiology and Parasitology, University of the Basque Country UPV/EHU, 48940 Leioa, Spain.
| | - Peter B Pearman
- Department of Plant Biology and Ecology, University of the Basque Country UPV/EHU, 48940 Leioa, Spain; IKERBASQUE, Basque Foundation for Science, Maria Diaz de Haro 3, 48013 Bilbao, Spain
| | - Vladimir R Kaberdin
- Department of Immunology, Microbiology and Parasitology, University of the Basque Country UPV/EHU, 48940 Leioa, Spain; IKERBASQUE, Basque Foundation for Science, Maria Diaz de Haro 3, 48013 Bilbao, Spain; Research Centre for Experimental Marine Biology and Biotechnology (PIE-UPV/EHU), 48620 Plentzia, Spain.
| |
Collapse
|
56
|
Sharma M, Das M, Diana D, Wedderburn A, Anindya R. Identification of novel open reading frames in the intergenic regions of Mycobacterium leprae genome and detection of transcript by qRT-PCR. Microb Pathog 2018; 124:316-321. [DOI: 10.1016/j.micpath.2018.08.062] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 08/28/2018] [Accepted: 08/29/2018] [Indexed: 10/28/2022]
|
57
|
Yin X, Wu Orr M, Wang H, Hobbs EC, Shabalina SA, Storz G. The small protein MgtS and small RNA MgrR modulate the PitA phosphate symporter to boost intracellular magnesium levels. Mol Microbiol 2018; 111:131-144. [PMID: 30276893 DOI: 10.1111/mmi.14143] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/26/2018] [Indexed: 01/04/2023]
Abstract
In response to low levels of magnesium (Mg2+ ), the PhoQP two component system induces the transcription of two convergent genes, one encoding a 31-amino acid protein denoted MgtS and the second encoding a small, regulatory RNA (sRNA) denoted MgrR. Previous studies showed that the MgtS protein interacts with and stabilizes the MgtA Mg2+ importer to increase intracellular Mg2+ levels, while the MgrR sRNA base pairs with the eptB mRNA thus affecting lipopolysaccharide modification. Surprisingly, we found overexpression of the MgtS protein also leads to induction of the PhoRB regulon. Studies to understand this activation showed that MgtS forms a complex with a second protein, PitA, a cation-phosphate symporter. Given that the additive effect of ∆mgtA and ∆mgtS mutations on intracellular Mg2+ concentrations seen previously is lost in the ∆pitA mutant, we suggest that MgtS binds to and prevents Mg2+ leakage through PitA under Mg2+ -limiting conditions. Consistent with a detrimental role of PitA in low Mg2+ , we also observe MgrR sRNA repression of PitA synthesis. Thus, PhoQP induces the expression of two convergent small genes in response to Mg2+ limitation whose products act to modulate PitA at different levels to increase intracellular Mg2+ .
Collapse
Affiliation(s)
- Xuefeng Yin
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD, 20892-4417, USA.,Health Science Center, Peking University, Beijing, 100191, China
| | - Mona Wu Orr
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD, 20892-4417, USA
| | - Hanbo Wang
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD, 20892-4417, USA.,School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Errett C Hobbs
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD, 20892-4417, USA
| | - Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD, 20892-4417, USA
| |
Collapse
|
58
|
Orf GS, Gisriel C, Redding KE. Evolution of photosynthetic reaction centers: insights from the structure of the heliobacterial reaction center. PHOTOSYNTHESIS RESEARCH 2018; 138:11-37. [PMID: 29603081 DOI: 10.1007/s11120-018-0503-2] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 03/22/2018] [Indexed: 05/24/2023]
Abstract
The proliferation of phototrophy within early-branching prokaryotes represented a significant step forward in metabolic evolution. All available evidence supports the hypothesis that the photosynthetic reaction center (RC)-the pigment-protein complex in which electromagnetic energy (i.e., photons of visible or near-infrared light) is converted to chemical energy usable by an organism-arose once in Earth's history. This event took place over 3 billion years ago and the basic architecture of the RC has diversified into the distinct versions that now exist. Using our recent 2.2-Å X-ray crystal structure of the homodimeric photosynthetic RC from heliobacteria, we have performed a robust comparison of all known RC types with available structural data. These comparisons have allowed us to generate hypotheses about structural and functional aspects of the common ancestors of extant RCs and to expand upon existing evolutionary schemes. Since the heliobacterial RC is homodimeric and loosely binds (and reduces) quinones, we support the view that it retains more ancestral features than its homologs from other groups. In the evolutionary scenario we propose, the ancestral RC predating the division between Type I and Type II RCs was homodimeric, loosely bound two mobile quinones, and performed an inefficient disproportionation reaction to reduce quinone to quinol. The changes leading to the diversification into Type I and Type II RCs were separate responses to the need to optimize this reaction: the Type I lineage added a [4Fe-4S] cluster to facilitate double reduction of a quinone, while the Type II lineage heterodimerized and specialized the two cofactor branches, fixing the quinone in the QA site. After the Type I/II split, an ancestor to photosystem I fixed its quinone sites and then heterodimerized to bind PsaC as a new subunit, as responses to rising O2 after the appearance of the oxygen-evolving complex in an ancestor of photosystem II. These pivotal events thus gave rise to the diversity that we observe today.
Collapse
Affiliation(s)
- Gregory S Orf
- School of Molecular Sciences, Arizona State University, Tempe, AZ, 85287, USA
- Center for Bioenergy and Photosynthesis, Arizona State University, Tempe, AZ, 85287, USA
| | - Christopher Gisriel
- School of Molecular Sciences, Arizona State University, Tempe, AZ, 85287, USA
- Center for Bioenergy and Photosynthesis, Arizona State University, Tempe, AZ, 85287, USA
- The Biodesign Center for Applied Structural Discovery, Arizona State University, Tempe, AZ, 85287, USA
| | - Kevin E Redding
- School of Molecular Sciences, Arizona State University, Tempe, AZ, 85287, USA.
- Center for Bioenergy and Photosynthesis, Arizona State University, Tempe, AZ, 85287, USA.
| |
Collapse
|
59
|
Yu SH, Vogel J, Förstner KU. ANNOgesic: a Swiss army knife for the RNA-seq based annotation of bacterial/archaeal genomes. Gigascience 2018; 7:5087959. [PMID: 30169674 PMCID: PMC6123526 DOI: 10.1093/gigascience/giy096] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Accepted: 08/23/2018] [Indexed: 11/13/2022] Open
Abstract
To understand the gene regulation of an organism of interest, a comprehensive genome annotation is essential. While some features, such as coding sequences, can be computationally predicted with high accuracy based purely on the genomic sequence, others, such as promoter elements or noncoding RNAs, are harder to detect. RNA sequencing (RNA-seq) has proven to be an efficient method to identify these genomic features and to improve genome annotations. However, processing and integrating RNA-seq data in order to generate high-resolution annotations is challenging, time consuming, and requires numerous steps. We have constructed a powerful and modular tool called ANNOgesic that provides the required analyses and simplifies RNA-seq-based bacterial and archaeal genome annotation. It can integrate data from conventional RNA-seq and differential RNA-seq and predicts and annotates numerous features, including small noncoding RNAs, with high precision. The software is available under an open source license (ISCL) at https://pypi.org/project/ANNOgesic/.
Collapse
Affiliation(s)
- Sung-Huan Yu
- Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2, 97080 Würzburg, Germany
| | - Jörg Vogel
- Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2, 97080 Würzburg, Germany.,Helmholtz Institute for RNA-based Infection Research (HIRI), Josef-Schneider-Straße 2, 97080 Würzburg Germany
| | - Konrad U Förstner
- Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2, 97080 Würzburg, Germany.,ZB MED - Information Center for Life Sciences, Informationservices, Gleueler Straße 60, 50931 Cologne (Köln), Germany.,Technical University of Cologne, Faculty for Information and Communication Sciences, Claudiusstraße 1, 50678 Cologne (Köln), Germany
| |
Collapse
|
60
|
Investigation of amino acid specificity in the CydX small protein shows sequence plasticity at the functional level. PLoS One 2018; 13:e0198699. [PMID: 29912917 PMCID: PMC6005532 DOI: 10.1371/journal.pone.0198699] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2018] [Accepted: 05/23/2018] [Indexed: 11/19/2022] Open
Abstract
Small proteins are a new and expanding area of research. Many characterized small proteins are composed of a single hydrophobic α-helix, and the functional requirements of their limited amino acid sequence are not well understood. One hydrophobic small protein, CydX, has been shown to be a component of the cytochrome bd oxidase complex in Escherichia coli, and is required for enzyme function. To investigate small protein sequence specificity, an alanine scanning mutagenesis on the small protein CydX was conducted using mutant alleles expressed from the E. coli chromosome at the wild-type locus. The resulting mutant strains were assayed for CydX function. No single amino acid was required to maintain wild-type resistance to β-mercaptoethanol. However, substitutions of 10-amino acid blocks indicated that the N-terminus of the protein was required for wild-type CydX activity. A series of double mutants showed that multiple mutations at the N-terminus led to β-mercaptoethanol sensitivity in vivo. Triple mutants showed both in vivo and in vitro phenotypes. Together, these data provide evidence suggesting a high level of functional plasticity in CydX, in which multiple amino acids may work cooperatively to facilitate CydX function.
Collapse
|
61
|
Hücker SM, Vanderhaeghen S, Abellan-Schneyder I, Scherer S, Neuhaus K. The Novel Anaerobiosis-Responsive Overlapping Gene ano Is Overlapping Antisense to the Annotated Gene ECs2385 of Escherichia coli O157:H7 Sakai. Front Microbiol 2018; 9:931. [PMID: 29867840 PMCID: PMC5960689 DOI: 10.3389/fmicb.2018.00931] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 04/23/2018] [Indexed: 12/26/2022] Open
Abstract
Current notion presumes that only one protein is encoded at a given bacterial genetic locus. However, transcription and translation of an overlapping open reading frame (ORF) of 186 bp length were discovered by RNAseq and RIBOseq experiments. This ORF is almost completely embedded in the annotated L,D-transpeptidase gene ECs2385 of Escherichia coli O157:H7 Sakai in the antisense reading frame -3. The ORF is transcribed as part of a bicistronic mRNA, which includes the annotated upstream gene ECs2384, encoding a murein lipoprotein. The transcriptional start site of the operon resides 38 bp upstream of the ECs2384 start codon and is driven by a predicted σ70 promoter, which is constitutively active under different growth conditions. The bicistronic operon contains a ρ-independent terminator just upstream of the novel gene, significantly decreasing its transcription. The novel gene can be stably expressed as an EGFP-fusion protein and a translationally arrested mutant of ano, unable to produce the protein, shows a growth advantage in competitive growth experiments compared to the wild type under anaerobiosis. Therefore, the novel antisense overlapping gene is named ano (anaerobiosis responsive overlapping gene). A phylostratigraphic analysis indicates that ano originated very recently de novo by overprinting after the Escherichia/Shigella clade separated from other enterobacteria. Therefore, ano is one of the very rare cases of overlapping genes known in the genus Escherichia.
Collapse
Affiliation(s)
- Sarah M Hücker
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | - Sonja Vanderhaeghen
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | | | - Siegfried Scherer
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany.,Institute for Food & Health, Technical University of Munich, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany.,Core Facility Microbiome/NGS, Institute for Food & Health, Technical University of Munich, Freising, Germany
| |
Collapse
|
62
|
VanOrsdel CE, Kelly JP, Burke BN, Lein CD, Oufiero CE, Sanchez JF, Wimmers LE, Hearn DJ, Abuikhdair FJ, Barnhart KR, Duley ML, Ernst SEG, Kenerson BA, Serafin AJ, Hemm MR. Identifying New Small Proteins in Escherichia coli. Proteomics 2018; 18:e1700064. [PMID: 29645342 PMCID: PMC6001520 DOI: 10.1002/pmic.201700064] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 03/05/2018] [Indexed: 12/11/2022]
Abstract
The number of small proteins (SPs) encoded in the Escherichia coli genome is unknown, as current bioinformatics and biochemical techniques make short gene and small protein identification challenging. One method of small protein identification involves adding an epitope tag to the 3′ end of a short open reading frame (sORF) on the chromosome, with synthesis confirmed by immunoblot assays. In this study, this strategy was used to identify new E. coli small proteins, tagging 80 sORFs in the E. coli genome, and assayed for protein synthesis. The selected sORFs represent diverse sequence characteristics, including degrees of sORF conservation, predicted transmembrane domains, sORF direction with respect to flanking genes, ribosome binding site (RBS) prediction, and ribosome profiling results. Of 80 sORFs, 36 resulted in encoded synthesized proteins—a 45% success rate. Modeling of detected versus non‐detected small proteins analysis showed predictions based on RBS prediction, transcription data, and ribosome profiling had statistically‐significant correlation with protein synthesis; however, there was no correlation between current sORF annotation and protein synthesis. These results suggest substantial numbers of small proteins remain undiscovered in E. coli, and existing bioinformatics techniques must continue to improve to facilitate identification.
Collapse
Affiliation(s)
- Caitlin E VanOrsdel
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - John P Kelly
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Brittany N Burke
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Christina D Lein
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | | | - Joseph F Sanchez
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Larry E Wimmers
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - David J Hearn
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Fatimeh J Abuikhdair
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Kathryn R Barnhart
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Michelle L Duley
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Sarah E G Ernst
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Briana A Kenerson
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Aubrey J Serafin
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Matthew R Hemm
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| |
Collapse
|
63
|
Hollerer I, Higdon A, Brar GA. Strategies and Challenges in Identifying Function for Thousands of sORF-Encoded Peptides in Meiosis. Proteomics 2018; 18:e1700274. [PMID: 28929627 PMCID: PMC6135095 DOI: 10.1002/pmic.201700274] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Indexed: 11/11/2022]
Abstract
Recent genomic analyses have revealed pervasive translation from formerly unrecognized short open reading frames (sORFs) during yeast meiosis. Despite their short length, which has caused these regions to be systematically overlooked by traditional gene annotation approaches, meiotic sORFs share many features with classical genes, implying the potential for similar types of cellular functions. We found that sORF expression accounts for approximately 10-20% of the cellular translation capacity in yeast during meiotic differentiation and occurs within well-defined time windows, suggesting the production of relatively abundant peptides with stage-specific meiotic roles from these regions. Here, we provide arguments supporting this hypothesis and discuss sORF similarities and differences, as a group, to traditional protein coding regions, as well as challenges in defining their specific functions.
Collapse
Affiliation(s)
- Ina Hollerer
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA, USA
| | - Andrea Higdon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA, USA
| | - Gloria A Brar
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA, USA
| |
Collapse
|
64
|
Brunet MA, Levesque SA, Hunting DJ, Cohen AA, Roucou X. Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship. Genome Res 2018; 28:609-624. [PMID: 29626081 PMCID: PMC5932603 DOI: 10.1101/gr.230938.117] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 03/27/2018] [Indexed: 12/12/2022]
Abstract
Technological advances promise unprecedented opportunities for whole exome sequencing and proteomic analyses of populations. Currently, data from genome and exome sequencing or proteomic studies are searched against reference genome annotations. This provides the foundation for research and clinical screening for genetic causes of pathologies. However, current genome annotations substantially underestimate the proteomic information encoded within a gene. Numerous studies have now demonstrated the expression and function of alternative (mainly small, sometimes overlapping) ORFs within mature gene transcripts. This has important consequences for the correlation of phenotypes and genotypes. Most alternative ORFs are not yet annotated because of a lack of evidence, and this absence from databases precludes their detection by standard proteomic methods, such as mass spectrometry. Here, we demonstrate how current approaches tend to overlook alternative ORFs, hindering the discovery of new genetic drivers and fundamental research. We discuss available tools and techniques to improve identification of proteins from alternative ORFs and finally suggest a novel annotation system to permit a more complete representation of the transcriptomic and proteomic information contained within a gene. Given the crucial challenge of distinguishing functional ORFs from random ones, the suggested pipeline emphasizes both experimental data and conservation signatures. The addition of alternative ORFs in databases will render identification less serendipitous and advance the pace of research and genomic knowledge. This review highlights the urgent medical and research need to incorporate alternative ORFs in current genome annotations and thus permit their inclusion in hypotheses and models, which relate phenotypes and genotypes.
Collapse
Affiliation(s)
- Marie A Brunet
- Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.,Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada
| | - Sébastien A Levesque
- Pediatric Department, Centre Hospitalier de l'Université de Sherbrooke, Quebec J1H 5N4, Canada
| | - Darel J Hunting
- Department of Nuclear Medicine & Radiobiology, Université de Sherbrooke, Quebec J1H 5N4, Canada
| | - Alan A Cohen
- Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada
| | - Xavier Roucou
- Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada
| |
Collapse
|
65
|
Mustoe AM, Busan S, Rice GM, Hajdin CE, Peterson BK, Ruda VM, Kubica N, Nutiu R, Baryza JL, Weeks KM. Pervasive Regulatory Functions of mRNA Structure Revealed by High-Resolution SHAPE Probing. Cell 2018; 173:181-195.e18. [PMID: 29551268 DOI: 10.1016/j.cell.2018.02.034] [Citation(s) in RCA: 175] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Revised: 01/02/2018] [Accepted: 02/15/2018] [Indexed: 11/25/2022]
Abstract
mRNAs can fold into complex structures that regulate gene expression. Resolving such structures de novo has remained challenging and has limited our understanding of the prevalence and functions of mRNA structure. We use SHAPE-MaP experiments in living E. coli cells to derive quantitative, nucleotide-resolution structure models for 194 endogenous transcripts encompassing approximately 400 genes. Individual mRNAs have exceptionally diverse architectures, and most contain well-defined structures. Active translation destabilizes mRNA structure in cells. Nevertheless, mRNA structure remains similar between in-cell and cell-free environments, indicating broad potential for structure-mediated gene regulation. We find that the translation efficiency of endogenous genes is regulated by unfolding kinetics of structures overlapping the ribosome binding site. We discover conserved structured elements in 35% of UTRs, several of which we validate as novel protein binding motifs. RNA structure regulates every gene studied here in a meaningful way, implying that most functional structures remain to be discovered.
Collapse
Affiliation(s)
- Anthony M Mustoe
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA.
| | - Steven Busan
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA
| | - Greggory M Rice
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA; Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | | | - Brant K Peterson
- Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | - Vera M Ruda
- Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | - Neil Kubica
- Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | - Razvan Nutiu
- Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | - Jeremy L Baryza
- Novartis Institutes for Biomedical Research, Inc., Cambridge, MA, USA
| | - Kevin M Weeks
- Department of Chemistry, University of North Carolina, Chapel Hill, NC, USA.
| |
Collapse
|
66
|
Abstract
Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry. Incentives to define lists of genes that are essential for bacterial survival include the identification of potential targets for antibacterial drug development, genes required for rapid growth for exploitation in biotechnology, and discovery of new biochemical pathways. To identify essential genes in Escherichia coli, we constructed a transposon mutant library of unprecedented density. Initial automated analysis of the resulting data revealed many discrepancies compared to the literature. We now report more extensive statistical analysis supported by both literature searches and detailed inspection of high-density TraDIS sequencing data for each putative essential gene for the E. coli model laboratory organism. This paper is important because it provides a better understanding of the essential genes of E. coli, reveals the limitations of relying on automated analysis alone, and provides a new standard for the analysis of TraDIS data.
Collapse
|
67
|
Hücker SM, Vanderhaeghen S, Abellan-Schneyder I, Wecko R, Simon S, Scherer S, Neuhaus K. A novel short L-arginine responsive protein-coding gene (laoB) antiparallel overlapping to a CadC-like transcriptional regulator in Escherichia coli O157:H7 Sakai originated by overprinting. BMC Evol Biol 2018; 18:21. [PMID: 29433444 PMCID: PMC5810103 DOI: 10.1186/s12862-018-1134-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 01/31/2018] [Indexed: 11/10/2022] Open
Abstract
Background Due to the DNA triplet code, it is possible that the sequences of two or more protein-coding genes overlap to a large degree. However, such non-trivial overlaps are usually excluded by genome annotation pipelines and, thus, only a few overlapping gene pairs have been described in bacteria. In contrast, transcriptome and translatome sequencing reveals many signals originated from the antisense strand of annotated genes, of which we analyzed an example gene pair in more detail. Results A small open reading frame of Escherichia coli O157:H7 strain Sakai (EHEC), designated laoB (L-arginine responsive overlapping gene), is embedded in reading frame −2 in the antisense strand of ECs5115, encoding a CadC-like transcriptional regulator. This overlapping gene shows evidence of transcription and translation in Luria-Bertani (LB) and brain-heart infusion (BHI) medium based on RNA sequencing (RNAseq) and ribosomal-footprint sequencing (RIBOseq). The transcriptional start site is 289 base pairs (bp) upstream of the start codon and transcription termination is 155 bp downstream of the stop codon. Overexpression of LaoB fused to an enhanced green fluorescent protein (EGFP) reporter was possible. The sequence upstream of the transcriptional start site displayed strong promoter activity under different conditions, whereas promoter activity was significantly decreased in the presence of L-arginine. A strand-specific translationally arrested mutant of laoB provided a significant growth advantage in competitive growth experiments in the presence of L-arginine compared to the wild type, which returned to wild type level after complementation of laoB in trans. A phylostratigraphic analysis indicated that the novel gene is restricted to the Escherichia/Shigella clade and might have originated recently by overprinting leading to the expression of part of the antisense strand of ECs5115. Conclusions Here, we present evidence of a novel small protein-coding gene laoB encoded in the antisense frame −2 of the annotated gene ECs5115. Clearly, laoB is evolutionarily young and it originated in the Escherichia/Shigella clade by overprinting, a process which may cause the de novo evolution of bacterial genes like laoB. Electronic supplementary material The online version of this article (10.1186/s12862-018-1134-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarah M Hücker
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,Fraunhofer ITEM-R, Am Biopark 9, 93053, Regensburg, Germany
| | - Sonja Vanderhaeghen
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Isabel Abellan-Schneyder
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Romy Wecko
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Svenja Simon
- Department of Computer and Information Science, University of Konstanz, Box 78, 78457, Konstanz, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany. .,Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| |
Collapse
|
68
|
Ndah E, Jonckheere V, Giess A, Valen E, Menschaert G, Van Damme P. REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes. Nucleic Acids Res 2017; 45:e168. [PMID: 28977509 PMCID: PMC5714196 DOI: 10.1093/nar/gkx758] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2017] [Accepted: 08/17/2017] [Indexed: 12/13/2022] Open
Abstract
Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames.
Collapse
Affiliation(s)
- Elvis Ndah
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium.,Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, Belgium
| | - Veronique Jonckheere
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Adam Giess
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5020, Norway
| | - Eivind Valen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5020, Norway.,Sars International Centre for Marine Molecular Biology, University of Bergen, 5008 Bergen, Norway
| | - Gerben Menschaert
- Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, Belgium
| | - Petra Van Damme
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| |
Collapse
|
69
|
Omasits U, Varadarajan AR, Schmid M, Goetze S, Melidis D, Bourqui M, Nikolayeva O, Québatte M, Patrignani A, Dehio C, Frey JE, Robinson MD, Wollscheid B, Ahrens CH. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 2017; 27:2083-2095. [PMID: 29141959 PMCID: PMC5741054 DOI: 10.1101/gr.218255.116] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 10/25/2017] [Indexed: 12/18/2022]
Abstract
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.
Collapse
Affiliation(s)
- Ulrich Omasits
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Adithi R Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland.,Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Michael Schmid
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Goetze
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Damianos Melidis
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Marc Bourqui
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Olga Nikolayeva
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | | | - Andrea Patrignani
- Functional Genomics Center Zurich, ETH & UZH Zurich, CH-8057 Zurich, Switzerland
| | | | - Juerg E Frey
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Mark D Robinson
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | - Bernd Wollscheid
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Christian H Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| |
Collapse
|
70
|
Yuan P, D'Lima NG, Slavoff SA. Comparative Membrane Proteomics Reveals a Nonannotated E. coli Heat Shock Protein. Biochemistry 2017; 57:56-60. [PMID: 29039649 PMCID: PMC5761644 DOI: 10.1021/acs.biochem.7b00864] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Recent advances in proteomics and genomics have enabled discovery of thousands of previously nonannotated small open reading frames (smORFs) in genomes across evolutionary space. Furthermore, quantitative mass spectrometry has recently been applied to analysis of regulated smORF expression. However, bottom-up proteomics has remained relatively insensitive to membrane proteins, suggesting they may have been underdetected in previous studies. In this report, we add biochemical membrane protein enrichment to our previously developed label-free quantitative proteomics protocol, revealing a never-before-identified heat shock protein in Escherichia coli K12. This putative smORF-encoded heat shock protein, GndA, is likely to be ∼36-55 amino acids in length and contains a predicted transmembrane helix. We validate heat shock-regulated expression of the gndA smORF and demonstrate that a GndA-GFP fusion protein cofractionates with the cell membrane. Quantitative membrane proteomics therefore has the ability to reveal nonannotated small proteins that may play roles in bacterial stress responses.
Collapse
Affiliation(s)
- Peijia Yuan
- Department of Chemistry, Yale University , New Haven, Connecticut 06520, United States.,Chemical Biology Institute, Yale University , West Haven, Connecticut 06516, United States
| | - Nadia G D'Lima
- Department of Chemistry, Yale University , New Haven, Connecticut 06520, United States.,Chemical Biology Institute, Yale University , West Haven, Connecticut 06516, United States
| | - Sarah A Slavoff
- Department of Chemistry, Yale University , New Haven, Connecticut 06520, United States.,Chemical Biology Institute, Yale University , West Haven, Connecticut 06516, United States.,Department of Molecular Biophysics and Biochemistry, Yale University , New Haven, Connecticut 06529, United States
| |
Collapse
|
71
|
D'Lima NG, Khitun A, Rosenbloom AD, Yuan P, Gassaway BM, Barber KW, Rinehart J, Slavoff SA. Comparative Proteomics Enables Identification of Nonannotated Cold Shock Proteins in E. coli. J Proteome Res 2017; 16:3722-3731. [PMID: 28861998 PMCID: PMC5647875 DOI: 10.1021/acs.jproteome.7b00419] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
![]()
Recent advances in mass spectrometry-based
proteomics have revealed
translation of previously nonannotated microproteins from thousands
of small open reading frames (smORFs) in prokaryotic and eukaryotic
genomes. Facile methods to determine cellular functions of these newly
discovered microproteins are now needed. Here, we couple semiquantitative
comparative proteomics with whole-genome database searching to identify
two nonannotated, homologous cold shock-regulated microproteins in Escherichia coli K12 substr. MG1655, as well as two
additional constitutively expressed microproteins. We apply molecular
genetic approaches to confirm expression of these cold shock proteins
(YmcF and YnfQ) at reduced temperatures and identify the noncanonical
ATT start codons that initiate their translation. These proteins are
conserved in related Gram-negative bacteria and are predicted to be
structured, which, in combination with their cold shock upregulation,
suggests that they are likely to have biological roles in the cell.
These results reveal that previously unknown factors are involved
in the response of E. coli to lowered
temperatures and suggest that further nonannotated, stress-regulated E. coli microproteins may remain to be found. More
broadly, comparative proteomics may enable discovery of regulated,
and therefore potentially functional, products of smORF translation
across many different organisms and conditions.
Collapse
Affiliation(s)
- Nadia G D'Lima
- Department of Chemistry, Yale University , New Haven, Connecticut 06520, United States.,Chemical Biology Institute, Yale University , West Haven, Connecticut 06516, United States
| | - Alexandra Khitun
- Department of Chemistry, Yale University , New Haven, Connecticut 06520, United States.,Chemical Biology Institute, Yale University , West Haven, Connecticut 06516, United States
| | - Aaron D Rosenbloom
- Department of Chemistry, Yale University , New Haven, Connecticut 06520, United States
| | - Peijia Yuan
- Department of Chemistry, Yale University , New Haven, Connecticut 06520, United States.,Chemical Biology Institute, Yale University , West Haven, Connecticut 06516, United States
| | - Brandon M Gassaway
- Department of Cellular and Molecular Physiology, Yale University , New Haven, Connecticut 06520, United States.,Systems Biology Institute, Yale University , West Haven, Connecticut 06511, United States
| | - Karl W Barber
- Department of Cellular and Molecular Physiology, Yale University , New Haven, Connecticut 06520, United States.,Systems Biology Institute, Yale University , West Haven, Connecticut 06511, United States
| | - Jesse Rinehart
- Department of Cellular and Molecular Physiology, Yale University , New Haven, Connecticut 06520, United States.,Systems Biology Institute, Yale University , West Haven, Connecticut 06511, United States
| | - Sarah A Slavoff
- Department of Chemistry, Yale University , New Haven, Connecticut 06520, United States.,Chemical Biology Institute, Yale University , West Haven, Connecticut 06516, United States.,Department of Molecular Biophysics and Biochemistry, Yale University , New Haven, Connecticut 06529, United States
| |
Collapse
|
72
|
Hücker SM, Ardern Z, Goldberg T, Schafferhans A, Bernhofer M, Vestergaard G, Nelson CW, Schloter M, Rost B, Scherer S, Neuhaus K. Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome. PLoS One 2017; 12:e0184119. [PMID: 28902868 PMCID: PMC5597208 DOI: 10.1371/journal.pone.0184119] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 08/20/2017] [Indexed: 12/29/2022] Open
Abstract
In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.
Collapse
Affiliation(s)
- Sarah M. Hücker
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Tatyana Goldberg
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Andrea Schafferhans
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Michael Bernhofer
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Gisle Vestergaard
- Research Unit Environmental Genomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Chase W. Nelson
- Sackler Institute for Comparative Genomics, American Museum of Natural History New York, New York, United States of America
| | - Michael Schloter
- Research Unit Environmental Genomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Burkhard Rost
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
- * E-mail:
| |
Collapse
|
73
|
Friedman RC, Kalkhof S, Doppelt-Azeroual O, Mueller SA, Chovancová M, von Bergen M, Schwikowski B. Common and phylogenetically widespread coding for peptides by bacterial small RNAs. BMC Genomics 2017; 18:553. [PMID: 28732463 PMCID: PMC5521070 DOI: 10.1186/s12864-017-3932-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2016] [Accepted: 07/09/2017] [Indexed: 12/14/2022] Open
Abstract
Background While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. Methods Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in 14 phylogenetically diverse bacteria. Importantly, we quantify uncertainty in our predictions, and follow up on them using mass spectrometry proteomics and comparison to datasets including ribosome profiling. Results A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 409±191.7 unannotated sRNA ORFs are under selection to maintain coding (mean estimate and 95% confidence interval), an average of 29 per species considered here. This implies that overall at least 10.3±0.5% of sRNAs have a coding ORF, and in some species around 20% do. 165±69 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated in published ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are predicted novel components of type I toxin/antitoxin systems. Conclusions We predict over two dozen new protein-coding genes per bacterial species, but crucially also quantified the uncertainty in this estimate. Our predictions for sRNA coding ORFs, along with predicted novel type I toxins and tools for sorting and visualizing genomic context, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr. We expect these easily-accessible predictions to be a valuable tool for the study not only of bacterial sRNAs and type I toxin-antitoxin systems, but also of bacterial genetics and genomics. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3932-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Robin C Friedman
- Systems Biology Laboratory, Department of Genomes and Genetics, Institut Pasteur, Paris, France. .,Molecular Microbial Pathogenesis Unit, Department of Cell Biology and Infection, Institut Pasteur, Paris, France. .,Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France.
| | - Stefan Kalkhof
- Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany.,Current Address: Department of Bioanalytics, University of Applied Sciences and Arts of Coburg, Coburg, Germany
| | - Olivia Doppelt-Azeroual
- Bioinformatics and Biostatistics Hub, C3BI, USR 3756 IP CNRS, Institut Pasteur, Paris, France
| | - Stephan A Mueller
- Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany.,Current Address: Neuroproteomics, German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
| | - Martina Chovancová
- Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany.,Institute of Biochemistry, University of Leipzig, Leipzig, Germany
| | - Benno Schwikowski
- Systems Biology Laboratory, Department of Genomes and Genetics, Institut Pasteur, Paris, France.,Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France
| |
Collapse
|
74
|
|
75
|
Orfanoudaki G, Markaki M, Chatzi K, Tsamardinos I, Economou A. MatureP: prediction of secreted proteins with exclusive information from their mature regions. Sci Rep 2017; 7:3263. [PMID: 28607462 PMCID: PMC5468347 DOI: 10.1038/s41598-017-03557-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 04/28/2017] [Indexed: 11/09/2022] Open
Abstract
More than a third of the cellular proteome is non-cytoplasmic. Most secretory proteins use the Sec system for export and are targeted to membranes using signal peptides and mature domains. To specifically analyze bacterial mature domain features, we developed MatureP, a classifier that predicts secretory sequences through features exclusively computed from their mature domains. MatureP was trained using Just Add Data Bio, an automated machine learning tool. Mature domains are predicted efficiently with ~92% success, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC). Predictions were validated using experimental datasets of mutated secretory proteins. The features selected by MatureP reveal prominent differences in amino acid content between secreted and cytoplasmic proteins. Amino-terminal mature domain sequences have enhanced disorder, more hydroxyl and polar residues and less hydrophobics. Cytoplasmic proteins have prominent amino-terminal hydrophobic stretches and charged regions downstream. Presumably, secretory mature domains comprise a distinct protein class. They balance properties that promote the necessary flexibility required for the maintenance of non-folded states during targeting and secretion with the ability of post-secretion folding. These findings provide novel insight in protein trafficking, sorting and folding mechanisms and may benefit protein secretion biotechnology.
Collapse
Affiliation(s)
- Georgia Orfanoudaki
- Institute of Molecular Biology and Biotechnology-FORTH and Department of Biology-University of Crete, PO Box 1385, Heraklion, Crete, Greece
| | - Maria Markaki
- Computer Science Department, University of Crete, Heraklion, Greece
| | - Katerina Chatzi
- KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, B-3000, Leuven, Belgium
| | - Ioannis Tsamardinos
- Computer Science Department, University of Crete, Heraklion, Greece.,Gnosis Data Analysis PC, Heraklion, Greece
| | - Anastassios Economou
- Institute of Molecular Biology and Biotechnology-FORTH and Department of Biology-University of Crete, PO Box 1385, Heraklion, Crete, Greece. .,KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, B-3000, Leuven, Belgium.
| |
Collapse
|
76
|
Increasing intracellular magnesium levels with the 31-amino acid MgtS protein. Proc Natl Acad Sci U S A 2017; 114:5689-5694. [PMID: 28512220 DOI: 10.1073/pnas.1703415114] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Synthesis of the 31-amino acid, inner membrane protein MgtS (formerly denoted YneM) is induced by very low Mg2+ in a PhoPQ-dependent manner in Escherichia coli Here we report that MgtS acts to increase intracellular Mg2+ levels and maintain cell integrity upon Mg2+ depletion. Upon development of a functional tagged derivative of MgtS, we found that MgtS interacts with MgtA to increase the levels of this P-type ATPase Mg2+ transporter under Mg2+-limiting conditions. Correspondingly, the effects of MgtS upon Mg2+ limitation are lost in a ∆mgtA mutant, and MgtA overexpression can suppress the ∆mgtS phenotype. MgtS stabilization of MgtA provides an additional layer of regulation of this tightly controlled Mg2+ transporter and adds to the list of small proteins that regulate inner membrane transporters.
Collapse
|
77
|
Abstract
Increasing evidence indicates that many, if not all, small genes encoding proteins ≤100 aa are missing in annotations of bacterial genomes currently available. To uncover unannotated small genes in the model bacterium Salmonella enterica Typhimurium 14028s, we used the genomic technique ribosome profiling, which provides a snapshot of all mRNAs being translated (translatome) in a given growth condition. For comprehensive identification of unannotated small genes, we obtained Salmonella translatomes from four different growth conditions: LB, MOPS rich defined medium, and two infection-relevant conditions low Mg2+ (10 µM) and low pH (5.8). To facilitate the identification of small genes, ribosome profiling data were analyzed in combination with in silico predicted putative open reading frames and transcriptome profiles. As a result, we uncovered 130 unannotated ORFs. Of them, 98% were small ORFs putatively encoding peptides/proteins ≤100 aa, and some of them were only expressed in the infection-relevant low Mg2+ and/or low pH condition. We validated the expression of 25 of these ORFs by western blot, including the smallest, which encodes a peptide of 7 aa residues. Our results suggest that many sequenced bacterial genomes are underannotated with regard to small genes and their gene annotations need to be revised.
Collapse
|
78
|
Neuhaus K, Landstorfer R, Simon S, Schober S, Wright PR, Smith C, Backofen R, Wecko R, Keim DA, Scherer S. Differentiation of ncRNAs from small mRNAs in Escherichia coli O157:H7 EDL933 (EHEC) by combined RNAseq and RIBOseq - ryhB encodes the regulatory RNA RyhB and a peptide, RyhP. BMC Genomics 2017; 18:216. [PMID: 28245801 PMCID: PMC5331693 DOI: 10.1186/s12864-017-3586-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2016] [Accepted: 02/13/2017] [Indexed: 12/14/2022] Open
Abstract
Background While NGS allows rapid global detection of transcripts, it remains difficult to distinguish ncRNAs from short mRNAs. To detect potentially translated RNAs, we developed an improved protocol for bacterial ribosomal footprinting (RIBOseq). This allowed distinguishing ncRNA from mRNA in EHEC. A high ratio of ribosomal footprints per transcript (ribosomal coverage value, RCV) is expected to indicate a translated RNA, while a low RCV should point to a non-translated RNA. Results Based on their low RCV, 150 novel non-translated EHEC transcripts were identified as putative ncRNAs, representing both antisense and intergenic transcripts, 74 of which had expressed homologs in E. coli MG1655. Bioinformatics analysis predicted statistically significant target regulons for 15 of the intergenic transcripts; experimental analysis revealed 4-fold or higher differential expression of 46 novel ncRNA in different growth media. Out of 329 annotated EHEC ncRNAs, 52 showed an RCV similar to protein-coding genes, of those, 16 had RIBOseq patterns matching annotated genes in other enterobacteriaceae, and 11 seem to possess a Shine-Dalgarno sequence, suggesting that such ncRNAs may encode small proteins instead of being solely non-coding. To support that the RIBOseq signals are reflecting translation, we tested the ribosomal-footprint covered ORF of ryhB and found a phenotype for the encoded peptide in iron-limiting condition. Conclusion Determination of the RCV is a useful approach for a rapid first-step differentiation between bacterial ncRNAs and small mRNAs. Further, many known ncRNAs may encode proteins as well. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3586-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Klaus Neuhaus
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany. .,Core Facility Microbiome/NGS, ZIEL Institute for Food & Health, Weihenstephaner Berg 3, D-85354, Freising, Germany.
| | - Richard Landstorfer
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany
| | - Svenja Simon
- Informatik und Informationswissenschaft, Universität Konstanz, D-78457, Konstanz, Germany
| | - Steffen Schober
- Institut für Nachrichtentechnik, Universität Ulm, Albert-Einstein-Allee 43, D-89081, Ulm, Germany
| | - Patrick R Wright
- Bioinformatics Group, Department of Computer Science and BIOSS Centre for Biological Signaling Studies, Cluster of Excellence, University of Freiburg, D-79110, Freiburg, Germany
| | - Cameron Smith
- Bioinformatics Group, Department of Computer Science and BIOSS Centre for Biological Signaling Studies, Cluster of Excellence, University of Freiburg, D-79110, Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science and BIOSS Centre for Biological Signaling Studies, Cluster of Excellence, University of Freiburg, D-79110, Freiburg, Germany
| | - Romy Wecko
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany
| | - Daniel A Keim
- Informatik und Informationswissenschaft, Universität Konstanz, D-78457, Konstanz, Germany
| | - Siegfried Scherer
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, D-85354, Freising, Germany
| |
Collapse
|
79
|
Yang X, Jensen SI, Wulff T, Harrison SJ, Long KS. Identification and validation of novel small proteins in Pseudomonas putida. ENVIRONMENTAL MICROBIOLOGY REPORTS 2016; 8:966-974. [PMID: 27717237 DOI: 10.1111/1758-2229.12473] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 09/06/2016] [Accepted: 09/10/2016] [Indexed: 06/06/2023]
Abstract
Small proteins of 50 amino acids or less have been understudied due to difficulties that impede their annotation and detection. In order to obtain information on small open reading frames (sORFs) in Pseudomonas putida, bioinformatic and proteomic approaches were used to identify putative sORFs in the well-characterized strain KT2440. A plasmid-based system was established for sORF validation, enabling expression of C-terminal sequential peptide affinity tagged variants and their detection via protein immunoblotting. Out of 22 tested putative sORFs, the expression of 14 sORFs was confirmed, where all except one are novel. All of the validated sORFs except one are located adjacent to annotated genes on the same strand and three are in close proximity to genes with known functions. These include an ABC transporter operon and the two transcriptional regulators Fis and CysB involved in biofilm formation and cysteine biosynthesis respectively. The work sheds light on the P. putida small proteome and small protein identification, a necessary first step towards gaining insights into their functions and possible evolutionary implications.
Collapse
Affiliation(s)
- Xiaochen Yang
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
| | - Sheila I Jensen
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
| | - Tune Wulff
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
| | - Scott J Harrison
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
| | - Katherine S Long
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
80
|
Baumgartner D, Kopf M, Klähn S, Steglich C, Hess WR. Small proteins in cyanobacteria provide a paradigm for the functional analysis of the bacterial micro-proteome. BMC Microbiol 2016; 16:285. [PMID: 27894276 PMCID: PMC5126843 DOI: 10.1186/s12866-016-0896-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 11/14/2016] [Indexed: 12/21/2022] Open
Abstract
Background Despite their versatile functions in multimeric protein complexes, in the modification of enzymatic activities, intercellular communication or regulatory processes, proteins shorter than 80 amino acids (μ-proteins) are a systematically underestimated class of gene products in bacteria. Photosynthetic cyanobacteria provide a paradigm for small protein functions due to extensive work on the photosynthetic apparatus that led to the functional characterization of 19 small proteins of less than 50 amino acids. In analogy, previously unstudied small ORFs with similar degrees of conservation might encode small proteins of high relevance also in other functional contexts. Results Here we used comparative transcriptomic information available for two model cyanobacteria, Synechocystis sp. PCC 6803 and Synechocystis sp. PCC 6714 for the prediction of small ORFs. We found 293 transcriptional units containing candidate small ORFs ≤80 codons in Synechocystis sp. PCC 6803, also including the known mRNAs encoding small proteins of the photosynthetic apparatus. From these transcriptional units, 146 are shared between the two strains, 42 are shared with the higher plant Arabidopsis thaliana and 25 with E. coli. To verify the existence of the respective μ-proteins in vivo, we selected five genes as examples to which a FLAG tag sequence was added and re-introduced them into Synechocystis sp. PCC 6803. These were the previously annotated gene ssr1169, two newly defined genes norf1 and norf4, as well as nsiR6(nitrogen stress-induced RNA 6) and hliR1(high light-inducible RNA 1) , which originally were considered non-coding. Upon activation of expression via the Cu2+.responsive petE promoter or from the native promoters, all five proteins were detected in Western blot experiments. Conclusions The distribution and conservation of these five genes as well as their regulation of expression and the physico-chemical properties of the encoded proteins underline the likely great bandwidth of small protein functions in bacteria and makes them attractive candidates for functional studies.
Collapse
Affiliation(s)
- Desiree Baumgartner
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, D-79104, Freiburg, Germany
| | - Matthias Kopf
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, D-79104, Freiburg, Germany.,Present Address: Molecular Health GmbH, Kurfürsten-Anlage 21, 69115, Heidelberg, Germany
| | - Stephan Klähn
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, D-79104, Freiburg, Germany
| | - Claudia Steglich
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, D-79104, Freiburg, Germany
| | - Wolfgang R Hess
- University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Schänzlestr. 1, D-79104, Freiburg, Germany.
| |
Collapse
|
81
|
Gifsy-1 Prophage IsrK with Dual Function as Small and Messenger RNA Modulates Vital Bacterial Machineries. PLoS Genet 2016; 12:e1005975. [PMID: 27057757 PMCID: PMC4825925 DOI: 10.1371/journal.pgen.1005975] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Accepted: 03/14/2016] [Indexed: 11/20/2022] Open
Abstract
While an increasing number of conserved small regulatory RNAs (sRNAs) are known to function in general bacterial physiology, the roles and modes of action of sRNAs from horizontally acquired genomic regions remain little understood. The IsrK sRNA of Gifsy-1 prophage of Salmonella belongs to the latter class. This regulatory RNA exists in two isoforms. The first forms, when a portion of transcripts originating from isrK promoter reads-through the IsrK transcription-terminator producing a translationally inactive mRNA target. Acting in trans, the second isoform, short IsrK RNA, binds the inactive transcript rendering it translationally active. By switching on translation of the first isoform, short IsrK indirectly activates the production of AntQ, an antiterminator protein located upstream of isrK. Expression of antQ globally interferes with transcription termination resulting in bacterial growth arrest and ultimately cell death. Escherichia coli and Salmonella cells expressing AntQ display condensed chromatin morphology and localization of UvrD to the nucleoid. The toxic phenotype of AntQ can be rescued by co-expression of the transcription termination factor, Rho, or RNase H, which protects genomic DNA from breaks by resolving R-loops. We propose that AntQ causes conflicts between transcription and replication machineries and thus promotes DNA damage. The isrK locus represents a unique example of an island-encoded sRNA that exerts a highly complex regulatory mechanism to tune the expression of a toxic protein.
Collapse
|
82
|
Pueyo JI, Magny EG, Sampson CJ, Amin U, Evans IR, Bishop SA, Couso JP. Hemotin, a Regulator of Phagocytosis Encoded by a Small ORF and Conserved across Metazoans. PLoS Biol 2016; 14:e1002395. [PMID: 27015288 PMCID: PMC4807881 DOI: 10.1371/journal.pbio.1002395] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 01/29/2016] [Indexed: 12/12/2022] Open
Abstract
Translation of hundreds of small ORFs (smORFs) of less than 100 amino acids has recently been revealed in vertebrates and Drosophila. Some of these peptides have essential and conserved cellular functions. In Drosophila, we have predicted a particular smORF class encoding ~80 aa hydrophobic peptides, which may function in membranes and cell organelles. Here, we characterise hemotin, a gene encoding an 88aa transmembrane smORF peptide localised to early endosomes in Drosophila macrophages. hemotin regulates endosomal maturation during phagocytosis by repressing the cooperation of 14-3-3ζ with specific phosphatidylinositol (PI) enzymes. hemotin mutants accumulate undigested phagocytic material inside enlarged endo-lysosomes and as a result, hemotin mutants have reduced ability to fight bacteria, and hence, have severely reduced life span and resistance to infections. We identify Stannin, a peptide involved in organometallic toxicity, as the Hemotin functional homologue in vertebrates, showing that this novel regulator of phagocytic processing is widely conserved, emphasizing the significance of smORF peptides in cell biology and disease.
Collapse
Affiliation(s)
- José I. Pueyo
- Brighton and Sussex Medical School, University of Sussex, Brighton, United Kingdom
| | - Emile G. Magny
- Brighton and Sussex Medical School, University of Sussex, Brighton, United Kingdom
| | | | - Unum Amin
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Iwan R. Evans
- Department of Infection and Immunity and the Bateson Centre, University of Sheffield, Sheffield, South Yorkshire, United Kingdom
| | - Sarah A. Bishop
- Brighton and Sussex Medical School, University of Sussex, Brighton, United Kingdom
| | - Juan P. Couso
- Brighton and Sussex Medical School, University of Sussex, Brighton, United Kingdom
- * E-mail:
| |
Collapse
|
83
|
Ma J, Diedrich JK, Jungreis I, Donaldson C, Vaughan J, Kellis M, Yates JR, Saghatelian A. Improved Identification and Analysis of Small Open Reading Frame Encoded Polypeptides. Anal Chem 2016; 88:3967-75. [PMID: 27010111 DOI: 10.1021/acs.analchem.6b00191] [Citation(s) in RCA: 95] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Computational, genomic, and proteomic approaches have been used to discover nonannotated protein-coding small open reading frames (smORFs). Some novel smORFs have crucial biological roles in cells and organisms, which motivates the search for additional smORFs. Proteomic smORF discovery methods are advantageous because they detect smORF-encoded polypeptides (SEPs) to validate smORF translation and SEP stability. Because SEPs are shorter and less abundant than average proteins, SEP detection using proteomics faces unique challenges. Here, we optimize several steps in the SEP discovery workflow to improve SEP isolation and identification. These changes have led to the detection of several new human SEPs (novel human genes), improved confidence in the SEP assignments, and enabled quantification of SEPs under different cellular conditions. These improvements will allow faster detection and characterization of new SEPs and smORFs.
Collapse
Affiliation(s)
- Jiao Ma
- Department of Chemistry and Chemical Biology, Harvard University , 12 Oxford Street, Cambridge, Massachusetts 02138, United States.,Salk Institute for Biological Studies, Clayton Foundation Laboratories for Peptide Biology , 10010 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Jolene K Diedrich
- Salk Institute for Biological Studies, Clayton Foundation Laboratories for Peptide Biology , 10010 North Torrey Pines Road, La Jolla, California 92037, United States.,Department of Chemical Physiology, The Scripps Research Institute , 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , 32 Vassar Street, Cambridge, Massachusetts 02139, United States.,The Broad Institute of MIT and Harvard , 7 Cambridge Center, Cambridge, Massachusetts 02139, United States
| | - Cynthia Donaldson
- Salk Institute for Biological Studies, Clayton Foundation Laboratories for Peptide Biology , 10010 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Joan Vaughan
- Salk Institute for Biological Studies, Clayton Foundation Laboratories for Peptide Biology , 10010 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology , 32 Vassar Street, Cambridge, Massachusetts 02139, United States.,The Broad Institute of MIT and Harvard , 7 Cambridge Center, Cambridge, Massachusetts 02139, United States
| | - John R Yates
- Salk Institute for Biological Studies, Clayton Foundation Laboratories for Peptide Biology , 10010 North Torrey Pines Road, La Jolla, California 92037, United States.,Department of Chemical Physiology, The Scripps Research Institute , 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Alan Saghatelian
- Salk Institute for Biological Studies, Clayton Foundation Laboratories for Peptide Biology , 10010 North Torrey Pines Road, La Jolla, California 92037, United States
| |
Collapse
|
84
|
Nakahigashi K, Takai Y, Kimura M, Abe N, Nakayashiki T, Shiwa Y, Yoshikawa H, Wanner BL, Ishihama Y, Mori H. Comprehensive identification of translation start sites by tetracycline-inhibited ribosome profiling. DNA Res 2016; 23:193-201. [PMID: 27013550 PMCID: PMC4909307 DOI: 10.1093/dnares/dsw008] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2015] [Accepted: 02/06/2016] [Indexed: 01/12/2023] Open
Abstract
Tetracycline-inhibited ribosome profiling (TetRP) provides a powerful new experimental tool for comprehensive genome-wide identification of translation initiation sites in bacteria. We validated TetRP by confirming the translation start sites of protein-coding genes in accordance with the 2006 version of Escherichia coli K-12 annotation record (GenBank U00096.2) and found ∼150 new start sites within 60 nucleotides of the annotated site. This analysis revealed 72 per cent of the genes whose initiation site annotations were changed from the 2006 GenBank record to the newer 2014 annotation record (GenBank U00096.3), indicating a high sensitivity. Also, results from reporter fusion and proteomics of N-terminally enriched peptides showed high specificity of the TetRP results. In addition, we discovered over 300 translation start sites within non-coding, intergenic regions of the genome, using a threshold that retains ∼2,000 known coding genes. While some appear to correspond to pseudogenes, others may encode small peptides or have previously unforeseen roles. In summary, we showed that ribosome profiling upon translation inhibition by tetracycline offers a simple, reliable and comprehensive experimental tool for precise annotation of translation start sites of expressed genes in bacteria.
Collapse
Affiliation(s)
- Kenji Nakahigashi
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan
| | - Yuki Takai
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan
| | - Michiko Kimura
- Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Nozomi Abe
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan
| | - Toru Nakayashiki
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan
| | - Yuh Shiwa
- Genome Research Center, NODAI Research Institute, Tokyo University of Agriculture, Tokyo 156-8502, Japan
| | - Hirofumi Yoshikawa
- Genome Research Center, NODAI Research Institute, Tokyo University of Agriculture, Tokyo 156-8502, Japan Department of Bioscience, Tokyo University of Agriculture, Tokyo 156-8502, Japan
| | - Barry L Wanner
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Hirotada Mori
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan
| |
Collapse
|
85
|
Neuhaus K, Landstorfer R, Fellner L, Simon S, Schafferhans A, Goldberg T, Marx H, Ozoline ON, Rost B, Kuster B, Keim DA, Scherer S. Translatomics combined with transcriptomics and proteomics reveals novel functional, recently evolved orphan genes in Escherichia coli O157:H7 (EHEC). BMC Genomics 2016; 17:133. [PMID: 26911138 PMCID: PMC4765031 DOI: 10.1186/s12864-016-2456-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 02/09/2016] [Indexed: 12/30/2022] Open
Abstract
Background Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). Results Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. Conclusions These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2456-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Klaus Neuhaus
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| | - Richard Landstorfer
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| | - Lea Fellner
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| | - Svenja Simon
- Lehrstuhl für Datenanalyse und Visualisierung, Fachbereich Informatik und Informationswissenschaft, Universität Konstanz, Box 78, 78457, Konstanz, Germany.
| | - Andrea Schafferhans
- Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
| | - Tatyana Goldberg
- Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
| | - Harald Marx
- Chair of Proteomics and Bioanalytics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354, Freising, Germany.
| | - Olga N Ozoline
- Institute of Cell Biophysics, Russian Academy of Sciences, Moscow Region, 142290, Pushchino, Russia.
| | - Burkhard Rost
- Department of Informatics - Bioinformatics & TUM-IAS, Technische Universität München, Boltzmannstraße 3, 85748, Garching, Germany.
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354, Freising, Germany. .,Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technische Universität München, Gregor-Mendel-Str. 4, 85354, Freising, Germany.
| | - Daniel A Keim
- Lehrstuhl für Datenanalyse und Visualisierung, Fachbereich Informatik und Informationswissenschaft, Universität Konstanz, Box 78, 78457, Konstanz, Germany.
| | - Siegfried Scherer
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs- und Lebensmittelforschung, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| |
Collapse
|
86
|
Saghatelian A, Couso JP. Discovery and characterization of smORF-encoded bioactive polypeptides. Nat Chem Biol 2015; 11:909-16. [PMID: 26575237 PMCID: PMC4956473 DOI: 10.1038/nchembio.1964] [Citation(s) in RCA: 184] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 10/19/2015] [Indexed: 12/13/2022]
Abstract
Analysis of genomes, transcriptomes and proteomes reveals the existence of hundreds to thousands of translated, yet non-annotated, short open reading frames (small ORFs or smORFs). The discovery of smORFs and their protein products, smORF-encoded polypeptides (SEPs), points to a fundamental gap in our knowledge of protein-coding genes. Various studies have identified central roles for smORFs in metabolism, apoptosis and development. The discovery of these bioactive SEPs emphasizes the functional potential of this unexplored class of biomolecules. Here, we provide an overview of this emerging field and highlight the opportunities for chemical biology to answer fundamental questions about these novel genes. Such studies will provide new insights into the protein-coding potential of genomes and identify functional genes with roles in biology and disease.
Collapse
Affiliation(s)
- Alan Saghatelian
- Clayton Foundation Laboratories for Peptide Biology, Helmsley Center for Genomic Medicine, Salk Institute for Biological Studies, San Diego, CA 92037
| | - Juan Pablo Couso
- School of Life Sciences, University of Sussex, Falmer, Brighton, BN1 6PU, UK
| |
Collapse
|
87
|
Papanastasiou M, Orfanoudaki G, Kountourakis N, Koukaki M, Sardis MF, Aivaliotis M, Tsolis KC, Karamanou S, Economou A. Rapid label-free quantitative analysis of the E. coli
BL21(DE3) inner membrane proteome. Proteomics 2015; 16:85-97. [DOI: 10.1002/pmic.201500304] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2015] [Revised: 09/05/2015] [Accepted: 10/12/2015] [Indexed: 12/12/2022]
Affiliation(s)
- Malvina Papanastasiou
- Institute of Molecular Biology and Biotechnology; Foundation for Research & Technology; Iraklio Greece
- Department Pathology & Laboratory Medicine, Perelman School of Medicine; University of Pennsylvania; Philadelphia USA
| | - Georgia Orfanoudaki
- Institute of Molecular Biology and Biotechnology; Foundation for Research & Technology; Iraklio Greece
- Department of Biology; University of Crete; Iraklio Greece
| | - Nikos Kountourakis
- Institute of Molecular Biology and Biotechnology; Foundation for Research & Technology; Iraklio Greece
| | - Marina Koukaki
- Institute of Molecular Biology and Biotechnology; Foundation for Research & Technology; Iraklio Greece
| | - Marios Frantzeskos Sardis
- Institute of Molecular Biology and Biotechnology; Foundation for Research & Technology; Iraklio Greece
- Laboratory of Molecular Bacteriology, Rega Institute, Department of Microbiology and Immunology; Katholieke Universiteit Leuven; Leuven Belgium
| | - Michalis Aivaliotis
- Institute of Molecular Biology and Biotechnology; Foundation for Research & Technology; Iraklio Greece
| | - Konstantinos C. Tsolis
- Institute of Molecular Biology and Biotechnology; Foundation for Research & Technology; Iraklio Greece
- Department of Biology; University of Crete; Iraklio Greece
- Laboratory of Molecular Bacteriology, Rega Institute, Department of Microbiology and Immunology; Katholieke Universiteit Leuven; Leuven Belgium
| | - Spyridoula Karamanou
- Institute of Molecular Biology and Biotechnology; Foundation for Research & Technology; Iraklio Greece
- Laboratory of Molecular Bacteriology, Rega Institute, Department of Microbiology and Immunology; Katholieke Universiteit Leuven; Leuven Belgium
| | - Anastassios Economou
- Institute of Molecular Biology and Biotechnology; Foundation for Research & Technology; Iraklio Greece
- Department of Biology; University of Crete; Iraklio Greece
- Laboratory of Molecular Bacteriology, Rega Institute, Department of Microbiology and Immunology; Katholieke Universiteit Leuven; Leuven Belgium
| |
Collapse
|
88
|
Tian Y, Yang JE. Emerging landscape of short open reading frame-encoded peptides. Shijie Huaren Xiaohua Zazhi 2015; 23:4954-4960. [DOI: 10.11569/wcjd.v23.i31.4954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Short open reading frames (sORFs) are a common feature of genomes of human and other species, but their coding potential remains unknown. Innovations in proteomics and high-throughput analyses of translation start sites have resulted in the identification of hundreds of putative coding sORFs, and some of them have been verified to be able to translated into short peptides (<100 amino acids). Moreover, recent findings reveal their diverse functions in various biological processes including development and differentiation. This review discusses the translation, identification and biological function of short peptides.
Collapse
|
89
|
Leaderless Transcripts and Small Proteins Are Common Features of the Mycobacterial Translational Landscape. PLoS Genet 2015; 11:e1005641. [PMID: 26536359 PMCID: PMC4633059 DOI: 10.1371/journal.pgen.1005641] [Citation(s) in RCA: 163] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 10/10/2015] [Indexed: 11/19/2022] Open
Abstract
RNA-seq technologies have provided significant insight into the transcription networks of mycobacteria. However, such studies provide no definitive information on the translational landscape. Here, we use a combination of high-throughput transcriptome and proteome-profiling approaches to more rigorously understand protein expression in two mycobacterial species. RNA-seq and ribosome profiling in Mycobacterium smegmatis, and transcription start site (TSS) mapping and N-terminal peptide mass spectrometry in Mycobacterium tuberculosis, provide complementary, empirical datasets to examine the congruence of transcription and translation in the Mycobacterium genus. We find that nearly one-quarter of mycobacterial transcripts are leaderless, lacking a 5’ untranslated region (UTR) and Shine-Dalgarno ribosome-binding site. Our data indicate that leaderless translation is a major feature of mycobacterial genomes and is comparably robust to leadered initiation. Using translational reporters to systematically probe the cis-sequence requirements of leaderless translation initiation in mycobacteria, we find that an ATG or GTG at the mRNA 5’ end is both necessary and sufficient. This criterion, together with our ribosome occupancy data, suggests that mycobacteria encode hundreds of small, unannotated proteins at the 5’ ends of transcripts. The conservation of small proteins in both mycobacterial species tested suggests that some play important roles in mycobacterial physiology. Our translational-reporter system further indicates that mycobacterial leadered translation initiation requires a Shine Dalgarno site in the 5’ UTR and that ATG, GTG, TTG, and ATT codons can robustly initiate translation. Our combined approaches provide the first comprehensive view of mycobacterial gene structures and their non-canonical mechanisms of protein expression. The current paradigm for bacterial translation is based on an mRNA that includes an untranslated leader sequence containing the ribosome-binding site upstream of the initiation codon. We applied genome-scale approaches to map the protein-coding regions in the genomes of Mycobacterium smegmatis and Mycobacterium tuberculosis. We found that nearly one-quarter of mycobacterial transcripts are leaderless in mycobacterial species, thus indicating that ribosomes must recognize these mRNAs by a novel mechanism and suggesting that there are alternative modes of bacterial translation beyond the Escherichia coli paradigm. Our translational profiling showed that many mycobacterial proteins are mis-annotated, and also found many new genes encoding small proteins that had been previously overlooked, which are likely to play novel roles in diverse cellular processes. We also developed a new reporter system that provides mechanistic insights into translation initiation through deep sequencing. Our data show that leaderless translation is a robust process that is conserved in mycobacteria, that leaderless translation only requires that the mRNA begin with a start codon, and predict that mycobacteria encode hundreds of small proteins. This work will help us understand gene structure, genome organization and protein expression in bacteria, and how the translational machinery differs in different organisms.
Collapse
|
90
|
Akiyama K, Mizuno S, Hizukuri Y, Mori H, Nogi T, Akiyama Y. Roles of the membrane-reentrant β-hairpin-like loop of RseP protease in selective substrate cleavage. eLife 2015; 4. [PMID: 26447507 PMCID: PMC4597795 DOI: 10.7554/elife.08928] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 09/10/2015] [Indexed: 11/13/2022] Open
Abstract
Molecular mechanisms underlying substrate recognition and cleavage by Escherichia coli RseP, which belongs to S2P family of intramembrane-cleaving proteases, remain unclear. We examined the function of a conserved region looped into the membrane domain of RseP to form a β-hairpin-like structure near its active site in substrate recognition and cleavage. We observed that mutations disturbing the possible β-strand conformation of the loop impaired RseP proteolytic activity and that some of these mutations resulted in the differential cleavage of different substrates. Co-immunoprecipitation and crosslinking experiments suggest that the loop directly interacts with the transmembrane segments of substrates. Helix-destabilising mutations in the transmembrane segments of substrates suppressed the effect of loop mutations in an allele-specific manner. These results suggest that the loop promotes substrate cleavage by selectively recognising the transmembrane segments of substrates in an extended conformation and by presenting them to the proteolytic active site, which contributes to substrate discrimination.
Collapse
Affiliation(s)
| | - Shinya Mizuno
- Institute for Virus Research, Kyoto University, Kyoto, Japan
| | - Yohei Hizukuri
- Institute for Virus Research, Kyoto University, Kyoto, Japan
| | - Hiroyuki Mori
- Institute for Virus Research, Kyoto University, Kyoto, Japan
| | - Terukazu Nogi
- Graduate School of Medical Life Science, Yokohama City University, Yokohama, Japan
| | | |
Collapse
|
91
|
Prasse D, Thomsen J, De Santis R, Muntel J, Becher D, Schmitz RA. First description of small proteins encoded by spRNAs in Methanosarcina mazei strain Gö1. Biochimie 2015; 117:138-48. [DOI: 10.1016/j.biochi.2015.04.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 04/08/2015] [Indexed: 01/06/2023]
|
92
|
Kemp G, Cymer F. Small membrane proteins - elucidating the function of the needle in the haystack. Biol Chem 2015; 395:1365-77. [PMID: 25153378 DOI: 10.1515/hsz-2014-0213] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Accepted: 08/06/2014] [Indexed: 11/15/2022]
Abstract
Membrane proteins are important mediators between the cell and its environment or between different compartments within a cell. However, much less is known about the structure and function of membrane proteins compared to water-soluble proteins. Moreover, until recently a subset of membrane proteins, those shorter than 100 amino acids, have almost completely evaded detection as a result of technical difficulties. These small membrane proteins (SMPs) have been underrepresented in most genomic and proteomic screens of both pro- and eukaryotic cells and, hence, we know much less about their functions in both. Currently, through a combination of bioinformatics, ribosome profiling, and more sensitive proteomics, large numbers of SMPs are being identified and characterized. Herein we describe recent advances in identifying SMPs from genomic and proteomic datasets and describe examples where SMPs have been successfully characterized biochemically. Finally we give an overview of identified functions of SMPs and speculate on the possible roles SMPs play in the cell.
Collapse
|
93
|
YjjQ Represses Transcription of flhDC and Additional Loci in Escherichia coli. J Bacteriol 2015; 197:2713-20. [PMID: 26078445 DOI: 10.1128/jb.00263-15] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 06/04/2015] [Indexed: 11/20/2022] Open
Abstract
UNLABELLED The presumptive transcriptional regulator YjjQ has been identified as being virulence associated in avian pathogenic Escherichia coli (APEC). In this work, we characterize YjjQ as transcriptional repressor of the flhDC operon, encoding the master regulator of flagellar synthesis, and of additional loci. The latter include gfc (capsule 4 synthesis), ompC (outer membrane porin C), yfiRNB (regulated c-di-GMP synthesis), and loci of poorly defined function (ybhL and ymiA-yciX). We identify the YjjQ DNA-binding sites at the flhDC and gfc promoters and characterize a DNA-binding sequence motif present at all promoters found to be repressed by YjjQ. At the flhDC promoter, the YjjQ DNA-binding site overlaps the RcsA-RcsB DNA-binding site. RcsA-RcsB likewise represses the flhDC promoter, but the repression by YjjQ and that by RcsA-RcsB are independent of each other. These data suggest that YjjQ is an additional regulator involved in the complex control of flhDC at the level of transcription initiation. Furthermore, we show that YjjQ represses motility of the E. coli K-12 laboratory strain and of uropathogenic E. coli (UPEC) strains CFT073 and 536. Regulation of flhDC, yfiRNB, and additional loci by YjjQ may be features relevant for pathogenicity. IMPORTANCE Escherichia coli is a commensal and pathogenic bacterium causing intra- and extraintestinal infections in humans and farm animals. The pathogenicity of E. coli strains is determined by their particular genome content, which includes essential and associated virulence factors that control the cellular physiology in the host environment. However, the gene pools of commensal and pathogenic E. coli are not clearly differentiated, and the function of virulence-associated loci needs to be characterized. In this study, we characterize the function of yjjQ, encoding a transcription regulator that was identified as being virulence associated in avian pathogenic E. coli (APEC). We characterize YjjQ as transcriptional repressor of flagellar motility and of additional loci related to pathogenicity.
Collapse
|
94
|
Allen RJ, Brenner EP, VanOrsdel CE, Hobson JJ, Hearn DJ, Hemm MR. Conservation analysis of the CydX protein yields insights into small protein identification and evolution. BMC Genomics 2014; 15:946. [PMID: 25475368 PMCID: PMC4325964 DOI: 10.1186/1471-2164-15-946] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 11/10/2014] [Indexed: 11/27/2022] Open
Abstract
Background The reliable identification of proteins containing 50 or fewer amino acids is difficult due to the limited information content in short sequences. The 37 amino acid CydX protein in Escherichia coli is a member of the cytochrome bd oxidase complex, an enzyme found throughout Eubacteria. To investigate the extent of CydX conservation and prevalence and evaluate different methods of small protein homologue identification, we surveyed 1095 Eubacteria species for the presence of the small protein. Results Over 300 homologues were identified, including 80 unannotated genes. The ability of both closely-related and divergent homologues to complement the E. coli ΔcydX mutant supports our identification techniques, and suggests that CydX homologues retain similar function among divergent species. However, sequence analysis of these proteins shows a great degree of variability, with only a few highly-conserved residues. An analysis of the co-variation between CydX homologues and their corresponding cydA and cydB genes shows a close synteny of the small protein with the CydA long Q-loop. Phylogenetic analysis suggests that the cydABX operon has undergone horizontal gene transfer, although the cydX gene likely evolved in a progenitor of the Alpha, Beta, and Gammaproteobacteria. Further investigation of cydAB operons identified two additional conserved hypothetical small proteins: CydY encoded in CydAQlong operons that lack cydX, and CydZ encoded in more than 150 CydAQshort operons. Conclusions This study provides a systematic analysis of bioinformatics techniques required for the unique challenges present in small protein identification and phylogenetic analyses. These results elucidate the prevalence of CydX throughout the Proteobacteria, provide insight into the selection pressure and sequence requirements for CydX function, and suggest a potential functional interaction between the small protein and the CydA Q-loop, an enigmatic domain of the cytochrome bd oxidase complex. Finally, these results identify other conserved small proteins encoded in cytochrome bd oxidase operons, suggesting that small protein subunits may be a more common component of these enzymes than previously thought. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-946) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | - Matthew R Hemm
- Department of Biological Sciences, Towson University, Towson 21252MD, USA.
| |
Collapse
|
95
|
Juhas M, Reuß DR, Zhu B, Commichau FM. Bacillus subtilis and Escherichia coli essential genes and minimal cell factories after one decade of genome engineering. Microbiology (Reading) 2014; 160:2341-2351. [DOI: 10.1099/mic.0.079376-0] [Citation(s) in RCA: 100] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Investigation of essential genes, besides contributing to understanding the fundamental principles of life, has numerous practical applications. Essential genes can be exploited as building blocks of a tightly controlled cell ‘chassis’. Bacillus subtilis and Escherichia coli K-12 are both well-characterized model bacteria used as hosts for a plethora of biotechnological applications. Determination of the essential genes that constitute the B. subtilis and E. coli minimal genomes is therefore of the highest importance. Recent advances have led to the modification of the original B. subtilis and E. coli essential gene sets identified 10 years ago. Furthermore, significant progress has been made in the area of genome minimization of both model bacteria. This review provides an update, with particular emphasis on the current essential gene sets and their comparison with the original gene sets identified 10 years ago. Special attention is focused on the genome reduction analyses in B. subtilis and E. coli and the construction of minimal cell factories for industrial applications.
Collapse
Affiliation(s)
- Mario Juhas
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Daniel R. Reuß
- Department of General Microbiology, Georg-August-University Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany
| | - Bingyao Zhu
- Department of General Microbiology, Georg-August-University Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany
| | - Fabian M. Commichau
- Department of General Microbiology, Georg-August-University Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany
| |
Collapse
|
96
|
Bloch S, Nejman-Faleńczyk B, Dydecka A, Łoś JM, Felczykowska A, Węgrzyn A, Węgrzyn G. Different expression patterns of genes from the exo-xis region of bacteriophage λ and Shiga toxin-converting bacteriophage Ф24B following infection or prophage induction in Escherichia coli. PLoS One 2014; 9:e108233. [PMID: 25310402 PMCID: PMC4195576 DOI: 10.1371/journal.pone.0108233] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 08/28/2014] [Indexed: 11/19/2022] Open
Abstract
Lambdoid bacteriophages serve as useful models in microbiological and molecular studies on basic biological process. Moreover, this family of viruses plays an important role in pathogenesis of enterohemorrhagic Escherichia coli (EHEC) strains, as they are carriers of genes coding for Shiga toxins. Efficient expression of these genes requires lambdoid prophage induction and multiplication of the phage genome. Therefore, understanding the mechanisms regulating these processes appears essential for both basic knowledge and potential anti-EHEC applications. The exo-xis region, present in genomes of lambdoid bacteriophages, contains highly conserved genes of largely unknown functions. Recent report indicated that the Ea8.5 protein, encoded in this region, contains a newly discovered fused homeodomain/zinc-finger fold, suggesting its plausible regulatory role. Moreover, subsequent studies demonstrated that overexpression of the exo-xis region from a multicopy plasmid resulted in impaired lysogenization of E. coli and more effective induction of λ and Ф24B prophages. In this report, we demonstrate that after prophage induction, the increase in phage DNA content in the host cells is more efficient in E. coli bearing additional copies of the exo-xis region, while survival rate of such bacteria is lower, which corroborated previous observations. Importantly, by using quantitative real-time reverse transcription PCR, we have determined patterns of expressions of particular genes from this region. Unexpectedly, in both phages λ and Ф24B, these patterns were significantly different not only between conditions of the host cells infection by bacteriophages and prophage induction, but also between induction of prophages with various agents (mitomycin C and hydrogen peroxide). This may shed a new light on our understanding of regulation of lambdoid phage development, depending on the mode of lytic cycle initiation.
Collapse
Affiliation(s)
- Sylwia Bloch
- Department of Molecular Biology, University of Gdańsk, Gdańsk, Poland
| | | | | | - Joanna M. Łoś
- Department of Molecular Biology, University of Gdańsk, Gdańsk, Poland
| | | | - Alicja Węgrzyn
- Department of Microbiology, University of Szczecin, Szczecin, Poland
| | - Grzegorz Węgrzyn
- Department of Molecular Biology, University of Gdańsk, Gdańsk, Poland
- * E-mail:
| |
Collapse
|
97
|
Chen H, Luo Q, Yin J, Gao T, Gao H. Evidence for the requirement of CydX in function but not assembly of the cytochrome bd oxidase in Shewanella oneidensis. Biochim Biophys Acta Gen Subj 2014; 1850:318-28. [PMID: 25316290 DOI: 10.1016/j.bbagen.2014.10.005] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Revised: 09/25/2014] [Accepted: 10/06/2014] [Indexed: 12/19/2022]
Abstract
BACKGROUND Cytochrome bd oxidase, existing widely in bacteria, produces a proton motive force by the vectorial charge transfer of protons and more importantly, endows bacteria with a number of vitally important physiological functions, such as enhancing tolerance to various stresses. Although extensively studied as a CydA-CydB two-subunit complex for decades, the complex in certain groups of bacteria is recently found to in fact consist of an additional subunit, which is functionally essential. METHODS We investigated the assembly of the CydA-CydB complex using BiFC. We investigated the function of CydX using mutational analysis. RESULTS CydX, a 38-amino-acid inner-membrane protein, is associated with the CydA-CydB complex in Shewanella oneidensis, a facultative anaerobe renowned for its respiratory versatility. It is clear that CydX is neither required for the in vivo assembly of the CydA-CydB complex nor relies on the complex for its translocation and integration into the membrane. The N-terminal segment (1-25 amino acid residues) and short periplasmic overhang of CydX, with respect to functionality, are important whereas the remaining C-terminal segment is rather flexible. CONCLUSION Based on these findings, we postulate that CydX may function by positioning and stabilizing the prosthetic hemes, especially heme d in the CydA-CydB complex although a role of participating in catalytic reaction is not excluded. GENERAL SIGNIFICANCE The work provides novel insights into our understanding of the small subunit of the cytochrome bd oxidase.
Collapse
Affiliation(s)
- Haijiang Chen
- Institute of Microbiology and College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Qixia Luo
- Institute of Microbiology and College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Jianhua Yin
- Institute of Microbiology and College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tong Gao
- Institute of Microbiology and College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Haichun Gao
- Institute of Microbiology and College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.
| |
Collapse
|
98
|
Aspden JL, Eyre-Walker YC, Phillips RJ, Amin U, Mumtaz MAS, Brocard M, Couso JP. Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq. eLife 2014; 3:e03528. [PMID: 25144939 PMCID: PMC4359375 DOI: 10.7554/elife.03528] [Citation(s) in RCA: 231] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 08/19/2014] [Indexed: 12/11/2022] Open
Abstract
Thousands of small Open Reading Frames (smORFs) with the potential to encode small peptides of fewer than 100 amino acids exist in our genomes. However, the number of smORFs actually translated, and their molecular and functional roles are still unclear. In this study, we present a genome-wide assessment of smORF translation by ribosomal profiling of polysomal fractions in Drosophila. We detect two types of smORFs bound by multiple ribosomes and thus undergoing productive translation. The ‘longer’ smORFs of around 80 amino acids resemble canonical proteins in translational metrics and conservation, and display a propensity to contain transmembrane motifs. The ‘dwarf’ smORFs are in general shorter (around 20 amino-acid long), are mostly found in 5′-UTRs and non-coding RNAs, are less well conserved, and have no bioinformatic indicators of peptide function. Our findings indicate that thousands of smORFs are translated in metazoan genomes, reinforcing the idea that smORFs are an abundant and fundamental genome component. DOI:http://dx.doi.org/10.7554/eLife.03528.001 To produce a protein, a stretch of DNA must first be transcribed to produce a molecule of messenger RNA (mRNA). The genetic information copied from the DNA is then read three letters at a time, in groups called codons. Each codon either encodes a particular amino acid to be added into a protein or provides further instructions: ‘start codons’ mark the beginning of a protein; ‘stop codons’ mark its end. The DNA between these two points is called an open reading frame (or ORF)—however, not all ORFs produce proteins. Most proteins are made of several hundred amino acids, but the genomes of animals contain thousands of ORFs that would generate much smaller proteins made of fewer than 100 amino acids, if they were translated. It is, however, unclear how many of these small ORFs are converted into mRNA molecules and functional proteins. Ribosomes are large molecular machines that translate the code in mRNA molecules and join together the appropriate amino acids in the right order to make a protein. Ribosome profiling is a technique that identifies which mRNA molecules are translated into proteins by determining the sequences of all the mRNA molecules bound to ribosomes at a particular moment. The mRNA sequences can then be compared with the sequence of the whole genome to work out which ORFs they correspond to. Ribosome profiling has been used to detect translated small ORFs, but the method yields a relatively high false positive rate as some mRNAs can bind to ribosomes without being translated. To better detect small protein-producing ORFs, Aspden et al. developed a technique based on ribosome profiling called Poly-Ribo-Seq. The method takes advantage of the fact that during active translation, clusters of multiple ribosomes, called polysomes, bind mRNAs. Poly-Ribo-Seq isolates these polysomes and determines the sequence bound by each of the ribosomes, thereby reducing the number of false positives. Applying Poly-Ribo-Seq to cells from the fruit fly Drosophila allowed Aspden et al. to identify two types of short ORF. The first type codes for proteins that are around 80 amino acids long and are translated with the same efficiency as larger ORFs. The sequences of these ORFs are found in other species, match at least in part sequences of known functional ORFs, and the proteins produced are found in specific locations inside cells. These small proteins may contribute to membrane integrity or function. Together, these properties suggest that these mRNAs create functional small proteins. The second pool consists of very small ORFs (‘dwarf smORFs’) that code for around 20 amino acids, which are not translated so often and do not show many similarities with other species. As the findings of Aspden et al. suggest that a large fraction of Drosophila small ORFs are translated into proteins, the next challenge will be to determine the roles of these small proteins in cells. DOI:http://dx.doi.org/10.7554/eLife.03528.002
Collapse
Affiliation(s)
- Julie L Aspden
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | | | - Rose J Phillips
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Unum Amin
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | | | - Michele Brocard
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Juan-Pablo Couso
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| |
Collapse
|
99
|
Crappé J, Van Criekinge W, Menschaert G. Little things make big things happen: A summary of micropeptide encoding genes. EUPA OPEN PROTEOMICS 2014. [DOI: 10.1016/j.euprot.2014.02.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
100
|
A Novel Small Protein ofBacillus subtilisInvolved in Spore Germination and Spore Coat Assembly. Biosci Biotechnol Biochem 2014; 75:1119-28. [DOI: 10.1271/bbb.110029] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|