1
|
Fijalkowski I, Snauwaert V, Van Damme P. Proteins à la carte: riboproteogenomic exploration of bacterial N-terminal proteoform expression. mBio 2024; 15:e0033324. [PMID: 38511928 PMCID: PMC11005335 DOI: 10.1128/mbio.00333-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
In recent years, it has become evident that the true complexity of bacterial proteomes remains underestimated. Gene annotation tools are known to propagate biases and overlook certain classes of truly expressed proteins, particularly proteoforms-protein isoforms arising from a single gene. Recent (re-)annotation efforts heavily rely on ribosome profiling by providing a direct readout of translation to fully describe bacterial proteomes. In this study, we employ a robust riboproteogenomic pipeline to conduct a systematic census of expressed N-terminal proteoform pairs, representing two isoforms encoded by a single gene raised by annotated and alternative translation initiation, in Salmonella. Intriguingly, conditional-dependent changes in relative utilization of annotated and alternative translation initiation sites (TIS) were observed in several cases. This suggests that TIS selection is subject to regulatory control, adding yet another layer of complexity to our understanding of bacterial proteomes. IMPORTANCE With the emerging theme of genes within genes comprising the existence of alternative open reading frames (ORFs) generated by translation initiation at in-frame start codons, mechanisms that control the relative utilization of annotated and alternative TIS need to be unraveled and our molecular understanding of resulting proteoforms broadened. Utilizing complementary ribosome profiling strategies to map ORF boundaries, we uncovered dual-encoding ORFs generated by in-frame TIS usage in Salmonella. Besides demonstrating that alternative TIS usage may generate proteoforms with different characteristics, such as differential localization and specialized function, quantitative aspects of conditional retapamulin-assisted ribosome profiling (Ribo-RET) translation initiation maps offer unprecedented insights into the relative utilization of annotated and alternative TIS, enabling the exploration of gene regulatory mechanisms that control TIS usage and, consequently, the translation of N-terminal proteoform pairs.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Valdes Snauwaert
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| |
Collapse
|
2
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
3
|
Simoens L, Fijalkowski I, Van Damme P. Exposing the small protein load of bacterial life. FEMS Microbiol Rev 2023; 47:fuad063. [PMID: 38012116 PMCID: PMC10723866 DOI: 10.1093/femsre/fuad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 11/10/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023] Open
Abstract
The ever-growing repertoire of genomic techniques continues to expand our understanding of the true diversity and richness of prokaryotic genomes. Riboproteogenomics laid the foundation for dynamic studies of previously overlooked genomic elements. Most strikingly, bacterial genomes were revealed to harbor robust repertoires of small open reading frames (sORFs) encoding a diverse and broadly expressed range of small proteins, or sORF-encoded polypeptides (SEPs). In recent years, continuous efforts led to great improvements in the annotation and characterization of such proteins, yet many challenges remain to fully comprehend the pervasive nature of small proteins and their impact on bacterial biology. In this work, we review the recent developments in the dynamic field of bacterial genome reannotation, catalog the important biological roles carried out by small proteins and identify challenges obstructing the way to full understanding of these elusive proteins.
Collapse
Affiliation(s)
- Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| |
Collapse
|
4
|
Identification and characterisation of sPEPs in Cryptococcus neoformans. Fungal Genet Biol 2022; 160:103688. [PMID: 35339703 DOI: 10.1016/j.fgb.2022.103688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 03/02/2022] [Accepted: 03/21/2022] [Indexed: 11/24/2022]
Abstract
Short open reading frame (sORF)-encoded peptides (sPEPs) have been found across a wide range of genomic locations in a variety of species. To date, their identification, validation, and characterisation in the human fungal pathogen Cryptococcus neoformans has been limited due to a lack of standardised protocols. We have developed an enrichment process that enables sPEP detection within a protein sample from this polysaccharide-encapsulated yeast, and implemented proteogenomics to provide insights into the validity of predicted and hypothetical sORFs annotated in the C. neoformans genome. Novel sORFs were discovered within the 5' and 3' UTRs of known transcripts as well as in "non-coding" RNAs. One novel candidate, dubbed NPB1, that resided in an RNA annotated as "non-coding", was chosen for characterisation. Through the creation of both specific point mutations and a full deletion allele, the function of the new sPEP, Npb1, was shown to resemble that of the bacterial trans-translation protein SmpB.
Collapse
|
5
|
Salvail H, Choi J, Groisman EA. Differential synthesis of novel small protein times Salmonella virulence program. PLoS Genet 2022; 18:e1010074. [PMID: 35245279 PMCID: PMC8896665 DOI: 10.1371/journal.pgen.1010074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 02/03/2022] [Indexed: 11/18/2022] Open
Abstract
Gene organization in operons enables concerted transcription of functionally related genes and efficient control of cellular processes. Typically, an operon is transcribed as a polycistronic mRNA that is translated into corresponding proteins. Here, we identify a bicistronic operon transcribed as two mRNAs, yet only one allows translation of both genes. We establish that the novel gene ugtS forms an operon with virulence gene ugtL, an activator of the master virulence regulatory system PhoP/PhoQ in Salmonella enterica serovar Typhimurium. Only the longer ugtSugtL mRNA carries the ugtS ribosome binding site and therefore allows ugtS translation. Inside macrophages, the ugtSugtL mRNA species allowing translation of both genes is produced hours before that allowing translation solely of ugtL. The small protein UgtS controls the kinetics of PhoP phosphorylation by antagonizing UgtL activity, preventing premature activation of a critical virulence program. Moreover, S. enterica serovars that infect cold-blooded animals lack ugtS. Our results establish how foreign gene control of ancestral regulators enables pathogens to time their virulence programs. Pathogens must express their virulence genes at precisely the right time to cause disease. Here, we identify a novel small protein that governs a critical virulence program in the pathogen Salmonella enterica serovar Typhimurium (S. Typhimurium). We establish that the novel small protein UgtS prevents the virulence protein UgtL from activating the master virulence regulator PhoP inside macrophages. S. Typhimurium produces two ugtSugtL mRNAs, but only one of them allows ugtS translation. The absence of ugtS from S. enterica serovars that infect cold-blooded animals raises the possibility of UgtS playing a regulatory role during infection of warm-blooded animals. Our findings establish how a horizontally acquired bicistron enables pathogens to time their virulence programs by controlling ancestral regulators.
Collapse
Affiliation(s)
- Hubert Salvail
- Department of Microbial Pathogenesis, Yale School of Medicine, New Haven, Connecticut, United States of America
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut, United States of America
| | - Jeongjoon Choi
- Department of Microbial Pathogenesis, Yale School of Medicine, New Haven, Connecticut, United States of America
- Department of Genetics, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Eduardo A. Groisman
- Department of Microbial Pathogenesis, Yale School of Medicine, New Haven, Connecticut, United States of America
- Yale Microbial Sciences Institute, West Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
6
|
Kreitmeier M, Ardern Z, Abele M, Ludwig C, Scherer S, Neuhaus K. Spotlight on alternative frame coding: Two long overlapping genes in Pseudomonas aeruginosa are translated and under purifying selection. iScience 2022; 25:103844. [PMID: 35198897 PMCID: PMC8850804 DOI: 10.1016/j.isci.2022.103844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/14/2021] [Accepted: 01/27/2022] [Indexed: 12/13/2022] Open
Abstract
The existence of overlapping genes (OLGs) with significant coding overlaps revolutionizes our understanding of genomic complexity. We report two exceptionally long (957 nt and 1536 nt), evolutionarily novel, translated antisense open reading frames (ORFs) embedded within annotated genes in the pathogenic Gram-negative bacterium Pseudomonas aeruginosa. Both OLG pairs show sequence features consistent with being genes and transcriptional signals in RNA sequencing. Translation of both OLGs was confirmed by ribosome profiling and mass spectrometry. Quantitative proteomics of samples taken during different phases of growth revealed regulation of protein abundances, implying biological functionality. Both OLGs are taxonomically restricted, and likely arose by overprinting within the genus. Evidence for purifying selection further supports functionality. The OLGs reported here, designated olg1 and olg2, are the longest yet proposed in prokaryotes and are among the best attested in terms of translation and evolutionary constraint. These results highlight a potentially large unexplored dimension of prokaryotic genomes. Two novel, very long, overlapping genes were found in Pseudomonas aeruginosa Both overlapping genes, olg1 and olg2, are transcribed, translated, and regulated Mass spectrometry verifies translation of the overlapping and their mother genes Both overlapping genes are taxonomically restricted, but under purifying selection
Collapse
Affiliation(s)
- Michaela Kreitmeier
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany.,Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Miriam Abele
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
7
|
Vazquez-Laslop N, Sharma CM, Mankin A, Buskirk AR. Identifying Small Open Reading Frames in Prokaryotes with Ribosome Profiling. J Bacteriol 2022; 204:e0029421. [PMID: 34339296 PMCID: PMC8765392 DOI: 10.1128/jb.00294-21] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Small proteins encoded by open reading frames (ORFs) shorter than 50 codons (small ORFs [sORFs]) are often overlooked by annotation engines and are difficult to characterize using traditional biochemical techniques. Ribosome profiling has tremendous potential to empirically improve the annotations of prokaryotic genomes. Recent improvements in ribosome profiling methods for bacterial model organisms have revealed many new sORFs in well-characterized genomes. Antibiotics that trap ribosomes just after initiation have played a key role in these developments by allowing the unambiguous identification of the start codons (and, hence, the reading frame) for novel ORFs. Here, we describe these new methods and highlight critical controls and considerations for adapting ribosome profiling to different prokaryotic species.
Collapse
Affiliation(s)
- Nora Vazquez-Laslop
- Center for Biomolecular Sciences, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Cynthia M. Sharma
- Molecular Infection Biology II, Institute of Molecular Infection Biology, University of Würzburg, Würzburg, Germany
| | - Alexander Mankin
- Center for Biomolecular Sciences, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Allen R. Buskirk
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
8
|
Yadavalli SS, Yuan J. Bacterial Small Membrane Proteins: the Swiss Army Knife of Regulators at the Lipid Bilayer. J Bacteriol 2022; 204:e0034421. [PMID: 34516282 PMCID: PMC8765417 DOI: 10.1128/jb.00344-21] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Small membrane proteins represent a subset of recently discovered small proteins (≤100 amino acids), which are a ubiquitous class of emerging regulators underlying bacterial adaptation to environmental stressors. Until relatively recently, small open reading frames encoding these proteins were not designated genes in genome annotations. Therefore, our understanding of small protein biology was primarily limited to a few candidates associated with previously characterized larger partner proteins. Following the first systematic analyses of small proteins in Escherichia coli over a decade ago, numerous small proteins across different bacteria have been uncovered. An estimated one-third of these newly discovered proteins in E. coli are localized to the cell membrane, where they may interact with distinct groups of membrane proteins, such as signal receptors, transporters, and enzymes, and affect their activities. Recently, there has been considerable progress in functionally characterizing small membrane protein regulators aided by innovative tools adapted specifically to study small proteins. Our review covers prototypical proteins that modulate a broad range of cellular processes, such as transport, signal transduction, stress response, respiration, cell division, sporulation, and membrane stability. Thus, small membrane proteins represent a versatile group of physiology regulators at the membrane and the whole cell. Additionally, small membrane proteins have the potential for clinical applications, where some of the proteins may act as antibacterial agents themselves while others serve as alternative drug targets for the development of novel antimicrobials.
Collapse
Affiliation(s)
- Srujana S. Yadavalli
- Waksman Institute of Microbiology, Rutgers University, Piscataway, New Jersey, USA
- Department of Genetics, Rutgers University, Piscataway, New Jersey, USA
| | - Jing Yuan
- Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
- LOEWE Center for Synthetic Microbiology (SYNMIKRO), Marburg, Germany
| |
Collapse
|
9
|
Gelhausen R, Müller T, Svensson SL, Alkhnbashi OS, Sharma CM, Eggenhofer F, Backofen R. RiboReport - benchmarking tools for ribosome profiling-based identification of open reading frames in bacteria. Brief Bioinform 2022; 23:6509045. [PMID: 35037022 PMCID: PMC8921622 DOI: 10.1093/bib/bbab549] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 11/22/2021] [Accepted: 11/29/2021] [Indexed: 11/19/2022] Open
Abstract
Small proteins encoded by short open reading frames (ORFs) with 50 codons or fewer are emerging as an important class of cellular macromolecules in diverse organisms. However, they often evade detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation in genomic regions previously thought to be non-coding, driving the development of ORF detection tools using Ribo-seq data. However, only a handful of tools have been designed for bacteria, and these have not yet been systematically compared. Here, we aimed to identify tools that use Ribo-seq data to correctly determine the translational status of annotated bacterial ORFs and also discover novel translated regions with high sensitivity. To this end, we generated a large set of annotated ORFs from four diverse bacterial organisms, manually labeled for their translation status based on Ribo-seq data, which are available for future benchmarking studies. This set was used to investigate the predictive performance of seven Ribo-seq-based ORF detection tools (REPARATION_blast, DeepRibo, Ribo-TISH, PRICE, smORFer, ribotricer and SPECtre), as well as IRSOM, which uses coding potential and RNA-seq coverage only. DeepRibo and REPARATION_blast robustly predicted translated ORFs, including sORFs, with no significant difference for ORFs in close proximity to other genes versus stand-alone genes. However, no tool predicted a set of novel, experimentally verified sORFs with high sensitivity. Start codon predictions with smORFer show the value of initiation site profiling data to further improve the sensitivity of ORF prediction tools in bacteria. Overall, we find that bacterial tools perform well for sORF detection, although there is potential for improving their performance, applicability, usability and reproducibility.
Collapse
Affiliation(s)
- Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Teresa Müller
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Sarah L Svensson
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Str. 2 / D15, 97080, Würzburg, Germany
| | - Omer S Alkhnbashi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Cynthia M Sharma
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Str. 2 / D15, 97080, Würzburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany.,Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schänzlestr. 18, 79104, State, Germany
| |
Collapse
|
10
|
Fijalkowski I, Peeters MKR, Van Damme P. Small Protein Enrichment Improves Proteomics Detection of sORF Encoded Polypeptides. Front Genet 2021; 12:713400. [PMID: 34721520 PMCID: PMC8554064 DOI: 10.3389/fgene.2021.713400] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 10/01/2021] [Indexed: 11/13/2022] Open
Abstract
With the rapid growth in the number of sequenced genomes, genome annotation efforts became almost exclusively reliant on automated pipelines. Despite their unquestionable utility, these methods have been shown to underestimate the true complexity of the studied genomes, with small open reading frames (sORFs; ORFs typically considered shorter than 300 nucleotides) and, in consequence, their protein products (sORF encoded polypeptides or SEPs) being the primary example of a poorly annotated and highly underexplored class of genomic elements. With the advent of advanced translatomics such as ribosome profiling, reannotation efforts have progressed a great deal in providing translation evidence for numerous, previously unannotated sORFs. However, proteomics validation of these riboproteogenomics discoveries remains challenging due to their short length and often highly variable physiochemical properties. In this work we evaluate and compare tailored, yet easily adaptable, protein extraction methodologies for their efficacy in the extraction and concomitantly proteomics detection of SEPs expressed in the prokaryotic model pathogen Salmonella typhimurium (S. typhimurium). Further, an optimized protocol for the enrichment and efficient detection of SEPs making use of the of amphipathic polymer amphipol A8-35 and relying on differential peptide vs. protein solubility was developed and compared with global extraction methods making use of chaotropic agents. Given the versatile biological functions SEPs have been shown to exert, this work provides an accessible protocol for proteomics exploration of this fascinating class of small proteins.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Gent, Belgium
| | - Marlies K. R. Peeters
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Gent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Gent, Belgium
| |
Collapse
|
11
|
Abstract
Mg2+ is the most abundant divalent cation in living cells. It is essential for charge neutralization, macromolecule stabilization, and the assembly and activity of ribosomes and as a cofactor for enzymatic reactions. When experiencing low cytoplasmic Mg2+, bacteria adopt two main strategies: They increase the abundance and activity of Mg2+ importers and decrease the abundance of Mg2+-chelating ATP and rRNA. These changes reduce regulated proteolysis by ATP-dependent proteases and protein synthesis in a systemic fashion. In many bacterial species, the transcriptional regulator PhoP controls expression of proteins mediating these changes. The 5' leader region of some mRNAs responds to low cytoplasmic Mg2+ or to disruptions in translation of open reading frames in the leader regions by furthering expression of the associated coding regions, which specify proteins mediating survival when the cytoplasmic Mg2+ concentration is low. Microbial species often utilize similar adaptation strategies to cope with low cytoplasmic Mg2+ despite relying on different genes to do so.
Collapse
Affiliation(s)
- Eduardo A Groisman
- Department of Microbial Pathogenesis, Yale School of Medicine, New Haven, Connecticut 06536, USA; .,Yale Microbial Sciences Institute, West Haven, Connecticut 06516, USA
| | - Carissa Chan
- Department of Microbial Pathogenesis, Yale School of Medicine, New Haven, Connecticut 06536, USA;
| |
Collapse
|
12
|
Abstract
Escherichia coli was one of the first species to have its genome sequenced and remains one of the best-characterized model organisms. Thus, it is perhaps surprising that recent studies have shown that a substantial number of genes have been overlooked. Genes encoding more than 140 small proteins, defined as those containing 50 or fewer amino acids, have been identified in E. coli in the past 10 years, and there is substantial evidence indicating that many more remain to be discovered. This review covers the methods that have been successful in identifying small proteins and the short open reading frames that encode them. The small proteins that have been functionally characterized to date in this model organism are also discussed. It is hoped that the review, along with the associated databases of known as well as predicted but undetected small proteins, will aid in and provide a roadmap for the continued identification and characterization of these proteins in E. coli as well as other bacteria.
Collapse
|
13
|
Bartholomäus A, Kolte B, Mustafayeva A, Goebel I, Fuchs S, Benndorf D, Engelmann S, Ignatova Z. smORFer: a modular algorithm to detect small ORFs in prokaryotes. Nucleic Acids Res 2021; 49:e89. [PMID: 34125903 PMCID: PMC8421149 DOI: 10.1093/nar/gkab477] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 04/29/2021] [Accepted: 05/18/2021] [Indexed: 11/15/2022] Open
Abstract
Emerging evidence places small proteins (≤50 amino acids) more centrally in physiological processes. Yet, their functional identification and the systematic genome annotation of their cognate small open-reading frames (smORFs) remains challenging both experimentally and computationally. Ribosome profiling or Ribo-Seq (that is a deep sequencing of ribosome-protected fragments) enables detecting of actively translated open-reading frames (ORFs) and empirical annotation of coding sequences (CDSs) using the in-register translation pattern that is characteristic for genuinely translating ribosomes. Multiple identifiers of ORFs that use the 3-nt periodicity in Ribo-Seq data sets have been successful in eukaryotic smORF annotation. They have difficulties evaluating prokaryotic genomes due to the unique architecture (e.g. polycistronic messages, overlapping ORFs, leaderless translation, non-canonical initiation etc.). Here, we present a new algorithm, smORFer, which performs with high accuracy in prokaryotic organisms in detecting putative smORFs. The unique feature of smORFer is that it uses an integrated approach and considers structural features of the genetic sequence along with in-frame translation and uses Fourier transform to convert these parameters into a measurable score to faithfully select smORFs. The algorithm is executed in a modular way, and dependent on the data available for a particular organism, different modules can be selected for smORF search.
Collapse
Affiliation(s)
- Alexander Bartholomäus
- GFZ German Research Centre for Geosciences, Section Geomicrobiology, 14473 Potsdam, Germany.,Inst. Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Baban Kolte
- Inst. Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Ayten Mustafayeva
- Helmholtz Center for Infection Research, Microbial Proteomics, 38124 Braunschweig, Germany.,Inst. Microbiology, TU Braunschweig, Braunschweig, Germany
| | - Ingrid Goebel
- Inst. Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | | | - Dirk Benndorf
- Otto von Guericke University, Bioprocess Engineering, 39106 Magdeburg, Germany.,Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, 39106 Magdeburg, Germany
| | - Susanne Engelmann
- Helmholtz Center for Infection Research, Microbial Proteomics, 38124 Braunschweig, Germany.,Inst. Microbiology, TU Braunschweig, Braunschweig, Germany
| | - Zoya Ignatova
- Inst. Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| |
Collapse
|
14
|
Fijalkowska D, Fijalkowski I, Willems P, Van Damme P. Bacterial riboproteogenomics: the era of N-terminal proteoform existence revealed. FEMS Microbiol Rev 2021; 44:418-431. [PMID: 32386204 DOI: 10.1093/femsre/fuaa013] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 05/07/2020] [Indexed: 12/17/2022] Open
Abstract
With the rapid increase in the number of sequenced prokaryotic genomes, relying on automated gene annotation became a necessity. Multiple lines of evidence, however, suggest that current bacterial genome annotations may contain inconsistencies and are incomplete, even for so-called well-annotated genomes. We here discuss underexplored sources of protein diversity and new methodologies for high-throughput genome reannotation. The expression of multiple molecular forms of proteins (proteoforms) from a single gene, particularly driven by alternative translation initiation, is gaining interest as a prominent contributor to bacterial protein diversity. In consequence, riboproteogenomic pipelines were proposed to comprehensively capture proteoform expression in prokaryotes by the complementary use of (positional) proteomics and the direct readout of translated genomic regions using ribosome profiling. To complement these discoveries, tailored strategies are required for the functional characterization of newly discovered bacterial proteoforms.
Collapse
Affiliation(s)
- Daria Fijalkowska
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| | - Igor Fijalkowski
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| | - Patrick Willems
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| | - Petra Van Damme
- Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, B-9000 Ghent, Belgium
| |
Collapse
|
15
|
Stringer A, Smith C, Mangano K, Wade JT. Identification of novel translated small ORFs in Escherichia coli using complementary ribosome profiling approaches. J Bacteriol 2021; 204:JB0035221. [PMID: 34662240 PMCID: PMC8765432 DOI: 10.1128/jb.00352-21] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 10/12/2021] [Indexed: 11/20/2022] Open
Abstract
Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Ribosome profiling has been used to infer the existence of small proteins by detecting the translation of the corresponding open reading frames (ORFs). Detection of translated short ORFs by ribosome profiling can be improved by treating cells with drugs that stall ribosomes at specific codons. Here, we combine the analysis of ribosome profiling data for Escherichia coli cells treated with antibiotics that stall ribosomes at either start or stop codons. Thus, we identify ribosome-occupied start and stop codons with high sensitivity for ∼400 novel putative ORFs. The newly discovered ORFs are mostly short, with 365 encoding proteins of <51 amino acids. We validate translation of several selected short ORFs, and show that many likely encode unstable proteins. Moreover, we present evidence that most of the newly identified short ORFs are not under purifying selection, suggesting they do not impact cell fitness, although a small subset have the hallmarks of functional ORFs. IMPORTANCE Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Recent studies have discovered small proteins by mapping the location of translating ribosomes on RNA using a technique known as ribosome profiling. Discovery of translated sORFs using ribosome profiling can be improved by treating cells with drugs that trap initiating ribosomes. Here, we show that combining these data with equivalent data for cells treated with a drug that stalls terminating ribosomes facilitates the discovery of small proteins. We use this approach to discover 365 putative genes that encode small proteins in Escherichia coli.
Collapse
Affiliation(s)
- Anne Stringer
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Carol Smith
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Kyle Mangano
- Center for Biomolecular Sciences, University of Illinois, Chicago, Illinois, USA
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
- Department of Biomedical Sciences, School of Public Health, University at Albany, Albany, New York, USA
| |
Collapse
|
16
|
Meydan S, Klepacki D, Mankin AS, Vázquez-Laslop N. Identification of Translation Start Sites in Bacterial Genomes. Methods Mol Biol 2021; 2252:27-55. [PMID: 33765270 DOI: 10.1007/978-1-0716-1150-0_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The knowledge of translation start sites is crucial for annotation of genes in bacterial genomes. However, systematic mapping of start codons in bacterial genes has mainly relied on predictions based on protein conservation and mRNA sequence features which, although useful, are not always accurate. We recently found that the pleuromutilin antibiotic retapamulin (RET) is a specific inhibitor of translation initiation that traps ribosomes specifically at start codons, and we used it in combination with ribosome profiling to map start codons in the Escherichia coli genome. This genome-wide strategy, that was named Ribo-RET, not only verifies the position of start codons in already annotated genes but also enables identification of previously unannotated open reading frames and reveals the presence of internal start sites within genes. Here, we provide a detailed Ribo-RET protocol for E. coli. Ribo-RET can be adapted for mapping the start codons of the protein-coding sequences in a variety of bacterial species.
Collapse
Affiliation(s)
- Sezen Meydan
- National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Dorota Klepacki
- Center for Biomolecular Sciences, University of Illinois at Chicago, Chicago, IL, USA
| | - Alexander S Mankin
- Center for Biomolecular Sciences, University of Illinois at Chicago, Chicago, IL, USA.
| | - Nora Vázquez-Laslop
- Center for Biomolecular Sciences, University of Illinois at Chicago, Chicago, IL, USA.
| |
Collapse
|
17
|
The Small Toxic Salmonella Protein TimP Targets the Cytoplasmic Membrane and Is Repressed by the Small RNA TimR. mBio 2020; 11:mBio.01659-20. [PMID: 33172998 PMCID: PMC7667032 DOI: 10.1128/mbio.01659-20] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Next-generation sequencing (NGS) has enabled the revelation of a vast number of genomes from organisms spanning all domains of life. To reduce complexity when new genome sequences are annotated, open reading frames (ORFs) shorter than 50 codons in length are generally omitted. However, it has recently become evident that this procedure sorts away ORFs encoding small proteins of high biological significance. For instance, tailored small protein identification approaches have shown that bacteria encode numerous small proteins with important physiological functions. As the number of predicted small ORFs increase, it becomes important to characterize the corresponding proteins. In this study, we discovered a conserved but previously overlooked small enterobacterial protein. We show that this protein, which we dubbed TimP, is a potent toxin that inhibits bacterial growth by targeting the cell membrane. Toxicity is relieved by a small regulatory RNA, which binds the toxin mRNA to inhibit toxin synthesis. Small proteins are gaining increased attention due to their important functions in major biological processes throughout the domains of life. However, their small size and low sequence conservation make them difficult to identify. It is therefore not surprising that enterobacterial ryfA has escaped identification as a small protein coding gene for nearly 2 decades. Since its identification in 2001, ryfA has been thought to encode a noncoding RNA and has been implicated in biofilm formation in Escherichia coli and pathogenesis in Shigella dysenteriae. Although a recent ribosome profiling study suggested ryfA to be translated, the corresponding protein product was not detected. In this study, we provide evidence that ryfA encodes a small toxic inner membrane protein, TimP, overexpression of which causes cytoplasmic membrane leakage. TimP carries an N-terminal signal sequence, indicating that its membrane localization is Sec-dependent. Expression of TimP is repressed by the small RNA (sRNA) TimR, which base pairs with the timP mRNA to inhibit its translation. In contrast to overexpression, endogenous expression of TimP upon timR deletion permits cell growth, possibly indicating a toxicity-independent function in the bacterial membrane.
Collapse
|
18
|
Arginine-Rich Small Proteins with a Domain of Unknown Function, DUF1127, Play a Role in Phosphate and Carbon Metabolism of Agrobacterium tumefaciens. J Bacteriol 2020; 202:JB.00309-20. [PMID: 33093235 DOI: 10.1128/jb.00309-20] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 07/21/2020] [Indexed: 02/06/2023] Open
Abstract
In any given organism, approximately one-third of all proteins have a yet-unknown function. A widely distributed domain of unknown function is DUF1127. Approximately 17,000 proteins with such an arginine-rich domain are found in 4,000 bacteria. Most of them are single-domain proteins, and a large fraction qualifies as small proteins with fewer than 50 amino acids. We systematically identified and characterized the seven DUF1127 members of the plant pathogen Agrobacterium tumefaciens They all give rise to authentic proteins and are differentially expressed as shown at the RNA and protein levels. The seven proteins fall into two subclasses on the basis of their length, sequence, and reciprocal regulation by the LysR-type transcription factor LsrB. The absence of all three short DUF1127 proteins caused a striking phenotype in later growth phases and increased cell aggregation and biofilm formation. Protein profiling and transcriptome sequencing (RNA-seq) analysis of the wild type and triple mutant revealed a large number of differentially regulated genes in late exponential and stationary growth. The most affected genes are involved in phosphate uptake, glycine/serine homeostasis, and nitrate respiration. The results suggest a redundant function of the small DUF1127 paralogs in nutrient acquisition and central carbon metabolism of A. tumefaciens They may be required for diauxic switching between carbon sources when sugar from the medium is depleted. We end by discussing how DUF1127 might confer such a global impact on cell physiology and gene expression.IMPORTANCE Despite being prevalent in numerous ecologically and clinically relevant bacterial species, the biological role of proteins with a domain of unknown function, DUF1127, is unclear. Experimental models are needed to approach their elusive function. We used the phytopathogen Agrobacterium tumefaciens, a natural genetic engineer that causes crown gall disease, and focused on its three small DUF1127 proteins. They have redundant and pervasive roles in nutrient acquisition, cellular metabolism, and biofilm formation. The study shows that small proteins have important previously missed biological functions. How small basic proteins can have such a broad impact is a fascinating prospect of future research.
Collapse
|
19
|
Small proteins regulate Salmonella survival inside macrophages by controlling degradation of a magnesium transporter. Proc Natl Acad Sci U S A 2020; 117:20235-20243. [PMID: 32753384 DOI: 10.1073/pnas.2006116117] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
All cells require Mg2+ to replicate and proliferate. The macrophage protein Slc11a1 is proposed to protect mice from invading microbes by causing Mg2+ starvation in host tissues. However, the Mg2+ transporter MgtB enables the facultative intracellular pathogen Salmonella enterica serovar Typhimurium to cause disease in mice harboring a functional Slc11a1 protein. Here, we report that, unexpectedly, the Salmonella small protein MgtR promotes MgtB degradation by the protease FtsH, which raises the question: How does Salmonella preserve MgtB to promote survival inside macrophages? We establish that the Salmonella small protein MgtU prevents MgtB proteolysis, even when MgtR is absent. Like MgtB, MgtU is necessary for survival in Slc11a1 +/+ macrophages, resistance to oxidative stress, and growth under Mg2+ limitation conditions. The Salmonella Mg2+ transporter MgtA is not protected by MgtU despite sharing 50% amino acid identity with MgtB and being degraded in an MgtR- and FtsH-dependent manner. Surprisingly, the mgtB, mgtR, and mgtU genes are part of the same transcript, providing a singular example of transcript-specifying proteins that promote and hinder degradation of the same target. Our findings demonstrate that small proteins can confer pathogen survival inside macrophages by altering the abundance of related transporters, thereby furthering homeostasis.
Collapse
|
20
|
Garai P, Blanc‐Potard A. Uncovering small membrane proteins in pathogenic bacteria: Regulatory functions and therapeutic potential. Mol Microbiol 2020; 114:710-720. [DOI: 10.1111/mmi.14564] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 06/19/2020] [Accepted: 06/20/2020] [Indexed: 01/01/2023]
Affiliation(s)
- Preeti Garai
- Laboratory of Pathogen‐Host Interactions Université de MontpellierCNRS‐UMR5235 Montpellier France
| | - Anne Blanc‐Potard
- Laboratory of Pathogen‐Host Interactions Université de MontpellierCNRS‐UMR5235 Montpellier France
| |
Collapse
|
21
|
Zehentner B, Ardern Z, Kreitmeier M, Scherer S, Neuhaus K. A Novel pH-Regulated, Unusual 603 bp Overlapping Protein Coding Gene pop Is Encoded Antisense to ompA in Escherichia coli O157:H7 (EHEC). Front Microbiol 2020; 11:377. [PMID: 32265854 PMCID: PMC7103648 DOI: 10.3389/fmicb.2020.00377] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 02/20/2020] [Indexed: 12/23/2022] Open
Abstract
Antisense transcription is well known in bacteria. However, translation of antisense RNAs is typically not considered, as the implied overlapping coding at a DNA locus is assumed to be highly improbable. Therefore, such overlapping genes are systematically excluded in prokaryotic genome annotation. Here we report an exceptional 603 bp long open reading frame completely embedded in antisense to the gene of the outer membrane protein ompA. An active σ70 promoter, transcription start site (TSS), Shine-Dalgarno motif and rho-independent terminator were experimentally validated, providing evidence that this open reading frame has all the structural features of a functional gene. Furthermore, ribosomal profiling revealed translation of the mRNA, the protein was detected in Western blots and a pH-dependent phenotype conferred by the protein was shown in competitive overexpression growth experiments of a translationally arrested mutant versus wild type. We designate this novel gene pop (pH-regulated overlapping protein-coding gene), thus adding another example to the growing list of overlapping, protein coding genes in bacteria.
Collapse
Affiliation(s)
- Barbara Zehentner
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | - Michaela Kreitmeier
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
- ZIEL – Institute for Food & Health, Technical University of Munich, Freising, Germany
| | - Klaus Neuhaus
- ZIEL – Institute for Food & Health, Technical University of Munich, Freising, Germany
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technical University of Munich, Freising, Germany
| |
Collapse
|
22
|
Panagi I, Jennings E, Zeng J, Günster RA, Stones CD, Mak H, Jin E, Stapels DAC, Subari NZ, Pham THM, Brewer SM, Ong SYQ, Monack DM, Helaine S, Thurston TLM. Salmonella Effector SteE Converts the Mammalian Serine/Threonine Kinase GSK3 into a Tyrosine Kinase to Direct Macrophage Polarization. Cell Host Microbe 2020; 27:41-53.e6. [PMID: 31862381 PMCID: PMC6953433 DOI: 10.1016/j.chom.2019.11.002] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 09/13/2019] [Accepted: 11/06/2019] [Indexed: 12/31/2022]
Abstract
Many Gram-negative bacterial pathogens antagonize anti-bacterial immunity through translocated effector proteins that inhibit pro-inflammatory signaling. In addition, the intracellular pathogen Salmonella enterica serovar Typhimurium initiates an anti-inflammatory transcriptional response in macrophages through its effector protein SteE. However, the target(s) and molecular mechanism of SteE remain unknown. Here, we demonstrate that SteE converts both the amino acid and substrate specificity of the host pleiotropic serine/threonine kinase GSK3. SteE itself is a substrate of GSK3, and phosphorylation of SteE is required for its activity. Remarkably, phosphorylated SteE then forces GSK3 to phosphorylate the non-canonical substrate signal transducer and activator of transcription 3 (STAT3) on tyrosine-705. This results in STAT3 activation, which along with GSK3 is required for SteE-mediated upregulation of the anti-inflammatory M2 macrophage marker interleukin-4Rα (IL-4Rα). Overall, the conversion of GSK3 to a tyrosine-directed kinase represents a tightly regulated event that enables a bacterial virulence protein to reprogram innate immune signaling and establish an anti-inflammatory environment.
Collapse
Affiliation(s)
- Ioanna Panagi
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Elliott Jennings
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Jingkun Zeng
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Regina A Günster
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Cullum D Stones
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Hazel Mak
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Enkai Jin
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Daphne A C Stapels
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Nur Z Subari
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Trung H M Pham
- Departments of Microbiology and Immunology, Stanford University, Stanford, CA, USA
| | - Susan M Brewer
- Departments of Microbiology and Immunology, Stanford University, Stanford, CA, USA
| | - Samantha Y Q Ong
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Denise M Monack
- Departments of Microbiology and Immunology, Stanford University, Stanford, CA, USA
| | - Sophie Helaine
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
| | - Teresa L M Thurston
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK.
| |
Collapse
|
23
|
R Cerqueira F, Vasconcelos ATR. OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5989499. [PMID: 33206960 PMCID: PMC7673341 DOI: 10.1093/database/baaa067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 07/11/2020] [Accepted: 07/27/2020] [Indexed: 11/14/2022]
Abstract
Small open reading frames (ORFs) have been systematically disregarded by automatic genome annotation. The difficulty in finding patterns in tiny sequences is the main reason that makes small ORFs to be overlooked by computational procedures. However, advances in experimental methods show that small proteins can play vital roles in cellular activities. Hence, it is urgent to make progress in the development of computational approaches to speed up the identification of potential small ORFs. In this work, our focus is on bacterial genomes. We improve a previous approach to identify small ORFs in bacteria. Our method uses machine learning techniques and decoy subject sequences to filter out spurious ORF alignments. We show that an advanced multivariate analysis can be more effective in terms of sensitivity than applying the simplistic and widely used e-value cutoff. This is particularly important in the case of small ORFs for which alignments present higher e-values than usual. Experiments with control datasets show that the machine learning algorithms used in our method to curate significant alignments can achieve average sensitivity and specificity of 97.06% and 99.61%, respectively. Therefore, an important step is provided here toward the construction of more accurate computational tools for the identification of small ORFs in bacteria.
Collapse
Affiliation(s)
- Fabio R Cerqueira
- Department of Production Engineering, Universidade Federal Fluminense, Rua Domingos Silvério s/n, Petrópolis, 25 650-050, Rio de Janeiro, Brazil.,Graduate Program in Computer Science, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | | |
Collapse
|
24
|
Regulation of Bacterial Gene Expression by Transcription Attenuation. Microbiol Mol Biol Rev 2019; 83:83/3/e00019-19. [PMID: 31270135 DOI: 10.1128/mmbr.00019-19] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
A wide variety of mechanisms that control gene expression in bacteria are based on conditional transcription termination. Generally, in these mechanisms, a transcription terminator is located between a promoter and a downstream gene(s), and the efficiency of the terminator is controlled by a regulatory effector that can be a metabolite, protein, or RNA. The most common type of regulation involving conditional termination is transcription attenuation, in which the primary regulatory target is an essential element of a single terminator. The terminator can be either intrinsic or Rho dependent, with each presenting unique regulatory targets. Transcription attenuation mechanisms can be divided into five classes based primarily on the manner in which transcription termination is rendered conditional. This review summarizes each class of control mechanisms from a historical perspective, describes important examples in a physiological context and the current state of knowledge, highlights major advances, and discusses expectations of future discoveries.
Collapse
|
25
|
Abstract
The origin of novel genes and beneficial functions is of fundamental interest in evolutionary biology. New genes can originate from different mechanisms, including horizontal gene transfer, duplication-divergence, and de novo from noncoding DNA sequences. Comparative genomics has generated strong evidence for de novo emergence of genes in various organisms, but experimental demonstration of this process has been limited to localized randomization in preexisting structural scaffolds. This bypasses the basic requirement of de novo gene emergence, i.e., lack of an ancestral gene. We constructed highly diverse plasmid libraries encoding randomly generated open reading frames and expressed them in Escherichia coli to identify short peptides that could confer a beneficial and selectable phenotype in vivo (in a living cell). Selections on antibiotic-containing agar plates resulted in the identification of three peptides that increased aminoglycoside resistance up to 48-fold. Combining genetic and functional analyses, we show that the peptides are highly hydrophobic, and by inserting into the membrane, they reduce membrane potential, decrease aminoglycoside uptake, and thereby confer high-level resistance. This study demonstrates that randomized DNA sequences can encode peptides that confer selective benefits and illustrates how expression of random sequences could spark the origination of new genes. In addition, our results also show that this question can be addressed experimentally by expression of highly diverse sequence libraries and subsequent selection for specific functions, such as resistance to toxic compounds, the ability to rescue auxotrophic/temperature-sensitive mutants, and growth on normally nonused carbon sources, allowing the exploration of many different phenotypes.IMPORTANCE De novo gene origination from nonfunctional DNA sequences was long assumed to be implausible. However, recent studies have shown that large fractions of genomic noncoding DNA are transcribed and translated, potentially generating new genes. Experimental validation of this process so far has been limited to comparative genomics, in vitro selections, or partial randomizations. Here, we describe selection of novel peptides in vivo using fully random synthetic expression libraries. The peptides confer aminoglycoside resistance by inserting into the bacterial membrane and thereby partly reducing membrane potential and decreasing drug uptake. Our results show that beneficial peptides can be selected from random sequence pools in vivo and support the idea that expression of noncoding sequences could spark the origination of new genes.
Collapse
|
26
|
Retapamulin-Assisted Ribosome Profiling Reveals the Alternative Bacterial Proteome. Mol Cell 2019; 74:481-493.e6. [PMID: 30904393 DOI: 10.1016/j.molcel.2019.02.017] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Revised: 01/25/2019] [Accepted: 02/12/2019] [Indexed: 12/21/2022]
Abstract
The use of alternative translation initiation sites enables production of more than one protein from a single gene, thereby expanding the cellular proteome. Although several such examples have been serendipitously found in bacteria, genome-wide mapping of alternative translation start sites has been unattainable. We found that the antibiotic retapamulin specifically arrests initiating ribosomes at start codons of the genes. Retapamulin-enhanced Ribo-seq analysis (Ribo-RET) not only allowed mapping of conventional initiation sites at the beginning of the genes, but strikingly, it also revealed putative internal start sites in a number of Escherichia coli genes. Experiments demonstrated that the internal start codons can be recognized by the ribosomes and direct translation initiation in vitro and in vivo. Proteins, whose synthesis is initiated at internal in-frame and out-of-frame start sites, can be functionally important and contribute to the "alternative" bacterial proteome. The internal start sites may also play regulatory roles in gene expression.
Collapse
|
27
|
Weaver J, Mohammad F, Buskirk AR, Storz G. Identifying Small Proteins by Ribosome Profiling with Stalled Initiation Complexes. mBio 2019; 10:e02819-18. [PMID: 30837344 PMCID: PMC6401488 DOI: 10.1128/mbio.02819-18] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Accepted: 01/24/2019] [Indexed: 11/20/2022] Open
Abstract
Small proteins consisting of 50 or fewer amino acids have been identified as regulators of larger proteins in bacteria and eukaryotes. Despite the importance of these molecules, the total number of small proteins remains unknown because conventional annotation pipelines usually exclude small open reading frames (smORFs). We previously identified several dozen small proteins in the model organism Escherichia coli using theoretical bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. Here, we present an empirical approach for discovering new proteins, taking advantage of recent advances in ribosome profiling in which antibiotics are used to trap newly initiated 70S ribosomes at start codons. This approach led to the identification of many novel initiation sites in intergenic regions in E. coli We tagged 41 smORFs on the chromosome and detected protein synthesis for all but three. Not only are the corresponding genes intergenic but they are also found antisense to other genes, in operons, and overlapping other open reading frames (ORFs), some impacting the translation of larger downstream genes. These results demonstrate the utility of this method for identifying new genes, regardless of their genomic context.IMPORTANCE Proteins comprised of 50 or fewer amino acids have been shown to interact with and modulate the functions of larger proteins in a range of organisms. Despite the possible importance of small proteins, the true prevalence and capabilities of these regulators remain unknown as the small size of the proteins places serious limitations on their identification, purification, and characterization. Here, we present a ribosome profiling approach with stalled initiation complexes that led to the identification of 38 new small proteins.
Collapse
Affiliation(s)
- Jeremy Weaver
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, USA
| | - Fuad Mohammad
- Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Allen R Buskirk
- Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, USA
| |
Collapse
|
28
|
A Family of Small Intrinsically Disordered Proteins Involved in Flagellum-Dependent Motility in Salmonella enterica. J Bacteriol 2018; 201:JB.00415-18. [PMID: 30373755 DOI: 10.1128/jb.00415-18] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Accepted: 10/21/2018] [Indexed: 02/08/2023] Open
Abstract
By screening a collection of Salmonella mutants deleted for genes encoding small proteins of ≤60 amino acids, we identified three paralogous small genes (ymdF, STM14_1829, and yciG) required for wild-type flagellum-dependent swimming and swarming motility. The ymdF, STM14_1829, and yciG genes encode small proteins of 55, 60, and 60 amino acid residues, respectively. A bioinformatics analysis predicted that these small proteins are intrinsically disordered proteins, and circular dichroism analysis of purified recombinant proteins confirmed that all three proteins are unstructured in solution. A mutant deleted for STM14_1829 showed the most severe motility defect, indicating that among the three paralogs, STM14_1829 is a key protein required for wild-type motility. We determined that relative to the wild type, the expression of the flagellin protein FliC is lower in the ΔSTM14_1829 mutant due to the downregulation of the flhDC operon encoding the FlhDC master regulator. By comparing the gene expression profiles between the wild-type and ΔSTM14_1829 strains via RNA sequencing, we found that the gene encoding the response regulator PhoP is upregulated in the ΔSTM14_1829 mutant, suggesting the indirect repression of the flhDC operon by the activated PhoP. Homologs of STM14_1829 are conserved in a wide range of bacteria, including Escherichia coli and Pseudomonas aeruginosa We showed that the inactivation of STM14_1829 homologs in E. coli and P. aeruginosa also alters motility, suggesting that this family of small intrinsically disordered proteins may play a role in the cellular pathway(s) that affects motility.IMPORTANCE This study reports the identification of a novel family of small intrinsically disordered proteins that are conserved in a wide range of flagellated and nonflagellated bacteria. Although this study identifies the role of these small proteins in the scope of flagellum-dependent motility in Salmonella, they likely play larger roles in a more conserved cellular pathway(s) that indirectly affects flagellum expression in the case of motile bacteria. Small intrinsically disordered proteins have not been well characterized in prokaryotes, and the results of our study provide a basis for their detailed functional characterization.
Collapse
|
29
|
The novel EHEC gene asa overlaps the TEGT transporter gene in antisense and is regulated by NaCl and growth phase. Sci Rep 2018; 8:17875. [PMID: 30552341 PMCID: PMC6294744 DOI: 10.1038/s41598-018-35756-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 11/08/2018] [Indexed: 12/02/2022] Open
Abstract
Only a few overlapping gene pairs are known in the best-analyzed bacterial model organism Escherichia coli. Automatic annotation programs usually annotate only one out of six reading frames at a locus, allowing only small overlaps between protein-coding sequences. However, both RNAseq and RIBOseq show signals corresponding to non-trivially overlapping reading frames in antisense to annotated genes, which may constitute protein-coding genes. The transcription and translation of the novel 264 nt gene asa, which overlaps in antisense to a putative TEGT (Testis-Enhanced Gene Transfer) transporter gene is detected in pathogenic E. coli, but not in two apathogenic E. coli strains. The gene in E. coli O157:H7 (EHEC) was further analyzed. An overexpression phenotype was identified in two stress conditions, i.e. excess in salt or arginine. For this, EHEC overexpressing asa was grown competitively against EHEC with a translationally arrested asa mutant gene. RT-qPCR revealed conditional expression dependent on growth phase, sodium chloride, and arginine. Two potential promoters were computationally identified and experimentally verified by reporter gene expression and determination of the transcription start site. The protein Asa was verified by Western blot. Close homologues of asa have not been found in protein databases, but bioinformatic analyses showed that it may be membrane associated, having a largely disordered structure.
Collapse
|
30
|
Hücker SM, Vanderhaeghen S, Abellan-Schneyder I, Scherer S, Neuhaus K. The Novel Anaerobiosis-Responsive Overlapping Gene ano Is Overlapping Antisense to the Annotated Gene ECs2385 of Escherichia coli O157:H7 Sakai. Front Microbiol 2018; 9:931. [PMID: 29867840 PMCID: PMC5960689 DOI: 10.3389/fmicb.2018.00931] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 04/23/2018] [Indexed: 12/26/2022] Open
Abstract
Current notion presumes that only one protein is encoded at a given bacterial genetic locus. However, transcription and translation of an overlapping open reading frame (ORF) of 186 bp length were discovered by RNAseq and RIBOseq experiments. This ORF is almost completely embedded in the annotated L,D-transpeptidase gene ECs2385 of Escherichia coli O157:H7 Sakai in the antisense reading frame -3. The ORF is transcribed as part of a bicistronic mRNA, which includes the annotated upstream gene ECs2384, encoding a murein lipoprotein. The transcriptional start site of the operon resides 38 bp upstream of the ECs2384 start codon and is driven by a predicted σ70 promoter, which is constitutively active under different growth conditions. The bicistronic operon contains a ρ-independent terminator just upstream of the novel gene, significantly decreasing its transcription. The novel gene can be stably expressed as an EGFP-fusion protein and a translationally arrested mutant of ano, unable to produce the protein, shows a growth advantage in competitive growth experiments compared to the wild type under anaerobiosis. Therefore, the novel antisense overlapping gene is named ano (anaerobiosis responsive overlapping gene). A phylostratigraphic analysis indicates that ano originated very recently de novo by overprinting after the Escherichia/Shigella clade separated from other enterobacteria. Therefore, ano is one of the very rare cases of overlapping genes known in the genus Escherichia.
Collapse
Affiliation(s)
- Sarah M Hücker
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | - Sonja Vanderhaeghen
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | | | - Siegfried Scherer
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany.,Institute for Food & Health, Technical University of Munich, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany.,Core Facility Microbiome/NGS, Institute for Food & Health, Technical University of Munich, Freising, Germany
| |
Collapse
|
31
|
VanOrsdel CE, Kelly JP, Burke BN, Lein CD, Oufiero CE, Sanchez JF, Wimmers LE, Hearn DJ, Abuikhdair FJ, Barnhart KR, Duley ML, Ernst SEG, Kenerson BA, Serafin AJ, Hemm MR. Identifying New Small Proteins in Escherichia coli. Proteomics 2018; 18:e1700064. [PMID: 29645342 PMCID: PMC6001520 DOI: 10.1002/pmic.201700064] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 03/05/2018] [Indexed: 12/11/2022]
Abstract
The number of small proteins (SPs) encoded in the Escherichia coli genome is unknown, as current bioinformatics and biochemical techniques make short gene and small protein identification challenging. One method of small protein identification involves adding an epitope tag to the 3′ end of a short open reading frame (sORF) on the chromosome, with synthesis confirmed by immunoblot assays. In this study, this strategy was used to identify new E. coli small proteins, tagging 80 sORFs in the E. coli genome, and assayed for protein synthesis. The selected sORFs represent diverse sequence characteristics, including degrees of sORF conservation, predicted transmembrane domains, sORF direction with respect to flanking genes, ribosome binding site (RBS) prediction, and ribosome profiling results. Of 80 sORFs, 36 resulted in encoded synthesized proteins—a 45% success rate. Modeling of detected versus non‐detected small proteins analysis showed predictions based on RBS prediction, transcription data, and ribosome profiling had statistically‐significant correlation with protein synthesis; however, there was no correlation between current sORF annotation and protein synthesis. These results suggest substantial numbers of small proteins remain undiscovered in E. coli, and existing bioinformatics techniques must continue to improve to facilitate identification.
Collapse
Affiliation(s)
- Caitlin E VanOrsdel
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - John P Kelly
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Brittany N Burke
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Christina D Lein
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | | | - Joseph F Sanchez
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Larry E Wimmers
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - David J Hearn
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Fatimeh J Abuikhdair
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Kathryn R Barnhart
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Michelle L Duley
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Sarah E G Ernst
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Briana A Kenerson
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Aubrey J Serafin
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| | - Matthew R Hemm
- Department of Biological Sciences, Smith Hall, Towson University, Towson, MD, USA
| |
Collapse
|
32
|
Brunet MA, Levesque SA, Hunting DJ, Cohen AA, Roucou X. Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship. Genome Res 2018; 28:609-624. [PMID: 29626081 PMCID: PMC5932603 DOI: 10.1101/gr.230938.117] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 03/27/2018] [Indexed: 12/12/2022]
Abstract
Technological advances promise unprecedented opportunities for whole exome sequencing and proteomic analyses of populations. Currently, data from genome and exome sequencing or proteomic studies are searched against reference genome annotations. This provides the foundation for research and clinical screening for genetic causes of pathologies. However, current genome annotations substantially underestimate the proteomic information encoded within a gene. Numerous studies have now demonstrated the expression and function of alternative (mainly small, sometimes overlapping) ORFs within mature gene transcripts. This has important consequences for the correlation of phenotypes and genotypes. Most alternative ORFs are not yet annotated because of a lack of evidence, and this absence from databases precludes their detection by standard proteomic methods, such as mass spectrometry. Here, we demonstrate how current approaches tend to overlook alternative ORFs, hindering the discovery of new genetic drivers and fundamental research. We discuss available tools and techniques to improve identification of proteins from alternative ORFs and finally suggest a novel annotation system to permit a more complete representation of the transcriptomic and proteomic information contained within a gene. Given the crucial challenge of distinguishing functional ORFs from random ones, the suggested pipeline emphasizes both experimental data and conservation signatures. The addition of alternative ORFs in databases will render identification less serendipitous and advance the pace of research and genomic knowledge. This review highlights the urgent medical and research need to incorporate alternative ORFs in current genome annotations and thus permit their inclusion in hypotheses and models, which relate phenotypes and genotypes.
Collapse
Affiliation(s)
- Marie A Brunet
- Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.,Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada
| | - Sébastien A Levesque
- Pediatric Department, Centre Hospitalier de l'Université de Sherbrooke, Quebec J1H 5N4, Canada
| | - Darel J Hunting
- Department of Nuclear Medicine & Radiobiology, Université de Sherbrooke, Quebec J1H 5N4, Canada
| | - Alan A Cohen
- Groupe de recherche PRIMUS, Department of Family and Emergency Medicine, Quebec J1H 5N4, Canada
| | - Xavier Roucou
- Biochemistry Department, Université de Sherbrooke, Quebec J1E 4K8, Canada.,PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Université Laval, Quebec G1V 0A6, Canada
| |
Collapse
|
33
|
Esquirol L, Peat TS, Wilding M, Liu JW, French NG, Hartley CJ, Onagi H, Nebl T, Easton CJ, Newman J, Scott C. An unexpected vestigial protein complex reveals the evolutionary origins of an s-triazine catabolic enzyme. J Biol Chem 2018. [PMID: 29523689 DOI: 10.1074/jbc.ra118.001996] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Cyanuric acid is a metabolic intermediate of s-triazines, such as atrazine (a common herbicide) and melamine (used in resins and plastics). Cyanuric acid is mineralized to ammonia and carbon dioxide by the soil bacterium Pseudomonas sp. strain ADP via three hydrolytic enzymes (AtzD, AtzE, and AtzF). Here, we report the purification and biochemical and structural characterization of AtzE. Contrary to previous reports, we found that AtzE is not a biuret amidohydrolase, but instead it catalyzes the hydrolytic deamination of 1-carboxybiuret. X-ray crystal structures of apo AtzE and AtzE bound with the suicide inhibitor phenyl phosphorodiamidate revealed that the AtzE enzyme complex consists of two independent molecules in the asymmetric unit. We also show that AtzE forms an α2β2 heterotetramer with a previously unidentified 68-amino acid-long protein (AtzG) encoded in the cyanuric acid mineralization operon from Pseudomonas sp. strain ADP. Moreover, we observed that AtzG is essential for the production of soluble, active AtzE and that this obligate interaction is a vestige of their shared evolutionary origin. We propose that AtzEG was likely recruited into the cyanuric acid-mineralizing pathway from an ancestral glutamine transamidosome that required protein-protein interactions to enforce the exclusion of solvent from the transamidation reaction.
Collapse
Affiliation(s)
- Lygie Esquirol
- From the Biocatalysis and Synthetic Biology Team and.,the Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory 2601, and
| | - Thomas S Peat
- CSIRO Biomedical Manufacturing, Parkville, Melbourne, Victoria 3052, Australia
| | - Matthew Wilding
- the Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory 2601, and.,CSIRO Biomedical Manufacturing, Parkville, Melbourne, Victoria 3052, Australia
| | - Jian-Wei Liu
- From the Biocatalysis and Synthetic Biology Team and
| | | | | | - Hideki Onagi
- the Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory 2601, and
| | - Thomas Nebl
- CSIRO Biomedical Manufacturing, Parkville, Melbourne, Victoria 3052, Australia
| | - Christopher J Easton
- the Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory 2601, and
| | - Janet Newman
- CSIRO Biomedical Manufacturing, Parkville, Melbourne, Victoria 3052, Australia
| | - Colin Scott
- From the Biocatalysis and Synthetic Biology Team and .,Synthetic Biology Future Science Platform, CSIRO Land and Water, Canberra, Australian Capital Territory 2601
| |
Collapse
|
34
|
Hücker SM, Vanderhaeghen S, Abellan-Schneyder I, Wecko R, Simon S, Scherer S, Neuhaus K. A novel short L-arginine responsive protein-coding gene (laoB) antiparallel overlapping to a CadC-like transcriptional regulator in Escherichia coli O157:H7 Sakai originated by overprinting. BMC Evol Biol 2018; 18:21. [PMID: 29433444 PMCID: PMC5810103 DOI: 10.1186/s12862-018-1134-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 01/31/2018] [Indexed: 11/10/2022] Open
Abstract
Background Due to the DNA triplet code, it is possible that the sequences of two or more protein-coding genes overlap to a large degree. However, such non-trivial overlaps are usually excluded by genome annotation pipelines and, thus, only a few overlapping gene pairs have been described in bacteria. In contrast, transcriptome and translatome sequencing reveals many signals originated from the antisense strand of annotated genes, of which we analyzed an example gene pair in more detail. Results A small open reading frame of Escherichia coli O157:H7 strain Sakai (EHEC), designated laoB (L-arginine responsive overlapping gene), is embedded in reading frame −2 in the antisense strand of ECs5115, encoding a CadC-like transcriptional regulator. This overlapping gene shows evidence of transcription and translation in Luria-Bertani (LB) and brain-heart infusion (BHI) medium based on RNA sequencing (RNAseq) and ribosomal-footprint sequencing (RIBOseq). The transcriptional start site is 289 base pairs (bp) upstream of the start codon and transcription termination is 155 bp downstream of the stop codon. Overexpression of LaoB fused to an enhanced green fluorescent protein (EGFP) reporter was possible. The sequence upstream of the transcriptional start site displayed strong promoter activity under different conditions, whereas promoter activity was significantly decreased in the presence of L-arginine. A strand-specific translationally arrested mutant of laoB provided a significant growth advantage in competitive growth experiments in the presence of L-arginine compared to the wild type, which returned to wild type level after complementation of laoB in trans. A phylostratigraphic analysis indicated that the novel gene is restricted to the Escherichia/Shigella clade and might have originated recently by overprinting leading to the expression of part of the antisense strand of ECs5115. Conclusions Here, we present evidence of a novel small protein-coding gene laoB encoded in the antisense frame −2 of the annotated gene ECs5115. Clearly, laoB is evolutionarily young and it originated in the Escherichia/Shigella clade by overprinting, a process which may cause the de novo evolution of bacterial genes like laoB. Electronic supplementary material The online version of this article (10.1186/s12862-018-1134-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarah M Hücker
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,Fraunhofer ITEM-R, Am Biopark 9, 93053, Regensburg, Germany
| | - Sonja Vanderhaeghen
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Isabel Abellan-Schneyder
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Romy Wecko
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Svenja Simon
- Department of Computer and Information Science, University of Konstanz, Box 78, 78457, Konstanz, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany. .,Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| |
Collapse
|
35
|
Ndah E, Jonckheere V, Giess A, Valen E, Menschaert G, Van Damme P. REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes. Nucleic Acids Res 2017; 45:e168. [PMID: 28977509 PMCID: PMC5714196 DOI: 10.1093/nar/gkx758] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2017] [Accepted: 08/17/2017] [Indexed: 12/13/2022] Open
Abstract
Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames.
Collapse
Affiliation(s)
- Elvis Ndah
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium.,Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, Belgium
| | - Veronique Jonckheere
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Adam Giess
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5020, Norway
| | - Eivind Valen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5020, Norway.,Sars International Centre for Marine Molecular Biology, University of Bergen, 5008 Bergen, Norway
| | - Gerben Menschaert
- Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, Belgium
| | - Petra Van Damme
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| |
Collapse
|
36
|
Fels U, Gevaert K, Van Damme P. Proteogenomics in Aid of Host-Pathogen Interaction Studies: A Bacterial Perspective. Proteomes 2017; 5:E26. [PMID: 29019919 PMCID: PMC5748561 DOI: 10.3390/proteomes5040026] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 10/02/2017] [Accepted: 10/08/2017] [Indexed: 12/17/2022] Open
Abstract
By providing useful tools to study host-pathogen interactions, next-generation omics has recently enabled the study of gene expression changes in both pathogen and infected host simultaneously. However, since great discriminative power is required to study pathogen and host simultaneously throughout the infection process, the depth of quantitative gene expression profiling has proven to be unsatisfactory when focusing on bacterial pathogens, thus preferentially requiring specific strategies or the development of novel methodologies based on complementary omics approaches. In this review, we focus on the difficulties encountered when making use of proteogenomics approaches to study bacterial pathogenesis. In addition, we review different omics strategies (i.e., transcriptomics, proteomics and secretomics) and their applications for studying interactions of pathogens with their host.
Collapse
Affiliation(s)
- Ursula Fels
- VIB-UGent Center for Medical Biotechnology, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium.
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium.
| | - Kris Gevaert
- VIB-UGent Center for Medical Biotechnology, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium.
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium.
| | - Petra Van Damme
- VIB-UGent Center for Medical Biotechnology, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium.
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium.
| |
Collapse
|
37
|
Hücker SM, Ardern Z, Goldberg T, Schafferhans A, Bernhofer M, Vestergaard G, Nelson CW, Schloter M, Rost B, Scherer S, Neuhaus K. Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome. PLoS One 2017; 12:e0184119. [PMID: 28902868 PMCID: PMC5597208 DOI: 10.1371/journal.pone.0184119] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 08/20/2017] [Indexed: 12/29/2022] Open
Abstract
In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.
Collapse
Affiliation(s)
- Sarah M. Hücker
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Tatyana Goldberg
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Andrea Schafferhans
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Michael Bernhofer
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Gisle Vestergaard
- Research Unit Environmental Genomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Chase W. Nelson
- Sackler Institute for Comparative Genomics, American Museum of Natural History New York, New York, United States of America
| | - Michael Schloter
- Research Unit Environmental Genomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Burkhard Rost
- Department of Informatics—Bioinformatics & TUM-IAS, Technische Universität München, Garching, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Technische Universität München, Freising, Germany
- Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Freising, Germany
- * E-mail:
| |
Collapse
|