1
|
Tufail MA, Jordan B, Hadjeras L, Gelhausen R, Cassidy L, Habenicht T, Gutt M, Hellwig L, Backofen R, Tholey A, Sharma CM, Schmitz RA. Uncovering the small proteome of Methanosarcina mazei using Ribo-seq and peptidomics under different nitrogen conditions. Nat Commun 2024; 15:8659. [PMID: 39370430 PMCID: PMC11456600 DOI: 10.1038/s41467-024-53008-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 09/25/2024] [Indexed: 10/08/2024] Open
Abstract
The mesophilic methanogenic archaeal model organism Methanosarcina mazei strain Gö1 is crucial for climate and environmental research due to its ability to produce methane. Here, we establish a Ribo-seq protocol for M. mazei strain Gö1 under two growth conditions (nitrogen sufficiency and limitation). The translation of 93 previously annotated and 314 unannotated small ORFs, coding for proteins ≤ 70 amino acids, is predicted with high confidence based on Ribo-seq data. LC-MS analysis validates the translation for 62 annotated small ORFs and 26 unannotated small ORFs. Epitope tagging followed by immunoblotting analysis confirms the translation of 13 out of 16 selected unannotated small ORFs. A comprehensive differential transcription and translation analysis reveals that 29 of 314 unannotated small ORFs are differentially regulated in response to nitrogen availability at the transcriptional and 49 at the translational level. A high number of reported small RNAs are emerging as dual-function RNAs, including sRNA154, the central regulatory small RNA of nitrogen metabolism. Several unannotated small ORFs are conserved in Methanosarcina species and overproducing several (small ORF encoded) small proteins suggests key physiological functions. Overall, the comprehensive analysis opens an avenue to elucidate the function(s) of multitudinous small proteins and dual-function RNAs in M. mazei.
Collapse
Affiliation(s)
| | - Britta Jordan
- Institute for General Microbiology, Kiel University, 24118, Kiel, Germany
| | - Lydia Hadjeras
- Institute of Molecular Infection Biology, University of Würzburg, 97080, Würzburg, Germany
| | - Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110, Freiburg, Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Kiel University, 24105, Kiel, Germany
| | - Tim Habenicht
- Institute for General Microbiology, Kiel University, 24118, Kiel, Germany
| | - Miriam Gutt
- Institute for General Microbiology, Kiel University, 24118, Kiel, Germany
| | - Lisa Hellwig
- Institute for General Microbiology, Kiel University, 24118, Kiel, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110, Freiburg, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Kiel University, 24105, Kiel, Germany
| | - Cynthia M Sharma
- Institute of Molecular Infection Biology, University of Würzburg, 97080, Würzburg, Germany
| | - Ruth A Schmitz
- Institute for General Microbiology, Kiel University, 24118, Kiel, Germany.
| |
Collapse
|
2
|
Vellappan S, Sun J, Favate J, Jagadeesan P, Cerda D, Shah P, Yadavalli SS. Translational profiling of stress-induced small proteins uncovers an unexpected connection among distinct signaling systems. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.13.612970. [PMID: 39345582 PMCID: PMC11429745 DOI: 10.1101/2024.09.13.612970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Signaling networks in bacteria enable sensing and adaptation to challenging environments by activating specific genes that help counteract stressors. Small proteins (≤ 50 amino acids long) are a rising class of bacterial stress response regulators. Escherichia coli encodes over 150 small proteins, most of which lack known phenotypes and their biological roles remain elusive. Using magnesium limitation as a stressor, we investigate small proteins induced in response to stress using ribosome profiling, RNA sequencing, and transcriptional reporter assays. We uncover 17 small proteins with increased translation initiation, a majority of which are transcriptionally upregulated by the PhoQ-PhoP two-component signaling system, crucial for magnesium homeostasis. Next, we describe small protein-specific deletion and overexpression phenotypes, which underscore the physiological significance of their expression in low magnesium stress. Most remarkably, our study reveals that a small membrane protein YoaI is an unusual connector of the major signaling networks - PhoR-PhoB and EnvZ-OmpR in E. coli , advancing our understanding of small protein regulators of cellular signaling. Highlights Ribo-RET identifies 17 small proteins induced under low Mg 2+ stress in E. coli Many of these proteins are transcriptionally activated by PhoQP signaling systemHalf of the stress-induced small proteins localize to the membraneDeletion or overexpression of specific small proteins affects growth under stressSmall protein YoaI connects PhoR-PhoB and EnvZ-OmpR signaling networks. Graphical abstract
Collapse
Affiliation(s)
- Sangeevan Vellappan
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ USA
- Department of Genetics, School of Arts and Sciences, Rutgers University, Piscataway, NJ USA
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, USA
| | - Junhong Sun
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ USA
| | - John Favate
- Department of Genetics, School of Arts and Sciences, Rutgers University, Piscataway, NJ USA
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, USA
| | - Pranavi Jagadeesan
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ USA
| | - Debbie Cerda
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ USA
- Department of Genetics, School of Arts and Sciences, Rutgers University, Piscataway, NJ USA
| | - Premal Shah
- Department of Genetics, School of Arts and Sciences, Rutgers University, Piscataway, NJ USA
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, USA
| | - Srujana S. Yadavalli
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ USA
- Department of Genetics, School of Arts and Sciences, Rutgers University, Piscataway, NJ USA
| |
Collapse
|
3
|
Whited AM, Jungreis I, Allen J, Cleveland CL, Mudge JM, Kellis M, Rinn JL, Hough LE. Biophysical characterization of high-confidence, small human proteins. BIOPHYSICAL REPORTS 2024; 4:100167. [PMID: 38909903 PMCID: PMC11305224 DOI: 10.1016/j.bpr.2024.100167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/09/2024] [Accepted: 06/20/2024] [Indexed: 06/25/2024]
Abstract
Significant efforts have been made to characterize the biophysical properties of proteins. Small proteins have received less attention because their annotation has historically been less reliable. However, recent improvements in sequencing, proteomics, and bioinformatics techniques have led to the high-confidence annotation of small open reading frames (smORFs) that encode for functional proteins, producing smORF-encoded proteins (SEPs). SEPs have been found to perform critical functions in several species, including humans. While significant efforts have been made to annotate SEPs, less attention has been given to the biophysical properties of these proteins. We characterized the distributions of predicted and curated biophysical properties, including sequence composition, structure, localization, function, and disease association of a conservative list of previously identified human SEPs. We found significant differences between SEPs and both larger proteins and control sets. In addition, we provide an example of how our characterization of biophysical properties can contribute to distinguishing protein-coding smORFs from noncoding ones in otherwise ambiguous cases.
Collapse
Affiliation(s)
- A M Whited
- BioFrontiers Institute, University of Colorado, Boulder, Colorado
| | - Irwin Jungreis
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts; MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts
| | - Jeffre Allen
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Biochemistry, University of Colorado Boulder, Boulder, Colorado
| | | | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts; MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts
| | - John L Rinn
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Biochemistry, University of Colorado Boulder, Boulder, Colorado
| | - Loren E Hough
- BioFrontiers Institute, University of Colorado, Boulder, Colorado; Department of Physics, University of Colorado Boulder, Boulder, Colorado.
| |
Collapse
|
4
|
Duan Y, Santos-Júnior CD, Schmidt TS, Fullam A, de Almeida BLS, Zhu C, Kuhn M, Zhao XM, Bork P, Coelho LP. A catalog of small proteins from the global microbiome. Nat Commun 2024; 15:7563. [PMID: 39214983 PMCID: PMC11364881 DOI: 10.1038/s41467-024-51894-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
Small open reading frames (smORFs) shorter than 100 codons are widespread and perform essential roles in microorganisms, where they encode proteins active in several cell functions, including signal pathways, stress response, and antibacterial activities. However, the ecology, distribution and role of small proteins in the global microbiome remain unknown. Here, we construct a global microbial smORFs catalog (GMSC) derived from 63,410 publicly available metagenomes across 75 distinct habitats and 87,920 high-quality isolate genomes. GMSC contains 965 million non-redundant smORFs with comprehensive annotations. We find that archaea harbor more smORFs proportionally than bacteria. We moreover provide a tool called GMSC-mapper to identify and annotate small proteins from microbial (meta)genomes. Overall, this publicly-available resource demonstrates the immense and underexplored diversity of small proteins.
Collapse
Affiliation(s)
- Yiqian Duan
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Célio Dias Santos-Júnior
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Laboratory of Microbial Processes & Biodiversity - LMPB; Department of Hydrobiology, Universidade Federal de São Carlos - UFSCar, São Carlos, São Paulo, Brazil
| | - Thomas Sebastian Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- APC Microbiome and School of Medicine, University College Cork, Cork, Ireland
| | - Anthony Fullam
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Breno L S de Almeida
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Chengkai Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Michael Kuhn
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
- Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China.
- Lingang Laboratory, Shanghai, 200031, China.
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China.
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, QLD, Australia.
- Centre for Data Science, Queensland University of Technology, Brisbane, QLD, Australia.
| |
Collapse
|
5
|
Mohsen JJ, Mohsen MG, Jiang K, Landajuela A, Quinto L, Isaacs FJ, Karatekin E, Slavoff SA. Cellular function of the GndA small open reading frame-encoded polypeptide during heat shock. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.29.601336. [PMID: 38979229 PMCID: PMC11230408 DOI: 10.1101/2024.06.29.601336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Over the past 15 years, hundreds of previously undiscovered bacterial small open reading frame (sORF)-encoded polypeptides (SEPs) of fewer than fifty amino acids have been identified, and biological functions have been ascribed to an increasing number of SEPs from intergenic regions and small RNAs. However, despite numbering in the dozens in Escherichia coli, and hundreds to thousands in humans, same-strand nested sORFs that overlap protein coding genes in alternative reading frames remain understudied. In order to provide insight into this enigmatic class of unannotated genes, we characterized GndA, a 36-amino acid, heat shock-regulated SEP encoded within the +2 reading frame of the gnd gene in E. coli K-12 MG1655. We show that GndA pulls down components of respiratory complex I (RCI) and is required for proper localization of a RCI subunit during heat shock. At high temperature GndA deletion (ΔGndA) cells exhibit perturbations in cell growth, NADH+/NAD ratio, and expression of a number of genes including several associated with oxidative stress. These findings suggest that GndA may function in maintenance of homeostasis during heat shock. Characterization of GndA therefore supports the nascent but growing consensus that functional, overlapping genes occur in genomes from viruses to humans.
Collapse
Affiliation(s)
- Jessica J. Mohsen
- Department of Chemistry, Yale University, New Haven, CT 06511
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT 06516
| | - Michael G. Mohsen
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06511
- Howard Hughes Medical Institute, Yale University, New Haven, CT 06511
| | - Kevin Jiang
- Department of Chemistry, Yale University, New Haven, CT 06511
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT 06516
| | - Ane Landajuela
- Department of Cellular and Molecular Physiology, Yale School of Medicine, New Haven, CT 06510
- Nanobiology Institute, Yale University, West Haven, CT 06516
| | - Laura Quinto
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06511
- Systems Biology Institute, Yale University, West Haven, CT 06516
| | - Farren J. Isaacs
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06511
- Systems Biology Institute, Yale University, West Haven, CT 06516
| | - Erdem Karatekin
- Department of Cellular and Molecular Physiology, Yale School of Medicine, New Haven, CT 06510
- Nanobiology Institute, Yale University, West Haven, CT 06516
- Wu Tsai Institute, Yale University, New Haven, CT 06511
- Université de Paris, Saints-Pères Paris Institute for the Neurosciences (SPPIN), Centre National de la Recherche Scientifique (CNRS), 75006 Paris, France
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06511
| | - Sarah A. Slavoff
- Department of Chemistry, Yale University, New Haven, CT 06511
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT 06516
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06511
| |
Collapse
|
6
|
Sinha PR, Balasubramanian R, Hegde SR. Integrated sequence and -omic features reveal novel small proteome of Mycobacterium tuberculosis. Front Microbiol 2024; 15:1335310. [PMID: 38812687 PMCID: PMC11133741 DOI: 10.3389/fmicb.2024.1335310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 04/15/2024] [Indexed: 05/31/2024] Open
Abstract
Bioinformatic studies on small proteins are under-represented due to difficulties in annotation posed by their small size. However, recent discoveries emphasize the functional significance of small proteins in cellular processes including cell signaling, metabolism, and adaptation to stress. In this study, we utilized a Random Forest classifier trained on sequence features, RNA-Seq, and Ribo-Seq data to uncover small proteins (smORFs) in M. tuberculosis. Independent predictions for the exponential and starvation conditions resulted in 695 potential smORFs. We examined the functional implications of these smORFs using homology searches, LC-MS/MS, and ChIP-seq data, testing their expression in diverse growth conditions, and identifying protein domains. We provide evidence that some of these smORFs could be part of operons, or exist as upstream ORFs. This expanded data resource for the proteins of M. tuberculosis would aid in fine-tuning the existing protein and gene regulatory networks, thereby improving system-wide studies. The primary goal of this study was to uncover and characterize smORFs in M. tuberculosis through bioinformatic analysis, shedding light on their functional roles and genomic organization. Further investigation of these potential smORFs would provide valuable insights into the genome organization and functional diversity of the M. tuberculosis proteome.
Collapse
Affiliation(s)
| | | | - Shubhada R. Hegde
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, India
| |
Collapse
|
7
|
Whited AM, Jungreis I, Allen J, Cleveland CL, Mudge JM, Kellis M, Rinn JL, Hough LE. Biophysical characterization of high-confidence, small human proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.12.589296. [PMID: 38659920 PMCID: PMC11042228 DOI: 10.1101/2024.04.12.589296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Significant efforts have been made to characterize the biophysical properties of proteins. Small proteins have received less attention because their annotation has historically been less reliable. However, recent improvements in sequencing, proteomics, and bioinformatics techniques have led to the high-confidence annotation of small open reading frames (smORFs) that encode for functional proteins, producing smORF-encoded proteins (SEPs). SEPs have been found to perform critical functions in several species, including humans. While significant efforts have been made to annotate SEPs, less attention has been given to the biophysical properties of these proteins. We characterized the distributions of predicted and curated biophysical properties, including sequence composition, structure, localization, function, and disease association of a conservative list of previously identified human SEPs. We found significant differences between SEPs and both larger proteins and control sets. Additionally, we provide an example of how our characterization of biophysical properties can contribute to distinguishing protein-coding smORFs from non-coding ones in otherwise ambiguous cases.
Collapse
Affiliation(s)
- A M Whited
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
| | - Irwin Jungreis
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - Jeffre Allen
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Biochemistry, University of Colorado Boulder, CO, USA
| | | | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - John L Rinn
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Biochemistry, University of Colorado Boulder, CO, USA
| | - Loren E Hough
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Physics, University of Colorado Boulder, CO, USA
| |
Collapse
|
8
|
Miravet-Verde S, Mazzolini R, Segura-Morales C, Broto A, Lluch-Senar M, Serrano L. ProTInSeq: transposon insertion tracking by ultra-deep DNA sequencing to identify translated large and small ORFs. Nat Commun 2024; 15:2091. [PMID: 38453908 PMCID: PMC10920889 DOI: 10.1038/s41467-024-46112-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 02/14/2024] [Indexed: 03/09/2024] Open
Abstract
Identifying open reading frames (ORFs) being translated is not a trivial task. ProTInSeq is a technique designed to characterize proteomes by sequencing transposon insertions engineered to express a selection marker when they occur in-frame within a protein-coding gene. In the bacterium Mycoplasma pneumoniae, ProTInSeq identifies 83% of its annotated proteins, along with 5 proteins and 153 small ORF-encoded proteins (SEPs; ≤100 aa) that were not previously annotated. Moreover, ProTInSeq can be utilized for detecting translational noise, as well as for relative quantification and transmembrane topology estimation of fitness and non-essential proteins. By integrating various identification approaches, the number of initially annotated SEPs in this bacterium increases from 27 to 329, with a quarter of them predicted to possess antimicrobial potential. Herein, we describe a methodology complementary to Ribo-Seq and mass spectroscopy that can identify SEPs while providing other insights in a proteome with a flexible and cost-effective DNA ultra-deep sequencing approach.
Collapse
Affiliation(s)
- Samuel Miravet-Verde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003, Barcelona, Spain.
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zurich, Zurich, Switzerland.
| | | | - Carolina Segura-Morales
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003, Barcelona, Spain
| | - Alicia Broto
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003, Barcelona, Spain
| | - Maria Lluch-Senar
- Pulmobiotics, Dr Aiguader 88, 08003, Barcelona, Spain.
- Institute of Biotechnology and Biomedicine "Vicent Villar Palasi" (IBB), Universitat Autònoma de Barcelona, Barcelona, Spain.
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- ICREA, Pg. Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
9
|
Mohsen JJ, Martel AA, Slavoff SA. Microproteins-Discovery, structure, and function. Proteomics 2023; 23:e2100211. [PMID: 37603371 PMCID: PMC10841188 DOI: 10.1002/pmic.202100211] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/22/2023]
Abstract
Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional approaches are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.
Collapse
Affiliation(s)
- Jessica J. Mohsen
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Alina A. Martel
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
| | - Sarah A. Slavoff
- Department of Chemistry, Yale University, New Haven, CT, USA
- Institute of Biomolecular Design and Discovery, Yale University, West Haven, CT, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| |
Collapse
|
10
|
Dimonaco NJ, Clare A, Kenobi K, Aubrey W, Creevey CJ. StORF-Reporter: finding genes between genes. Nucleic Acids Res 2023; 51:11504-11517. [PMID: 37897345 PMCID: PMC10682499 DOI: 10.1093/nar/gkad814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 09/04/2023] [Accepted: 09/27/2023] [Indexed: 10/30/2023] Open
Abstract
Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations.
Collapse
Affiliation(s)
- Nicholas J Dimonaco
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3PD, Wales, UK
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
- Department of Medicine, McMaster University, Hamilton, ON, Canada
- Farncombe Family Digestive Health Research Institute, McMaster University, Hamilton, ON, Canada
- School of Biological Sciences, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, UK
| | - Amanda Clare
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
| | - Kim Kenobi
- Department of Mathematics, Aberystwyth University, Aberystwyth SY23 3BZ, Wales, UK
| | - Wayne Aubrey
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
| | - Christopher J Creevey
- School of Biological Sciences, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, UK
| |
Collapse
|
11
|
Chen Y, Cao X, Loh KH, Slavoff SA. Chemical labeling and proteomics for characterization of unannotated small and alternative open reading frame-encoded polypeptides. Biochem Soc Trans 2023; 51:1071-1082. [PMID: 37171061 PMCID: PMC10317152 DOI: 10.1042/bst20221074] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 03/27/2023] [Accepted: 04/13/2023] [Indexed: 05/13/2023]
Abstract
Thousands of unannotated small and alternative open reading frames (smORFs and alt-ORFs, respectively) have recently been revealed in mammalian genomes. While hundreds of mammalian smORF- and alt-ORF-encoded proteins (SEPs and alt-proteins, respectively) affect cell proliferation, the overwhelming majority of smORFs and alt-ORFs remain uncharacterized at the molecular level. Complicating the task of identifying the biological roles of smORFs and alt-ORFs, the SEPs and alt-proteins that they encode exhibit limited sequence homology to protein domains of known function. Experimental techniques for the functionalization of these gene classes are therefore required. Approaches combining chemical labeling and quantitative proteomics have greatly advanced our ability to identify and characterize functional SEPs and alt-proteins in high throughput. In this review, we briefly describe the principles of proteomic discovery of SEPs and alt-proteins, then summarize how these technologies interface with chemical labeling for identification of SEPs and alt-proteins with specific properties, as well as in defining the interactome of SEPs and alt-proteins.
Collapse
Affiliation(s)
- Yanran Chen
- Department of Chemistry, Yale University, New Haven, CT, U.S.A
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, U.S.A
| | - Xiongwen Cao
- Department of Chemistry, Yale University, New Haven, CT, U.S.A
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, U.S.A
- Department of Comparative Medicine, Yale University School of Medicine, New Haven, CT, U.S.A
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Ken H. Loh
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, U.S.A
- Department of Comparative Medicine, Yale University School of Medicine, New Haven, CT, U.S.A
| | - Sarah A. Slavoff
- Department of Chemistry, Yale University, New Haven, CT, U.S.A
- Institute for Biomolecular Design and Discovery, Yale University, West Haven, CT, U.S.A
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, U.S.A
| |
Collapse
|
12
|
Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. About the dark corners in the gene function space of Escherichia coli remaining without illumination by scientific literature. Biol Direct 2023; 18:7. [PMID: 36855185 PMCID: PMC9976479 DOI: 10.1186/s13062-023-00362-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 02/21/2023] [Indexed: 03/02/2023] Open
Abstract
BACKGROUND Although Escherichia coli (E. coli) is the most studied prokaryote organism in the history of life sciences, many molecular mechanisms and gene functions encoded in its genome remain to be discovered. This work aims at quantifying the illumination of the E. coli gene function space by the scientific literature and how close we are towards the goal of a complete list of E. coli gene functions. RESULTS The scientific literature about E. coli protein-coding genes has been mapped onto the genome via the mentioning of names for genomic regions in scientific articles both for the case of the strain K-12 MG1655 as well as for the 95%-threshold softcore genome of 1324 E. coli strains with known complete genome. The article match was quantified with the ratio of a given gene name's occurrence to the mentioning of any gene names in the paper. The various genome regions have an extremely uneven literature coverage. A group of elite genes with ≥ 100 full publication equivalents (FPEs, FPE = 1 is an idealized publication devoted to just a single gene) attracts the lion share of the papers. For K-12, ~ 65% of the literature covers just 342 elite genes; for the softcore genome, ~ 68% of the FPEs is about only 342 elite gene families (GFs). We also find that most genes/GFs have at least one mentioning in a dedicated scientific article (with the exception of at least 137 protein-coding transcripts for K-12 and 26 GFs from the softcore genome). Whereas the literature growth rates were highest for uncharacterized or understudied genes until 2005-2010 compared with other groups of genes, they became negative thereafter. At the same time, literature for anyhow well-studied genes started to grow explosively with threshold T10 (≥ 10 FPEs). Typically, a body of ~ 20 actual articles generated over ~ 15 years of research effort was necessary to reach T10. Lineage-specific co-occurrence analysis of genes belonging to the accessory genome of E. coli together with genomic co-localization and sequence-analytic exploration hints previously completely uncharacterized genes yahV and yddL being associated with osmotic stress response/motility mechanisms. CONCLUSION If the numbers of scientific articles about uncharacterized and understudied genes remain at least at present levels, full gene function lists for the strain K-12 MG1655 and the E. coli softcore genome are in reach within the next 25-30 years. Once the literature body for a gene crosses 10 FPEs, most of the critical fundamental research risk appears overcome and steady incremental research becomes possible.
Collapse
Affiliation(s)
- Erwin Tantoso
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore
| | - Birgit Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore
| | - Swati Sinha
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.,European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Frank Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore. .,Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore. .,School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Republic of Singapore.
| |
Collapse
|
13
|
Chen Z, Meng J, Zhao S, Yin C, Luan Y. sORFPred: A Method Based on Comprehensive Features and Ensemble Learning to Predict the sORFs in Plant LncRNAs. Interdiscip Sci 2023; 15:189-201. [PMID: 36705893 DOI: 10.1007/s12539-023-00552-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 01/11/2023] [Accepted: 01/13/2023] [Indexed: 01/28/2023]
Abstract
Long non-coding RNAs (lncRNAs) are important regulators of biological processes. It has recently been shown that some lncRNAs include small open reading frames (sORFs) that can encode small peptides of no more than 100 amino acids. However, existing methods are commonly applied to human and animal datasets and still suffer from low feature representation capability. Thus, accurate and credible prediction of sORFs with coding ability in plant lncRNAs is imperative. This paper proposes a new method termed sORFPred, in which we design a model named MCSEN by combining multi-scale convolution and Squeeze-and-Excitation Networks to fully mine distinct information embedded in sORFs, integrate and optimize multiple sequence-based and physicochemical feature descriptors, and built a two-layer prediction classifier based on Bayesian optimization algorithm and Extra Trees. sORFPred has been evaluated on sORFs datasets of three species and experimentally validated sORFs dataset. Results indicate that sORFPred outperforms existing methods and achieves 97.28% accuracy, 97.06% precision, 97.52% recall, and 97.29% F1-score on Arabidopsis thaliana, which shows a significant improvement in prediction performance compared to various conventional shallow machine learning and deep learning models.
Collapse
Affiliation(s)
- Ziwei Chen
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.,School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China. .,School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| | - Siyuan Zhao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.,School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Chao Yin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.,School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Yushi Luan
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.,School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
| |
Collapse
|
14
|
Identification and analysis of smORFs in Chlamydomonas reinhardtii. Genomics 2022; 114:110444. [PMID: 35933072 DOI: 10.1016/j.ygeno.2022.110444] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 07/06/2022] [Accepted: 07/31/2022] [Indexed: 11/24/2022]
Abstract
Small open reading frames (smORFs) have been acknowledged as an important partner in organism functions ranging from bacteria to higher eukaryotes. However, lack of investigation of smORFs in green algae, despite their importance in ecology and evolution. We applied bioinformatic analysis, ribosome profiling, and small peptide proteomics to provide a genome-wide and high-confident smORF database in the model green alga Chlamydomonas reinhardtii. The whole genome was screened first to mine potential coding smORFs. Then conservative analysis, ribosome profiling, and proteomics data were processed to identify conserved smORFs and generate translation evidence. The combination of procedures resulted in 2014 smORFs that might exist in the C. reinhardtii genome. The expression of smORFs in Cd treatment suggested that two smORFs might participate in redox reaction, three in inorganic phosphate transport, and one in DNA repair under stress. Our study built a genome-widely database in C. reinhardtii, providing target smORFs for further research.
Collapse
|
15
|
Méheust R, Castelle CJ, Jaffe AL, Banfield JF. Conserved and lineage-specific hypothetical proteins may have played a central role in the rise and diversification of major archaeal groups. BMC Biol 2022; 20:154. [PMID: 35790962 PMCID: PMC9258230 DOI: 10.1186/s12915-022-01348-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/09/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Archaea play fundamental roles in the environment, for example by methane production and consumption, ammonia oxidation, protein degradation, carbon compound turnover, and sulfur compound transformations. Recent genomic analyses have profoundly reshaped our understanding of the distribution and functionalities of Archaea and their roles in eukaryotic evolution. RESULTS Here, 1179 representative genomes were selected from 3197 archaeal genomes. The representative genomes clustered based on the content of 10,866 newly defined archaeal protein families (that will serve as a community resource) recapitulates archaeal phylogeny. We identified the co-occurring proteins that distinguish the major lineages. Those with metabolic roles were consistent with experimental data. However, two families specific to Asgard were determined to be new eukaryotic signature proteins. Overall, the blocks of lineage-specific families are dominated by proteins that lack functional predictions. CONCLUSIONS Given that these hypothetical proteins are near ubiquitous within major archaeal groups, we propose that they were important in the origin of most of the major archaeal lineages. Interestingly, although there were clearly phylum-specific co-occurring proteins, no such blocks of protein families were shared across superphyla, suggesting a burst-like origin of new lineages early in archaeal evolution.
Collapse
Affiliation(s)
- Raphaël Méheust
- Department of Earth and Planetary Science, University of California, Berkeley, CA, USA. .,Innovative Genomics Institute, University of California, Berkeley, CA, USA. .,LABGeM, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, Evry, France.
| | - Cindy J Castelle
- Department of Earth and Planetary Science, University of California, Berkeley, CA, USA.,Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Alexander L Jaffe
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Jillian F Banfield
- Department of Earth and Planetary Science, University of California, Berkeley, CA, USA. .,Innovative Genomics Institute, University of California, Berkeley, CA, USA. .,Chan Zuckerberg Biohub, San Francisco, CA, USA. .,Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA.
| |
Collapse
|
16
|
Fijalkowski I, Willems P, Jonckheere V, Simoens L, Van Damme P. Hidden in plain sight: challenges in proteomics detection of small ORF-encoded polypeptides. MICROLIFE 2022; 3:uqac005. [PMID: 37223358 PMCID: PMC10117744 DOI: 10.1093/femsml/uqac005] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 04/18/2022] [Accepted: 04/29/2022] [Indexed: 05/25/2023]
Abstract
Genomic studies of bacteria have long pointed toward widespread prevalence of small open reading frames (sORFs) encoding for short proteins, <100 amino acids in length. Despite the mounting genomic evidence of their robust expression, relatively little progress has been made in their mass spectrometry-based detection and various blanket statements have been used to explain this observed discrepancy. In this study, we provide a large-scale riboproteogenomics investigation of the challenging nature of proteomic detection of such small proteins as informed by conditional translation data. A panel of physiochemical properties alongside recently developed mass spectrometry detectability metrics was interrogated to provide a comprehensive evidence-based assessment of sORF-encoded polypeptide (SEP) detectability. Moreover, a large-scale proteomics and translatomics compendium of proteins produced by Salmonella Typhimurium (S. Typhimurium), a model human pathogen, across a panel of growth conditions is presented and used in support of our in silico SEP detectability analysis. This integrative approach is used to provide a data-driven census of small proteins expressed by S. Typhimurium across growth phases and infection-relevant conditions. Taken together, our study pinpoints current limitations in proteomics-based detection of novel small proteins currently missing from bacterial genome annotations.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Patrick Willems
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Veronique Jonckheere
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
17
|
Abstract
While most small, regulatory RNAs are thought to be “noncoding,” a few have been found to also encode a small protein. Here we describe a 164-nucleotide RNA that encodes a 28-amino acid, amphipathic protein, which interacts with aerobic glycerol-3-phosphate dehydrogenase and increases dehydrogenase activity but also base pairs with two mRNAs to reduce expression. The coding and base-pairing sequences overlap, and the two regulatory functions compete. Bacteria have evolved small RNAs (sRNAs) to regulate numerous biological processes and stress responses. While sRNAs generally are considered to be “noncoding,” a few have been found to also encode a small protein. Here we describe one such dual-function RNA that modulates carbon utilization in Escherichia coli. The 164-nucleotide RNA was previously shown to encode a 28-amino acid protein (denoted AzuC). We discovered the membrane-associated AzuC protein interacts with GlpD, the aerobic glycerol-3-phosphate dehydrogenase, and increases dehydrogenase activity. Overexpression of the RNA encoding AzuC results in a growth defect in glycerol and galactose medium. The defect in galactose medium was still observed for a stop codon mutant derivative, suggesting a second role for the RNA. Consistent with this observation, we found that cadA and galE are repressed by base pairing with the RNA (denoted AzuR). Interestingly, AzuC translation interferes with the observed repression of cadA and galE by the RNA and base pairing interferes with AzuC translation, demonstrating that the translation and base-pairing functions compete.
Collapse
|
18
|
Yadavalli SS, Yuan J. Bacterial Small Membrane Proteins: the Swiss Army Knife of Regulators at the Lipid Bilayer. J Bacteriol 2022; 204:e0034421. [PMID: 34516282 PMCID: PMC8765417 DOI: 10.1128/jb.00344-21] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Small membrane proteins represent a subset of recently discovered small proteins (≤100 amino acids), which are a ubiquitous class of emerging regulators underlying bacterial adaptation to environmental stressors. Until relatively recently, small open reading frames encoding these proteins were not designated genes in genome annotations. Therefore, our understanding of small protein biology was primarily limited to a few candidates associated with previously characterized larger partner proteins. Following the first systematic analyses of small proteins in Escherichia coli over a decade ago, numerous small proteins across different bacteria have been uncovered. An estimated one-third of these newly discovered proteins in E. coli are localized to the cell membrane, where they may interact with distinct groups of membrane proteins, such as signal receptors, transporters, and enzymes, and affect their activities. Recently, there has been considerable progress in functionally characterizing small membrane protein regulators aided by innovative tools adapted specifically to study small proteins. Our review covers prototypical proteins that modulate a broad range of cellular processes, such as transport, signal transduction, stress response, respiration, cell division, sporulation, and membrane stability. Thus, small membrane proteins represent a versatile group of physiology regulators at the membrane and the whole cell. Additionally, small membrane proteins have the potential for clinical applications, where some of the proteins may act as antibacterial agents themselves while others serve as alternative drug targets for the development of novel antimicrobials.
Collapse
Affiliation(s)
- Srujana S. Yadavalli
- Waksman Institute of Microbiology, Rutgers University, Piscataway, New Jersey, USA
- Department of Genetics, Rutgers University, Piscataway, New Jersey, USA
| | - Jing Yuan
- Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
- LOEWE Center for Synthetic Microbiology (SYNMIKRO), Marburg, Germany
| |
Collapse
|
19
|
Expression of the DeaD RNA helicase is regulated at multiple levels through its long mRNA 5' untranslated region. J Bacteriol 2022; 204:e0061321. [PMID: 35041499 DOI: 10.1128/jb.00613-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
DEAD-box proteins (DBPs) are a prominent class of RNA remodeling proteins that alter RNA structure, a process they typically perform through an ATP-dependent RNA helicase activity. Although many DBPs have been characterized at the structural and functional level in detail, much less is known about how they are regulated. We previously showed that the messenger RNA (mRNA) for the Escherichia coli (E. coli) DeaD DBP contains an unusually long 5' untranslated region (5' UTR) of 838 nucleotides (nts) and that it is the primary RNA determinant of DeaD autoregulation. We speculated that such a long and complex 5' UTR might regulate deaD expression in additional ways. Here we show that the deaD mRNA 5' UTR regulates deaD expression at two additional levels: temperature dependent expression and through a stem-loop structure overlapping the start codon. These results support the hypothesis that a long 5' UTR can regulate gene expression through multiple mechanisms. Importance The expression of genes is frequently regulated by determinants with the 5' UTR. Although many different regulatory mechanisms that operate via the 5' UTR have been described, the functional relevance of genes with long UTRs is less clear. Here, we show that the 838 nt long 5' UTR in the deaD mRNA regulates the expression of DeaD at multiple levels. We propose that long UTRs originate to provide precise control of gene expression through multiple regulatory mechanisms, and they are indicators of the importance of their associated gene products for cellular adaptation to different environments.
Collapse
|
20
|
Miyakoshi M, Okayama H, Lejars M, Kanda T, Tanaka Y, Itaya K, Okuno M, Itoh T, Iwai N, Wachi M. Mining RNA-seq data reveals the massive regulon of GcvB small RNA and its physiological significance in maintaining amino acid homeostasis in Escherichia coli. Mol Microbiol 2022; 117:160-178. [PMID: 34543491 PMCID: PMC9299463 DOI: 10.1111/mmi.14814] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 09/15/2021] [Accepted: 09/17/2021] [Indexed: 11/30/2022]
Abstract
Bacterial small RNAs regulate the expression of multiple genes through imperfect base-pairing with target mRNAs mediated by RNA chaperone proteins such as Hfq. GcvB is the master sRNA regulator of amino acid metabolism and transport in a wide range of Gram-negative bacteria. Recently, independent RNA-seq approaches identified a plethora of transcripts interacting with GcvB in Escherichia coli. In this study, the compilation of RIL-seq, CLASH, and MAPS data sets allowed us to identify GcvB targets with high accuracy. We validated 21 new GcvB targets repressed at the posttranscriptional level, raising the number of direct targets to >50 genes in E. coli. Among its multiple seed sequences, GcvB utilizes either R1 or R3 to regulate most of these targets. Furthermore, we demonstrated that both R1 and R3 seed sequences are required to fully repress the expression of gdhA, cstA, and sucC genes. In contrast, the ilvLXGMEDA polycistronic mRNA is targeted by GcvB through at least four individual binding sites in the mRNA. Finally, we revealed that GcvB is involved in the susceptibility of peptidase-deficient E. coli strain (Δpeps) to Ala-Gln dipeptide by regulating both Dpp dipeptide importer and YdeE dipeptide exporter via R1 and R3 seed sequences, respectively.
Collapse
Affiliation(s)
- Masatoshi Miyakoshi
- Department of Biomedical ScienceFaculty of MedicineUniversity of TsukubaTsukubaJapan
| | - Haruna Okayama
- Department of Life Science and TechnologyTokyo Institute of TechnologyYokohamaJapan
| | - Maxence Lejars
- Department of Biomedical ScienceFaculty of MedicineUniversity of TsukubaTsukubaJapan
| | - Takeshi Kanda
- Department of Biomedical ScienceFaculty of MedicineUniversity of TsukubaTsukubaJapan
| | - Yuki Tanaka
- Department of Life Science and TechnologyTokyo Institute of TechnologyYokohamaJapan
| | - Kaori Itaya
- Department of Life Science and TechnologyTokyo Institute of TechnologyYokohamaJapan
| | - Miki Okuno
- Department of Life Science and TechnologyTokyo Institute of TechnologyYokohamaJapan
- Present address:
School of MedicineKurume UniversityKurumeJapan
| | - Takehiko Itoh
- Department of Life Science and TechnologyTokyo Institute of TechnologyYokohamaJapan
| | - Noritaka Iwai
- Department of Life Science and TechnologyTokyo Institute of TechnologyYokohamaJapan
| | - Masaaki Wachi
- Department of Life Science and TechnologyTokyo Institute of TechnologyYokohamaJapan
| |
Collapse
|
21
|
Miyakoshi M, Morita T, Kobayashi A, Berger A, Takahashi H, Gotoh Y, Hayashi T, Tanaka K. Glutamine synthetase mRNA releases sRNA from its 3'UTR to regulate carbon/nitrogen metabolic balance in Enterobacteriaceae. eLife 2022; 11:82411. [PMID: 36440827 PMCID: PMC9731577 DOI: 10.7554/elife.82411] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 11/27/2022] [Indexed: 11/29/2022] Open
Abstract
Glutamine synthetase (GS) is the key enzyme of nitrogen assimilation induced under nitrogen limiting conditions. The carbon skeleton of glutamate and glutamine, 2-oxoglutarate, is supplied from the TCA cycle, but how this metabolic flow is controlled in response to nitrogen availability remains unknown. We show that the expression of the E1o component of 2-oxoglutarate dehydrogenase, SucA, is repressed under nitrogen limitation in Salmonella enterica and Escherichia coli. The repression is exerted at the post-transcriptional level by an Hfq-dependent sRNA GlnZ generated from the 3'UTR of the GS-encoding glnA mRNA. Enterobacterial GlnZ variants contain a conserved seed sequence and primarily regulate sucA through base-pairing far upstream of the translation initiation region. During growth on glutamine as the nitrogen source, the glnA 3'UTR deletion mutants expressed SucA at higher levels than the S. enterica and E. coli wild-type strains, respectively. In E. coli, the transcriptional regulator Nac also participates in the repression of sucA. Lastly, this study clarifies that the release of GlnZ from the glnA mRNA by RNase E is essential for the post-transcriptional regulation of sucA. Thus, the mRNA coordinates the two independent functions to balance the supply and demand of the fundamental metabolites.
Collapse
Affiliation(s)
- Masatoshi Miyakoshi
- Department of Infection Biology, Faculty of Medicine, University of TsukubaTsukubaJapan,Transborder Medical Research Center, University of TsukubaTsukubaJapan,International Joint Degree Master’s Program in Agro-Biomedical Science in Food and Health (GIP-TRIAD), University of TsukubaTsukubaJapan
| | - Teppei Morita
- Institute for Advanced Biosciences, Keio UniversityTsuruokaJapan,Graduate School of Media and Governance, Keio UniversityFujisawaJapan
| | - Asaki Kobayashi
- Transborder Medical Research Center, University of TsukubaTsukubaJapan
| | - Anna Berger
- International Joint Degree Master’s Program in Agro-Biomedical Science in Food and Health (GIP-TRIAD), University of TsukubaTsukubaJapan
| | | | - Yasuhiro Gotoh
- Department of Bacteriology, Faculty of Medical Sciences, Kyushu UniversityFukuokaJapan
| | - Tetsuya Hayashi
- Department of Bacteriology, Faculty of Medical Sciences, Kyushu UniversityFukuokaJapan
| | - Kan Tanaka
- Laboratory for Chemistry and Life Science, Institute of Innovative Research, Tokyo Institute of TechnologyYokohamaJapan
| |
Collapse
|
22
|
Fijalkowski I, Peeters MKR, Van Damme P. Small Protein Enrichment Improves Proteomics Detection of sORF Encoded Polypeptides. Front Genet 2021; 12:713400. [PMID: 34721520 PMCID: PMC8554064 DOI: 10.3389/fgene.2021.713400] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 10/01/2021] [Indexed: 11/13/2022] Open
Abstract
With the rapid growth in the number of sequenced genomes, genome annotation efforts became almost exclusively reliant on automated pipelines. Despite their unquestionable utility, these methods have been shown to underestimate the true complexity of the studied genomes, with small open reading frames (sORFs; ORFs typically considered shorter than 300 nucleotides) and, in consequence, their protein products (sORF encoded polypeptides or SEPs) being the primary example of a poorly annotated and highly underexplored class of genomic elements. With the advent of advanced translatomics such as ribosome profiling, reannotation efforts have progressed a great deal in providing translation evidence for numerous, previously unannotated sORFs. However, proteomics validation of these riboproteogenomics discoveries remains challenging due to their short length and often highly variable physiochemical properties. In this work we evaluate and compare tailored, yet easily adaptable, protein extraction methodologies for their efficacy in the extraction and concomitantly proteomics detection of SEPs expressed in the prokaryotic model pathogen Salmonella typhimurium (S. typhimurium). Further, an optimized protocol for the enrichment and efficient detection of SEPs making use of the of amphipathic polymer amphipol A8-35 and relying on differential peptide vs. protein solubility was developed and compared with global extraction methods making use of chaotropic agents. Given the versatile biological functions SEPs have been shown to exert, this work provides an accessible protocol for proteomics exploration of this fascinating class of small proteins.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Gent, Belgium
| | - Marlies K. R. Peeters
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Gent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Gent, Belgium
| |
Collapse
|
23
|
Venkat K, Hoyos M, Haycocks JR, Cassidy L, Engelmann B, Rolle-Kampczyk U, von Bergen M, Tholey A, Grainger DC, Papenfort K. A dual-function RNA balances carbon uptake and central metabolism in Vibrio cholerae. EMBO J 2021; 40:e108542. [PMID: 34612526 PMCID: PMC8672173 DOI: 10.15252/embj.2021108542] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 08/31/2021] [Accepted: 09/02/2021] [Indexed: 11/22/2022] Open
Abstract
Bacterial small RNAs (sRNAs) are well known to modulate gene expression by base pairing with trans‐encoded transcripts and are typically non‐coding. However, several sRNAs have been reported to also contain an open reading frame and thus are considered dual‐function RNAs. In this study, we discovered a dual‐function RNA from Vibrio cholerae, called VcdRP, harboring a 29 amino acid small protein (VcdP), as well as a base‐pairing sequence. Using a forward genetic screen, we identified VcdRP as a repressor of cholera toxin production and link this phenotype to the inhibition of carbon transport by the base‐pairing segment of the regulator. By contrast, we demonstrate that the VcdP small protein acts downstream of carbon transport by binding to citrate synthase (GltA), the first enzyme of the citric acid cycle. Interaction of VcdP with GltA results in increased enzyme activity and together VcdR and VcdP reroute carbon metabolism. We further show that transcription of vcdRP is repressed by CRP allowing us to provide a model in which VcdRP employs two different molecular mechanisms to synchronize central metabolism in V. cholerae.
Collapse
Affiliation(s)
- Kavyaa Venkat
- Institute of Microbiology, Friedrich Schiller University, Jena, Germany
| | - Mona Hoyos
- Institute of Microbiology, Friedrich Schiller University, Jena, Germany
| | - James Rj Haycocks
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics, University of Kiel, Kiel, Germany
| | | | | | | | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, University of Kiel, Kiel, Germany
| | - David C Grainger
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
| | - Kai Papenfort
- Institute of Microbiology, Friedrich Schiller University, Jena, Germany.,Microverse Cluster, Friedrich Schiller University Jena, Jena, Germany
| |
Collapse
|
24
|
Abstract
Escherichia coli was one of the first species to have its genome sequenced and remains one of the best-characterized model organisms. Thus, it is perhaps surprising that recent studies have shown that a substantial number of genes have been overlooked. Genes encoding more than 140 small proteins, defined as those containing 50 or fewer amino acids, have been identified in E. coli in the past 10 years, and there is substantial evidence indicating that many more remain to be discovered. This review covers the methods that have been successful in identifying small proteins and the short open reading frames that encode them. The small proteins that have been functionally characterized to date in this model organism are also discussed. It is hoped that the review, along with the associated databases of known as well as predicted but undetected small proteins, will aid in and provide a roadmap for the continued identification and characterization of these proteins in E. coli as well as other bacteria.
Collapse
|
25
|
Bartholomäus A, Kolte B, Mustafayeva A, Goebel I, Fuchs S, Benndorf D, Engelmann S, Ignatova Z. smORFer: a modular algorithm to detect small ORFs in prokaryotes. Nucleic Acids Res 2021; 49:e89. [PMID: 34125903 PMCID: PMC8421149 DOI: 10.1093/nar/gkab477] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 04/29/2021] [Accepted: 05/18/2021] [Indexed: 11/15/2022] Open
Abstract
Emerging evidence places small proteins (≤50 amino acids) more centrally in physiological processes. Yet, their functional identification and the systematic genome annotation of their cognate small open-reading frames (smORFs) remains challenging both experimentally and computationally. Ribosome profiling or Ribo-Seq (that is a deep sequencing of ribosome-protected fragments) enables detecting of actively translated open-reading frames (ORFs) and empirical annotation of coding sequences (CDSs) using the in-register translation pattern that is characteristic for genuinely translating ribosomes. Multiple identifiers of ORFs that use the 3-nt periodicity in Ribo-Seq data sets have been successful in eukaryotic smORF annotation. They have difficulties evaluating prokaryotic genomes due to the unique architecture (e.g. polycistronic messages, overlapping ORFs, leaderless translation, non-canonical initiation etc.). Here, we present a new algorithm, smORFer, which performs with high accuracy in prokaryotic organisms in detecting putative smORFs. The unique feature of smORFer is that it uses an integrated approach and considers structural features of the genetic sequence along with in-frame translation and uses Fourier transform to convert these parameters into a measurable score to faithfully select smORFs. The algorithm is executed in a modular way, and dependent on the data available for a particular organism, different modules can be selected for smORF search.
Collapse
Affiliation(s)
- Alexander Bartholomäus
- GFZ German Research Centre for Geosciences, Section Geomicrobiology, 14473 Potsdam, Germany.,Inst. Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Baban Kolte
- Inst. Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | - Ayten Mustafayeva
- Helmholtz Center for Infection Research, Microbial Proteomics, 38124 Braunschweig, Germany.,Inst. Microbiology, TU Braunschweig, Braunschweig, Germany
| | - Ingrid Goebel
- Inst. Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| | | | - Dirk Benndorf
- Otto von Guericke University, Bioprocess Engineering, 39106 Magdeburg, Germany.,Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, 39106 Magdeburg, Germany
| | - Susanne Engelmann
- Helmholtz Center for Infection Research, Microbial Proteomics, 38124 Braunschweig, Germany.,Inst. Microbiology, TU Braunschweig, Braunschweig, Germany
| | - Zoya Ignatova
- Inst. Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, 20146 Hamburg, Germany
| |
Collapse
|
26
|
Fuchs S, Kucklick M, Lehmann E, Beckmann A, Wilkens M, Kolte B, Mustafayeva A, Ludwig T, Diwo M, Wissing J, Jänsch L, Ahrens CH, Ignatova Z, Engelmann S. Towards the characterization of the hidden world of small proteins in Staphylococcus aureus, a proteogenomics approach. PLoS Genet 2021; 17:e1009585. [PMID: 34061833 PMCID: PMC8195425 DOI: 10.1371/journal.pgen.1009585] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 06/11/2021] [Accepted: 05/07/2021] [Indexed: 01/08/2023] Open
Abstract
Small proteins play essential roles in bacterial physiology and virulence, however, automated algorithms for genome annotation are often not yet able to accurately predict the corresponding genes. The accuracy and reliability of genome annotations, particularly for small open reading frames (sORFs), can be significantly improved by integrating protein evidence from experimental approaches. Here we present a highly optimized and flexible bioinformatics workflow for bacterial proteogenomics covering all steps from (i) generation of protein databases, (ii) database searches and (iii) peptide-to-genome mapping to (iv) visualization of results. We used the workflow to identify high quality peptide spectrum matches (PSMs) for small proteins (≤ 100 aa, SP100) in Staphylococcus aureus Newman. Protein extracts from S. aureus were subjected to different experimental workflows for protein digestion and prefractionation and measured with highly sensitive mass spectrometers. In total, 175 proteins with up to 100 aa (SP100) were identified. Out of these 24 (ranging from 9 to 99 aa) were novel and not contained in the used genome annotation.144 SP100 are highly conserved and were found in at least 50% of the publicly available S. aureus genomes, while 127 are additionally conserved in other staphylococci. Almost half of the identified SP100 were basic, suggesting a role in binding to more acidic molecules such as nucleic acids or phospholipids.
Collapse
Affiliation(s)
- Stephan Fuchs
- Robert Koch Institute, Methodenentwicklung und Forschungsinfrastruktur (MF), Berlin, Germany
| | - Martin Kucklick
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Erik Lehmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Alexander Beckmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Maya Wilkens
- Robert Koch Institute, Methodenentwicklung und Forschungsinfrastruktur (MF), Berlin, Germany
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Baban Kolte
- University of Hamburg, Institute of Biochemistry and Molecular Biology, Hamburg, Germany
| | - Ayten Mustafayeva
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Tobias Ludwig
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Maurice Diwo
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Josef Wissing
- Helmholtz Center for Infection Research GmbH, Cellular Proteomics, Braunschweig, Germany
| | - Lothar Jänsch
- Helmholtz Center for Infection Research GmbH, Cellular Proteomics, Braunschweig, Germany
| | - Christian H Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Zoya Ignatova
- University of Hamburg, Institute of Biochemistry and Molecular Biology, Hamburg, Germany
| | - Susanne Engelmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| |
Collapse
|
27
|
Tharakan R, Sawa A. Minireview: Novel Micropeptide Discovery by Proteomics and Deep Sequencing Methods. Front Genet 2021; 12:651485. [PMID: 34025718 PMCID: PMC8136307 DOI: 10.3389/fgene.2021.651485] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 03/22/2021] [Indexed: 12/12/2022] Open
Abstract
A novel class of small proteins, called micropeptides, has recently been discovered in the genome. These proteins, which have been found to play important roles in many physiological and cellular systems, are shorter than 100 amino acids and were overlooked during previous genome annotations. Discovery and characterization of more micropeptides has been ongoing, often using -omics methods such as proteomics, RNA sequencing, and ribosome profiling. In this review, we survey the recent advances in the micropeptides field and describe the methodological and conceptual challenges facing future micropeptide endeavors.
Collapse
Affiliation(s)
- Ravi Tharakan
- National Institute on Aging, National Institutes of Health, Baltimore, MD, United States
| | - Akira Sawa
- Departments of Psychiatry, Neuroscience, Biomedical Engineering, and Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States.,Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| |
Collapse
|
28
|
Gerovac M, Vogel J, Smirnov A. The World of Stable Ribonucleoproteins and Its Mapping With Grad-Seq and Related Approaches. Front Mol Biosci 2021; 8:661448. [PMID: 33898526 PMCID: PMC8058203 DOI: 10.3389/fmolb.2021.661448] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 03/04/2021] [Indexed: 12/13/2022] Open
Abstract
Macromolecular complexes of proteins and RNAs are essential building blocks of cells. These stable supramolecular particles can be viewed as minimal biochemical units whose structural organization, i.e., the way the RNA and the protein interact with each other, is directly linked to their biological function. Whether those are dynamic regulatory ribonucleoproteins (RNPs) or integrated molecular machines involved in gene expression, the comprehensive knowledge of these units is critical to our understanding of key molecular mechanisms and cell physiology phenomena. Such is the goal of diverse complexomic approaches and in particular of the recently developed gradient profiling by sequencing (Grad-seq). By separating cellular protein and RNA complexes on a density gradient and quantifying their distributions genome-wide by mass spectrometry and deep sequencing, Grad-seq charts global landscapes of native macromolecular assemblies. In this review, we propose a function-based ontology of stable RNPs and discuss how Grad-seq and related approaches transformed our perspective of bacterial and eukaryotic ribonucleoproteins by guiding the discovery of new RNA-binding proteins and unusual classes of noncoding RNAs. We highlight some methodological aspects and developments that permit to further boost the power of this technique and to look for exciting new biology in understudied and challenging biological models.
Collapse
Affiliation(s)
- Milan Gerovac
- Institute of Molecular Infection Biology (IMIB), University of Würzburg, Würzburg, Germany
| | - Jörg Vogel
- Institute of Molecular Infection Biology (IMIB), University of Würzburg, Würzburg, Germany
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), Würzburg, Germany
| | - Alexandre Smirnov
- UMR 7156—Génétique Moléculaire, Génomique, Microbiologie (GMGM), University of Strasbourg, CNRS, Strasbourg, France
- University of Strasbourg Institute for Advanced Study (USIAS), Strasbourg, France
| |
Collapse
|
29
|
Steinberg R, Koch HG. The largely unexplored biology of small proteins in pro- and eukaryotes. FEBS J 2021; 288:7002-7024. [PMID: 33780127 DOI: 10.1111/febs.15845] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/11/2021] [Accepted: 03/26/2021] [Indexed: 12/29/2022]
Abstract
The large abundance of small open reading frames (smORFs) in prokaryotic and eukaryotic genomes and the plethora of smORF-encoded small proteins became only apparent with the constant advancements in bioinformatic, genomic, proteomic, and biochemical tools. Small proteins are typically defined as proteins of < 50 amino acids in prokaryotes and of less than 100 amino acids in eukaryotes, and their importance for cell physiology and cellular adaptation is only beginning to emerge. In contrast to antimicrobial peptides, which are secreted by prokaryotic and eukaryotic cells for combatting pathogens and competitors, small proteins act within the producing cell mainly by stabilizing protein assemblies and by modifying the activity of larger proteins. Production of small proteins is frequently linked to stress conditions or environmental changes, and therefore, cells seem to use small proteins as intracellular modifiers for adjusting cell metabolism to different intra- and extracellular cues. However, the size of small proteins imposes a major challenge for the cellular machinery required for protein folding and intracellular trafficking and recent data indicate that small proteins can engage distinct trafficking pathways. In the current review, we describe the diversity of small proteins in prokaryotes and eukaryotes, highlight distinct and common features, and illustrate how they are handled by the protein trafficking machineries in prokaryotic and eukaryotic cells. Finally, we also discuss future topics of research on this fascinating but largely unexplored group of proteins.
Collapse
Affiliation(s)
- Ruth Steinberg
- Institute for Biochemistry and Molecular Biology, Zentrum für Biochemie und Molekulare Medizin (ZMBZ), Faculty of Medicine, Albert-Ludwigs-Universität Freiburg, Germany
| | - Hans-Georg Koch
- Institute for Biochemistry and Molecular Biology, Zentrum für Biochemie und Molekulare Medizin (ZMBZ), Faculty of Medicine, Albert-Ludwigs-Universität Freiburg, Germany
| |
Collapse
|
30
|
The History of Colistin Resistance Mechanisms in Bacteria: Progress and Challenges. Microorganisms 2021; 9:microorganisms9020442. [PMID: 33672663 PMCID: PMC7924381 DOI: 10.3390/microorganisms9020442] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 02/17/2021] [Accepted: 02/18/2021] [Indexed: 12/13/2022] Open
Abstract
Since 2015, the discovery of colistin resistance genes has been limited to the characterization of new mobile colistin resistance (mcr) gene variants. However, given the complexity of the mechanisms involved, there are many colistin-resistant bacterial strains whose mechanism remains unknown and whose exploitation requires complementary technologies. In this review, through the history of colistin, we underline the methods used over the last decades, both old and recent, to facilitate the discovery of the main colistin resistance mechanisms and how new technological approaches may help to improve the rapid and efficient exploration of new target genes. To accomplish this, a systematic search was carried out via PubMed and Google Scholar on published data concerning polymyxin resistance from 1950 to 2020 using terms most related to colistin. This review first explores the history of the discovery of the mechanisms of action and resistance to colistin, based on the technologies deployed. Then we focus on the most advanced technologies used, such as MALDI-TOF-MS, high throughput sequencing or the genetic toolbox. Finally, we outline promising new approaches, such as omics tools and CRISPR-Cas9, as well as the challenges they face. Much has been achieved since the discovery of polymyxins, through several innovative technologies. Nevertheless, colistin resistance mechanisms remains very complex.
Collapse
|
31
|
Schlesinger D, Elsässer SJ. Revisiting sORFs: overcoming challenges to identify and characterize functional microproteins. FEBS J 2021; 289:53-74. [PMID: 33595896 DOI: 10.1111/febs.15769] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 01/17/2021] [Accepted: 02/15/2021] [Indexed: 02/07/2023]
Abstract
Short ORFs (sORFs), that is, occurrences of a start and stop codon within 100 codons or less, can be found in organisms of all domains of life, outnumbering annotated protein-coding ORFs by orders of magnitude. Even though functional proteins smaller than 100 amino acids are known, the coding potential of sORFs has often been overlooked, as it is not trivial to predict and test for functionality within the large number of sORFs. Recent advances in ribosome profiling and mass spectrometry approaches, together with refined bioinformatic predictions, have enabled a huge leap forward in this field and identified thousands of likely coding sORFs. A relatively low number of small proteins or microproteins produced from these sORFs have been characterized so far on the molecular, structural, and/or mechanistic level. These however display versatile and, in some cases, essential cellular functions, allowing for the exciting possibility that many more, previously unknown small proteins might be encoded in the genome, waiting to be discovered. This review will give an overview of the steadily growing microprotein field, focusing on eukaryotic small proteins. We will discuss emerging themes in the molecular action of microproteins, as well as advances and challenges in microprotein identification and characterization.
Collapse
Affiliation(s)
- Dörte Schlesinger
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| | - Simon J Elsässer
- Science for Life Laboratory, Division of Genome Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Ming Wai Lau Centre for Reparative Medicine, Stockholm node, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
32
|
Gunnarsson S, Prabakaran S. In silico identification of novel open reading frames in Plasmodium falciparum oocyte and salivary gland sporozoites using proteogenomics framework. Malar J 2021; 20:71. [PMID: 33546698 PMCID: PMC7866754 DOI: 10.1186/s12936-021-03598-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 01/16/2021] [Indexed: 11/25/2022] Open
Abstract
Background Plasmodium falciparum causes the deadliest form of malaria, which remains one of the most prevalent infectious diseases. Unfortunately, the only licensed vaccine showed limited protection and resistance to anti-malarial drug is increasing, which can be largely attributed to the biological complexity of the parasite’s life cycle. The progression from one developmental stage to another in P. falciparum involves drastic changes in gene expressions, where its infectivity to human hosts varies greatly depending on the stage. Approaches to identify candidate genes that are responsible for the development of infectivity to human hosts typically involve differential gene expression analysis between stages. However, the detection may be limited to annotated proteins and open reading frames (ORFs) predicted using restrictive criteria. Methods The above problem is particularly relevant for P. falciparum; whose genome annotation is relatively incomplete given its clinical significance. In this work, systems proteogenomics approach was used to address this challenge, as it allows computational detection of unannotated, novel Open Reading Frames (nORFs), which are neglected by conventional analyses. Two pairs of transcriptome/proteome were obtained from a previous study where one was collected in the mosquito-infectious oocyst sporozoite stage, and the other in the salivary gland sporozoite stage with human infectivity. They were then re-analysed using the proteogenomics framework to identify nORFs in each stage. Results Translational products of nORFs that map to antisense, intergenic, intronic, 3′ UTR and 5′ UTR regions, as well as alternative reading frames of canonical proteins were detected. Some of these nORFs also showed differential expression between the two life cycle stages studied. Their regulatory roles were explored through further bioinformatics analyses including the expression regulation on the parent reference genes, in silico structure prediction, and gene ontology term enrichment analysis. Conclusion The identification of nORFs in P. falciparum sporozoites highlights the biological complexity of the parasite. Although the analyses are solely computational, these results provide a starting point for further experimental validation of the existence and functional roles of these nORFs,
Collapse
Affiliation(s)
- Sophie Gunnarsson
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Sudhakaran Prabakaran
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK.
| |
Collapse
|
33
|
Adams PP, Baniulyte G, Esnault C, Chegireddy K, Singh N, Monge M, Dale RK, Storz G, Wade JT. Regulatory roles of Escherichia coli 5' UTR and ORF-internal RNAs detected by 3' end mapping. eLife 2021; 10:62438. [PMID: 33460557 PMCID: PMC7815308 DOI: 10.7554/elife.62438] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 11/26/2020] [Indexed: 02/06/2023] Open
Abstract
Many bacterial genes are regulated by RNA elements in their 5´ untranslated regions (UTRs). However, the full complement of these elements is not known even in the model bacterium Escherichia coli. Using complementary RNA-sequencing approaches, we detected large numbers of 3´ ends in 5´ UTRs and open reading frames (ORFs), suggesting extensive regulation by premature transcription termination. We documented regulation for multiple transcripts, including spermidine induction involving Rho and translation of an upstream ORF for an mRNA encoding a spermidine efflux pump. In addition to discovering novel sites of regulation, we detected short, stable RNA fragments derived from 5´ UTRs and sequences internal to ORFs. Characterization of three of these transcripts, including an RNA internal to an essential cell division gene, revealed that they have independent functions as sRNA sponges. Thus, these data uncover an abundance of cis- and trans-acting RNA regulators in bacterial 5´ UTRs and internal to ORFs. In most organisms, specific segments of a cell’s genetic information are copied to form single-stranded molecules of various sizes and purposes. Each of these RNA molecules, as they are known, is constructed as a chain that starts at the 5´ end and terminates at the 3´ end. Certain RNAs carry the information present in a gene, which provides the instructions that a cell needs to build proteins. Some, however, are ‘non-coding’ and instead act to fine-tune the activity of other RNAs. These regulatory RNAs can be separate from the RNAs they control, or they can be embedded in the very sequences they regulate; new evidence also shows that certain regulatory RNAs can act in both ways. Many regulatory RNAs are yet to be catalogued, even in simple, well-studied species such as the bacterium Escherichia coli. Here, Adams et al. aimed to better characterize the regulatory RNAs present in E. coli by mapping out the 3´ ends of every RNA molecule in the bacterium. This revealed many new regulatory RNAs and offered insights into where these sequences are located. For instance, the results show that several of these RNAs were embedded within RNA produced from larger genes. Some were nested in coding RNAs, and were parts of a longer RNA sequence that is adjacent to the protein coding segment. Others, however, were present within the instructions that code for a protein. The work by Adams et al. reveals that regulatory RNAs can be located in unexpected places, and provides a method for identifying them. This can be applied to other types of bacteria, in particular in species with few known RNA regulators.
Collapse
Affiliation(s)
- Philip P Adams
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, United States.,Postdoctoral Research Associate Program, National Institute of General Medical Sciences, National Institutes of Health, Bethesda, United States
| | - Gabriele Baniulyte
- Wadsworth Center, New York State Department of Health, Albany, United States
| | - Caroline Esnault
- Bioinformatics and Scientific Programming Core, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, United States
| | - Kavya Chegireddy
- Department of Biomedical Sciences, School of Public Health, University at Albany, Albany, United States
| | - Navjot Singh
- Wadsworth Center, New York State Department of Health, Albany, United States
| | - Molly Monge
- Wadsworth Center, New York State Department of Health, Albany, United States
| | - Ryan K Dale
- Bioinformatics and Scientific Programming Core, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, United States
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, United States
| | - Joseph T Wade
- Wadsworth Center, New York State Department of Health, Albany, United States.,Department of Biomedical Sciences, School of Public Health, University at Albany, Albany, United States
| |
Collapse
|
34
|
Bogati B, Wadsworth N, Barrera F, Fozo EM. Improved growth of Escherichia coli in aminoglycoside antibiotics by the zor-orz toxin-antitoxin system. J Bacteriol 2021; 204:JB0040721. [PMID: 34570627 PMCID: PMC8765423 DOI: 10.1128/jb.00407-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/21/2021] [Indexed: 11/20/2022] Open
Abstract
Type I toxin-antitoxin systems consist of a small protein (under 60 amino acids) whose overproduction can result in cell growth stasis or death, and a small RNA that represses translation of the toxin mRNA. Despite their potential toxicity, type I toxin proteins are increasingly linked to improved survival of bacteria in stressful environments and antibiotic persistence. While the interaction of toxin mRNAs with their cognate antitoxin sRNAs in some systems are well characterized, additional translational control of many toxins and their biological roles are not well understood. Using an ectopic overexpression system, we show that the efficient translation of a chromosomally encoded type I toxin, ZorO, requires mRNA processing of its long 5' untranslated region (UTR; Δ28 UTR). The severity of ZorO induced toxicity on growth inhibition, membrane depolarization, and ATP depletion were significantly increased if expressed from the Δ28 UTR versus the full-length UTR. ZorO did not form large pores as evident via a liposomal leakage assay, in vivo morphological analyses, and measurement of ATP loss. Further, increasing the copy number of the entire zor-orz locus significantly improved growth of bacterial cells in the presence of kanamycin and increased the minimum inhibitory concentration against kanamycin and gentamycin; however, no such benefit was observed against other antibiotics. This supports a role for the zor-orz locus as a protective measure against specific stress agents and is likely not part of a general stress response mechanism. Combined, these data shed more insights into the possible native functions for type I toxin proteins. IMPORTANCE Bacterial species can harbor gene pairs known as type I toxin-antitoxin systems where one gene encodes a small protein that is toxic to the bacteria producing it and a second gene that encodes a small RNA antitoxin to prevent toxicity. While artificial overproduction of type I toxin proteins can lead to cell growth inhibition and cell lysis, the endogenous translation of type I toxins appears to be tightly regulated. Here, we show translational regulation controls production of the ZorO type I toxin and prevents subsequent negative effects on the cell. Further, we demonstrate a role for zorO and its cognate antitoxin in improved growth of E. coli in the presence of aminoglycoside antibiotics.
Collapse
Affiliation(s)
- Bikash Bogati
- Department of Microbiology, University of Tennessee, Knoxville, Tennessee, USA
| | - Nicholas Wadsworth
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee, USA
| | - Francisco Barrera
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee, USA
| | - Elizabeth M. Fozo
- Department of Microbiology, University of Tennessee, Knoxville, Tennessee, USA
| |
Collapse
|
35
|
Stringer A, Smith C, Mangano K, Wade JT. Identification of novel translated small ORFs in Escherichia coli using complementary ribosome profiling approaches. J Bacteriol 2021; 204:JB0035221. [PMID: 34662240 PMCID: PMC8765432 DOI: 10.1128/jb.00352-21] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 10/12/2021] [Indexed: 11/20/2022] Open
Abstract
Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Ribosome profiling has been used to infer the existence of small proteins by detecting the translation of the corresponding open reading frames (ORFs). Detection of translated short ORFs by ribosome profiling can be improved by treating cells with drugs that stall ribosomes at specific codons. Here, we combine the analysis of ribosome profiling data for Escherichia coli cells treated with antibiotics that stall ribosomes at either start or stop codons. Thus, we identify ribosome-occupied start and stop codons with high sensitivity for ∼400 novel putative ORFs. The newly discovered ORFs are mostly short, with 365 encoding proteins of <51 amino acids. We validate translation of several selected short ORFs, and show that many likely encode unstable proteins. Moreover, we present evidence that most of the newly identified short ORFs are not under purifying selection, suggesting they do not impact cell fitness, although a small subset have the hallmarks of functional ORFs. IMPORTANCE Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Recent studies have discovered small proteins by mapping the location of translating ribosomes on RNA using a technique known as ribosome profiling. Discovery of translated sORFs using ribosome profiling can be improved by treating cells with drugs that trap initiating ribosomes. Here, we show that combining these data with equivalent data for cells treated with a drug that stalls terminating ribosomes facilitates the discovery of small proteins. We use this approach to discover 365 putative genes that encode small proteins in Escherichia coli.
Collapse
Affiliation(s)
- Anne Stringer
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Carol Smith
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Kyle Mangano
- Center for Biomolecular Sciences, University of Illinois, Chicago, Illinois, USA
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
- Department of Biomedical Sciences, School of Public Health, University at Albany, Albany, New York, USA
| |
Collapse
|
36
|
Aoyama JJ, Raina M, Storz G. Synthetic dual-function RNA reveals features necessary for target regulation. J Bacteriol 2021; 204:JB0034521. [PMID: 34460309 PMCID: PMC8765420 DOI: 10.1128/jb.00345-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 08/23/2021] [Indexed: 11/20/2022] Open
Abstract
Small base pairing RNAs (sRNAs) and small proteins comprise two classes of regulators that allow bacterial cells to adapt to a wide variety of growth conditions. A limited number of transcripts encoding both of these activities, regulation of mRNA expression by base pairing and synthesis of a small regulatory protein, have been identified. Given that few have been characterized, little is known about the interplay between the two regulatory functions. To investigate the competition between the two activities, we constructed synthetic dual-function RNAs, hereafter referred to as MgtSR or MgtRS, comprised of the Escherichia coli sRNA MgrR and the open reading frame encoding the small protein MgtS. MgrR is a 98 nt base pairing sRNA that negatively regulates eptB encoding phosphoethanolamine transferase. MgtS is a 31 aa small inner membrane protein that is required for the accumulation of MgtA, a magnesium (Mg2+) importer. Expression of the separate genes encoding MgrR and MgtS is normally induced in response to low Mg2+ by the PhoQP two-component system. By generating various versions of this synthetic dual-function RNA, we probed how the organization of components and the distance between the coding and base pairing sequences contribute to the proper function of both activities of a dual-function RNA. By understanding the features of natural and synthetic dual-function RNAs, future synthetic molecules can be designed to maximize their regulatory impact. IMPORTANCE Dual-function RNAs in bacteria encode a small protein and also base pair with mRNAs to act as small, regulatory RNAs. Given that only a limited number of dual-function RNAs have been characterized, further study of these regulators is needed to increase understanding of their features. This study demonstrates that a functional synthetic dual-regulator can be constructed from separate components and used to study the functional organization of dual-function RNAs, with the goal of exploiting these regulators.
Collapse
Affiliation(s)
- Jordan J. Aoyama
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, USA
- Biological Sciences Graduate Program, University of Maryland, College Park, Maryland, USA
| | - Medha Raina
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, USA
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland, USA
| |
Collapse
|
37
|
The Small Toxic Salmonella Protein TimP Targets the Cytoplasmic Membrane and Is Repressed by the Small RNA TimR. mBio 2020; 11:mBio.01659-20. [PMID: 33172998 PMCID: PMC7667032 DOI: 10.1128/mbio.01659-20] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Next-generation sequencing (NGS) has enabled the revelation of a vast number of genomes from organisms spanning all domains of life. To reduce complexity when new genome sequences are annotated, open reading frames (ORFs) shorter than 50 codons in length are generally omitted. However, it has recently become evident that this procedure sorts away ORFs encoding small proteins of high biological significance. For instance, tailored small protein identification approaches have shown that bacteria encode numerous small proteins with important physiological functions. As the number of predicted small ORFs increase, it becomes important to characterize the corresponding proteins. In this study, we discovered a conserved but previously overlooked small enterobacterial protein. We show that this protein, which we dubbed TimP, is a potent toxin that inhibits bacterial growth by targeting the cell membrane. Toxicity is relieved by a small regulatory RNA, which binds the toxin mRNA to inhibit toxin synthesis. Small proteins are gaining increased attention due to their important functions in major biological processes throughout the domains of life. However, their small size and low sequence conservation make them difficult to identify. It is therefore not surprising that enterobacterial ryfA has escaped identification as a small protein coding gene for nearly 2 decades. Since its identification in 2001, ryfA has been thought to encode a noncoding RNA and has been implicated in biofilm formation in Escherichia coli and pathogenesis in Shigella dysenteriae. Although a recent ribosome profiling study suggested ryfA to be translated, the corresponding protein product was not detected. In this study, we provide evidence that ryfA encodes a small toxic inner membrane protein, TimP, overexpression of which causes cytoplasmic membrane leakage. TimP carries an N-terminal signal sequence, indicating that its membrane localization is Sec-dependent. Expression of TimP is repressed by the small RNA (sRNA) TimR, which base pairs with the timP mRNA to inhibit its translation. In contrast to overexpression, endogenous expression of TimP upon timR deletion permits cell growth, possibly indicating a toxicity-independent function in the bacterial membrane.
Collapse
|
38
|
Arginine-Rich Small Proteins with a Domain of Unknown Function, DUF1127, Play a Role in Phosphate and Carbon Metabolism of Agrobacterium tumefaciens. J Bacteriol 2020; 202:JB.00309-20. [PMID: 33093235 DOI: 10.1128/jb.00309-20] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 07/21/2020] [Indexed: 02/06/2023] Open
Abstract
In any given organism, approximately one-third of all proteins have a yet-unknown function. A widely distributed domain of unknown function is DUF1127. Approximately 17,000 proteins with such an arginine-rich domain are found in 4,000 bacteria. Most of them are single-domain proteins, and a large fraction qualifies as small proteins with fewer than 50 amino acids. We systematically identified and characterized the seven DUF1127 members of the plant pathogen Agrobacterium tumefaciens They all give rise to authentic proteins and are differentially expressed as shown at the RNA and protein levels. The seven proteins fall into two subclasses on the basis of their length, sequence, and reciprocal regulation by the LysR-type transcription factor LsrB. The absence of all three short DUF1127 proteins caused a striking phenotype in later growth phases and increased cell aggregation and biofilm formation. Protein profiling and transcriptome sequencing (RNA-seq) analysis of the wild type and triple mutant revealed a large number of differentially regulated genes in late exponential and stationary growth. The most affected genes are involved in phosphate uptake, glycine/serine homeostasis, and nitrate respiration. The results suggest a redundant function of the small DUF1127 paralogs in nutrient acquisition and central carbon metabolism of A. tumefaciens They may be required for diauxic switching between carbon sources when sugar from the medium is depleted. We end by discussing how DUF1127 might confer such a global impact on cell physiology and gene expression.IMPORTANCE Despite being prevalent in numerous ecologically and clinically relevant bacterial species, the biological role of proteins with a domain of unknown function, DUF1127, is unclear. Experimental models are needed to approach their elusive function. We used the phytopathogen Agrobacterium tumefaciens, a natural genetic engineer that causes crown gall disease, and focused on its three small DUF1127 proteins. They have redundant and pervasive roles in nutrient acquisition, cellular metabolism, and biofilm formation. The study shows that small proteins have important previously missed biological functions. How small basic proteins can have such a broad impact is a fascinating prospect of future research.
Collapse
|
39
|
Steinberg R, Origi A, Natriashvili A, Sarmah P, Licheva M, Walker PM, Kraft C, High S, Luirink J, Shi WQ, Helmstädter M, Ulbrich MH, Koch HG. Posttranslational insertion of small membrane proteins by the bacterial signal recognition particle. PLoS Biol 2020; 18:e3000874. [PMID: 32997663 PMCID: PMC7549839 DOI: 10.1371/journal.pbio.3000874] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 10/12/2020] [Accepted: 09/02/2020] [Indexed: 01/05/2023] Open
Abstract
Small membrane proteins represent a largely unexplored yet abundant class of proteins in pro- and eukaryotes. They essentially consist of a single transmembrane domain and are associated with stress response mechanisms in bacteria. How these proteins are inserted into the bacterial membrane is unknown. Our study revealed that in Escherichia coli, the 27-amino-acid-long model protein YohP is recognized by the signal recognition particle (SRP), as indicated by in vivo and in vitro site-directed cross-linking. Cross-links to SRP were also observed for a second small membrane protein, the 33-amino-acid-long YkgR. However, in contrast to the canonical cotranslational recognition by SRP, SRP was found to bind to YohP posttranslationally. In vitro protein transport assays in the presence of a SecY inhibitor and proteoliposome studies demonstrated that SRP and its receptor FtsY are essential for the posttranslational membrane insertion of YohP by either the SecYEG translocon or by the YidC insertase. Furthermore, our data showed that the yohP mRNA localized preferentially and translation-independently to the bacterial membrane in vivo. In summary, our data revealed that YohP engages an unique SRP-dependent posttranslational insertion pathway that is likely preceded by an mRNA targeting step. This further highlights the enormous plasticity of bacterial protein transport machineries. Small membrane proteins represent a largely unexplored yet abundant class of proteins, but how they are inserted into the bacterial membrane is unknown. This study identifies a novel posttranslational protein transport pathway that relies on the signal recognition particle and the SecYEG translocon/YidC insertase.
Collapse
Affiliation(s)
- Ruth Steinberg
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Andrea Origi
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- Faculty of Biology, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Ana Natriashvili
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- Faculty of Biology, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Pinku Sarmah
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- Faculty of Biology, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Mariya Licheva
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- Faculty of Biology, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Princess M. Walker
- Department of Chemistry, Ball State University, Muncie, Indiana, United States of America
| | - Claudine Kraft
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Stephen High
- School of Biological Sciences, University of Manchester, Manchester, United Kingdom
| | - Joen Luirink
- Molecular Microbiology, AIMMS, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Wei. Q. Shi
- Department of Chemistry, Ball State University, Muncie, Indiana, United States of America
| | - Martin Helmstädter
- Internal Medicine IV, Department of Medicine, Medical Center − University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Maximilian H. Ulbrich
- Internal Medicine IV, Department of Medicine, Medical Center − University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- BIOSS Centre for Biological Signalling Studies, University of Freiburg, Freiburg, Germany
| | - Hans-Georg Koch
- Institute of Biochemistry and Molecular Biology, ZBMZ, Faculty of Medicine, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- * E-mail:
| |
Collapse
|
40
|
Canestrari JG, Lasek-Nesselquist E, Upadhyay A, Rofaeil M, Champion MM, Wade JT, Derbyshire KM, Gray TA. Polycysteine-encoding leaderless short ORFs function as cysteine-responsive attenuators of operonic gene expression in mycobacteria. Mol Microbiol 2020; 114:93-108. [PMID: 32181921 PMCID: PMC8764745 DOI: 10.1111/mmi.14498] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 03/12/2020] [Indexed: 12/11/2022]
Abstract
Genome-wide transcriptomic analyses have revealed abundant expressed short open reading frames (ORFs) in bacteria. Whether these short ORFs, or the small proteins they encode, are functional remains an open question. One quarter of mycobacterial mRNAs are leaderless, beginning with a 5'-AUG or GUG initiation codon. Leaderless mRNAs often encode unannotated short ORFs as the first gene of a polycistronic transcript. Here, we show that polycysteine-encoding leaderless short ORFs function as cysteine-responsive attenuators of operonic gene expression. Detailed mutational analysis shows that one polycysteine short ORF controls expression of the downstream genes. Our data indicate that ribosomes stalled in the polycysteine tract block mRNA structures that otherwise sequester the ribosome-binding site of the 3'gene. We assessed endogenous proteomic responses to cysteine limitation in Mycobacterium smegmatis using mass spectrometry. Six cysteine metabolic loci having unannotated polycysteine-encoding leaderless short ORF architectures responded to cysteine limitation, revealing widespread cysteine-responsive attenuation in mycobacteria. Individual leaderless short ORFs confer independent operon-level control, while their shared dependence on cysteine ensures a collective response mediated by ribosome pausing. We propose the term ribulon to classify ribosome-directed regulons. Regulon-level coordination by ribosomes on sensory short ORFs illustrates one utility of the many unannotated short ORFs expressed in bacterial genomes.
Collapse
Affiliation(s)
- Jill G Canestrari
- Division of Genetics, Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Erica Lasek-Nesselquist
- Division of Genetics, Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Ashutosh Upadhyay
- Division of Genetics, Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Martina Rofaeil
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN, USA
| | - Matthew M Champion
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN, USA
| | - Joseph T Wade
- Division of Genetics, Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Keith M Derbyshire
- Division of Genetics, Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Todd A Gray
- Division of Genetics, Wadsworth Center, New York State Department of Health, Albany, NY, USA
| |
Collapse
|
41
|
Gelsinger DR, Dallon E, Reddy R, Mohammad F, Buskirk A, DiRuggiero J. Ribosome profiling in archaea reveals leaderless translation, novel translational initiation sites, and ribosome pausing at single codon resolution. Nucleic Acids Res 2020; 48:5201-5216. [PMID: 32382758 PMCID: PMC7261190 DOI: 10.1093/nar/gkaa304] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 04/09/2020] [Accepted: 04/22/2020] [Indexed: 12/22/2022] Open
Abstract
High-throughput methods, such as ribosome profiling, have revealed the complexity of translation regulation in Bacteria and Eukarya with large-scale effects on cellular functions. In contrast, the translational landscape in Archaea remains mostly unexplored. Here, we developed ribosome profiling in a model archaeon, Haloferax volcanii, elucidating, for the first time, the translational landscape of a representative of the third domain of life. We determined the ribosome footprint of H. volcanii to be comparable in size to that of the Eukarya. We linked footprint lengths to initiating and elongating states of the ribosome on leadered transcripts, operons, and on leaderless transcripts, the latter representing 70% of H. volcanii transcriptome. We manipulated ribosome activity with translation inhibitors to reveal ribosome pausing at specific codons. Lastly, we found that the drug harringtonine arrested ribosomes at initiation sites in this archaeon. This drug treatment allowed us to confirm known translation initiation sites and also reveal putative novel initiation sites in intergenic regions and within genes. Ribosome profiling revealed an uncharacterized complexity of translation in this archaeon with bacteria-like, eukarya-like, and potentially novel translation mechanisms. These mechanisms are likely to be functionally essential and to contribute to an expanded proteome with regulatory roles in gene expression.
Collapse
Affiliation(s)
| | - Emma Dallon
- Department of Biology, the Johns Hopkins University, Baltimore, MD, USA
| | - Rahul Reddy
- Department of Biology, the Johns Hopkins University, Baltimore, MD, USA
| | - Fuad Mohammad
- Department of Molecular Biology and Genetics, the Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Allen R Buskirk
- Department of Molecular Biology and Genetics, the Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jocelyne DiRuggiero
- Department of Biology, the Johns Hopkins University, Baltimore, MD, USA
- Department of Earth and Planetary Sciences, the Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
42
|
Du D, Neuberger A, Orr MW, Newman CE, Hsu PC, Samsudin F, Szewczak-Harris A, Ramos LM, Debela M, Khalid S, Storz G, Luisi BF. Interactions of a Bacterial RND Transporter with a Transmembrane Small Protein in a Lipid Environment. Structure 2020; 28:625-634.e6. [PMID: 32348749 PMCID: PMC7267776 DOI: 10.1016/j.str.2020.03.013] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Revised: 02/14/2020] [Accepted: 03/27/2020] [Indexed: 12/01/2022]
Abstract
The small protein AcrZ in Escherichia coli interacts with the transmembrane portion of the multidrug efflux pump AcrB and increases resistance of the bacterium to a subset of the antibiotic substrates of that transporter. It is not clear how the physical association of the two proteins selectively changes activity of the pump for defined substrates. Here, we report cryo-EM structures of AcrB and the AcrBZ complex in lipid environments, and comparisons suggest that conformational changes occur in the drug-binding pocket as a result of AcrZ binding. Simulations indicate that cardiolipin preferentially interacts with the AcrBZ complex, due to increased contact surface, and we observe that chloramphenicol sensitivity of bacteria lacking AcrZ is exacerbated when combined with cardiolipin deficiency. Taken together, the data suggest that AcrZ and lipid cooperate to allosterically modulate AcrB activity. This mode of regulation by a small protein and lipid may occur for other membrane proteins.
Collapse
Affiliation(s)
- Dijun Du
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK.
| | - Arthur Neuberger
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Mona Wu Orr
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892-4417, USA
| | - Catherine E Newman
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Pin-Chia Hsu
- School of Chemistry, University of Southampton, Southampton SO17 1BJ, UK
| | - Firdaus Samsudin
- School of Chemistry, University of Southampton, Southampton SO17 1BJ, UK
| | - Andrzej Szewczak-Harris
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Leana M Ramos
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892-4417, USA
| | - Mekdes Debela
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Syma Khalid
- School of Chemistry, University of Southampton, Southampton SO17 1BJ, UK.
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD 20892-4417, USA.
| | - Ben F Luisi
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK.
| |
Collapse
|
43
|
Yadavalli SS, Goh T, Carey JN, Malengo G, Vellappan S, Nickels BE, Sourjik V, Goulian M, Yuan J. Functional determinants of a small protein controlling a broadly conserved bacterial sensor kinase. J Bacteriol 2020; 202:JB.00305-20. [PMID: 32482726 PMCID: PMC8404706 DOI: 10.1128/jb.00305-20] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 05/22/2020] [Indexed: 12/14/2022] Open
Abstract
The PhoQ/PhoP two-component system plays a vital role in the regulation of Mg2+ homeostasis, resistance to acid and hyperosmotic stress, cationic antimicrobial peptides, and virulence in Escherichia coli, Salmonella and related bacteria. Previous studies have shown that MgrB, a 47 amino acid membrane protein that is part of the PhoQ/PhoP regulon, inhibits the histidine kinase PhoQ. MgrB is part of a negative feedback loop modulating this two-component system that prevents hyperactivation of PhoQ and may also provide an entry point for additional input signals for the PhoQ/PhoP pathway. To explore the mechanism of action of MgrB, we have analyzed the effects of point mutations, C-terminal truncations and transmembrane region swaps on MgrB activity. In contrast with two other known membrane protein regulators of histidine kinases in E. coli, we find that the MgrB TM region is necessary for PhoQ inhibition. Our results indicate that the TM region mediates interactions with PhoQ and that W20 is a key residue for PhoQ/MgrB complex formation. Additionally, mutations of the MgrB cytosolic region suggest that the two N-terminal lysines play an important role in regulating PhoQ activity. Alanine scanning mutagenesis of the periplasmic region of MgrB further indicates that, with the exception of a few highly conserved residues, most residues are not essential for MgrB's function as a PhoQ inhibitor. Our results indicate that the regulatory function of the small protein MgrB depends on distinct contributions from multiple residues spread across the protein. Interestingly, the TM region also appears to interact with other non-cognate histidine kinases in a bacterial two-hybrid assay, suggesting a potential route for evolving new small protein modulators of histidine kinases.
Collapse
Affiliation(s)
- Srujana S Yadavalli
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Department of Genetics and Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ 08854, USA
| | - Ted Goh
- Department of Biology, Swarthmore College, Swarthmore, Pennsylvania 19081, USA
- Boston University School of Medicine, Boston, Massachusetts 02118, USA
| | - Jeffrey N Carey
- Biochemistry and Molecular Biophysics Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Gabriele Malengo
- Max Planck Institute for Terrestrial Microbiology, 35043 Marburg, Germany
- LOEWE Center for Synthetic Microbiology (SYNMIKRO), 35043 Marburg, Germany
| | - Sangeevan Vellappan
- Molecular Biosciences Graduate Program, Rutgers University, Piscataway NJ 08854
| | - Bryce E Nickels
- Department of Genetics and Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ 08854, USA
| | - Victor Sourjik
- Max Planck Institute for Terrestrial Microbiology, 35043 Marburg, Germany
- LOEWE Center for Synthetic Microbiology (SYNMIKRO), 35043 Marburg, Germany
| | - Mark Goulian
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Department of Physics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Jing Yuan
- Max Planck Institute for Terrestrial Microbiology, 35043 Marburg, Germany
- LOEWE Center for Synthetic Microbiology (SYNMIKRO), 35043 Marburg, Germany
| |
Collapse
|
44
|
Kubatova N, Pyper DJ, Jonker HRA, Saxena K, Remmel L, Richter C, Brantl S, Evguenieva‐Hackenberg E, Hess WR, Klug G, Marchfelder A, Soppa J, Streit W, Mayzel M, Orekhov VY, Fuxreiter M, Schmitz RA, Schwalbe H. Rapid Biophysical Characterization and NMR Spectroscopy Structural Analysis of Small Proteins from Bacteria and Archaea. Chembiochem 2020; 21:1178-1187. [PMID: 31705614 PMCID: PMC7217052 DOI: 10.1002/cbic.201900677] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Indexed: 01/08/2023]
Abstract
Proteins encoded by small open reading frames (sORFs) have a widespread occurrence in diverse microorganisms and can be of high functional importance. However, due to annotation biases and their technically challenging direct detection, these small proteins have been overlooked for a long time and were only recently rediscovered. The currently rapidly growing number of such proteins requires efficient methods to investigate their structure-function relationship. Herein, a method is presented for fast determination of the conformational properties of small proteins. Their small size makes them perfectly amenable for solution-state NMR spectroscopy. NMR spectroscopy can provide detailed information about their conformational states (folded, partially folded, and unstructured). In the context of the priority program on small proteins funded by the German research foundation (SPP2002), 27 small proteins from 9 different bacterial and archaeal organisms have been investigated. It is found that most of these small proteins are unstructured or partially folded. Bioinformatics tools predict that some of these unstructured proteins can potentially fold upon complex formation. A protocol for fast NMR spectroscopy structure elucidation is described for the small proteins that adopt a persistently folded structure by implementation of new NMR technologies, including automated resonance assignment and nonuniform sampling in combination with targeted acquisition.
Collapse
Affiliation(s)
- Nina Kubatova
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Dennis J. Pyper
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Hendrik R. A. Jonker
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Krishna Saxena
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Laura Remmel
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Christian Richter
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| | - Sabine Brantl
- AG BakteriengenetikMatthias-Schleiden-InstitutPhilosophenweg 1207743JenaGermany
| | - Elena Evguenieva‐Hackenberg
- Institute for Microbiology and Molecular BiologyJustus Liebig University GiessenHeinrich-Buff-Ring 2635392GiessenGermany
| | - Wolfgang R. Hess
- Faculty of Biology, Genetics and Experimental BioinformaticsAlbert Ludwigs University FreiburgSchänzlestrasse 179104FreiburgGermany
| | - Gabriele Klug
- Institute for Microbiology and Molecular BiologyJustus Liebig University GiessenHeinrich-Buff-Ring 2635392GiessenGermany
| | | | - Jörg Soppa
- Institute for Molecular BiosciencesJohann Wolfgang Goethe UniversityMax-von-Laue-Strasse 960438Frankfurt am MainGermany
| | - Wolfgang Streit
- Department of Microbiology and BiotechnologyUniversity of HamburgOhnhorststrasse 1822609HamburgGermany
| | - Maxim Mayzel
- Swedish NMR CentreUniversity of GothenburgP. O. Box 46540530GothenburgSweden
| | - Vladislav Y. Orekhov
- Swedish NMR CentreUniversity of GothenburgP. O. Box 46540530GothenburgSweden
- Department of Chemistry and Molecular BiologyUniversity of GothenburgKemigården 441296GothenburgSweden
| | - Monika Fuxreiter
- MTA-DE Laboratory of Protein DynamicsDepartment of Biochemistry and Molecular BiologyUniversity of DebrecenNagyerdei krt 984032DebrecenHungary
| | - Ruth A. Schmitz
- Institute for General MicrobiologyChristian Albrechts University KielAm Botanischen Garten 1–924118KielGermany
| | - Harald Schwalbe
- Institute for Organic Chemistry and Chemical BiologyCenter for Biomolecular Magnetic Resonance (BMRZ)Johann Wolfgang Goethe UniversityMax-von-Laue-Strasse 760438Frankfurt/MainGermany
| |
Collapse
|
45
|
Cao X, Slavoff SA. Non-AUG start codons: Expanding and regulating the small and alternative ORFeome. Exp Cell Res 2020; 391:111973. [PMID: 32209305 DOI: 10.1016/j.yexcr.2020.111973] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 03/10/2020] [Accepted: 03/18/2020] [Indexed: 01/17/2023]
Abstract
Recent ribosome profiling and proteomic studies have revealed the presence of thousands of novel coding sequences, referred to as small open reading frames (sORFs), in prokaryotic and eukaryotic genomes. These genes have defied discovery via traditional genomic tools not only because they tend to be shorter than standard gene annotation length cutoffs, but also because they are, as a class, enriched in sequence properties previously assumed to be unusual, including non-AUG start codons. In this review, we summarize what is currently known about the incidence, efficiency, and mechanism of non-AUG start codon usage in prokaryotes and eukaryotes, and provide examples of regulatory and functional sORFs that initiate at non-AUG codons. While only a handful of non-AUG-initiated novel genes have been characterized in detail to date, their participation in important biological processes suggests that an improved understanding of this class of genes is needed.
Collapse
Affiliation(s)
- Xiongwen Cao
- Department of Chemistry, Yale University, New Haven, CT, 06520, United States; Chemical Biology Institute, Yale University, West Haven, CT, 06516, United States
| | - Sarah A Slavoff
- Department of Chemistry, Yale University, New Haven, CT, 06520, United States; Chemical Biology Institute, Yale University, West Haven, CT, 06516, United States; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06529, United States.
| |
Collapse
|
46
|
Vakirlis N, Acar O, Hsu B, Castilho Coelho N, Van Oss SB, Wacholder A, Medetgul-Ernar K, Bowman RW, Hines CP, Iannotta J, Parikh SB, McLysaght A, Camacho CJ, O'Donnell AF, Ideker T, Carvunis AR. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nat Commun 2020; 11:781. [PMID: 32034123 PMCID: PMC7005711 DOI: 10.1038/s41467-020-14500-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 12/20/2019] [Indexed: 11/14/2022] Open
Abstract
Recent evidence demonstrates that novel protein-coding genes can arise de novo from non-genic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of non-genic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Here, we systematically characterize how these de novo emerging coding sequences impact fitness in budding yeast. Disruption of emerging sequences is generally inconsequential for fitness in the laboratory and in natural populations. Overexpression of emerging sequences, however, is enriched in adaptive fitness effects compared to overexpression of established genes. We find that adaptive emerging sequences tend to encode putative transmembrane domains, and that thymine-rich intergenic regions harbor a widespread potential to produce transmembrane domains. These findings, together with in-depth examination of the de novo emerging YBR196C-A locus, suggest a novel evolutionary model whereby adaptive transmembrane polypeptides emerge de novo from thymine-rich non-genic regions and subsequently accumulate changes molded by natural selection. There is increasing evidence that protein-coding genes can emerge de novo from noncoding genomic regions. Vakirlis et al. propose that sequences encoding transmembrane polypeptides can emerge de novo in thymine-rich genomic regions and provide organisms with fitness benefits.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, 2, Ireland
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Brian Hsu
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - S Branden Van Oss
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Kate Medetgul-Ernar
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - Ray W Bowman
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, United States
| | - Cameron P Hines
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States
| | - John Iannotta
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Aoife McLysaght
- Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, 2, Ireland
| | - Carlos J Camacho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States
| | - Allyson F O'Donnell
- Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States. .,Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, United States.
| | - Trey Ideker
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA, 92093, United States.
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States. .,Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, United States.
| |
Collapse
|
47
|
R Cerqueira F, Vasconcelos ATR. OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5989499. [PMID: 33206960 PMCID: PMC7673341 DOI: 10.1093/database/baaa067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 07/11/2020] [Accepted: 07/27/2020] [Indexed: 11/14/2022]
Abstract
Small open reading frames (ORFs) have been systematically disregarded by automatic genome annotation. The difficulty in finding patterns in tiny sequences is the main reason that makes small ORFs to be overlooked by computational procedures. However, advances in experimental methods show that small proteins can play vital roles in cellular activities. Hence, it is urgent to make progress in the development of computational approaches to speed up the identification of potential small ORFs. In this work, our focus is on bacterial genomes. We improve a previous approach to identify small ORFs in bacteria. Our method uses machine learning techniques and decoy subject sequences to filter out spurious ORF alignments. We show that an advanced multivariate analysis can be more effective in terms of sensitivity than applying the simplistic and widely used e-value cutoff. This is particularly important in the case of small ORFs for which alignments present higher e-values than usual. Experiments with control datasets show that the machine learning algorithms used in our method to curate significant alignments can achieve average sensitivity and specificity of 97.06% and 99.61%, respectively. Therefore, an important step is provided here toward the construction of more accurate computational tools for the identification of small ORFs in bacteria.
Collapse
Affiliation(s)
- Fabio R Cerqueira
- Department of Production Engineering, Universidade Federal Fluminense, Rua Domingos Silvério s/n, Petrópolis, 25 650-050, Rio de Janeiro, Brazil.,Graduate Program in Computer Science, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | | |
Collapse
|
48
|
Homologous bd oxidases share the same architecture but differ in mechanism. Nat Commun 2019; 10:5138. [PMID: 31723136 PMCID: PMC6853902 DOI: 10.1038/s41467-019-13122-4] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/22/2019] [Indexed: 11/25/2022] Open
Abstract
Cytochrome bd oxidases are terminal reductases of bacterial and archaeal respiratory chains. The enzyme couples the oxidation of ubiquinol or menaquinol with the reduction of dioxygen to water, thus contributing to the generation of the protonmotive force. Here, we determine the structure of the Escherichia coli bd oxidase treated with the specific inhibitor aurachin by cryo-electron microscopy (cryo-EM). The major subunits CydA and CydB are related by a pseudo two fold symmetry. The heme b and d cofactors are found in CydA, while ubiquinone-8 is bound at the homologous positions in CydB to stabilize its structure. The architecture of the E. coli enzyme is highly similar to that of Geobacillus thermodenitrificans, however, the positions of heme b595 and d are interchanged, and a common oxygen channel is blocked by a fourth subunit and substituted by a more narrow, alternative channel. Thus, with the same overall fold, the homologous enzymes exhibit a different mechanism. Cytochrome bd oxidases couple quinol oxidation and the release of protons to the periplasmic side with proton uptake from the cytoplasmic side to reduce dioxygen to water and they are the terminal reductases in bacterial and archaeal respiratory chains. Here the authors present the cryo-EM structure of Escherichia coli bd oxidase and discuss mechanistic implications.
Collapse
|
49
|
Synthetic hydrophobic peptides derived from MgtR weaken Salmonella pathogenicity and work with a different mode of action than endogenously produced peptides. Sci Rep 2019; 9:15253. [PMID: 31649255 PMCID: PMC6813294 DOI: 10.1038/s41598-019-51760-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2019] [Accepted: 09/24/2019] [Indexed: 12/03/2022] Open
Abstract
Due to the antibiotic resistance crisis, novel therapeutic strategies need to be developed against bacterial pathogens. Hydrophobic bacterial peptides (small proteins under 50 amino acids) have emerged as regulatory molecules that can interact with bacterial membrane proteins to modulate their activity and/or stability. Among them, the Salmonella MgtR peptide promotes the degradation of MgtC, a virulence factor involved in Salmonella intramacrophage replication, thus providing the basis for an antivirulence strategy. We demonstrate here that endogenous overproduction of MgtR reduced Salmonella replication inside macrophages and lowered MgtC protein level, whereas a peptide variant of MgtR (MgtR-S17I), which does not interact with MgtC, had no effect. We then used synthetic peptides to evaluate their action upon exogenous addition. Unexpectedly, upon addition of synthetic peptides, both MgtR and its variant MgtR-S17I reduced Salmonella intramacrophage replication and lowered MgtC and MgtB protein levels, suggesting a different mechanism of action of exogenously added peptides versus endogenously produced peptides. The synthetic peptides did not act by reducing bacterial viability. We next tested their effect on various recombinant proteins produced in Escherichia coli and showed that the level of several inner membrane proteins was strongly reduced upon addition of both peptides, whereas cytoplasmic or outer membrane proteins remained unaffected. Moreover, the α-helical structure of synthetic MgtR is important for its biological activity, whereas helix-helix interacting motif is dispensable. Cumulatively, these results provide perspectives for new antivirulence strategies with the use of peptides that act by reducing the level of inner membrane proteins, including virulence factors.
Collapse
|
50
|
Zheng GZ, Li W, Liu ZY. Alternative role of noncoding RNAs: coding and noncoding properties. J Zhejiang Univ Sci B 2019; 20:920-927. [PMID: 31595728 DOI: 10.1631/jzus.b1900336] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Noncoding RNAs (ncRNAs) have played a critical role in cellular biological functions. Recently, some peptides or proteins originating from annotated ncRNAs were identified in organism development and various diseases. Here, we briefly review several novel peptides translated by annotated ncRNAs and related key functions. In addition, we summarize the potential mechanism of bifunctional ncRNAs and propose a specific "switch" triggering the transformation from the noncoding to the coding state under certain stimuli or cellular stress. The coding properties of ncRNAs and their peptide products may provide a novel horizon in proteomic research and can be regarded as a potential therapeutic target for the treatment of various diseases.
Collapse
Affiliation(s)
- Gui-Zhen Zheng
- Department of Emergency Internal Medicine, Shanghai East Hospital, Tongji University, Shanghai 200120, China
| | - Wei Li
- Department of General Surgery, Changzheng Hospital, Second Military Medical University, Shanghai 200003, China
| | - Zhi-Yong Liu
- Department of Laboratory Diagnostics, Changhai Hospital, Second Military Medical University, Shanghai 200433, China.,Kunming General Hospital of Chengdu Military Command, Kunming 650032, China
| |
Collapse
|