1
|
Duan Y, Santos-Júnior CD, Schmidt TS, Fullam A, de Almeida BLS, Zhu C, Kuhn M, Zhao XM, Bork P, Coelho LP. A catalog of small proteins from the global microbiome. Nat Commun 2024; 15:7563. [PMID: 39214983 PMCID: PMC11364881 DOI: 10.1038/s41467-024-51894-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
Small open reading frames (smORFs) shorter than 100 codons are widespread and perform essential roles in microorganisms, where they encode proteins active in several cell functions, including signal pathways, stress response, and antibacterial activities. However, the ecology, distribution and role of small proteins in the global microbiome remain unknown. Here, we construct a global microbial smORFs catalog (GMSC) derived from 63,410 publicly available metagenomes across 75 distinct habitats and 87,920 high-quality isolate genomes. GMSC contains 965 million non-redundant smORFs with comprehensive annotations. We find that archaea harbor more smORFs proportionally than bacteria. We moreover provide a tool called GMSC-mapper to identify and annotate small proteins from microbial (meta)genomes. Overall, this publicly-available resource demonstrates the immense and underexplored diversity of small proteins.
Collapse
Affiliation(s)
- Yiqian Duan
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Célio Dias Santos-Júnior
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Laboratory of Microbial Processes & Biodiversity - LMPB; Department of Hydrobiology, Universidade Federal de São Carlos - UFSCar, São Carlos, São Paulo, Brazil
| | - Thomas Sebastian Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- APC Microbiome and School of Medicine, University College Cork, Cork, Ireland
| | - Anthony Fullam
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Breno L S de Almeida
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Chengkai Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Michael Kuhn
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
- Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China.
- Lingang Laboratory, Shanghai, 200031, China.
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China.
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, QLD, Australia.
- Centre for Data Science, Queensland University of Technology, Brisbane, QLD, Australia.
| |
Collapse
|
2
|
Sun Z, Ning Z, Figeys D. The Landscape and Perspectives of the Human Gut Metaproteomics. Mol Cell Proteomics 2024; 23:100763. [PMID: 38608842 PMCID: PMC11098955 DOI: 10.1016/j.mcpro.2024.100763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/26/2024] [Accepted: 04/09/2024] [Indexed: 04/14/2024] Open
Abstract
The human gut microbiome is closely associated with human health and diseases. Metaproteomics has emerged as a valuable tool for studying the functionality of the gut microbiome by analyzing the entire proteins present in microbial communities. Recent advancements in liquid chromatography and tandem mass spectrometry (LC-MS/MS) techniques have expanded the detection range of metaproteomics. However, the overall coverage of the proteome in metaproteomics is still limited. While metagenomics studies have revealed substantial microbial diversity and functional potential of the human gut microbiome, few studies have summarized and studied the human gut microbiome landscape revealed with metaproteomics. In this article, we present the current landscape of human gut metaproteomics studies by re-analyzing the identification results from 15 published studies. We quantified the limited proteome coverage in metaproteomics and revealed a high proportion of annotation coverage of metaproteomics-identified proteins. We conducted a preliminary comparison between the metaproteomics view and the metagenomics view of the human gut microbiome, identifying key areas of consistency and divergence. Based on the current landscape of human gut metaproteomics, we discuss the feasibility of using metaproteomics to study functionally unknown proteins and propose a whole workflow peptide-centric analysis. Additionally, we suggest enhancing metaproteomics analysis by refining taxonomic classification and calculating confidence scores, as well as developing tools for analyzing the interaction between taxonomy and function.
Collapse
Affiliation(s)
- Zhongzhi Sun
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | - Zhibin Ning
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | - Daniel Figeys
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada.
| |
Collapse
|
3
|
Schumacher K, Gelhausen R, Kion-Crosby W, Barquist L, Backofen R, Jung K. Ribosome profiling reveals the fine-tuned response of Escherichia coli to mild and severe acid stress. mSystems 2023; 8:e0103723. [PMID: 37909716 PMCID: PMC10746267 DOI: 10.1128/msystems.01037-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 09/28/2023] [Indexed: 11/03/2023] Open
Abstract
IMPORTANCE Bacteria react very differently to survive in acidic environments, such as the human gastrointestinal tract. Escherichia coli is one of the extremely acid-resistant bacteria and has a variety of acid-defense mechanisms. Here, we provide the first genome-wide overview of the adaptations of E. coli K-12 to mild and severe acid stress at both the transcriptional and translational levels. Using ribosome profiling and RNA sequencing, we uncover novel adaptations to different degrees of acidity, including previously hidden stress-induced small proteins and novel key transcription factors for acid defense, and report mRNAs with pH-dependent differential translation efficiency. In addition, we distinguish between acid-specific adaptations and general stress response mechanisms using denoising autoencoders. This workflow represents a powerful approach that takes advantage of next-generation sequencing techniques and machine learning to systematically analyze bacterial stress responses.
Collapse
Affiliation(s)
- Kilian Schumacher
- Faculty of Biology, Microbiology, Ludwig-Maximilians-Universität München, Martinsried, Germany
| | - Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-Universität Freiburg, Freiburg, Germany
| | - Willow Kion-Crosby
- Helmholtz Institute for RNA-based Infection Research (HIRI)/Helmholtz Centre for Infection Research (HZI), Würzburg, Germany
- University of Würzburg, Faculty of Medicine, Würzburg, Germany
| | - Lars Barquist
- Helmholtz Institute for RNA-based Infection Research (HIRI)/Helmholtz Centre for Infection Research (HZI), Würzburg, Germany
- University of Würzburg, Faculty of Medicine, Würzburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-Universität Freiburg, Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Freiburg, Germany
| | - Kirsten Jung
- Faculty of Biology, Microbiology, Ludwig-Maximilians-Universität München, Martinsried, Germany
| |
Collapse
|
4
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
5
|
Anders J, Stadler PF. RNAcode_Web - Convenient identification of evolutionary conserved protein coding regions. J Integr Bioinform 2023; 20:jib-2022-0046. [PMID: 37615674 PMCID: PMC10757073 DOI: 10.1515/jib-2022-0046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 02/15/2023] [Indexed: 08/25/2023] Open
Abstract
The differentiation of regions with coding potential from non-coding regions remains a key task in computational biology. Methods such as RNAcode that exploit patterns of sequence conservation for this task have a substantial advantage in classification accuracy in particular for short coding sequences, compared to methods that rely on a single input sequence. However, they require sequence alignments as input. Frequently, suitable multiple sequence alignments are not readily available and are tedious, and sometimes difficult to construct. We therefore introduce here a new web service that provides access to the well-known coding sequence detector RNAcode with minimal user overhead. It requires as input only a single target nucleotide sequence. The service automates the collection, selection, and preparation of homologous sequences from the NCBI database, as well as the construction of the multiple sequence alignment that are needed as input for RNAcode. The service automatizes the entire pre- and postprocessing and thus makes the investigation of specific genomic regions for previously unannotated coding regions, such as small peptides or additional introns, a simple task that is easily accessible to non-expert users. RNAcode_Web is accessible online at rnacode.bioinf.uni-leipzig.de.
Collapse
Affiliation(s)
- John Anders
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, D-04107Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, D-04107Leipzig, Germany
- Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, D-04103Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501, USA
| |
Collapse
|
6
|
Hadjeras L, Heiniger B, Maaß S, Scheuer R, Gelhausen R, Azarderakhsh S, Barth-Weber S, Backofen R, Becher D, Ahrens CH, Sharma CM, Evguenieva-Hackenberg E. Unraveling the small proteome of the plant symbiont Sinorhizobium meliloti by ribosome profiling and proteogenomics. MICROLIFE 2023; 4:uqad012. [PMID: 37223733 PMCID: PMC10117765 DOI: 10.1093/femsml/uqad012] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/08/2023] [Accepted: 03/07/2023] [Indexed: 05/25/2023]
Abstract
The soil-dwelling plant symbiont Sinorhizobium meliloti is a major model organism of Alphaproteobacteria. Despite numerous detailed OMICS studies, information about small open reading frame (sORF)-encoded proteins (SEPs) is largely missing, because sORFs are poorly annotated and SEPs are hard to detect experimentally. However, given that SEPs can fulfill important functions, identification of translated sORFs is critical for analyzing their roles in bacterial physiology. Ribosome profiling (Ribo-seq) can detect translated sORFs with high sensitivity, but is not yet routinely applied to bacteria because it must be adapted for each species. Here, we established a Ribo-seq procedure for S. meliloti 2011 based on RNase I digestion and detected translation for 60% of the annotated coding sequences during growth in minimal medium. Using ORF prediction tools based on Ribo-seq data, subsequent filtering, and manual curation, the translation of 37 non-annotated sORFs with ≤ 70 amino acids was predicted with confidence. The Ribo-seq data were supplemented by mass spectrometry (MS) analyses from three sample preparation approaches and two integrated proteogenomic search database (iPtgxDB) types. Searches against standard and 20-fold smaller Ribo-seq data-informed custom iPtgxDBs confirmed 47 annotated SEPs and identified 11 additional novel SEPs. Epitope tagging and Western blot analysis confirmed the translation of 15 out of 20 SEPs selected from the translatome map. Overall, by combining MS and Ribo-seq approaches, the small proteome of S. meliloti was substantially expanded by 48 novel SEPs. Several of them are part of predicted operons and/or are conserved from Rhizobiaceae to Bacteria, suggesting important physiological functions.
Collapse
Affiliation(s)
- Lydia Hadjeras
- Institute of Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
| | - Benjamin Heiniger
- Molecular Ecology,
Agroscope and SIB Swiss Institute of Bioinformatics, 8046 Zurich, Switzerland
| | - Sandra Maaß
- Institute of Microbiology, University of Greifswald, 17489 Greifswald, Germany
| | - Robina Scheuer
- Institute of Microbiology and Molecular Biology, University of Giessen, 35392 Giessen, Germany
| | - Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany
| | - Saina Azarderakhsh
- Institute of Microbiology and Molecular Biology, University of Giessen, 35392 Giessen, Germany
| | - Susanne Barth-Weber
- Institute of Microbiology and Molecular Biology, University of Giessen, 35392 Giessen, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany
| | - Dörte Becher
- Institute of Microbiology, University of Greifswald, 17489 Greifswald, Germany
| | - Christian H Ahrens
- Molecular Ecology, Agroscope and SIB Swiss Institute of Bioinformatics, 8046 Zurich, Switzerland
| | - Cynthia M Sharma
- Institute of Molecular Infection Biology, University of Würzburg, 97080 Würzburg, Germany
| | | |
Collapse
|
7
|
Hadjeras L, Bartel J, Maier LK, Maaß S, Vogel V, Svensson SL, Eggenhofer F, Gelhausen R, Müller T, Alkhnbashi OS, Backofen R, Becher D, Sharma CM, Marchfelder A. Revealing the small proteome of Haloferax volcanii by combining ribosome profiling and small-protein optimized mass spectrometry. MICROLIFE 2023; 4:uqad001. [PMID: 37223747 PMCID: PMC10117724 DOI: 10.1093/femsml/uqad001] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 11/29/2022] [Accepted: 01/13/2023] [Indexed: 05/25/2023]
Abstract
In contrast to extensively studied prokaryotic 'small' transcriptomes (encompassing all small noncoding RNAs), small proteomes (here defined as including proteins ≤70 aa) are only now entering the limelight. The absence of a complete small protein catalogue in most prokaryotes precludes our understanding of how these molecules affect physiology. So far, archaeal genomes have not yet been analyzed broadly with a dedicated focus on small proteins. Here, we present a combinatorial approach, integrating experimental data from small protein-optimized mass spectrometry (MS) and ribosome profiling (Ribo-seq), to generate a high confidence inventory of small proteins in the model archaeon Haloferax volcanii. We demonstrate by MS and Ribo-seq that 67% of the 317 annotated small open reading frames (sORFs) are translated under standard growth conditions. Furthermore, annotation-independent analysis of Ribo-seq data showed ribosomal engagement for 47 novel sORFs in intergenic regions. A total of seven of these were also detected by proteomics, in addition to an eighth novel small protein solely identified by MS. We also provide independent experimental evidence in vivo for the translation of 12 sORFs (annotated and novel) using epitope tagging and western blotting, underlining the validity of our identification scheme. Several novel sORFs are conserved in Haloferax species and might have important functions. Based on our findings, we conclude that the small proteome of H. volcanii is larger than previously appreciated, and that combining MS with Ribo-seq is a powerful approach for the discovery of novel small protein coding genes in archaea.
Collapse
Affiliation(s)
- Lydia Hadjeras
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2 / D15, 97080 Würzburg, Germany
| | - Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | | | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Verena Vogel
- Biology II, Ulm University, Albert-Einstein-Allee 11, 89081 Ulm, Germany
| | - Sarah L Svensson
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2 / D15, 97080 Würzburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Rick Gelhausen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Teresa Müller
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Omer S Alkhnbashi
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Cynthia M Sharma
- Department of Molecular Infection Biology II, Institute of Molecular Infection Biology (IMIB), University of Würzburg, Josef-Schneider-Straße 2 / D15, 97080 Würzburg, Germany
| | - Anita Marchfelder
- Biology II, Ulm University, Albert-Einstein-Allee 11, 89081 Ulm, Germany
| |
Collapse
|
8
|
Malekos E, Carpenter S. Short open reading frame genes in innate immunity: from discovery to characterization. Trends Immunol 2022; 43:741-756. [PMID: 35965152 PMCID: PMC10118063 DOI: 10.1016/j.it.2022.07.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 07/11/2022] [Accepted: 07/13/2022] [Indexed: 12/27/2022]
Abstract
Next-generation sequencing (NGS) technologies have greatly expanded the size of the known transcriptome. Many newly discovered transcripts are classified as long noncoding RNAs (lncRNAs) which are assumed to affect phenotype through sequence and structure and not via translated protein products despite the vast majority of them harboring short open reading frames (sORFs). Recent advances have demonstrated that the noncoding designation is incorrect in many cases and that sORF-encoded peptides (SEPs) translated from these transcripts are important contributors to diverse biological processes. Interest in SEPs is at an early stage and there is evidence for the existence of thousands of SEPs that are yet unstudied. We hope to pique interest in investigating this unexplored proteome by providing a discussion of SEP characterization generally and describing specific discoveries in innate immunity.
Collapse
Affiliation(s)
- Eric Malekos
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA; Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Susan Carpenter
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA; Department of Molecular Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
9
|
Fijalkowski I, Willems P, Jonckheere V, Simoens L, Van Damme P. Hidden in plain sight: challenges in proteomics detection of small ORF-encoded polypeptides. MICROLIFE 2022; 3:uqac005. [PMID: 37223358 PMCID: PMC10117744 DOI: 10.1093/femsml/uqac005] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 04/18/2022] [Accepted: 04/29/2022] [Indexed: 05/25/2023]
Abstract
Genomic studies of bacteria have long pointed toward widespread prevalence of small open reading frames (sORFs) encoding for short proteins, <100 amino acids in length. Despite the mounting genomic evidence of their robust expression, relatively little progress has been made in their mass spectrometry-based detection and various blanket statements have been used to explain this observed discrepancy. In this study, we provide a large-scale riboproteogenomics investigation of the challenging nature of proteomic detection of such small proteins as informed by conditional translation data. A panel of physiochemical properties alongside recently developed mass spectrometry detectability metrics was interrogated to provide a comprehensive evidence-based assessment of sORF-encoded polypeptide (SEP) detectability. Moreover, a large-scale proteomics and translatomics compendium of proteins produced by Salmonella Typhimurium (S. Typhimurium), a model human pathogen, across a panel of growth conditions is presented and used in support of our in silico SEP detectability analysis. This integrative approach is used to provide a data-driven census of small proteins expressed by S. Typhimurium across growth phases and infection-relevant conditions. Taken together, our study pinpoints current limitations in proteomics-based detection of novel small proteins currently missing from bacterial genome annotations.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Patrick Willems
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Veronique Jonckheere
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
10
|
Zhang Z, Li Y, Yuan W, Wang Z, Wan C. Proteomic-driven identification of short open reading frame-encoded peptides. Proteomics 2022; 22:e2100312. [PMID: 35384297 DOI: 10.1002/pmic.202100312] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/29/2022] [Accepted: 03/30/2022] [Indexed: 11/10/2022]
Abstract
Accumulating evidence has shown that a large number of short open reading frames (sORFs) also have the ability to encode proteins. The discovery of sORFs opens up a new research area, leading to the identification and functional study of sORF encoded peptides (SEPs) at the omics level. Besides bioinformatics prediction and ribosomal profiling, mass spectrometry (MS) has become a significant tool as it directly detects the sequence of SEPs. Though MS-based proteomics methods have proved to be effective for qualitative and quantitative analysis of SEPs, the detection of SEPs is still a great challenge due to their low abundance and short sequence. To illustrate the progress in method development, we described and discussed the main steps of large-scale proteomics identification of SEPs, including SEP extraction and enrichment, MS detection, data processing and quality control, quantification, and function prediction and validation methods. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Zheng Zhang
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Yujie Li
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Wenqian Yuan
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Zhiwei Wang
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Cuihong Wan
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| |
Collapse
|
11
|
Kreitmeier M, Ardern Z, Abele M, Ludwig C, Scherer S, Neuhaus K. Spotlight on alternative frame coding: Two long overlapping genes in Pseudomonas aeruginosa are translated and under purifying selection. iScience 2022; 25:103844. [PMID: 35198897 PMCID: PMC8850804 DOI: 10.1016/j.isci.2022.103844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/14/2021] [Accepted: 01/27/2022] [Indexed: 12/13/2022] Open
Abstract
The existence of overlapping genes (OLGs) with significant coding overlaps revolutionizes our understanding of genomic complexity. We report two exceptionally long (957 nt and 1536 nt), evolutionarily novel, translated antisense open reading frames (ORFs) embedded within annotated genes in the pathogenic Gram-negative bacterium Pseudomonas aeruginosa. Both OLG pairs show sequence features consistent with being genes and transcriptional signals in RNA sequencing. Translation of both OLGs was confirmed by ribosome profiling and mass spectrometry. Quantitative proteomics of samples taken during different phases of growth revealed regulation of protein abundances, implying biological functionality. Both OLGs are taxonomically restricted, and likely arose by overprinting within the genus. Evidence for purifying selection further supports functionality. The OLGs reported here, designated olg1 and olg2, are the longest yet proposed in prokaryotes and are among the best attested in terms of translation and evolutionary constraint. These results highlight a potentially large unexplored dimension of prokaryotic genomes.
Collapse
Affiliation(s)
- Michaela Kreitmeier
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Miriam Abele
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
12
|
Weidenbach K, Gutt M, Cassidy L, Chibani C, Schmitz RA. Small Proteins in Archaea, a Mainly Unexplored World. J Bacteriol 2022; 204:e0031321. [PMID: 34543104 PMCID: PMC8765429 DOI: 10.1128/jb.00313-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
In recent years, increasing numbers of small proteins have moved into the focus of science. Small proteins have been identified and characterized in all three domains of life, but the majority remains functionally uncharacterized, lack secondary structure, and exhibit limited evolutionary conservation. While quite a few have already been described for bacteria and eukaryotic organisms, the amount of known and functionally analyzed archaeal small proteins is still very limited. In this review, we compile the current state of research, show strategies for systematic approaches for global identification of small archaeal proteins, and address selected functionally characterized examples. Besides, we document exemplarily for one archaeon the tool development and optimization to identify small proteins using genome-wide approaches.
Collapse
Affiliation(s)
- Katrin Weidenbach
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| | - Miriam Gutt
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| | - Liam Cassidy
- AG Proteomics & Bioanalytics, Institute for Experimental Medicine, Christian Albrechts University, Kiel, Germany
| | - Cynthia Chibani
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| | - Ruth A. Schmitz
- Institute for General Microbiology, Christian Albrechts University, Kiel, Germany
| |
Collapse
|
13
|
Ahrens CH, Wade JT, Champion MM, Langer JD. A Practical Guide to Small Protein Discovery and Characterization Using Mass Spectrometry. J Bacteriol 2022; 204:e0035321. [PMID: 34748388 PMCID: PMC8765459 DOI: 10.1128/jb.00353-21] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Small proteins of up to ∼50 amino acids are an abundant class of biomolecules across all domains of life. Yet due to the challenges inherent in their size, they are often missed in genome annotations, and are difficult to identify and characterize using standard experimental approaches. Consequently, we still know few small proteins even in well-studied prokaryotic model organisms. Mass spectrometry (MS) has great potential for the discovery, validation, and functional characterization of small proteins. However, standard MS approaches are poorly suited to the identification of both known and novel small proteins due to limitations at each step of a typical proteomics workflow, i.e., sample preparation, protease digestion, liquid chromatography, MS data acquisition, and data analysis. Here, we outline the major MS-based workflows and bioinformatic pipelines used for small protein discovery and validation. Special emphasis is placed on highlighting the adjustments required to improve detection and data quality for small proteins. We discuss both the unbiased detection of small proteins and the targeted analysis of small proteins of interest. Finally, we provide guidelines to prioritize novel small proteins, and an outlook on methods with particular potential to further improve comprehensive discovery and characterization of small proteins.
Collapse
Affiliation(s)
- Christian H. Ahrens
- Agroscope, Method Development and Analytics & SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
- Department of Biomedical Sciences, School of Public Health, University at Albany, Albany, New York, USA
| | - Matthew M. Champion
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana, USA
| | - Julian D. Langer
- Mass Spectrometry and Proteomics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
- Proteomics, Max Planck Institute for Brain Research, Frankfurt am Main, Germany
| |
Collapse
|
14
|
Fijalkowski I, Peeters MKR, Van Damme P. Small Protein Enrichment Improves Proteomics Detection of sORF Encoded Polypeptides. Front Genet 2021; 12:713400. [PMID: 34721520 PMCID: PMC8554064 DOI: 10.3389/fgene.2021.713400] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 10/01/2021] [Indexed: 11/13/2022] Open
Abstract
With the rapid growth in the number of sequenced genomes, genome annotation efforts became almost exclusively reliant on automated pipelines. Despite their unquestionable utility, these methods have been shown to underestimate the true complexity of the studied genomes, with small open reading frames (sORFs; ORFs typically considered shorter than 300 nucleotides) and, in consequence, their protein products (sORF encoded polypeptides or SEPs) being the primary example of a poorly annotated and highly underexplored class of genomic elements. With the advent of advanced translatomics such as ribosome profiling, reannotation efforts have progressed a great deal in providing translation evidence for numerous, previously unannotated sORFs. However, proteomics validation of these riboproteogenomics discoveries remains challenging due to their short length and often highly variable physiochemical properties. In this work we evaluate and compare tailored, yet easily adaptable, protein extraction methodologies for their efficacy in the extraction and concomitantly proteomics detection of SEPs expressed in the prokaryotic model pathogen Salmonella typhimurium (S. typhimurium). Further, an optimized protocol for the enrichment and efficient detection of SEPs making use of the of amphipathic polymer amphipol A8-35 and relying on differential peptide vs. protein solubility was developed and compared with global extraction methods making use of chaotropic agents. Given the versatile biological functions SEPs have been shown to exert, this work provides an accessible protocol for proteomics exploration of this fascinating class of small proteins.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Gent, Belgium
| | - Marlies K. R. Peeters
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Gent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Gent, Belgium
| |
Collapse
|
15
|
Sedghi L, DiMassa V, Harrington A, Lynch SV, Kapila YL. The oral microbiome: Role of key organisms and complex networks in oral health and disease. Periodontol 2000 2021; 87:107-131. [PMID: 34463991 PMCID: PMC8457218 DOI: 10.1111/prd.12393] [Citation(s) in RCA: 293] [Impact Index Per Article: 73.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
States of oral health and disease reflect the compositional and functional capacities of, as well as the interspecies interactions within, the oral microbiota. The oral cavity exists as a highly dynamic microbial environment that harbors many distinct substrata and microenvironments that house diverse microbial communities. Specific to the oral cavity, the nonshedding dental surfaces facilitate the development of highly complex polymicrobial biofilm communities, characterized not only by the distinct microbes comprising them, but cumulatively by their activities. Adding to this complexity, the oral cavity faces near-constant environmental challenges, including those from host diet, salivary flow, masticatory forces, and introduction of exogenous microbes. The composition of the oral microbiome is shaped throughout life by factors including host genetics, maternal transmission, as well as environmental factors, such as dietary habits, oral hygiene practice, medications, and systemic factors. This dynamic ecosystem presents opportunities for oral microbial dysbiosis and the development of dental and periodontal diseases. The application of both in vitro and culture-independent approaches has broadened the mechanistic understandings of complex polymicrobial communities within the oral cavity, as well as the environmental, local, and systemic underpinnings that influence the dynamics of the oral microbiome. Here, we review the present knowledge and current understanding of microbial communities within the oral cavity and the influences and challenges upon this system that encourage homeostasis or provoke microbiome perturbation, and thus contribute to states of oral health or disease.
Collapse
Affiliation(s)
- Lea Sedghi
- Department of Orofacial SciencesSchool of DentistryUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Vincent DiMassa
- Department of MedicineUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Anthony Harrington
- Department of MedicineUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Susan V. Lynch
- Department of MedicineUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Yvonne L. Kapila
- Department of Orofacial SciencesSchool of DentistryUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| |
Collapse
|
16
|
Peeters MKR, Baggerman G, Gabriels R, Pepermans E, Menschaert G, Boonen K. Ion Mobility Coupled to a Time-of-Flight Mass Analyzer Combined With Fragment Intensity Predictions Improves Identification of Classical Bioactive Peptides and Small Open Reading Frame-Encoded Peptides. Front Cell Dev Biol 2021; 9:720570. [PMID: 34604223 PMCID: PMC8484717 DOI: 10.3389/fcell.2021.720570] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 08/25/2021] [Indexed: 12/29/2022] Open
Abstract
Bioactive peptides exhibit key roles in a wide variety of complex processes, such as regulation of body weight, learning, aging, and innate immune response. Next to the classical bioactive peptides, emerging from larger precursor proteins by specific proteolytic processing, a new class of peptides originating from small open reading frames (sORFs) have been recognized as important biological regulators. But their intrinsic properties, specific expression pattern and location on presumed non-coding regions have hindered the full characterization of the repertoire of bioactive peptides, despite their predominant role in various pathways. Although the development of peptidomics has offered the opportunity to study these peptides in vivo, it remains challenging to identify the full peptidome as the lack of cleavage enzyme specification and large search space complicates conventional database search approaches. In this study, we introduce a proteogenomics methodology using a new type of mass spectrometry instrument and the implementation of machine learning tools toward improved identification of potential bioactive peptides in the mouse brain. The application of trapped ion mobility spectrometry (tims) coupled to a time-of-flight mass analyzer (TOF) offers improved sensitivity, an enhanced peptide coverage, reduction in chemical noise and the reduced occurrence of chimeric spectra. Subsequent machine learning tools MS2PIP, predicting fragment ion intensities and DeepLC, predicting retention times, improve the database searching based on a large and comprehensive custom database containing both sORFs and alternative ORFs. Finally, the identification of peptides is further enhanced by applying the post-processing semi-supervised learning tool Percolator. Applying this workflow, the first peptidomics workflow combined with spectral intensity and retention time predictions, we identified a total of 167 predicted sORF-encoded peptides, of which 48 originating from presumed non-coding locations, next to 401 peptides from known neuropeptide precursors, linked to 66 annotated bioactive neuropeptides from within 22 different families. Additional PEAKS analysis expanded the pool of SEPs on presumed non-coding locations to 84, while an additional 204 peptides completed the list of peptides from neuropeptide precursors. Altogether, this study provides insights into a new robust pipeline that fuses technological advancements from different fields ensuring an improved coverage of the neuropeptidome in the mouse brain.
Collapse
Affiliation(s)
- Marlies K. R. Peeters
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Geert Baggerman
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Unit Environmental Risk and Health, Flemish Institute for Technological Research, Mol, Belgium
| | - Ralf Gabriels
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- VIB-UGent Center for Medical Biotechnology, Flanders Institute for Biotechnology, Ghent, Belgium
| | - Elise Pepermans
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Unit Environmental Risk and Health, Flemish Institute for Technological Research, Mol, Belgium
| | - Gerben Menschaert
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- OHMX.bio, Ghent, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Unit Environmental Risk and Health, Flemish Institute for Technological Research, Mol, Belgium
| |
Collapse
|
17
|
Cassidy L, Kaulich PT, Maaß S, Bartel J, Becher D, Tholey A. Bottom-up and top-down proteomic approaches for the identification, characterization, and quantification of the low molecular weight proteome with focus on short open reading frame-encoded peptides. Proteomics 2021; 21:e2100008. [PMID: 34145981 DOI: 10.1002/pmic.202100008] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 01/14/2023]
Abstract
The recent discovery of alternative open reading frames creates a need for suitable analytical approaches to verify their translation and to characterize the corresponding gene products at the molecular level. As the analysis of small proteins within a background proteome by means of classical bottom-up proteomics is challenging, method development for the analysis of small open reading frame encoded peptides (SEPs) have become a focal point for research. Here, we highlight bottom-up and top-down proteomics approaches established for the analysis of SEPs in both pro- and eukaryotes. Major steps of analysis, including sample preparation and (small) proteome isolation, separation and mass spectrometry, data interpretation and quality control, quantification, the analysis of post-translational modifications, and exploration of functional aspects of the SEPs by means of proteomics technologies are described. These methods do not exclusively cover the analytics of SEPs but simultaneously include the low molecular weight proteome, and moreover, can also be used for the proteome-wide analysis of proteolytic processing events.
Collapse
Affiliation(s)
- Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Philipp T Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| |
Collapse
|
18
|
Fuchs S, Kucklick M, Lehmann E, Beckmann A, Wilkens M, Kolte B, Mustafayeva A, Ludwig T, Diwo M, Wissing J, Jänsch L, Ahrens CH, Ignatova Z, Engelmann S. Towards the characterization of the hidden world of small proteins in Staphylococcus aureus, a proteogenomics approach. PLoS Genet 2021; 17:e1009585. [PMID: 34061833 PMCID: PMC8195425 DOI: 10.1371/journal.pgen.1009585] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 06/11/2021] [Accepted: 05/07/2021] [Indexed: 01/08/2023] Open
Abstract
Small proteins play essential roles in bacterial physiology and virulence, however, automated algorithms for genome annotation are often not yet able to accurately predict the corresponding genes. The accuracy and reliability of genome annotations, particularly for small open reading frames (sORFs), can be significantly improved by integrating protein evidence from experimental approaches. Here we present a highly optimized and flexible bioinformatics workflow for bacterial proteogenomics covering all steps from (i) generation of protein databases, (ii) database searches and (iii) peptide-to-genome mapping to (iv) visualization of results. We used the workflow to identify high quality peptide spectrum matches (PSMs) for small proteins (≤ 100 aa, SP100) in Staphylococcus aureus Newman. Protein extracts from S. aureus were subjected to different experimental workflows for protein digestion and prefractionation and measured with highly sensitive mass spectrometers. In total, 175 proteins with up to 100 aa (SP100) were identified. Out of these 24 (ranging from 9 to 99 aa) were novel and not contained in the used genome annotation.144 SP100 are highly conserved and were found in at least 50% of the publicly available S. aureus genomes, while 127 are additionally conserved in other staphylococci. Almost half of the identified SP100 were basic, suggesting a role in binding to more acidic molecules such as nucleic acids or phospholipids.
Collapse
Affiliation(s)
- Stephan Fuchs
- Robert Koch Institute, Methodenentwicklung und Forschungsinfrastruktur (MF), Berlin, Germany
| | - Martin Kucklick
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Erik Lehmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Alexander Beckmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Maya Wilkens
- Robert Koch Institute, Methodenentwicklung und Forschungsinfrastruktur (MF), Berlin, Germany
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Baban Kolte
- University of Hamburg, Institute of Biochemistry and Molecular Biology, Hamburg, Germany
| | - Ayten Mustafayeva
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Tobias Ludwig
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Maurice Diwo
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| | - Josef Wissing
- Helmholtz Center for Infection Research GmbH, Cellular Proteomics, Braunschweig, Germany
| | - Lothar Jänsch
- Helmholtz Center for Infection Research GmbH, Cellular Proteomics, Braunschweig, Germany
| | - Christian H Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Zoya Ignatova
- University of Hamburg, Institute of Biochemistry and Molecular Biology, Hamburg, Germany
| | - Susanne Engelmann
- University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany
- Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Braunschweig, Germany
| |
Collapse
|
19
|
A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations. BMC Bioinformatics 2021; 22:277. [PMID: 34039272 PMCID: PMC8157683 DOI: 10.1186/s12859-021-04159-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open
Abstract
Background Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small proteins (sProteins), therefore requires additional evidence. Proteogenomics is considered the gold standard for this purpose. It extends beyond established annotations and includes all possible open reading frames (ORFs) as potential sources of peptides, thus allowing the discovery of novel, unannotated proteins. Typically this results in large numbers of putative novel small proteins fraught with large fractions of false-positive predictions. Results We observe that number and quality of the peptide-spectrum matches (PSMs) that map to a candidate ORF can be highly informative for the purpose of distinguishing proteins from spurious ORF annotations. We report here on a workflow that aggregates PSM quality information and local context into simple descriptors and reliably separates likely proteins from the large pool of false-positive, i.e., most likely untranslated ORFs. We investigated the artificial gut microbiome model SIHUMIx, comprising eight different species, for which we validate 5114 proteins that have previously been annotated only as hypothetical ORFs. In addition, we identified 37 non-annotated protein candidates for which we found evidence at the proteomic and transcriptomic level. Half (19) of these candidates have close functional homologs in other species. Another 12 candidates have homologs designated as hypothetical proteins in other species. The remaining six candidates are short (< 100 AA) and are most likely bona fide novel proteins. Conclusions The aggregation of PSM quality information for predicted ORFs provides a robust and efficient method to identify novel proteins in proteomics data. The workflow is in particular capable of identifying small proteins and frameshift variants. Since PSMs are explicitly mapped to genomic locations, it furthermore facilitates the integration of transcriptomics data and other sources of genome-level information. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04159-8.
Collapse
|
20
|
Kaulich PT, Cassidy L, Bartel J, Schmitz RA, Tholey A. Multi-protease Approach for the Improved Identification and Molecular Characterization of Small Proteins and Short Open Reading Frame-Encoded Peptides. J Proteome Res 2021; 20:2895-2903. [PMID: 33760615 DOI: 10.1021/acs.jproteome.1c00115] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The identification of proteins below approximately 70-100 amino acids in bottom-up proteomics is still a challenging task due to the limited number of peptides generated by proteolytic digestion. This includes the short open reading frame-encoded peptides (SEPs), which are a subset of the small proteins that were not previously annotated or that are alternatively encoded. Here, we systematically investigated the use of multiple proteases (trypsin, chymotrypsin, LysC, LysargiNase, and GluC) in GeLC-MS/MS analysis to improve the sequence coverage and the number of identified peptides for small proteins, with a focus on SEPs, in the archaeon Methanosarcina mazei. Combining the data of all proteases, we identified 63 small proteins and additional 28 SEPs with at least two unique peptides, while only 55 small proteins and 22 SEP could be identified using trypsin only. For 27 small proteins and 12 SEPs, a complete sequence coverage was achieved. Moreover, for five SEPs, incorrectly predicted translation start points or potential in vivo proteolytic processing were identified, confirming the data of a previous top-down proteomics study of this organism. The results show clearly that a multi-protease approach allows to improve the identification and molecular characterization of small proteins and SEPs. LC-MS data: ProteomeXchange PXD023921.
Collapse
Affiliation(s)
- Philipp T Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel 24105, Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel 24105, Germany
| | - Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald 17489, Germany
| | - Ruth A Schmitz
- Institute for General Microbiology, Christian-Albrechts-Universität zu Kiel, Kiel 24118, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel 24105, Germany
| |
Collapse
|
21
|
Petruschke H, Schori C, Canzler S, Riesbeck S, Poehlein A, Daniel R, Frei D, Segessemann T, Zimmerman J, Marinos G, Kaleta C, Jehmlich N, Ahrens CH, von Bergen M. Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome. MICROBIOME 2021; 9:55. [PMID: 33622394 PMCID: PMC7903761 DOI: 10.1186/s40168-020-00981-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 12/16/2020] [Indexed: 05/13/2023]
Abstract
BACKGROUND The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. RESULTS We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx. CONCLUSIONS We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract. Video abstract.
Collapse
Affiliation(s)
- Hannes Petruschke
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Christian Schori
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Sebastian Canzler
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Sarah Riesbeck
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Anja Poehlein
- Institute of Microbiology and Genetics, Department of Genomic and Applied Microbiology, Georg-August University of Göttingen, Göttingen, Germany
| | - Rolf Daniel
- Institute of Microbiology and Genetics, Department of Genomic and Applied Microbiology, Georg-August University of Göttingen, Göttingen, Germany
| | - Daniel Frei
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Tina Segessemann
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Johannes Zimmerman
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Georgios Marinos
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Christoph Kaleta
- Research Group Medical Systems Biology, Institute for Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| | - Nico Jehmlich
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Christian H Ahrens
- Agroscope, Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland.
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany.
- Institute of Biochemistry, Faculty of Biosciences, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany.
| |
Collapse
|
22
|
Stringer A, Smith C, Mangano K, Wade JT. Identification of novel translated small ORFs in Escherichia coli using complementary ribosome profiling approaches. J Bacteriol 2021; 204:JB0035221. [PMID: 34662240 PMCID: PMC8765432 DOI: 10.1128/jb.00352-21] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 10/12/2021] [Indexed: 11/20/2022] Open
Abstract
Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Ribosome profiling has been used to infer the existence of small proteins by detecting the translation of the corresponding open reading frames (ORFs). Detection of translated short ORFs by ribosome profiling can be improved by treating cells with drugs that stall ribosomes at specific codons. Here, we combine the analysis of ribosome profiling data for Escherichia coli cells treated with antibiotics that stall ribosomes at either start or stop codons. Thus, we identify ribosome-occupied start and stop codons with high sensitivity for ∼400 novel putative ORFs. The newly discovered ORFs are mostly short, with 365 encoding proteins of <51 amino acids. We validate translation of several selected short ORFs, and show that many likely encode unstable proteins. Moreover, we present evidence that most of the newly identified short ORFs are not under purifying selection, suggesting they do not impact cell fitness, although a small subset have the hallmarks of functional ORFs. IMPORTANCE Small proteins of <51 amino acids are abundant across all domains of life but are often overlooked because their small size makes them difficult to predict computationally, and they are refractory to standard proteomic approaches. Recent studies have discovered small proteins by mapping the location of translating ribosomes on RNA using a technique known as ribosome profiling. Discovery of translated sORFs using ribosome profiling can be improved by treating cells with drugs that trap initiating ribosomes. Here, we show that combining these data with equivalent data for cells treated with a drug that stalls terminating ribosomes facilitates the discovery of small proteins. We use this approach to discover 365 putative genes that encode small proteins in Escherichia coli.
Collapse
Affiliation(s)
- Anne Stringer
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Carol Smith
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Kyle Mangano
- Center for Biomolecular Sciences, University of Illinois, Chicago, Illinois, USA
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
- Department of Biomedical Sciences, School of Public Health, University at Albany, Albany, New York, USA
| |
Collapse
|
23
|
The Small Toxic Salmonella Protein TimP Targets the Cytoplasmic Membrane and Is Repressed by the Small RNA TimR. mBio 2020; 11:mBio.01659-20. [PMID: 33172998 PMCID: PMC7667032 DOI: 10.1128/mbio.01659-20] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Next-generation sequencing (NGS) has enabled the revelation of a vast number of genomes from organisms spanning all domains of life. To reduce complexity when new genome sequences are annotated, open reading frames (ORFs) shorter than 50 codons in length are generally omitted. However, it has recently become evident that this procedure sorts away ORFs encoding small proteins of high biological significance. For instance, tailored small protein identification approaches have shown that bacteria encode numerous small proteins with important physiological functions. As the number of predicted small ORFs increase, it becomes important to characterize the corresponding proteins. In this study, we discovered a conserved but previously overlooked small enterobacterial protein. We show that this protein, which we dubbed TimP, is a potent toxin that inhibits bacterial growth by targeting the cell membrane. Toxicity is relieved by a small regulatory RNA, which binds the toxin mRNA to inhibit toxin synthesis. Small proteins are gaining increased attention due to their important functions in major biological processes throughout the domains of life. However, their small size and low sequence conservation make them difficult to identify. It is therefore not surprising that enterobacterial ryfA has escaped identification as a small protein coding gene for nearly 2 decades. Since its identification in 2001, ryfA has been thought to encode a noncoding RNA and has been implicated in biofilm formation in Escherichia coli and pathogenesis in Shigella dysenteriae. Although a recent ribosome profiling study suggested ryfA to be translated, the corresponding protein product was not detected. In this study, we provide evidence that ryfA encodes a small toxic inner membrane protein, TimP, overexpression of which causes cytoplasmic membrane leakage. TimP carries an N-terminal signal sequence, indicating that its membrane localization is Sec-dependent. Expression of TimP is repressed by the small RNA (sRNA) TimR, which base pairs with the timP mRNA to inhibit its translation. In contrast to overexpression, endogenous expression of TimP upon timR deletion permits cell growth, possibly indicating a toxicity-independent function in the bacterial membrane.
Collapse
|
24
|
Cassidy L, Helbig AO, Kaulich PT, Weidenbach K, Schmitz RA, Tholey A. Multidimensional separation schemes enhance the identification and molecular characterization of low molecular weight proteomes and short open reading frame-encoded peptides in top-down proteomics. J Proteomics 2020; 230:103988. [PMID: 32949814 DOI: 10.1016/j.jprot.2020.103988] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/17/2020] [Accepted: 09/14/2020] [Indexed: 12/13/2022]
Abstract
Short open reading frame-encoded peptides (SEP) represent a widely undiscovered part of the proteome. The detailed analysis of SEP has, despite inherent limitations such as incomplete sequence coverage, challenges encountered with protein inference, the identification of posttranslational modifications and the assignment of potential N- and C-terminal truncations, predominantly been assessed using bottom-up proteomic workflows. The use of top-down based proteomic workflows is capable of providing an unparalleled level of characterization information, which is of increased importance in the case of alternatively encoded protein products. However, top-down based analysis is not without its own limitations, for which efficient separation prior to MS analysis is a major issue. We established a sample preparation approach for the combined bottom-up and top-down proteomic analysis of SEP. Key improvements were made by the application of solid phase extraction (SPE), which supported enrichment of proteins below ca. 20 kDa, followed by 2D-LC-MS top-down analysis encompassing both HCD and EThcD ion activation. Bottom-up experiments were used to support and confirm top-down data interpretation. This strategy allowed for the top-down characterization of 36 proteoforms mapping to 12 SEP from the archaeon Methanosarcina mazei strain Gö1, with the concurrent detection and identification of several posttranslational modifications in SEP. BIOLOGICAL SIGNIFICANCE: Small or short open reading frames (sORF) have been widely neglected in genome research in the past. With their increasing discovery, the question about the presence and molecular function of their translation products, the short open reading frame-encoded peptides (SEP), arises. As these small proteins are usually below the 10 kDa range, the number of peptides identifiable by bottom-up proteomics is limited which hampers both the identification and the recognition of potential posttranslational modifications. The presented top-down approach allowed for the detection of full length SEP, as well as of terminally truncated proteoforms, and further enabled the identification of disulfide bonds in these small proteins. This demonstrates, that this yet widely undiscovered part of the proteome undergoes the same modifications as classical proteins which is an essential step for future understanding of the biological functions of these molecules.
Collapse
Affiliation(s)
- Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Andreas O Helbig
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Philipp T Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany
| | - Kathrin Weidenbach
- Institute for General Microbiology, Christian-Albrechts-Universität zu Kiel, 24118 Kiel, Germany
| | - Ruth A Schmitz
- Institute for General Microbiology, Christian-Albrechts-Universität zu Kiel, 24118 Kiel, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany.
| |
Collapse
|
25
|
Bartel J, Varadarajan AR, Sura T, Ahrens CH, Maaß S, Becher D. Optimized Proteomics Workflow for the Detection of Small Proteins. J Proteome Res 2020; 19:4004-4018. [DOI: 10.1021/acs.jproteome.0c00286] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Adithi R. Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Thomas Sura
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Christian H. Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| |
Collapse
|
26
|
Kaulich PT, Cassidy L, Weidenbach K, Schmitz RA, Tholey A. Complementarity of Different SDS‐PAGE Gel Staining Methods for the Identification of Short Open Reading Frame‐Encoded Peptides. Proteomics 2020; 20:e2000084. [DOI: 10.1002/pmic.202000084] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/15/2020] [Indexed: 12/14/2022]
Affiliation(s)
- Philipp T. Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine Christian‐Albrechts‐Universität zu Kiel Kiel 24105 Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine Christian‐Albrechts‐Universität zu Kiel Kiel 24105 Germany
| | - Katrin Weidenbach
- Institute for General Microbiology Christian‐Albrechts‐Universität zu Kiel Kiel 24118 Germany
| | - Ruth A. Schmitz
- Institute for General Microbiology Christian‐Albrechts‐Universität zu Kiel Kiel 24118 Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine Christian‐Albrechts‐Universität zu Kiel Kiel 24105 Germany
| |
Collapse
|