1
|
Fijalkowski I, Snauwaert V, Van Damme P. Proteins à la carte: riboproteogenomic exploration of bacterial N-terminal proteoform expression. mBio 2024; 15:e0033324. [PMID: 38511928 PMCID: PMC11005335 DOI: 10.1128/mbio.00333-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
In recent years, it has become evident that the true complexity of bacterial proteomes remains underestimated. Gene annotation tools are known to propagate biases and overlook certain classes of truly expressed proteins, particularly proteoforms-protein isoforms arising from a single gene. Recent (re-)annotation efforts heavily rely on ribosome profiling by providing a direct readout of translation to fully describe bacterial proteomes. In this study, we employ a robust riboproteogenomic pipeline to conduct a systematic census of expressed N-terminal proteoform pairs, representing two isoforms encoded by a single gene raised by annotated and alternative translation initiation, in Salmonella. Intriguingly, conditional-dependent changes in relative utilization of annotated and alternative translation initiation sites (TIS) were observed in several cases. This suggests that TIS selection is subject to regulatory control, adding yet another layer of complexity to our understanding of bacterial proteomes. IMPORTANCE With the emerging theme of genes within genes comprising the existence of alternative open reading frames (ORFs) generated by translation initiation at in-frame start codons, mechanisms that control the relative utilization of annotated and alternative TIS need to be unraveled and our molecular understanding of resulting proteoforms broadened. Utilizing complementary ribosome profiling strategies to map ORF boundaries, we uncovered dual-encoding ORFs generated by in-frame TIS usage in Salmonella. Besides demonstrating that alternative TIS usage may generate proteoforms with different characteristics, such as differential localization and specialized function, quantitative aspects of conditional retapamulin-assisted ribosome profiling (Ribo-RET) translation initiation maps offer unprecedented insights into the relative utilization of annotated and alternative TIS, enabling the exploration of gene regulatory mechanisms that control TIS usage and, consequently, the translation of N-terminal proteoform pairs.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Valdes Snauwaert
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| |
Collapse
|
2
|
Simoens L, Fijalkowski I, Van Damme P. Exposing the small protein load of bacterial life. FEMS Microbiol Rev 2023; 47:fuad063. [PMID: 38012116 PMCID: PMC10723866 DOI: 10.1093/femsre/fuad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 11/10/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023] Open
Abstract
The ever-growing repertoire of genomic techniques continues to expand our understanding of the true diversity and richness of prokaryotic genomes. Riboproteogenomics laid the foundation for dynamic studies of previously overlooked genomic elements. Most strikingly, bacterial genomes were revealed to harbor robust repertoires of small open reading frames (sORFs) encoding a diverse and broadly expressed range of small proteins, or sORF-encoded polypeptides (SEPs). In recent years, continuous efforts led to great improvements in the annotation and characterization of such proteins, yet many challenges remain to fully comprehend the pervasive nature of small proteins and their impact on bacterial biology. In this work, we review the recent developments in the dynamic field of bacterial genome reannotation, catalog the important biological roles carried out by small proteins and identify challenges obstructing the way to full understanding of these elusive proteins.
Collapse
Affiliation(s)
- Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| |
Collapse
|
3
|
Fijalkowski I, Willems P, Jonckheere V, Simoens L, Van Damme P. Hidden in plain sight: challenges in proteomics detection of small ORF-encoded polypeptides. MICROLIFE 2022; 3:uqac005. [PMID: 37223358 PMCID: PMC10117744 DOI: 10.1093/femsml/uqac005] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 04/18/2022] [Accepted: 04/29/2022] [Indexed: 05/25/2023]
Abstract
Genomic studies of bacteria have long pointed toward widespread prevalence of small open reading frames (sORFs) encoding for short proteins, <100 amino acids in length. Despite the mounting genomic evidence of their robust expression, relatively little progress has been made in their mass spectrometry-based detection and various blanket statements have been used to explain this observed discrepancy. In this study, we provide a large-scale riboproteogenomics investigation of the challenging nature of proteomic detection of such small proteins as informed by conditional translation data. A panel of physiochemical properties alongside recently developed mass spectrometry detectability metrics was interrogated to provide a comprehensive evidence-based assessment of sORF-encoded polypeptide (SEP) detectability. Moreover, a large-scale proteomics and translatomics compendium of proteins produced by Salmonella Typhimurium (S. Typhimurium), a model human pathogen, across a panel of growth conditions is presented and used in support of our in silico SEP detectability analysis. This integrative approach is used to provide a data-driven census of small proteins expressed by S. Typhimurium across growth phases and infection-relevant conditions. Taken together, our study pinpoints current limitations in proteomics-based detection of novel small proteins currently missing from bacterial genome annotations.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Patrick Willems
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Veronique Jonckheere
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
4
|
Kreitmeier M, Ardern Z, Abele M, Ludwig C, Scherer S, Neuhaus K. Spotlight on alternative frame coding: Two long overlapping genes in Pseudomonas aeruginosa are translated and under purifying selection. iScience 2022; 25:103844. [PMID: 35198897 PMCID: PMC8850804 DOI: 10.1016/j.isci.2022.103844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/14/2021] [Accepted: 01/27/2022] [Indexed: 12/13/2022] Open
Abstract
The existence of overlapping genes (OLGs) with significant coding overlaps revolutionizes our understanding of genomic complexity. We report two exceptionally long (957 nt and 1536 nt), evolutionarily novel, translated antisense open reading frames (ORFs) embedded within annotated genes in the pathogenic Gram-negative bacterium Pseudomonas aeruginosa. Both OLG pairs show sequence features consistent with being genes and transcriptional signals in RNA sequencing. Translation of both OLGs was confirmed by ribosome profiling and mass spectrometry. Quantitative proteomics of samples taken during different phases of growth revealed regulation of protein abundances, implying biological functionality. Both OLGs are taxonomically restricted, and likely arose by overprinting within the genus. Evidence for purifying selection further supports functionality. The OLGs reported here, designated olg1 and olg2, are the longest yet proposed in prokaryotes and are among the best attested in terms of translation and evolutionary constraint. These results highlight a potentially large unexplored dimension of prokaryotic genomes.
Collapse
Affiliation(s)
- Michaela Kreitmeier
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Miriam Abele
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technische Universität München, Gregor-Mendel-Strasse 4, 85354 Freising, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
5
|
Willems P, Ndah E, Jonckheere V, Van Breusegem F, Van Damme P. To New Beginnings: Riboproteogenomics Discovery of N-Terminal Proteoforms in Arabidopsis Thaliana. FRONTIERS IN PLANT SCIENCE 2022; 12:778804. [PMID: 35069635 PMCID: PMC8770321 DOI: 10.3389/fpls.2021.778804] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 11/18/2021] [Indexed: 06/14/2023]
Abstract
Alternative translation initiation is a widespread event in biology that can shape multiple protein forms or proteoforms from a single gene. However, the respective contribution of alternative translation to protein complexity remains largely enigmatic. By complementary ribosome profiling and N-terminal proteomics (i.e., riboproteogenomics), we provide clear-cut evidence for ~90 N-terminal proteoform pairs shaped by (alternative) translation initiation in Arabidopsis thaliana. Next to several cases additionally confirmed by directed mutagenesis, identified alternative protein N-termini follow the enzymatic rules of co-translational N-terminal protein acetylation and initiator methionine removal. In contrast to other eukaryotic models, N-terminal acetylation in plants cannot generally be considered as a proxy of translation initiation because of its posttranslational occurrence on mature proteolytic neo-termini (N-termini) localized in the chloroplast stroma. Quantification of N-terminal acetylation revealed differing co- vs. posttranslational N-terminal acetylation patterns. Intriguingly, our data additionally hints to alternative translation initiation serving as a common mechanism to supply protein copies in multiple cellular compartments, as alternative translation sites are often in close proximity to cleavage sites of N-terminal transit sequences of nuclear-encoded chloroplastic and mitochondrial proteins. Overall, riboproteogenomics screening enables the identification of (differential localized) N-terminal proteoforms raised upon alternative translation.
Collapse
Affiliation(s)
- Patrick Willems
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Vlaams Instituut voor Biotechnologie (VIB)-Center for Plant Systems Biology, Ghent, Belgium
| | - Elvis Ndah
- integrative Riboproteogenomics, Interactomics and Proteomics Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Veronique Jonckheere
- integrative Riboproteogenomics, Interactomics and Proteomics Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Frank Van Breusegem
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Vlaams Instituut voor Biotechnologie (VIB)-Center for Plant Systems Biology, Ghent, Belgium
| | - Petra Van Damme
- integrative Riboproteogenomics, Interactomics and Proteomics Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| |
Collapse
|
6
|
Fijalkowski I, Peeters MKR, Van Damme P. Small Protein Enrichment Improves Proteomics Detection of sORF Encoded Polypeptides. Front Genet 2021; 12:713400. [PMID: 34721520 PMCID: PMC8554064 DOI: 10.3389/fgene.2021.713400] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 10/01/2021] [Indexed: 11/13/2022] Open
Abstract
With the rapid growth in the number of sequenced genomes, genome annotation efforts became almost exclusively reliant on automated pipelines. Despite their unquestionable utility, these methods have been shown to underestimate the true complexity of the studied genomes, with small open reading frames (sORFs; ORFs typically considered shorter than 300 nucleotides) and, in consequence, their protein products (sORF encoded polypeptides or SEPs) being the primary example of a poorly annotated and highly underexplored class of genomic elements. With the advent of advanced translatomics such as ribosome profiling, reannotation efforts have progressed a great deal in providing translation evidence for numerous, previously unannotated sORFs. However, proteomics validation of these riboproteogenomics discoveries remains challenging due to their short length and often highly variable physiochemical properties. In this work we evaluate and compare tailored, yet easily adaptable, protein extraction methodologies for their efficacy in the extraction and concomitantly proteomics detection of SEPs expressed in the prokaryotic model pathogen Salmonella typhimurium (S. typhimurium). Further, an optimized protocol for the enrichment and efficient detection of SEPs making use of the of amphipathic polymer amphipol A8-35 and relying on differential peptide vs. protein solubility was developed and compared with global extraction methods making use of chaotropic agents. Given the versatile biological functions SEPs have been shown to exert, this work provides an accessible protocol for proteomics exploration of this fascinating class of small proteins.
Collapse
Affiliation(s)
- Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Gent, Belgium
| | - Marlies K. R. Peeters
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Gent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Gent, Belgium
| |
Collapse
|
7
|
Jonckheere V, Van Damme P. N-Terminal Acetyltransferase Naa40p Whereabouts Put into N-Terminal Proteoform Perspective. Int J Mol Sci 2021; 22:ijms22073690. [PMID: 33916271 PMCID: PMC8037211 DOI: 10.3390/ijms22073690] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 03/28/2021] [Accepted: 03/28/2021] [Indexed: 11/21/2022] Open
Abstract
The evolutionary conserved N-alpha acetyltransferase Naa40p is among the most selective N-terminal acetyltransferases (NATs) identified to date. Here we identified a conserved N-terminally truncated Naa40p proteoform named Naa40p25 or short Naa40p (Naa40S). Intriguingly, although upon ectopic expression in yeast, both Naa40p proteoforms were capable of restoring N-terminal acetylation of the characterized yeast histone H2A Naa40p substrate, the Naa40p histone H4 substrate remained N-terminally free in human haploid cells specifically deleted for canonical Naa40p27 or 237 amino acid long Naa40p (Naa40L), but expressing Naa40S. Interestingly, human Naa40L and Naa40S displayed differential expression and subcellular localization patterns by exhibiting a principal nuclear and cytoplasmic localization, respectively. Furthermore, Naa40L was shown to be N-terminally myristoylated and to interact with N-myristoyltransferase 1 (NMT1), implicating NMT1 in steering Naa40L nuclear import. Differential interactomics data obtained by biotin-dependent proximity labeling (BioID) further hints to context-dependent roles of Naa40p proteoforms. More specifically, with Naa40S representing the main co-translationally acting actor, the interactome of Naa40L was enriched for nucleolar proteins implicated in ribosome biogenesis and the assembly of ribonucleoprotein particles, overall indicating a proteoform-specific segregation of previously reported Naa40p activities. Finally, the yeast histone variant H2A.Z and the transcriptionally regulatory protein Lge1 were identified as novel Naa40p substrates, expanding the restricted substrate repertoire of Naa40p with two additional members and further confirming Lge1 as being the first redundant yNatA and yNatD substrate identified to date.
Collapse
|
8
|
Willems P, Fels U, Staes A, Gevaert K, Van Damme P. Use of Hybrid Data-Dependent and -Independent Acquisition Spectral Libraries Empowers Dual-Proteome Profiling. J Proteome Res 2021; 20:1165-1177. [PMID: 33467856 PMCID: PMC7871992 DOI: 10.1021/acs.jproteome.0c00350] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Indexed: 01/01/2023]
Abstract
In the context of bacterial infections, it is imperative that physiological responses can be studied in an integrated manner, meaning a simultaneous analysis of both the host and the pathogen responses. To improve the sensitivity of detection, data-independent acquisition (DIA)-based proteomics was found to outperform data-dependent acquisition (DDA) workflows in identifying and quantifying low-abundant proteins. Here, by making use of representative bacterial pathogen/host proteome samples, we report an optimized hybrid library generation workflow for DIA mass spectrometry relying on the use of data-dependent and in silico-predicted spectral libraries. When compared to searching DDA experiment-specific libraries only, the use of hybrid libraries significantly improved peptide detection to an extent suggesting that infection-relevant host-pathogen conditions could be profiled in sufficient depth without the need of a priori bacterial pathogen enrichment when studying the bacterial proteome. Proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD017904 and PXD017945.
Collapse
Affiliation(s)
- Patrick Willems
- Department
of Biochemistry and Microbiology, Ghent
University, Ghent 9000, Belgium
- Department
of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9000, Belgium
- VIB-UGent
Center for Plant Systems Biology, Ghent 9052, Belgium
| | - Ursula Fels
- Department
of Biochemistry and Microbiology, Ghent
University, Ghent 9000, Belgium
- VIB-UGent
Center for Medical Biotechnology, Ghent 9052, Belgium
| | - An Staes
- VIB-UGent
Center for Medical Biotechnology, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9000, Belgium
| | - Kris Gevaert
- VIB-UGent
Center for Medical Biotechnology, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9000, Belgium
| | - Petra Van Damme
- Department
of Biochemistry and Microbiology, Ghent
University, Ghent 9000, Belgium
| |
Collapse
|
9
|
Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage. mSystems 2020; 5:5/5/e00833-20. [PMID: 33109751 PMCID: PMC7593589 DOI: 10.1128/msystems.00833-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years. Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality features, including comparison to predicted fragment ion intensities. When this strategy is applied, an enhanced proteome depth is achieved, as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by reanalyzing public Deinococcus radiodurans data sets. Taken together, our results show that systematic reanalysis using available prokaryotic (proteome) data sets holds great promise to assist in experimentally based genome annotation. IMPORTANCE Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years.
Collapse
|