1
|
Lewis JM, Jebeli L, Coulon PML, Lay CE, Scott NE. Glycoproteomic and proteomic analysis of Burkholderia cenocepacia reveals glycosylation events within FliF and MotB are dispensable for motility. Microbiol Spectr 2024; 12:e0034624. [PMID: 38709084 PMCID: PMC11237607 DOI: 10.1128/spectrum.00346-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 04/16/2024] [Indexed: 05/07/2024] Open
Abstract
Across the Burkholderia genus O-linked protein glycosylation is highly conserved. While the inhibition of glycosylation has been shown to be detrimental for virulence in Burkholderia cepacia complex species, such as Burkholderia cenocepacia, little is known about how specific glycosylation sites impact protein functionality. Within this study, we sought to improve our understanding of the breadth, dynamics, and requirement for glycosylation across the B. cenocepacia O-glycoproteome. Assessing the B. cenocepacia glycoproteome across different culture media using complementary glycoproteomic approaches, we increase the known glycoproteome to 141 glycoproteins. Leveraging this repertoire of glycoproteins, we quantitively assessed the glycoproteome of B. cenocepacia using Data-Independent Acquisition (DIA) revealing the B. cenocepacia glycoproteome is largely stable across conditions with most glycoproteins constitutively expressed. Examination of how the absence of glycosylation impacts the glycoproteome reveals that the protein abundance of only five glycoproteins (BCAL1086, BCAL2974, BCAL0525, BCAM0505, and BCAL0127) are altered by the loss of glycosylation. Assessing ΔfliF (ΔBCAL0525), ΔmotB (ΔBCAL0127), and ΔBCAM0505 strains, we demonstrate the loss of FliF, and to a lesser extent MotB, mirror the proteomic effects observed in the absence of glycosylation in ΔpglL. While both MotB and FliF are essential for motility, we find loss of glycosylation sites in MotB or FliF does not impact motility supporting these sites are dispensable for function. Combined this work broadens our understanding of the B. cenocepacia glycoproteome supporting that the loss of glycoproteins in the absence of glycosylation is not an indicator of the requirement for glycosylation for protein function. IMPORTANCE Burkholderia cenocepacia is an opportunistic pathogen of concern within the Cystic Fibrosis community. Despite a greater appreciation of the unique physiology of B. cenocepacia gained over the last 20 years a complete understanding of the proteome and especially the O-glycoproteome, is lacking. In this study, we utilize systems biology approaches to expand the known B. cenocepacia glycoproteome as well as track the dynamics of glycoproteins across growth phases, culturing media and in response to the loss of glycosylation. We show that the glycoproteome of B. cenocepacia is largely stable across conditions and that the loss of glycosylation only impacts five glycoproteins including the motility associated proteins FliF and MotB. Examination of MotB and FliF shows, while these proteins are essential for motility, glycosylation is dispensable. Combined this work supports that B. cenocepacia glycosylation can be dispensable for protein function and may influence protein properties beyond stability.
Collapse
Affiliation(s)
- Jessica M Lewis
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Leila Jebeli
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Pauline M L Coulon
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Catrina E Lay
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Nichollas E Scott
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| |
Collapse
|
2
|
D'Ermo G, Audebert S, Camoin L, Planer-Friedrich B, Casiot-Marouani C, Delpoux S, Lebrun R, Guiral M, Schoepp-Cothenet B. Quantitative proteomics reveals the Sox system's role in sulphur and arsenic metabolism of phototroph Halorhodospira halophila. Environ Microbiol 2024; 26:e16655. [PMID: 38897608 DOI: 10.1111/1462-2920.16655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 05/07/2024] [Indexed: 06/21/2024]
Abstract
The metabolic process of purple sulphur bacteria's anoxygenic photosynthesis has been primarily studied in Allochromatium vinosum, a member of the Chromatiaceae family. However, the metabolic processes of purple sulphur bacteria from the Ectothiorhodospiraceae and Halorhodospiraceae families remain unexplored. We have analysed the proteome of Halorhodospira halophila, a member of the Halorhodospiraceae family, which was cultivated with various sulphur compounds. This analysis allowed us to reconstruct the first comprehensive sulphur-oxidative photosynthetic network for this family. Some members of the Ectothiorhodospiraceae family have been shown to use arsenite as a photosynthetic electron donor. Therefore, we analysed the proteome response of Halorhodospira halophila when grown under arsenite and sulphide conditions. Our analyses using ion chromatography-inductively coupled plasma mass spectrometry showed that thioarsenates are chemically formed under these conditions. However, they are more extensively generated and converted in the presence of bacteria, suggesting a biological process. Our quantitative proteomics revealed that the SoxAXYZB system, typically dedicated to thiosulphate oxidation, is overproduced under these growth conditions. Additionally, two electron carriers, cytochrome c551/c5 and HiPIP III, are also overproduced. Electron paramagnetic resonance spectroscopy suggested that these transporters participate in the reduction of the photosynthetic Reaction Centre. These results support the idea of a chemically and biologically formed thioarsenate being oxidized by the Sox system, with cytochrome c551/c5 and HiPIP III directing electrons towards the Reaction Centre.
Collapse
Affiliation(s)
- Giulia D'Ermo
- Aix-Marseille Université, CNRS, BIP-UMR 7281, Marseille, France
| | - Stéphane Audebert
- Aix-Marseille Université, Inserm, CNRS, Institut Paoli-Calmettes, CRCM, Marseille Protéomique, Marseille, France
| | - Luc Camoin
- Aix-Marseille Université, Inserm, CNRS, Institut Paoli-Calmettes, CRCM, Marseille Protéomique, Marseille, France
| | - Britta Planer-Friedrich
- Environmental Geochemistry, Bayreuth Centre for Ecology and Environmental Research (BAYCEER), University of Bayreuth, Bayreuth, Germany
| | | | - Sophie Delpoux
- Laboratoire HydroSciences Montpellier, Univ. Montpellier, CNRS, IRD, Montpellier, France
| | - Régine Lebrun
- Aix-Marseille Université, CNRS, IMM-FR3479, Marseille Protéomique, Marseille, France
| | - Marianne Guiral
- Aix-Marseille Université, CNRS, BIP-UMR 7281, Marseille, France
| | | |
Collapse
|
3
|
Kelly S, Tham JL, McKeever K, Dillon E, O'Connell D, Scholz D, Simpson JC, O'Connor K, Narancic T, Cagney G. Comprehensive Proteomics Analysis of Polyhydroxyalkanoate (PHA) Biology in Pseudomonas putida KT2440: The Outer Membrane Lipoprotein OprL is a Newly Identified Phasin. Mol Cell Proteomics 2024; 23:100765. [PMID: 38608840 PMCID: PMC11103573 DOI: 10.1016/j.mcpro.2024.100765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/01/2024] [Accepted: 03/23/2024] [Indexed: 04/14/2024] Open
Abstract
Pseudomonas putida KT2440 is an important bioplastic-producing industrial microorganism capable of synthesizing the polymeric carbon-rich storage material, polyhydroxyalkanoate (PHA). PHA is sequestered in discrete PHA granules, or carbonosomes, and accumulates under conditions of stress, for example, low levels of available nitrogen. The pha locus responsible for PHA metabolism encodes both anabolic and catabolic enzymes, a transcription factor, and carbonosome-localized proteins termed phasins. The functions of phasins are incompletely understood but genetic disruption of their function causes PHA-related phenotypes. To improve our understanding of these proteins, we investigated the PHA pathways of P.putida KT2440 using three types of experiments. First, we profiled cells grown in nitrogen-limited and nitrogen-excess media using global expression proteomics, identifying sets of proteins found to coordinately increase or decrease within clustered pathways. Next, we analyzed the protein composition of isolated carbonosomes, identifying two new putative components. We carried out physical interaction screens focused on PHA-related proteins, generating a protein-protein network comprising 434 connected proteins. Finally, we confirmed that the outer membrane protein OprL (the Pal component of the Pal-Tol system) localizes to the carbonosome and shows a PHA-related phenotype and therefore is a novel phasin. The combined datasets represent a valuable overview of the protein components of the PHA system in P.putida highlighting the complex nature of regulatory interactions responsive to nutrient stress.
Collapse
Affiliation(s)
- Siobhan Kelly
- BiOrbic - Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland; UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland; School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland
| | - Jia-Lynn Tham
- BiOrbic - Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland; UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland; School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland
| | - Kate McKeever
- BiOrbic - Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland; UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland; School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland
| | - Eugene Dillon
- UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | - David O'Connell
- BiOrbic - Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland; UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland; School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland
| | - Dimitri Scholz
- UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | - Jeremy C Simpson
- UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland; UCD Earth Institute, University College Dublin, Belfield, Dublin, Ireland
| | - Kevin O'Connor
- BiOrbic - Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland; School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland; UCD School of Biology and Environmental Science, University College Dublin, Belfield, Dublin, Ireland
| | - Tanja Narancic
- BiOrbic - Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland; School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland.
| | - Gerard Cagney
- BiOrbic - Bioeconomy Research Centre, University College Dublin, Belfield, Dublin, Ireland; UCD Conway Institute, University College Dublin, Belfield, Dublin, Ireland; School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin, Ireland.
| |
Collapse
|
4
|
Genth J, Schäfer K, Cassidy L, Graspeuntner S, Rupp J, Tholey A. Identification of proteoforms of short open reading frame-encoded peptides in Blautia producta under different cultivation conditions. Microbiol Spectr 2023; 11:e0252823. [PMID: 37782090 PMCID: PMC10715070 DOI: 10.1128/spectrum.02528-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 08/14/2023] [Indexed: 10/03/2023] Open
Abstract
IMPORTANCE The identification of short open reading frame-encoded peptides (SEP) and different proteoforms in single cultures of gut microbes offers new insights into a largely neglected part of the microbial proteome landscape. This is of particular importance as SEP provide various predicted functions, such as acting as antimicrobial peptides, maintaining cell homeostasis under stress conditions, or even contributing to the virulence pattern. They are, thus, taking a poorly understood role in structure and function of microbial networks in the human body. A better understanding of SEP in the context of human health requires a precise understanding of the abundance of SEP both in commensal microbes as well as pathogens. For the gut beneficial B. producta, we demonstrate the importance of specific environmental conditions for biosynthesis of SEP expanding previous findings about their role in microbial interactions.
Collapse
Affiliation(s)
- Jerome Genth
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Kathrin Schäfer
- Department of Infectious Diseases and Microbiology, University of Lübeck, Lübeck, Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Simon Graspeuntner
- Department of Infectious Diseases and Microbiology, University of Lübeck, Lübeck, Germany
- German Center for Infection Research (DZIF), Partner Site Hamburg-Lübeck-Borstel-Riems, Lübeck, Germany
| | - Jan Rupp
- Department of Infectious Diseases and Microbiology, University of Lübeck, Lübeck, Germany
- German Center for Infection Research (DZIF), Partner Site Hamburg-Lübeck-Borstel-Riems, Lübeck, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| |
Collapse
|
5
|
Spät P, Krauspe V, Hess WR, Maček B, Nalpas N. Deep Proteogenomics of a Photosynthetic Cyanobacterium. J Proteome Res 2023; 22:1969-1983. [PMID: 37146978 PMCID: PMC10243305 DOI: 10.1021/acs.jproteome.3c00065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Indexed: 05/07/2023]
Abstract
Cyanobacteria, the evolutionary ancestors of plant chloroplasts, contribute substantially to the Earth's biogeochemical cycles and are of great interest for a sustainable economy. Knowledge of protein expression is the key to understanding cyanobacterial metabolism; however, proteome studies in cyanobacteria are limited and cover only a fraction of the theoretical proteome. Here, we performed a comprehensive proteogenomic analysis of the model cyanobacterium Synechocystis sp. PCC 6803 to characterize the expressed (phospho)proteome, re-annotate known and discover novel open reading frames (ORFs). By mapping extensive shotgun mass spectrometry proteomics data onto a six-frame translation of the Synechocystis genome, we refined the genomic annotation of 64 ORFs, including eight completely novel ORFs. Our study presents the largest reported (phospho)proteome dataset for a unicellular cyanobacterium, covering the expression of about 80% of the theoretical proteome under various cultivation conditions, such as nitrogen or carbon limitation. We report 568 phosphorylated S/T/Y sites that are present on numerous regulatory proteins, including the transcriptional regulators cyAbrB1 and cyAbrB2. We also catalogue the proteins that have never been detected under laboratory conditions and found that a large portion of them is plasmid-encoded. This dataset will serve as a resource, providing dedicated information on growth condition-dependent protein expression and phosphorylation.
Collapse
Affiliation(s)
- Philipp Spät
- Quantitative
Proteomics, Interfaculty Institute of Cell Biology, University of Tuebingen, Auf der Morgenstelle 15, 72076 Tübingen, Germany
| | - Vanessa Krauspe
- Genetics
& Experimental Bioinformatics, Institute of Biology III, University of Freiburg, Schänzlestraße 1, 79104 Freiburg im Breisgau, Germany
| | - Wolfgang R. Hess
- Genetics
& Experimental Bioinformatics, Institute of Biology III, University of Freiburg, Schänzlestraße 1, 79104 Freiburg im Breisgau, Germany
| | - Boris Maček
- Quantitative
Proteomics, Interfaculty Institute of Cell Biology, University of Tuebingen, Auf der Morgenstelle 15, 72076 Tübingen, Germany
| | - Nicolas Nalpas
- Quantitative
Proteomics, Interfaculty Institute of Cell Biology, University of Tuebingen, Auf der Morgenstelle 15, 72076 Tübingen, Germany
| |
Collapse
|
6
|
Potgieter MG, Nel AJM, Fortuin S, Garnett S, Wendoh JM, Tabb DL, Mulder NJ, Blackburn JM. MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets. PLoS Comput Biol 2023; 19:e1011163. [PMID: 37327214 PMCID: PMC10310047 DOI: 10.1371/journal.pcbi.1011163] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 06/29/2023] [Accepted: 05/08/2023] [Indexed: 06/18/2023] Open
Abstract
BACKGROUND Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines. RESULTS We compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database-but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation. CONCLUSIONS By estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.
Collapse
Affiliation(s)
- Matthys G. Potgieter
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Andrew J. M. Nel
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Suereta Fortuin
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Shaun Garnett
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Jerome M. Wendoh
- Division of Immunology, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - David L. Tabb
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences; African Microbiome Institute; South African Tuberculosis Bioinformatics Initiative; Stellenbosch University, Cape Town, South Africa
| | - Nicola J. Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Institute of Infectious Disease & Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jonathan M. Blackburn
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Institute of Infectious Disease & Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
7
|
Cormican JA, Horokhovskyi Y, Soh WT, Mishto M, Liepe J. inSPIRE: An Open-Source Tool for Increased Mass Spectrometry Identification Rates Using Prosit Spectral Prediction. Mol Cell Proteomics 2022; 21:100432. [PMID: 36280141 PMCID: PMC9720494 DOI: 10.1016/j.mcpro.2022.100432] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 10/17/2022] [Accepted: 10/19/2022] [Indexed: 11/05/2022] Open
Abstract
Rescoring of mass spectrometry (MS) search results using spectral predictors can strongly increase peptide spectrum match (PSM) identification rates. This approach is particularly effective when aiming to search MS data against large databases, for example, when dealing with nonspecific cleavage in immunopeptidomics or inflation of the reference database for noncanonical peptide identification. Here, we present inSPIRE (in silico Spectral Predictor Informed REscoring), a flexible and performant open-source rescoring pipeline built on Prosit MS spectral prediction, which is compatible with common database search engines. inSPIRE allows large-scale rescoring with data from multiple MS search files, increases sensitivity to minor differences in amino acid residue position, and can be applied to various MS sample types, including tryptic proteome digestions and immunopeptidomes. inSPIRE boosts PSM identification rates in immunopeptidomics, leading to better performance than the original Prosit rescoring pipeline, as confirmed by benchmarking of inSPIRE performance on ground truth datasets. The integration of various features in the inSPIRE backbone further boosts the PSM identification in immunopeptidomics, with a potential benefit for the identification of noncanonical peptides.
Collapse
Affiliation(s)
- John A Cormican
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany
| | - Yehor Horokhovskyi
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany
| | - Wai Tuck Soh
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany
| | - Michele Mishto
- Centre for Inflammation Biology and Cancer Immunology (CIBCI) & Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom; The Francis Crick Institute, London, United Kingdom.
| | - Juliane Liepe
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany.
| |
Collapse
|
8
|
Aggarwal S, Raj A, Kumar D, Dash D, Yadav AK. False discovery rate: the Achilles' heel of proteogenomics. Brief Bioinform 2022; 23:6582880. [PMID: 35534181 DOI: 10.1093/bib/bbac163] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 03/14/2022] [Accepted: 04/12/2022] [Indexed: 12/25/2022] Open
Abstract
Proteogenomics refers to the integrated analysis of the genome and proteome that leverages mass-spectrometry (MS)-based proteomics data to improve genome annotations, understand gene expression control through proteoforms and find sequence variants to develop novel insights for disease classification and therapeutic strategies. However, proteogenomic studies often suffer from reduced sensitivity and specificity due to inflated database size. To control the error rates, proteogenomics depends on the target-decoy search strategy, the de-facto method for false discovery rate (FDR) estimation in proteomics. The proteogenomic databases constructed from three- or six-frame nucleotide database translation not only increase the search space and compute-time but also violate the equivalence of target and decoy databases. These searches result in poorer separation between target and decoy scores, leading to stringent FDR thresholds. Understanding these factors and applying modified strategies such as two-pass database search or peptide-class-specific FDR can result in a better interpretation of MS data without introducing additional statistical biases. Based on these considerations, a user can interpret the proteogenomics results appropriately and control false positives and negatives in a more informed manner. In this review, first, we briefly discuss the proteogenomic workflows and limitations in database construction, followed by various considerations that can influence potential novel discoveries in a proteogenomic study. We conclude with suggestions to counter these challenges for better proteogenomic data interpretation.
Collapse
Affiliation(s)
- Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd milestone, PO Box No. 04, Faridabad-Gurgaon Expressway, Faridabad-121001, Haryana, India
| | - Anurag Raj
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad-201002, India
| | - Dhirendra Kumar
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India
| | - Debasis Dash
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad-201002, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd milestone, PO Box No. 04, Faridabad-Gurgaon Expressway, Faridabad-121001, Haryana, India
| |
Collapse
|
9
|
IntroSpect: Motif-Guided Immunopeptidome Database Building Tool to Improve the Sensitivity of HLA I Binding Peptide Identification by Mass Spectrometry. Biomolecules 2022; 12:biom12040579. [PMID: 35454168 PMCID: PMC9025654 DOI: 10.3390/biom12040579] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 04/11/2022] [Accepted: 04/12/2022] [Indexed: 01/02/2023] Open
Abstract
Although database search tools originally developed for shotgun proteome have been widely used in immunopeptidomic mass spectrometry identifications, they have been reported to achieve undesirably low sensitivities or high false positive rates as a result of the hugely inflated search space caused by the lack of specific enzymic digestions in immunopeptidome. To overcome such a problem, we developed a motif-guided immunopeptidome database building tool named IntroSpect, which is designed to first learn the peptide motifs from high confidence hits in the initial search, and then build a targeted database for refined search. Evaluated on 18 representative HLA class I datasets, IntroSpect can improve the sensitivity by an average of 76%, compared to conventional searches with unspecific digestions, while maintaining a very high level of accuracy (~96%), as confirmed by synthetic validation experiments. A distinct advantage of IntroSpect is that it does not depend on any external HLA data, so that it performs equally well on both well-studied and poorly-studied HLA types, unlike the previously developed method SpectMHC. We have also designed IntroSpect to keep a global FDR that can be conveniently controlled, similar to a conventional database search. Finally, we demonstrate the practical value of IntroSpect by discovering neoepitopes from MS data directly, an important application in cancer immunotherapies. IntroSpect is freely available to download and use.
Collapse
|
10
|
Vreeke GJ, Lubbers W, Vincken JP, Wierenga PA. A method to identify and quantify the complete peptide composition in protein hydrolysates. Anal Chim Acta 2022; 1201:339616. [DOI: 10.1016/j.aca.2022.339616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 01/19/2022] [Accepted: 02/14/2022] [Indexed: 11/26/2022]
|
11
|
Ahrens CH, Wade JT, Champion MM, Langer JD. A Practical Guide to Small Protein Discovery and Characterization Using Mass Spectrometry. J Bacteriol 2022; 204:e0035321. [PMID: 34748388 PMCID: PMC8765459 DOI: 10.1128/jb.00353-21] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Small proteins of up to ∼50 amino acids are an abundant class of biomolecules across all domains of life. Yet due to the challenges inherent in their size, they are often missed in genome annotations, and are difficult to identify and characterize using standard experimental approaches. Consequently, we still know few small proteins even in well-studied prokaryotic model organisms. Mass spectrometry (MS) has great potential for the discovery, validation, and functional characterization of small proteins. However, standard MS approaches are poorly suited to the identification of both known and novel small proteins due to limitations at each step of a typical proteomics workflow, i.e., sample preparation, protease digestion, liquid chromatography, MS data acquisition, and data analysis. Here, we outline the major MS-based workflows and bioinformatic pipelines used for small protein discovery and validation. Special emphasis is placed on highlighting the adjustments required to improve detection and data quality for small proteins. We discuss both the unbiased detection of small proteins and the targeted analysis of small proteins of interest. Finally, we provide guidelines to prioritize novel small proteins, and an outlook on methods with particular potential to further improve comprehensive discovery and characterization of small proteins.
Collapse
Affiliation(s)
- Christian H. Ahrens
- Agroscope, Method Development and Analytics & SIB Swiss Institute of Bioinformatics, Wädenswil, Switzerland
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
- Department of Biomedical Sciences, School of Public Health, University at Albany, Albany, New York, USA
| | - Matthew M. Champion
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana, USA
| | - Julian D. Langer
- Mass Spectrometry and Proteomics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
- Proteomics, Max Planck Institute for Brain Research, Frankfurt am Main, Germany
| |
Collapse
|
12
|
Chen L, Yang Y, Zhang Y, Li K, Cai H, Wang H, Zhao Q. The Small Open Reading Frame-Encoded Peptides: Advances in Methodologies and Functional Studies. Chembiochem 2021; 23:e202100534. [PMID: 34862721 DOI: 10.1002/cbic.202100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/15/2021] [Indexed: 11/07/2022]
Abstract
Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.
Collapse
Affiliation(s)
- Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China.,Laboratory for Synthetic Chemistry and Chemical Biology Limited, Hong Kong Science and Technology Park, New Territories, Hong Kong SAR, 999077, P. R. China
| | - Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Kecheng Li
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510623, P. R. China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510623, P. R. China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| |
Collapse
|
13
|
Hayes AJ, Lewis JM, Davies MR, Scott NE. Burkholderia PglL enzymes are Serine preferring oligosaccharyltransferases which target conserved proteins across the Burkholderia genus. Commun Biol 2021; 4:1045. [PMID: 34493791 PMCID: PMC8423747 DOI: 10.1038/s42003-021-02588-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 08/23/2021] [Indexed: 12/14/2022] Open
Abstract
Glycosylation is increasingly recognised as a common protein modification within bacterial proteomes. While great strides have been made in identifying species that contain glycosylation systems, our understanding of the proteins and sites targeted by these systems is far more limited. Within this work we explore the conservation of glycoproteins and glycosylation sites across the pan-Burkholderia glycoproteome. Using a multi-protease glycoproteomic approach, we generate high-confidence glycoproteomes in two widely utilized B. cenocepacia strains, K56-2 and H111. This resource reveals glycosylation occurs exclusively at Serine residues and that glycoproteins/glycosylation sites are highly conserved across B. cenocepacia isolates. This preference for glycosylation at Serine residues is observed across at least 9 Burkholderia glycoproteomes, supporting that Serine is the dominant residue targeted by PglL-mediated glycosylation across the Burkholderia genus. Combined, this work demonstrates that PglL enzymes of the Burkholderia genus are Serine-preferring oligosaccharyltransferases that target conserved and shared protein substrates. Hayes et al provide a glycosylation site focused analysis of the glycoproteome of two widely utilized B. cenocepacia strains, K56-2 and H111. This team demonstrates that within these glycoproteomes Serine is the sole residue targeted for protein glycosylation and that glycoproteins/glycosylation sites are highly conserved across B. cenocepacia isolates.
Collapse
Affiliation(s)
- Andrew J Hayes
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Jessica M Lewis
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Mark R Davies
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Nichollas E Scott
- Department of Microbiology and Immunology, University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia.
| |
Collapse
|
14
|
Cassidy L, Kaulich PT, Maaß S, Bartel J, Becher D, Tholey A. Bottom-up and top-down proteomic approaches for the identification, characterization, and quantification of the low molecular weight proteome with focus on short open reading frame-encoded peptides. Proteomics 2021; 21:e2100008. [PMID: 34145981 DOI: 10.1002/pmic.202100008] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 01/14/2023]
Abstract
The recent discovery of alternative open reading frames creates a need for suitable analytical approaches to verify their translation and to characterize the corresponding gene products at the molecular level. As the analysis of small proteins within a background proteome by means of classical bottom-up proteomics is challenging, method development for the analysis of small open reading frame encoded peptides (SEPs) have become a focal point for research. Here, we highlight bottom-up and top-down proteomics approaches established for the analysis of SEPs in both pro- and eukaryotes. Major steps of analysis, including sample preparation and (small) proteome isolation, separation and mass spectrometry, data interpretation and quality control, quantification, the analysis of post-translational modifications, and exploration of functional aspects of the SEPs by means of proteomics technologies are described. These methods do not exclusively cover the analytics of SEPs but simultaneously include the low molecular weight proteome, and moreover, can also be used for the proteome-wide analysis of proteolytic processing events.
Collapse
Affiliation(s)
- Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Philipp T Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| |
Collapse
|
15
|
Razban RM, Dasmeh P, Serohijos AWR, Shakhnovich EI. Avoidance of protein unfolding constrains protein stability in long-term evolution. Biophys J 2021; 120:2413-2424. [PMID: 33932438 PMCID: PMC8390877 DOI: 10.1016/j.bpj.2021.03.042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 02/24/2021] [Accepted: 03/17/2021] [Indexed: 11/28/2022] Open
Abstract
Every amino acid residue can influence a protein's overall stability, making stability highly susceptible to change throughout evolution. We consider the distribution of protein stabilities evolutionarily permittable under two previously reported protein fitness functions: flux dynamics and misfolding avoidance. We develop an evolutionary dynamics theory and find that it agrees better with an extensive protein stability data set for dihydrofolate reductase orthologs under the misfolding avoidance fitness function rather than the flux dynamics fitness function. Further investigation with ribonuclease H data demonstrates that not any misfolded state is avoided; rather, it is only the unfolded state. At the end, we discuss how our work pertains to the universal protein abundance-evolutionary rate correlation seen across organisms' proteomes. We derive a closed-form expression relating protein abundance to evolutionary rate that captures Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens experimental trends without fitted parameters.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - Pouria Dasmeh
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts; Departement de Biochimie, Université de Montréal, Montreal, Quebec, Canada
| | | | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts.
| |
Collapse
|
16
|
A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations. BMC Bioinformatics 2021; 22:277. [PMID: 34039272 PMCID: PMC8157683 DOI: 10.1186/s12859-021-04159-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open
Abstract
Background Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small proteins (sProteins), therefore requires additional evidence. Proteogenomics is considered the gold standard for this purpose. It extends beyond established annotations and includes all possible open reading frames (ORFs) as potential sources of peptides, thus allowing the discovery of novel, unannotated proteins. Typically this results in large numbers of putative novel small proteins fraught with large fractions of false-positive predictions. Results We observe that number and quality of the peptide-spectrum matches (PSMs) that map to a candidate ORF can be highly informative for the purpose of distinguishing proteins from spurious ORF annotations. We report here on a workflow that aggregates PSM quality information and local context into simple descriptors and reliably separates likely proteins from the large pool of false-positive, i.e., most likely untranslated ORFs. We investigated the artificial gut microbiome model SIHUMIx, comprising eight different species, for which we validate 5114 proteins that have previously been annotated only as hypothetical ORFs. In addition, we identified 37 non-annotated protein candidates for which we found evidence at the proteomic and transcriptomic level. Half (19) of these candidates have close functional homologs in other species. Another 12 candidates have homologs designated as hypothetical proteins in other species. The remaining six candidates are short (< 100 AA) and are most likely bona fide novel proteins. Conclusions The aggregation of PSM quality information for predicted ORFs provides a robust and efficient method to identify novel proteins in proteomics data. The workflow is in particular capable of identifying small proteins and frameshift variants. Since PSMs are explicitly mapped to genomic locations, it furthermore facilitates the integration of transcriptomics data and other sources of genome-level information. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04159-8.
Collapse
|
17
|
Fauvet B, Finka A, Castanié-Cornet MP, Cirinesi AM, Genevaux P, Quadroni M, Goloubinoff P. Bacterial Hsp90 Facilitates the Degradation of Aggregation-Prone Hsp70-Hsp40 Substrates. Front Mol Biosci 2021; 8:653073. [PMID: 33937334 PMCID: PMC8082187 DOI: 10.3389/fmolb.2021.653073] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 03/17/2021] [Indexed: 01/27/2023] Open
Abstract
In eukaryotes, the 90-kDa heat shock proteins (Hsp90s) are profusely studied chaperones that, together with 70-kDa heat shock proteins (Hsp70s), control protein homeostasis. In bacteria, however, the function of Hsp90 (HtpG) and its collaboration with Hsp70 (DnaK) remains poorly characterized. To uncover physiological processes that depend on HtpG and DnaK, we performed comparative quantitative proteomic analyses of insoluble and total protein fractions from unstressed wild-type (WT) Escherichia coli and from knockout mutants ΔdnaKdnaJ (ΔKJ), ΔhtpG (ΔG), and ΔdnaKdnaJΔhtpG (ΔKJG). Whereas the ΔG mutant showed no detectable proteomic differences with wild-type, ΔKJ expressed more chaperones, proteases and ribosomes and expressed dramatically less metabolic and respiratory enzymes. Unexpectedly, we found that the triple mutant ΔKJG showed higher levels of metabolic and respiratory enzymes than ΔKJ, suggesting that bacterial Hsp90 mediates the degradation of aggregation-prone Hsp70-Hsp40 substrates. Further in vivo experiments suggest that such Hsp90-mediated degradation possibly occurs through the HslUV protease.
Collapse
Affiliation(s)
- Bruno Fauvet
- Department of Plant Molecular Biology (DBMV), University of Lausanne, Lausanne, Switzerland
| | - Andrija Finka
- Department of Ecology, Agronomy and Aquaculture, University of Zadar, Zadar, Croatia
| | - Marie-Pierre Castanié-Cornet
- Laboratoire de Microbiologie et de Génétique Moléculaires, Center de Biologie Intégrative, CNRS, Université de Toulouse, Toulouse, France
| | - Anne-Marie Cirinesi
- Laboratoire de Microbiologie et de Génétique Moléculaires, Center de Biologie Intégrative, CNRS, Université de Toulouse, Toulouse, France
| | - Pierre Genevaux
- Laboratoire de Microbiologie et de Génétique Moléculaires, Center de Biologie Intégrative, CNRS, Université de Toulouse, Toulouse, France
| | - Manfredo Quadroni
- Protein Analysis Facility, University of Lausanne, Lausanne, Switzerland
| | - Pierre Goloubinoff
- Department of Plant Molecular Biology (DBMV), University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
18
|
Verbruggen S, Gessulat S, Gabriels R, Matsaroki A, Van de Voorde H, Kuster B, Degroeve S, Martens L, Van Criekinge W, Wilhelm M, Menschaert G. Spectral Prediction Features as a Solution for the Search Space Size Problem in Proteogenomics. Mol Cell Proteomics 2021; 20:100076. [PMID: 33823297 PMCID: PMC8214147 DOI: 10.1016/j.mcpro.2021.100076] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 03/04/2021] [Accepted: 03/25/2021] [Indexed: 11/17/2022] Open
Abstract
Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting. First proteogenomics with PSM rescoring using machine learning–predicted spectra Demonstrated on both ribosome profiling and nanopore RNA-Seq–derived databases Rescoring leads to elevated stringency and increased identification rates Rescoring compensates for the search space size issues in proteogenomics
Collapse
Affiliation(s)
- Steven Verbruggen
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Ralf Gabriels
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | | | | | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Sven Degroeve
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Lennart Martens
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Wim Van Criekinge
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Gerben Menschaert
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium.
| |
Collapse
|
19
|
Carter CW. Simultaneous codon usage, the origin of the proteome, and the emergence of de-novo proteins. Curr Opin Struct Biol 2021; 68:142-148. [PMID: 33529785 DOI: 10.1016/j.sbi.2021.01.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 01/05/2021] [Indexed: 12/21/2022]
Abstract
Genetic coding generally uses only one of a gene's two strands; its complement serving as template for replication. Aminoacyl-tRNA synthetases, aaRS, apparently first emerged as pairs on bidirectional genes, in which anticodons in the template strand served as codons for an entirely different protein. Interpreting both strands in frame constrained such genes sufficiently that it was rapidly superseded, leaving only traces in the elevated pairing between codon middle bases in antiparallel alignments. Codon assignments actually promote using information from both strands in multiple reading frames. Related phenomena, known as overprinting, are widely associated with viruses. In-frame bidirectional coding and overprinting nevertheless imply different structural and functional relationships, and different roles in generating folded proteins throughout the evolution of the proteome.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry, Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, United States.
| |
Collapse
|
20
|
McCausland JW, Yang X, Squyres GR, Lyu Z, Bruce KE, Lamanna MM, Söderström B, Garner EC, Winkler ME, Xiao J, Liu J. Treadmilling FtsZ polymers drive the directional movement of sPG-synthesis enzymes via a Brownian ratchet mechanism. Nat Commun 2021; 12:609. [PMID: 33504807 PMCID: PMC7840769 DOI: 10.1038/s41467-020-20873-y] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 12/15/2020] [Indexed: 01/30/2023] Open
Abstract
The FtsZ protein is a central component of the bacterial cell division machinery. It polymerizes at mid-cell and recruits more than 30 proteins to assemble into a macromolecular complex to direct cell wall constriction. FtsZ polymers exhibit treadmilling dynamics, driving the processive movement of enzymes that synthesize septal peptidoglycan (sPG). Here, we combine theoretical modelling with single-molecule imaging of live bacterial cells to show that FtsZ's treadmilling drives the directional movement of sPG enzymes via a Brownian ratchet mechanism. The processivity of the directional movement depends on the binding potential between FtsZ and the sPG enzyme, and on a balance between the enzyme's diffusion and FtsZ's treadmilling speed. We propose that this interplay may provide a mechanism to control the spatiotemporal distribution of active sPG enzymes, explaining the distinct roles of FtsZ treadmilling in modulating cell wall constriction rate observed in different bacteria.
Collapse
Affiliation(s)
- Joshua W McCausland
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Xinxing Yang
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Georgia R Squyres
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Zhixin Lyu
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Kevin E Bruce
- Department of Biology, Indiana University Bloomington, Bloomington, IN, 47405, USA
| | - Melissa M Lamanna
- Department of Biology, Indiana University Bloomington, Bloomington, IN, 47405, USA
| | - Bill Söderström
- The ithree Institute, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - Ethan C Garner
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Malcolm E Winkler
- Department of Biology, Indiana University Bloomington, Bloomington, IN, 47405, USA
| | - Jie Xiao
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA.
| | - Jian Liu
- Department of Cell Biology, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA.
| |
Collapse
|
21
|
Fauvet B, Finka A, Castanié-Cornet MP, Cirinesi AM, Genevaux P, Quadroni M, Goloubinoff P. Bacterial Hsp90 Facilitates the Degradation of Aggregation-Prone Hsp70-Hsp40 Substrates. Front Mol Biosci 2021. [PMID: 33937334 DOI: 10.1101/451989] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023] Open
Abstract
In eukaryotes, the 90-kDa heat shock proteins (Hsp90s) are profusely studied chaperones that, together with 70-kDa heat shock proteins (Hsp70s), control protein homeostasis. In bacteria, however, the function of Hsp90 (HtpG) and its collaboration with Hsp70 (DnaK) remains poorly characterized. To uncover physiological processes that depend on HtpG and DnaK, we performed comparative quantitative proteomic analyses of insoluble and total protein fractions from unstressed wild-type (WT) Escherichia coli and from knockout mutants ΔdnaKdnaJ (ΔKJ), ΔhtpG (ΔG), and ΔdnaKdnaJΔhtpG (ΔKJG). Whereas the ΔG mutant showed no detectable proteomic differences with wild-type, ΔKJ expressed more chaperones, proteases and ribosomes and expressed dramatically less metabolic and respiratory enzymes. Unexpectedly, we found that the triple mutant ΔKJG showed higher levels of metabolic and respiratory enzymes than ΔKJ, suggesting that bacterial Hsp90 mediates the degradation of aggregation-prone Hsp70-Hsp40 substrates. Further in vivo experiments suggest that such Hsp90-mediated degradation possibly occurs through the HslUV protease.
Collapse
Affiliation(s)
- Bruno Fauvet
- Department of Plant Molecular Biology (DBMV), University of Lausanne, Lausanne, Switzerland
| | - Andrija Finka
- Department of Ecology, Agronomy and Aquaculture, University of Zadar, Zadar, Croatia
| | - Marie-Pierre Castanié-Cornet
- Laboratoire de Microbiologie et de Génétique Moléculaires, Center de Biologie Intégrative, CNRS, Université de Toulouse, Toulouse, France
| | - Anne-Marie Cirinesi
- Laboratoire de Microbiologie et de Génétique Moléculaires, Center de Biologie Intégrative, CNRS, Université de Toulouse, Toulouse, France
| | - Pierre Genevaux
- Laboratoire de Microbiologie et de Génétique Moléculaires, Center de Biologie Intégrative, CNRS, Université de Toulouse, Toulouse, France
| | - Manfredo Quadroni
- Protein Analysis Facility, University of Lausanne, Lausanne, Switzerland
| | - Pierre Goloubinoff
- Department of Plant Molecular Biology (DBMV), University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
22
|
Midha MK, Kusebauch U, Shteynberg D, Kapil C, Bader SL, Reddy PJ, Campbell DS, Baliga NS, Moritz RL. A comprehensive spectral assay library to quantify the Escherichia coli proteome by DIA/SWATH-MS. Sci Data 2020; 7:389. [PMID: 33184295 PMCID: PMC7665006 DOI: 10.1038/s41597-020-00724-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 10/05/2020] [Indexed: 02/06/2023] Open
Abstract
Data-Independent Acquisition (DIA) is a method to improve consistent identification and precise quantitation of peptides and proteins by mass spectrometry (MS). The targeted data analysis strategy in DIA relies on spectral assay libraries that are generally derived from a priori measurements of peptides for each species. Although Escherichia coli (E. coli) is among the best studied model organisms, so far there is no spectral assay library for the bacterium publicly available. Here, we generated a spectral assay library for 4,014 of the 4,389 annotated E. coli proteins using one- and two-dimensional fractionated samples, and ion mobility separation enabling deep proteome coverage. We demonstrate the utility of this high-quality library with robustness in quantitation of the E. coli proteome and with rapid-chromatography to enhance throughput by targeted DIA-MS. The spectral assay library supports the detection and quantification of 91.5% of all E. coli proteins at high-confidence with 56,182 proteotypic peptides, making it a valuable resource for the scientific community. Data and spectral libraries are available via ProteomeXchange (PXD020761, PXD020785) and SWATHAtlas (SAL00222-28).
Collapse
Affiliation(s)
- Mukul K Midha
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA
| | - Ulrike Kusebauch
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA
| | - David Shteynberg
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA
| | - Charu Kapil
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA
| | - Samuel L Bader
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA
| | | | - David S Campbell
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA
| | - Nitin S Baliga
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA
- Departments of Biology and Microbiology, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
- Lawrence Berkeley National Lab, Berkeley, CA, USA
| | - Robert L Moritz
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA.
| |
Collapse
|
23
|
Bartel J, Varadarajan AR, Sura T, Ahrens CH, Maaß S, Becher D. Optimized Proteomics Workflow for the Detection of Small Proteins. J Proteome Res 2020; 19:4004-4018. [DOI: 10.1021/acs.jproteome.0c00286] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Adithi R. Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Thomas Sura
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Christian H. Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics & Bioinformatics and SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, D-17489 Greifswald, Germany
| |
Collapse
|
24
|
Guo Q, Li D, Zhai Y, Gu Z. CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms. ACS OMEGA 2020; 5:15370-15384. [PMID: 32637811 PMCID: PMC7331046 DOI: 10.1021/acsomega.0c01278] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 06/09/2020] [Indexed: 06/11/2023]
Abstract
Protein reference databases are a critical part of producing efficient proteomic analyses. However, the method for constructing clean, efficient, and comprehensive protein reference databases of nonmodel organisms is lacking. Existing methods either do not have contamination control procedures, or these methods rely on a three-frame and/or six-frame translation that sharply increases the search space and the need for computational resources. Herein, we propose a framework for constructing a customized comprehensive proteomic reference database (CCPRD) from draft genomes and deep sequencing transcriptomes. Its effectiveness is demonstrated by incorporating the proteomes of nematocysts from endoparasitic cnidarian: myxozoans. By applying customized contamination removal procedures, contaminations in omic data were successfully identified and removed. This is an effective method that does not result in overdecontamination. This can be shown by comparing the CCPRD MS results with an artificially contaminated database and another database with removed contaminations in genomes and transcriptomes added back. CCPRD outperformed traditional frame-based methods by identifying 35.2-50.7% more peptides and 35.8-43.8% more proteins, with a maximum of 84.6% in size reduction. A BUSCO analysis showed that the CCPRD maintained a relatively high level of completeness compared to traditional methods. These results confirm the superiority of the CCPRD over existing methods in peptide and protein identification numbers, database size, and completeness. By providing a general framework for generating the reference database, the CCPRD, which does not need a high-quality genome, can potentially be applied to nonmodel organisms and significantly contribute to proteomic research.
Collapse
Affiliation(s)
- Qingxiang Guo
- Department of Aquatic
Animal Medicine, College of Fisheries, Huazhong
Agricultural University, Wuhan, Hubei Province 430070, PR China
- Hubei Engineering Technology Research
Center for Aquatic Animal Diseases Control and Prevention, Wuhan 430070, PR China
| | - Dan Li
- Department of Aquatic
Animal Medicine, College of Fisheries, Huazhong
Agricultural University, Wuhan, Hubei Province 430070, PR China
- Hubei Engineering Technology Research
Center for Aquatic Animal Diseases Control and Prevention, Wuhan 430070, PR China
| | - Yanhua Zhai
- Department of Aquatic
Animal Medicine, College of Fisheries, Huazhong
Agricultural University, Wuhan, Hubei Province 430070, PR China
- Hubei Engineering Technology Research
Center for Aquatic Animal Diseases Control and Prevention, Wuhan 430070, PR China
| | - Zemao Gu
- Department of Aquatic
Animal Medicine, College of Fisheries, Huazhong
Agricultural University, Wuhan, Hubei Province 430070, PR China
- Hubei Engineering Technology Research
Center for Aquatic Animal Diseases Control and Prevention, Wuhan 430070, PR China
| |
Collapse
|
25
|
Weldatsadik R, Datta N, Kolmeder C, Vuopio J, Kere J, Wilkman S, Flatt J, Vuento R, Haapasalo K, Keskitalo S, Varjosalo M, Jokiranta T. Pool-seq driven proteogenomic database for Group G Streptococcus. J Proteomics 2019; 201:84-92. [DOI: 10.1016/j.jprot.2019.04.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/29/2019] [Accepted: 04/17/2019] [Indexed: 02/07/2023]
|
26
|
Peritore-Galve FC, Schneider DJ, Yang Y, Thannhauser TW, Smart CD, Stodghill P. Proteome Profile and Genome Refinement of the Tomato-Pathogenic Bacterium Clavibacter michiganensis subsp. michiganensis. Proteomics 2019; 19:e1800224. [PMID: 30648817 DOI: 10.1002/pmic.201800224] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 11/29/2018] [Indexed: 11/07/2022]
Affiliation(s)
- F Christopher Peritore-Galve
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Geneva, NY, 14456, USA
| | - David J Schneider
- Global Institute for Food Security, University of Saskatchewan, Saskatoon, SK, S7N 4J8, Canada
| | - Yong Yang
- United States Department of Agriculture (USDA), Agricultural Research Service, Robert W. Holley Center, Ithaca, NY, 14853, USA
| | - Theodore W Thannhauser
- United States Department of Agriculture (USDA), Agricultural Research Service, Robert W. Holley Center, Ithaca, NY, 14853, USA
| | - Christine D Smart
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Geneva, NY, 14456, USA
| | - Paul Stodghill
- United States Department of Agriculture (USDA), Agricultural Research Service, Robert W. Holley Center, Ithaca, NY, 14853, USA
| |
Collapse
|
27
|
Ren Z, Qi D, Pugh N, Li K, Wen B, Zhou R, Xu S, Liu S, Jones AR. Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets. Mol Cell Proteomics 2019; 18:86-98. [PMID: 30293062 PMCID: PMC6317475 DOI: 10.1074/mcp.ra118.000832] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 08/31/2018] [Indexed: 01/22/2023] Open
Abstract
Rice (Oryza sativa) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants, and common contaminants in which to search for protein-level evidence. We re-analyzed nine publicly accessible rice proteomics data sets. In total, we identified 420K peptide spectrum matches from 47K peptides and 8,187 protein groups. 4168 peptides were initially classed as putative novel peptides (not matching official genes). Following a strict filtration scheme to rule out other possible explanations, we discovered 1,584 high confidence novel peptides. The novel peptides were clustered into 692 genomic loci where our results suggest annotation improvements. 80% of the novel peptides had an ortholog match in the curated protein sequence set from at least one other plant species. For the peptides clustering in intergenic regions (and thus potentially new genes), 101 loci were identified, for which 43 had a high-confidence hit for a protein domain. Our results can be displayed as tracks on the Ensembl genome or other browsers supporting Track Hubs, to support re-annotation of the rice genome.
Collapse
Affiliation(s)
- Zhe Ren
- From the ‡BGI-Shenzhen, Shenzhen 518083, China
| | - Da Qi
- From the ‡BGI-Shenzhen, Shenzhen 518083, China
| | - Nina Pugh
- §Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Kai Li
- From the ‡BGI-Shenzhen, Shenzhen 518083, China
| | - Bo Wen
- ‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030;; ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030
| | - Ruo Zhou
- From the ‡BGI-Shenzhen, Shenzhen 518083, China
| | - Shaohang Xu
- From the ‡BGI-Shenzhen, Shenzhen 518083, China
| | - Siqi Liu
- From the ‡BGI-Shenzhen, Shenzhen 518083, China;.
| | - Andrew R Jones
- §Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK;.
| |
Collapse
|
28
|
In-depth analysis of Bacillus subtilis proteome identifies new ORFs and traces the evolutionary history of modified proteins. Sci Rep 2018; 8:17246. [PMID: 30467398 PMCID: PMC6250715 DOI: 10.1038/s41598-018-35589-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 11/07/2018] [Indexed: 01/05/2023] Open
Abstract
Bacillus subtilis is a sporulating Gram-positive bacterium widely used in basic research and biotechnology. Despite being one of the best-characterized bacterial model organism, recent proteomics studies identified only about 50% of its theoretical protein count. Here we combined several hundred MS measurements to obtain a comprehensive map of the proteome, phosphoproteome and acetylome of B. subtilis grown at 37 °C in minimal medium. We covered 75% of the theoretical proteome (3,159 proteins), detected 1,085 phosphorylation and 4,893 lysine acetylation sites and performed a systematic bioinformatic characterization of the obtained data. A subset of analyzed MS files allowed us to reconstruct a network of Hanks-type protein kinases, Ser/Thr/Tyr phosphatases and their substrates. We applied genomic phylostratigraphy to gauge the evolutionary age of B. subtilis protein classes and revealed that protein modifications were present on the oldest bacterial proteins. Finally, we performed a proteogenomic analysis by mapping all MS spectra onto a six-frame translation of B. subtilis genome and found evidence for 19 novel ORFs. We provide the most extensive overview of the proteome and post-translational modifications for B. subtilis to date, with insights into functional annotation and evolutionary aspects of the B. subtilis genome.
Collapse
|
29
|
Ndah E, Jonckheere V, Giess A, Valen E, Menschaert G, Van Damme P. REPARATION: ribosome profiling assisted (re-)annotation of bacterial genomes. Nucleic Acids Res 2017; 45:e168. [PMID: 28977509 PMCID: PMC5714196 DOI: 10.1093/nar/gkx758] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2017] [Accepted: 08/17/2017] [Indexed: 12/13/2022] Open
Abstract
Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames.
Collapse
Affiliation(s)
- Elvis Ndah
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium.,Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, Belgium
| | - Veronique Jonckheere
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Adam Giess
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5020, Norway
| | - Eivind Valen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5020, Norway.,Sars International Centre for Marine Molecular Biology, University of Bergen, 5008 Bergen, Norway
| | - Gerben Menschaert
- Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, B-9000 Ghent, Belgium
| | - Petra Van Damme
- VIB-UGent Center for Medical Biotechnology, B-9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| |
Collapse
|
30
|
Omasits U, Varadarajan AR, Schmid M, Goetze S, Melidis D, Bourqui M, Nikolayeva O, Québatte M, Patrignani A, Dehio C, Frey JE, Robinson MD, Wollscheid B, Ahrens CH. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 2017; 27:2083-2095. [PMID: 29141959 PMCID: PMC5741054 DOI: 10.1101/gr.218255.116] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 10/25/2017] [Indexed: 12/18/2022]
Abstract
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.
Collapse
Affiliation(s)
- Ulrich Omasits
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Adithi R Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland.,Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Michael Schmid
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Goetze
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Damianos Melidis
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Marc Bourqui
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Olga Nikolayeva
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | | | - Andrea Patrignani
- Functional Genomics Center Zurich, ETH & UZH Zurich, CH-8057 Zurich, Switzerland
| | | | - Juerg E Frey
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Mark D Robinson
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | - Bernd Wollscheid
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Christian H Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| |
Collapse
|
31
|
Heunis T, Dippenaar A, Warren RM, van Helden PD, van der Merwe RG, Gey van Pittius NC, Pain A, Sampson SL, Tabb DL. Proteogenomic Investigation of Strain Variation in Clinical Mycobacterium tuberculosis Isolates. J Proteome Res 2017; 16:3841-3851. [PMID: 28820946 DOI: 10.1021/acs.jproteome.7b00483] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Mycobacterium tuberculosis consists of a large number of different strains that display unique virulence characteristics. Whole-genome sequencing has revealed substantial genetic diversity among clinical M. tuberculosis isolates, and elucidating the phenotypic variation encoded by this genetic diversity will be of the utmost importance to fully understand M. tuberculosis biology and pathogenicity. In this study, we integrated whole-genome sequencing and mass spectrometry (GeLC-MS/MS) to reveal strain-specific characteristics in the proteomes of two clinical M. tuberculosis Latin American-Mediterranean isolates. Using this approach, we identified 59 peptides containing single amino acid variants, which covered ∼9% of all coding nonsynonymous single nucleotide variants detected by whole-genome sequencing. Furthermore, we identified 29 distinct peptides that mapped to a hypothetical protein not present in the M. tuberculosis H37Rv reference proteome. Here, we provide evidence for the expression of this protein in the clinical M. tuberculosis SAWC3651 isolate. The strain-specific databases enabled confirmation of genomic differences (i.e., large genomic regions of difference and nonsynonymous single nucleotide variants) in these two clinical M. tuberculosis isolates and allowed strain differentiation at the proteome level. Our results contribute to the growing field of clinical microbial proteogenomics and can improve our understanding of phenotypic variation in clinical M. tuberculosis isolates.
Collapse
Affiliation(s)
- Tiaan Heunis
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Anzaan Dippenaar
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Robin M Warren
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Paul D van Helden
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Ruben G van der Merwe
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Nicolaas C Gey van Pittius
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - Arnab Pain
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology , Thuwal 23955, Saudi Arabia
| | - Samantha L Sampson
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| | - David L Tabb
- DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, SAMRC Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town 7505, South Africa
| |
Collapse
|
32
|
Li H, Park J, Kim H, Hwang KB, Paek E. Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments. J Proteome Res 2017; 16:2231-2239. [PMID: 28452485 DOI: 10.1021/acs.jproteome.7b00033] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteogenomic searches are useful for novel peptide identification from tandem mass spectra. Usually, separate and multistage approaches are adopted to accurately control the false discovery rate (FDR) for proteogenomic search. Their performance on novel peptide identification has not been thoroughly evaluated, however, mainly due to the difficulty in confirming the existence of identified novel peptides. We simulated a proteogenomic search using a controlled, spike-in proteomic data set. After confirming that the results of the simulated proteogenomic search were similar to those of a real proteogenomic search using a human cell line data set, we evaluated the performance of six FDR control methods-global, separate, and multistage FDR estimation, respectively, coupled to a target-decoy search and a mixture model-based method-on novel peptide identification. The multistage approach showed the highest accuracy for FDR estimation. However, global and separate FDR estimation with the mixture model-based method showed higher sensitivities than others at the same true FDR. Furthermore, the mixture model-based method performed equally well when applied without or with a reduced set of decoy sequences. Considering different prior probabilities for novel and known protein identification, we recommend using mixture model-based methods with separate FDR estimation for sensitive and reliable identification of novel peptides from proteogenomic searches.
Collapse
Affiliation(s)
- Honglan Li
- School of Computer Science and Engineering, Soongsil University , Seoul 06978, Republic of Korea
| | - Jonghun Park
- Department of Computer Science, Hanyang University , Seoul 04763, Republic of Korea
| | - Hyunwoo Kim
- Scientific Data Research Center, Korea Institute of Science and Technology Information , Daejeon 34141, Republic of Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University , Seoul 06978, Republic of Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University , Seoul 04763, Republic of Korea
| |
Collapse
|
33
|
Ruggles KV, Krug K, Wang X, Clauser KR, Wang J, Payne SH, Fenyö D, Zhang B, Mani DR. Methods, Tools and Current Perspectives in Proteogenomics. Mol Cell Proteomics 2017; 16:959-981. [PMID: 28456751 DOI: 10.1074/mcp.mr117.000024] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Indexed: 12/20/2022] Open
Abstract
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications.
Collapse
Affiliation(s)
- Kelly V Ruggles
- From the ‡Department of Medicine, New York University School of Medicine, New York, New York 10016
| | - Karsten Krug
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Xiaojing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Karl R Clauser
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Jing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Samuel H Payne
- **Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354
| | - David Fenyö
- ‡‡Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, New York 10016; .,§§Institute for Systems Genetics, New York University School of Medicine, New York, New York 10016
| | - Bing Zhang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030; .,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - D R Mani
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142;
| |
Collapse
|
34
|
Takai K. Translational resistivity/conductivity of coding sequences during exponential growth of Escherichia coli. J Theor Biol 2017; 413:66-71. [PMID: 27876621 DOI: 10.1016/j.jtbi.2016.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Revised: 11/11/2016] [Accepted: 11/17/2016] [Indexed: 11/28/2022]
Abstract
Codon adaptation index (CAI) has been widely used for prediction of expression of recombinant genes in Escherichia coli and other organisms. However, CAI has no mechanistic basis that rationalizes its application to estimation of translational efficiency. Here, I propose a model based on which we could consider how codon usage is related to the level of expression during exponential growth of bacteria. In this model, translation of a gene is considered as an analog of electric current, and an analog of electric resistance corresponding to each gene is considered. "Translational resistance" is dependent on the steady-state concentration and the sequence of the mRNA species, and "translational resistivity" is dependent only on the mRNA sequence. The latter is the sum of two parts: one is the resistivity for the elongation reaction (coding sequence resistivity), and the other comes from all of the other steps of the decoding reaction. This electric circuit model clearly shows that some conditions should be met for codon composition of a coding sequence to correlate well with its expression level. On the other hand, I calculated relative frequency of each of the 61 sense codon triplets translated during exponential growth of E. coli from a proteomic dataset covering over 2600 proteins. A tentative method for estimating relative coding sequence resistivity based on the data is presented.
Collapse
Affiliation(s)
- Kazuyuki Takai
- Department of Materials Sciences and Biotechnology, Graduate School of Science and Engineering, Ehime University, 3 Bunkyo-cho, Matsuyama, Ehime 790-8577, Japan.
| |
Collapse
|
35
|
|
36
|
Li H, Joh YS, Kim H, Paek E, Lee SW, Hwang KB. Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification. BMC Genomics 2016; 17:1031. [PMID: 28155652 PMCID: PMC5259817 DOI: 10.1186/s12864-016-3327-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. Results To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. Conclusions We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3327-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Honglan Li
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Republic of Korea
| | - Yoon Sung Joh
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| | - Hyunwoo Kim
- Scientific Data Research Center, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| | - Sang-Won Lee
- Department of Chemistry, Research Institute for Natural Sciences, Korea University, Seoul, 02841, Republic of Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Republic of Korea.
| |
Collapse
|
37
|
Pettersen VK, Mosevoll KA, Lindemann PC, Wiker HG. Coordination of Metabolism and Virulence Factors Expression of Extraintestinal Pathogenic Escherichia coli Purified from Blood Cultures of Patients with Sepsis. Mol Cell Proteomics 2016; 15:2890-907. [PMID: 27364158 PMCID: PMC5013306 DOI: 10.1074/mcp.m116.060582] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Indexed: 02/06/2023] Open
Abstract
One of the trademarks of extraintestinal pathogenic Escherichia coli is adaptation of metabolism and basic physiology to diverse host sites. However, little is known how this common human pathogen adapts to permit survival and growth in blood. We used label-free quantitative proteomics to characterize five E. coli strains purified from clinical blood cultures associated with sepsis and urinary tract infections. Further comparison of proteome profiles of the clinical strains and a reference uropathogenic E. coli strain 536 cultivated in blood culture and on two different solid media distinguished cellular features altered in response to the pathogenically relevant condition. The analysis covered nearly 60% of the strains predicted proteomes, and included quantitative description based on label-free intensity scores for 90% of the detected proteins. Statistical comparison of anaerobic and aerobic blood cultures revealed 32 differentially expressed proteins (1.5% of the shared proteins), mostly associated with acquisition and utilization of metal ions critical for anaerobic or aerobic respiration. Analysis of variance identified significantly altered amounts of 47 proteins shared by the strains (2.7%), including proteins involved in vitamin B6 metabolism and virulence. Although the proteomes derived from blood cultures were fairly similar for the investigated strains, quantitative proteomic comparison to the growth on solid media identified 200 proteins with substantially changed levels (11% of the shared proteins). Blood culture was characterized by up-regulation of anaerobic fermentative metabolism and multiple virulence traits, including cell motility and iron acquisition. In a response to the growth on solid media there were increased levels of proteins functional in aerobic respiration, catabolism of medium-specific carbon sources and protection against oxidative and osmotic stresses. These results demonstrate on the expressed proteome level that expression of extraintestinal virulence factors and overall cellular metabolism closely reflects specific growth conditions. Data are available via ProteomeXchange with identifier PXD002912.
Collapse
Affiliation(s)
- Veronika Kuchařová Pettersen
- From the ‡The Gade Research Group for Infection and Immunity, Department of Clinical Science, University of Bergen, N-5021 Bergen, Norway;
| | | | - Paul Christoffer Lindemann
- From the ‡The Gade Research Group for Infection and Immunity, Department of Clinical Science, University of Bergen, N-5021 Bergen, Norway; ¶Department of Microbiology; Haukeland University Hospital, N-5021 Bergen, Norway
| | - Harald G Wiker
- From the ‡The Gade Research Group for Infection and Immunity, Department of Clinical Science, University of Bergen, N-5021 Bergen, Norway; ¶Department of Microbiology; Haukeland University Hospital, N-5021 Bergen, Norway
| |
Collapse
|
38
|
Sheynkman GM, Shortreed MR, Cesnik AJ, Smith LM. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2016; 9:521-45. [PMID: 27049631 PMCID: PMC4991544 DOI: 10.1146/annurev-anchem-071015-041722] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215;
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
- Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;
| |
Collapse
|
39
|
Shi L, Ravikumar V, Derouiche A, Macek B, Mijakovic I. Tyrosine 601 of Bacillus subtilis DnaK Undergoes Phosphorylation and Is Crucial for Chaperone Activity and Heat Shock Survival. Front Microbiol 2016; 7:533. [PMID: 27148221 PMCID: PMC4835898 DOI: 10.3389/fmicb.2016.00533] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 03/31/2016] [Indexed: 01/10/2023] Open
Abstract
In order to screen for cellular substrates of the Bacillus subtilis BY-kinase PtkA, and its cognate phosphotyrosine-protein phosphatase PtpZ, we performed a triple Stable Isotope Labeling by Amino acids in Cell culture-based quantitative phosphoproteome analysis. Detected tyrosine phosphorylation sites for which the phosphorylation level decreased in the ΔptkA strain and increased in the ΔptpZ strain, compared to the wild type (WT), were considered as potential substrates of PtkA/PtpZ. One of those sites was the residue tyrosine 601 of the molecular chaperone DnaK. We confirmed that DnaK is a substrate of PtkA and PtpZ by in vitro phosphorylation and dephosphorylation assays. In vitro, DnaK Y601F mutant exhibited impaired interaction with its co-chaperones DnaJ and GrpE, along with diminished capacity to hydrolyze ATP and assist the re-folding of denatured proteins. In vivo, loss of DnaK phosphorylation in the mutant strain dnaK Y601F, or in the strain overexpressing the phosphatase PtpZ, led to diminished survival upon heat shock, consistent with the in vitro results. The decreased survival of the mutant dnaK Y601F at an elevated temperature could be rescued by complementing with the WT dnaK allele expressed ectopically. We concluded that the residue tyrosine 601 of DnaK can be phosphorylated and dephosphorylated by PtkA and PtpZ, respectively. Furthermore, Y601 is important for DnaK chaperone activity and heat shock survival of B. subtilis.
Collapse
Affiliation(s)
- Lei Shi
- Division of Systems and Synthetic Biology, Department of Biology and Biological Engineering, Chalmers University of Technology Gothenburg, Sweden
| | - Vaishnavi Ravikumar
- Proteome Center Tübingen, Interfaculty Institute for Cell Biology, University of Tübingen Tübingen, Germany
| | - Abderahmane Derouiche
- Division of Systems and Synthetic Biology, Department of Biology and Biological Engineering, Chalmers University of Technology Gothenburg, Sweden
| | - Boris Macek
- Proteome Center Tübingen, Interfaculty Institute for Cell Biology, University of Tübingen Tübingen, Germany
| | - Ivan Mijakovic
- Division of Systems and Synthetic Biology, Department of Biology and Biological Engineering, Chalmers University of Technology Gothenburg, Sweden
| |
Collapse
|
40
|
Potgieter MG, Nakedi KC, Ambler JM, Nel AJM, Garnett S, Soares NC, Mulder N, Blackburn JM. Proteogenomic Analysis of Mycobacterium smegmatis Using High Resolution Mass Spectrometry. Front Microbiol 2016; 7:427. [PMID: 27092112 PMCID: PMC4821088 DOI: 10.3389/fmicb.2016.00427] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 03/16/2016] [Indexed: 11/30/2022] Open
Abstract
Biochemical evidence is vital for accurate genome annotation. The integration of experimental data collected at the proteome level using high resolution mass spectrometry allows for improvements in genome annotation by providing evidence for novel gene models, while validating or modifying others. Here, we report the results of a proteogenomic analysis of a reference strain of Mycobacterium smegmatis (mc2155), a fast growing model organism for the pathogenic Mycobacterium tuberculosis—the causative agent for Tuberculosis. By integrating high throughput LC/MS/MS proteomic data with genomic six frame translation and ab initio gene prediction databases, a total of 2887 ORFs were identified, including 2810 ORFs annotated to a Reference protein, and 63 ORFs not previously annotated to a Reference protein. Further, the translational start site (TSS) was validated for 558 Reference proteome gene models, while upstream translational evidence was identified for 81. In addition, N-terminus derived peptide identifications allowed for downstream TSS modification of a further 24 gene models. We validated the existence of six previously described interrupted coding sequences at the peptide level, and provide evidence for four novel frameshift positions. Analysis of peptide posterior error probability (PEP) scores indicates high-confidence novel peptide identifications and shows that the genome of M. smegmatis mc2155 is not yet fully annotated. Data are available via ProteomeXchange with identifier PXD003500.
Collapse
Affiliation(s)
- Matthys G Potgieter
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town Cape Town, South Africa
| | - Kehilwe C Nakedi
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, IDM, University of Cape Town Cape Town, South Africa
| | - Jon M Ambler
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town Cape Town, South Africa
| | - Andrew J M Nel
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, IDM, University of Cape Town Cape Town, South Africa
| | - Shaun Garnett
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, IDM, University of Cape Town Cape Town, South Africa
| | - Nelson C Soares
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, IDM, University of Cape Town Cape Town, South Africa
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town Cape Town, South Africa
| | - Jonathan M Blackburn
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, IDM, University of Cape Town Cape Town, South Africa
| |
Collapse
|
41
|
Wessels HJCT, de Almeida NM, Kartal B, Keltjens JT. Bacterial Electron Transfer Chains Primed by Proteomics. Adv Microb Physiol 2016; 68:219-352. [PMID: 27134025 DOI: 10.1016/bs.ampbs.2016.02.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Electron transport phosphorylation is the central mechanism for most prokaryotic species to harvest energy released in the respiration of their substrates as ATP. Microorganisms have evolved incredible variations on this principle, most of these we perhaps do not know, considering that only a fraction of the microbial richness is known. Besides these variations, microbial species may show substantial versatility in using respiratory systems. In connection herewith, regulatory mechanisms control the expression of these respiratory enzyme systems and their assembly at the translational and posttranslational levels, to optimally accommodate changes in the supply of their energy substrates. Here, we present an overview of methods and techniques from the field of proteomics to explore bacterial electron transfer chains and their regulation at levels ranging from the whole organism down to the Ångstrom scales of protein structures. From the survey of the literature on this subject, it is concluded that proteomics, indeed, has substantially contributed to our comprehending of bacterial respiratory mechanisms, often in elegant combinations with genetic and biochemical approaches. However, we also note that advanced proteomics offers a wealth of opportunities, which have not been exploited at all, or at best underexploited in hypothesis-driving and hypothesis-driven research on bacterial bioenergetics. Examples obtained from the related area of mitochondrial oxidative phosphorylation research, where the application of advanced proteomics is more common, may illustrate these opportunities.
Collapse
Affiliation(s)
- H J C T Wessels
- Nijmegen Center for Mitochondrial Disorders, Radboud Proteomics Centre, Translational Metabolic Laboratory, Radboud University Medical Center, Nijmegen, The Netherlands
| | - N M de Almeida
- Institute of Water and Wetland Research, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - B Kartal
- Institute of Water and Wetland Research, Radboud University Nijmegen, Nijmegen, The Netherlands; Laboratory of Microbiology, Ghent University, Ghent, Belgium
| | - J T Keltjens
- Institute of Water and Wetland Research, Radboud University Nijmegen, Nijmegen, The Netherlands.
| |
Collapse
|
42
|
Nakahigashi K, Takai Y, Kimura M, Abe N, Nakayashiki T, Shiwa Y, Yoshikawa H, Wanner BL, Ishihama Y, Mori H. Comprehensive identification of translation start sites by tetracycline-inhibited ribosome profiling. DNA Res 2016; 23:193-201. [PMID: 27013550 PMCID: PMC4909307 DOI: 10.1093/dnares/dsw008] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2015] [Accepted: 02/06/2016] [Indexed: 01/12/2023] Open
Abstract
Tetracycline-inhibited ribosome profiling (TetRP) provides a powerful new experimental tool for comprehensive genome-wide identification of translation initiation sites in bacteria. We validated TetRP by confirming the translation start sites of protein-coding genes in accordance with the 2006 version of Escherichia coli K-12 annotation record (GenBank U00096.2) and found ∼150 new start sites within 60 nucleotides of the annotated site. This analysis revealed 72 per cent of the genes whose initiation site annotations were changed from the 2006 GenBank record to the newer 2014 annotation record (GenBank U00096.3), indicating a high sensitivity. Also, results from reporter fusion and proteomics of N-terminally enriched peptides showed high specificity of the TetRP results. In addition, we discovered over 300 translation start sites within non-coding, intergenic regions of the genome, using a threshold that retains ∼2,000 known coding genes. While some appear to correspond to pseudogenes, others may encode small peptides or have previously unforeseen roles. In summary, we showed that ribosome profiling upon translation inhibition by tetracycline offers a simple, reliable and comprehensive experimental tool for precise annotation of translation start sites of expressed genes in bacteria.
Collapse
Affiliation(s)
- Kenji Nakahigashi
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan
| | - Yuki Takai
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan
| | - Michiko Kimura
- Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Nozomi Abe
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan
| | - Toru Nakayashiki
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan
| | - Yuh Shiwa
- Genome Research Center, NODAI Research Institute, Tokyo University of Agriculture, Tokyo 156-8502, Japan
| | - Hirofumi Yoshikawa
- Genome Research Center, NODAI Research Institute, Tokyo University of Agriculture, Tokyo 156-8502, Japan Department of Bioscience, Tokyo University of Agriculture, Tokyo 156-8502, Japan
| | - Barry L Wanner
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Hirotada Mori
- Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan
| |
Collapse
|
43
|
Ang KS, Kyriakopoulos S, Li W, Lee DY. Multi-omics data driven analysis establishes reference codon biases for synthetic gene design in microbial and mammalian cells. Methods 2016; 102:26-35. [PMID: 26850284 DOI: 10.1016/j.ymeth.2016.01.016] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Revised: 01/08/2016] [Accepted: 01/19/2016] [Indexed: 11/19/2022] Open
Abstract
In this study, we analyzed multi-omics data and subsets thereof to establish reference codon usage biases for codon optimization in synthetic gene design. Specifically, publicly available genomic, transcriptomic, proteomic and translatomic data for microbial and mammalian expression hosts, Escherichia coli, Saccharomyces cerevisiae, Pichia pastoris and Chinese hamster ovary (CHO) cells, were compiled to derive their individual codon and codon pair frequencies. Then, host dependent and -omics specific codon biases were generated and compared by principal component analysis and hierarchical clustering. Interestingly, our results indicated the similar codon bias patterns of the highly expressed transcripts, highly abundant proteins, and efficiently translated mRNA in microbial cells, despite the general lack of correlation between mRNA and protein expression levels. However, for CHO cells, the codon bias patterns among various -omics subsets are not distinguishable, forming one cluster. Thus, we further investigated the effect of different input codon biases on codon optimized sequences using the codon context (CC) and individual codon usage (ICU) design parameters, via in silico case study on the expression of human IFNγ sequence in CHO cells. The results supported that CC is more robust design parameter than ICU for improved heterologous gene design.
Collapse
Affiliation(s)
- Kok Siong Ang
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, Singapore 117585, Singapore; NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), Life Sciences Institute, National University of Singapore, 28 Medical Drive, Singapore 117456, Singapore
| | - Sarantos Kyriakopoulos
- Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), 20 Biopolis Way, #06-01 Centros, Singapore 138668, Singapore
| | - Wei Li
- Sangon Biotech (Shanghai) Co., Ltd., 698 Xiangmin Road, SongJiang District, Shanghai 201611, China
| | - Dong-Yup Lee
- Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, Singapore 117585, Singapore; NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), Life Sciences Institute, National University of Singapore, 28 Medical Drive, Singapore 117456, Singapore; Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), 20 Biopolis Way, #06-01 Centros, Singapore 138668, Singapore.
| |
Collapse
|
44
|
Proteogenomic Tools and Approaches to Explore Protein Coding Landscapes of Eukaryotic Genomes. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 926:1-10. [DOI: 10.1007/978-3-319-42316-6_1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
45
|
Zickmann F, Renard BY. MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms. Bioinformatics 2015; 31:i106-15. [PMID: 26072472 PMCID: PMC4765881 DOI: 10.1093/bioinformatics/btv236] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Summary: Ongoing advances in high-throughput technologies have facilitated accurate proteomic measurements and provide a wealth of information on genomic and transcript level. In proteogenomics, this multi-omics data is combined to analyze unannotated organisms and to allow more accurate sample-specific predictions. Existing analysis methods still mainly depend on six-frame translations or reference protein databases that are extended by transcriptomic information or known single nucleotide polymorphisms (SNPs). However, six-frames introduce an artificial sixfold increase of the target database and SNP integration requires a suitable database summarizing results from previous experiments. We overcome these limitations by introducing MSProGene, a new method for integrative proteogenomic analysis based on customized RNA-Seq driven transcript databases. MSProGene is independent from existing reference databases or annotated SNPs and avoids large six-frame translated databases by constructing sample-specific transcripts. In addition, it creates a network combining RNA-Seq and peptide information that is optimized by a maximum-flow algorithm. It thereby also allows resolving the ambiguity of shared peptides for protein inference. We applied MSProGene on three datasets and show that it facilitates a database-independent reliable yet accurate prediction on gene and protein level and additionally identifies novel genes. Availability and implementation: MSProGene is written in Java and Python. It is open source and available at http://sourceforge.net/projects/msprogene/. Contact:renardb@rki.de
Collapse
Affiliation(s)
- Franziska Zickmann
- Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany
| |
Collapse
|
46
|
Bin Goh WW, Guo T, Aebersold R, Wong L. Quantitative proteomics signature profiling based on network contextualization. Biol Direct 2015; 10:71. [PMID: 26666224 PMCID: PMC4678536 DOI: 10.1186/s13062-015-0098-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 11/30/2015] [Indexed: 12/02/2022] Open
Abstract
Background We present a network-based method, namely quantitative proteomic signature profiling (qPSP) that improves the biological content of proteomic data by converting protein expressions into hit-rates in protein complexes. Results We demonstrate, using two clinical proteomics datasets, that qPSP produces robust discrimination between phenotype classes (e.g. normal vs. disease) and uncovers phenotype-relevant protein complexes. Regardless of acquisition paradigm, comparisons of qPSP against conventional methods (e.g. t-test or hypergeometric test) demonstrate that it produces more stable and consistent predictions, even at small sample size. We show that qPSP is theoretically robust to noise, and that this robustness to noise is also observable in practice. Comparative analysis of hit-rates and protein expressions in significant complexes reveals that hit-rates are a useful means of summarizing differential behavior in a complex-specific manner. Conclusions Given qPSP’s ability to discriminate phenotype classes even at small sample sizes, high robustness to noise, and better summary statistics, it can be deployed towards analysis of highly heterogeneous clinical proteomics data. Reviewers This article was reviewed by Frank Eisenhaber and Sebastian Maurer-Stroh. Open peer review Reviewed by Frank Eisenhaber and Sebastian Maurer-Stroh. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0098-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wilson Wen Bin Goh
- School of Pharmaceutical Science and Technology, Tianjin University, 92 Weijin Road, Tianjin City, 300072, China. .,Center for Interdisciplinary Cardiovascular Sciences, Harvard Medical School, Boston, USA. .,Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland. .,School of Computing, National University of Singapore, Singapore, Singapore.
| | - Tiannan Guo
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland. .,Faculty of Science, University of Zurich, Zurich, Switzerland.
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
47
|
Schmidt A, Kochanowski K, Vedelaar S, Ahrné E, Volkmer B, Callipo L, Knoops K, Bauer M, Aebersold R, Heinemann M. The quantitative and condition-dependent Escherichia coli proteome. Nat Biotechnol 2015; 34:104-10. [PMID: 26641532 PMCID: PMC4888949 DOI: 10.1038/nbt.3418] [Citation(s) in RCA: 490] [Impact Index Per Article: 54.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 10/28/2015] [Indexed: 12/26/2022]
Abstract
Measuring precise concentrations of proteins can provide insights into biological processes. Here, we use efficient protein extraction and sample fractionation and state-of-the-art quantitative mass spectrometry techniques to generate a comprehensive, condition-dependent protein abundance map of Escherichia coli. We measure cellular protein concentrations for 55% of predicted E. coli genes (>2300 proteins) under 22 different experimental conditions and identify methylation and N-terminal protein acetylations previously not known to be prevalent in bacteria. We uncover system-wide proteome allocation, expression regulation, and post-translational adaptations. These data provide a valuable resource for the systems biology and broader E. coli research communities.
Collapse
Affiliation(s)
| | - Karl Kochanowski
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Silke Vedelaar
- Molecular Systems Biology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, the Netherlands
| | - Erik Ahrné
- Biozentrum, University of Basel, Basel, Switzerland
| | - Benjamin Volkmer
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Luciano Callipo
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Kèvin Knoops
- Molecular Cell Biology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, the Netherlands
| | - Manuel Bauer
- Biozentrum, University of Basel, Basel, Switzerland
| | - Ruedi Aebersold
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.,Faculty of Science, University of Zurich, Zurich, Switzerland
| | - Matthias Heinemann
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.,Molecular Systems Biology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, the Netherlands
| |
Collapse
|
48
|
Shanmugam AK, Nesvizhskii AI. Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics. J Proteome Res 2015; 14:5169-78. [PMID: 26569054 DOI: 10.1021/acs.jproteome.5b00504] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
In shotgun proteomics, peptides are typically identified using database searching, which involves scoring acquired tandem mass spectra against peptides derived from standard protein sequence databases such as Uniprot, Refseq, or Ensembl. In this strategy, the sensitivity of peptide identification is known to be affected by the size of the search space. Therefore, creating a targeted sequence database containing only peptides likely to be present in the analyzed sample can be a useful technique for improving the sensitivity of peptide identification. In this study, we describe how targeted peptide databases can be created based on the frequency of identification in the global proteome machine database (GPMDB), the largest publicly available repository of peptide and protein identification data. We demonstrate that targeted peptide databases can be easily integrated into existing proteome analysis workflows and describe a computational strategy for minimizing any loss of peptide identifications arising from potential search space incompleteness in the targeted search spaces. We demonstrate the performance of our workflow using several data sets of varying size and sample complexity.
Collapse
Affiliation(s)
- Avinash K Shanmugam
- Department of Computational Medicine and Bioinformatics and ‡Department of Pathology, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics and ‡Department of Pathology, University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|
49
|
Kumar D, Mondal AK, Kutum R, Dash D. Proteogenomics of rare taxonomic phyla: A prospective treasure trove of protein coding genes. Proteomics 2015; 16:226-40. [PMID: 26773550 DOI: 10.1002/pmic.201500263] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 09/18/2015] [Accepted: 09/28/2015] [Indexed: 01/04/2023]
Abstract
Sustainable innovations in sequencing technologies have resulted in a torrent of microbial genome sequencing projects. However, the prokaryotic genomes sequenced so far are unequally distributed along their phylogenetic tree; few phyla contain the majority, the rest only a few representatives. Accurate genome annotation lags far behind genome sequencing. While automated computational prediction, aided by comparative genomics, remains a popular choice for genome annotation, substantial fraction of these annotations are erroneous. Proteogenomics utilizes protein level experimental observations to annotate protein coding genes on a genome wide scale. Benefits of proteogenomics include discovery and correction of gene annotations regardless of their phylogenetic conservation. This not only allows detection of common, conserved proteins but also the discovery of protein products of rare genes that may be horizontally transferred or taxonomy specific. Chances of encountering such genes are more in rare phyla that comprise a small number of complete genome sequences. We collated all bacterial and archaeal proteogenomic studies carried out to date and reviewed them in the context of genome sequencing projects. Here, we present a comprehensive list of microbial proteogenomic studies, their taxonomic distribution, and also urge for targeted proteogenomics of underexplored taxa to build an extensive reference of protein coding genes.
Collapse
Affiliation(s)
- Dhirendra Kumar
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Anupam Kumar Mondal
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Rintu Kutum
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| | - Debasis Dash
- G. N. Ramachandran Knowledge Center of Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, South Campus, Sukhdev Vihar, Delhi, India
| |
Collapse
|
50
|
Yagoub D, Tay AP, Chen Z, Hamey JJ, Cai C, Chia SZ, Hart-Smith G, Wilkins MR. Proteogenomic Discovery of a Small, Novel Protein in Yeast Reveals a Strategy for the Detection of Unannotated Short Open Reading Frames. J Proteome Res 2015; 14:5038-47. [DOI: 10.1021/acs.jproteome.5b00734] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Daniel Yagoub
- Systems Biology Initiative,
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Aidan P. Tay
- Systems Biology Initiative,
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Zhiliang Chen
- Systems Biology Initiative,
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Joshua J. Hamey
- Systems Biology Initiative,
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Curtis Cai
- Systems Biology Initiative,
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Samantha Z. Chia
- Systems Biology Initiative,
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Gene Hart-Smith
- Systems Biology Initiative,
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Marc R. Wilkins
- Systems Biology Initiative,
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| |
Collapse
|