1
|
Gabriel W, Picciani M, The M, Wilhelm M. Deep Learning-Assisted Analysis of Immunopeptidomics Data. Methods Mol Biol 2024; 2758:457-483. [PMID: 38549030 DOI: 10.1007/978-1-0716-3646-6_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Liquid chromatography-coupled mass spectrometry (LC-MS/MS) is the primary method to obtain direct evidence for the presentation of disease- or patient-specific human leukocyte antigen (HLA). However, compared to the analysis of tryptic peptides in proteomics, the analysis of HLA peptides still poses computational and statistical challenges. Recently, fragment ion intensity-based matching scores assessing the similarity between predicted and observed spectra were shown to substantially increase the number of confidently identified peptides, particularly in use cases where non-tryptic peptides are analyzed. In this chapter, we describe in detail three procedures on how to benefit from state-of-the-art deep learning models to analyze and validate single spectra, single measurements, and multiple measurements in mass spectrometry-based immunopeptidomics. For this, we explain how to use the Universal Spectrum Explorer (USE), online Oktoberfest, and offline Oktoberfest. For intensity-based scoring, Oktoberfest uses fragment ion intensity and retention time predictions from the deep learning framework Prosit, a deep neural network trained on a very large number of synthetic peptides and tandem mass spectra generated within the ProteomeTools project. The examples shown highlight how deep learning-assisted analysis can increase the number of identified HLA peptides, facilitate the discovery of confidently identified neo-epitopes, or provide assistance in the assessment of the presence of cryptic peptides, such as spliced peptides.
Collapse
Affiliation(s)
- Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
2
|
Cormican JA, Horokhovskyi Y, Soh WT, Mishto M, Liepe J. inSPIRE: An Open-Source Tool for Increased Mass Spectrometry Identification Rates Using Prosit Spectral Prediction. Mol Cell Proteomics 2022; 21:100432. [PMID: 36280141 PMCID: PMC9720494 DOI: 10.1016/j.mcpro.2022.100432] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 10/17/2022] [Accepted: 10/19/2022] [Indexed: 11/05/2022] Open
Abstract
Rescoring of mass spectrometry (MS) search results using spectral predictors can strongly increase peptide spectrum match (PSM) identification rates. This approach is particularly effective when aiming to search MS data against large databases, for example, when dealing with nonspecific cleavage in immunopeptidomics or inflation of the reference database for noncanonical peptide identification. Here, we present inSPIRE (in silico Spectral Predictor Informed REscoring), a flexible and performant open-source rescoring pipeline built on Prosit MS spectral prediction, which is compatible with common database search engines. inSPIRE allows large-scale rescoring with data from multiple MS search files, increases sensitivity to minor differences in amino acid residue position, and can be applied to various MS sample types, including tryptic proteome digestions and immunopeptidomes. inSPIRE boosts PSM identification rates in immunopeptidomics, leading to better performance than the original Prosit rescoring pipeline, as confirmed by benchmarking of inSPIRE performance on ground truth datasets. The integration of various features in the inSPIRE backbone further boosts the PSM identification in immunopeptidomics, with a potential benefit for the identification of noncanonical peptides.
Collapse
Affiliation(s)
- John A Cormican
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany
| | - Yehor Horokhovskyi
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany
| | - Wai Tuck Soh
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany
| | - Michele Mishto
- Centre for Inflammation Biology and Cancer Immunology (CIBCI) & Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom; The Francis Crick Institute, London, United Kingdom.
| | - Juliane Liepe
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany.
| |
Collapse
|
3
|
Helena Duarte Sagawa C, Zaini PA, de A. B. Assis R, Saxe H, Salemi M, Jacobson A, Wilmarth PA, Phinney BS, M. Dandekar A. Deep Learning Neural Network Prediction Method Improves Proteome Profiling of Vascular Sap of Grapevines during Pierce's Disease Development. Biology (Basel) 2020; 9:biology9090261. [PMID: 32882865 PMCID: PMC7565608 DOI: 10.3390/biology9090261] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 08/24/2020] [Accepted: 08/28/2020] [Indexed: 12/31/2022]
Abstract
Plant secretome studies highlight the importance of vascular plant defense proteins against pathogens. Studies on Pierce’s disease of grapevines caused by the xylem-limited bacterium Xylella fastidiosa (Xf) have detected proteins and pathways associated with its pathobiology. Despite the biological importance of the secreted proteins in the extracellular space to plant survival and development, proteome studies are scarce due to methodological challenges. Prosit, a deep learning neural network prediction method is a powerful tool for improving proteome profiling by data-independent acquisition (DIA). We explored the potential of Prosit’s in silico spectral library predictions to improve DIA proteomic analysis of vascular leaf sap from grapevines with Pierce’s disease. The combination of DIA and Prosit-predicted libraries increased the total number of identified grapevine proteins from 145 to 360 and Xf proteins from 18 to 90 compared to gas-phase fractionation (GPF) libraries. The new proteins increased the range of molecular weights, assisted in the identification of more exclusive peptides per protein, and increased identification of low-abundance proteins. These improvements allowed identification of new functional pathways associated with cellular responses to oxidative stress, to be investigated further.
Collapse
Affiliation(s)
- Cíntia Helena Duarte Sagawa
- Department of Plant Sciences, University of California, Davis, 1 Shields Ave, CA 95616, USA; (C.H.D.S.); (P.A.Z.); (R.d.A.B.A.); (H.S.); (A.J.)
| | - Paulo A. Zaini
- Department of Plant Sciences, University of California, Davis, 1 Shields Ave, CA 95616, USA; (C.H.D.S.); (P.A.Z.); (R.d.A.B.A.); (H.S.); (A.J.)
| | - Renata de A. B. Assis
- Department of Plant Sciences, University of California, Davis, 1 Shields Ave, CA 95616, USA; (C.H.D.S.); (P.A.Z.); (R.d.A.B.A.); (H.S.); (A.J.)
- Departamento de Ciências Biológicas, Instituto de Ciências Exatas e Biológicas, Núcleo de Pesquisas em Ciências Biológicas, Universidade Federal de Ouro Preto, 122-Bauxita, Ouro Preto-MG 35400-000, Brazil
| | - Houston Saxe
- Department of Plant Sciences, University of California, Davis, 1 Shields Ave, CA 95616, USA; (C.H.D.S.); (P.A.Z.); (R.d.A.B.A.); (H.S.); (A.J.)
| | - Michelle Salemi
- Proteomics Core Facility, University of California, Davis, 1 Shields Ave, CA 95616, USA; (M.S.); (B.S.P.)
| | - Aaron Jacobson
- Department of Plant Sciences, University of California, Davis, 1 Shields Ave, CA 95616, USA; (C.H.D.S.); (P.A.Z.); (R.d.A.B.A.); (H.S.); (A.J.)
| | - Phillip A. Wilmarth
- Proteomics Shared Resource, Oregon Health and Science University, Medical Research Building, 3252 SW Research Drive, Portland, OR 97239, USA;
| | - Brett S. Phinney
- Proteomics Core Facility, University of California, Davis, 1 Shields Ave, CA 95616, USA; (M.S.); (B.S.P.)
| | - Abhaya M. Dandekar
- Department of Plant Sciences, University of California, Davis, 1 Shields Ave, CA 95616, USA; (C.H.D.S.); (P.A.Z.); (R.d.A.B.A.); (H.S.); (A.J.)
- Correspondence:
| |
Collapse
|
4
|
Verbruggen S, Ndah E, Van Criekinge W, Gessulat S, Kuster B, Wilhelm M, Van Damme P, Menschaert G. PROTEOFORMER 2.0: Further Developments in the Ribosome Profiling-assisted Proteogenomic Hunt for New Proteoforms. Mol Cell Proteomics 2019; 18:S126-S140. [PMID: 31040227 PMCID: PMC6692777 DOI: 10.1074/mcp.ra118.001218] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/30/2019] [Indexed: 12/20/2022] Open
Abstract
PROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, i.e. the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space with MaxQuant. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5' and 3' extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called noncoding regions. Further, proof-of-concept is reported for the improvement of spectrum matching by including Prosit, a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. These updates and novel conclusions provide new insights and lessons for the ribosome profiling-based proteogenomic research field. More practical information on the pipeline, raw code, the user manual (README) and explanations on the different modes of availability can be found at the GitHub repository of PROTEOFORMER: https://github.com/Biobix/proteoformer.
Collapse
Affiliation(s)
- Steven Verbruggen
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium.
| | - Elvis Ndah
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
| | - Wim Van Criekinge
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Munich, Germany; SAP SE, Potsdam, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Munich, Germany
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Munich, Germany
| | - Petra Van Damme
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium; Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Gerben Menschaert
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium.
| |
Collapse
|