1
|
Picciani M, Gabriel W, Giurcoiu VG, Shouman O, Hamood F, Lautenbacher L, Jensen CB, Müller J, Kalhor M, Soleymaniniya A, Kuster B, The M, Wilhelm M. Oktoberfest: Open-source spectral library generation and rescoring pipeline based on Prosit. Proteomics 2024; 24:e2300112. [PMID: 37672792 DOI: 10.1002/pmic.202300112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 09/08/2023]
Abstract
Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.
Collapse
Affiliation(s)
- Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Victor-George Giurcoiu
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Omar Shouman
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Firas Hamood
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Ludwig Lautenbacher
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Cecilia Bang Jensen
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Julian Müller
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Armin Soleymaniniya
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
2
|
Palstrøm NB, Campbell AJ, Lindegaard CA, Cakar S, Matthiesen R, Beck HC. Spectral library search for improved TMTpro labelled peptide assignment in human plasma proteomics. Proteomics 2024; 24:e2300236. [PMID: 37706597 DOI: 10.1002/pmic.202300236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 09/15/2023]
Abstract
Clinical biomarker discovery is often based on the analysis of human plasma samples. However, the high dynamic range and complexity of plasma pose significant challenges to mass spectrometry-based proteomics. Current methods for improving protein identifications require laborious pre-analytical sample preparation. In this study, we developed and evaluated a TMTpro-specific spectral library for improved protein identification in human plasma proteomics. The library was constructed by LC-MS/MS analysis of highly fractionated TMTpro-tagged human plasma, human cell lysates, and relevant arterial tissues. The library was curated using several quality filters to ensure reliable peptide identifications. Our results show that spectral library searching using the TMTpro spectral library improves the identification of proteins in plasma samples compared to conventional sequence database searching. Protein identifications made by the spectral library search engine demonstrated a high degree of complementarity with the sequence database search engine, indicating the feasibility of increasing the number of protein identifications without additional pre-analytical sample preparation. The TMTpro-specific spectral library provides a resource for future plasma proteomics research and optimization of search algorithms for greater accuracy and speed in protein identifications in human plasma proteomics, and is made publicly available to the research community via ProteomeXchange with identifier PXD042546.
Collapse
Affiliation(s)
- Nicolai B Palstrøm
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | - Amanda J Campbell
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | | | - Samir Cakar
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | - Rune Matthiesen
- Computational and Experimental Biology Group, CEDOC, Chronic Diseases Research Centre, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal
| | - Hans C Beck
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| |
Collapse
|
3
|
Gabriel W, Picciani M, The M, Wilhelm M. Deep Learning-Assisted Analysis of Immunopeptidomics Data. Methods Mol Biol 2024; 2758:457-483. [PMID: 38549030 DOI: 10.1007/978-1-0716-3646-6_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Liquid chromatography-coupled mass spectrometry (LC-MS/MS) is the primary method to obtain direct evidence for the presentation of disease- or patient-specific human leukocyte antigen (HLA). However, compared to the analysis of tryptic peptides in proteomics, the analysis of HLA peptides still poses computational and statistical challenges. Recently, fragment ion intensity-based matching scores assessing the similarity between predicted and observed spectra were shown to substantially increase the number of confidently identified peptides, particularly in use cases where non-tryptic peptides are analyzed. In this chapter, we describe in detail three procedures on how to benefit from state-of-the-art deep learning models to analyze and validate single spectra, single measurements, and multiple measurements in mass spectrometry-based immunopeptidomics. For this, we explain how to use the Universal Spectrum Explorer (USE), online Oktoberfest, and offline Oktoberfest. For intensity-based scoring, Oktoberfest uses fragment ion intensity and retention time predictions from the deep learning framework Prosit, a deep neural network trained on a very large number of synthetic peptides and tandem mass spectra generated within the ProteomeTools project. The examples shown highlight how deep learning-assisted analysis can increase the number of identified HLA peptides, facilitate the discovery of confidently identified neo-epitopes, or provide assistance in the assessment of the presence of cryptic peptides, such as spliced peptides.
Collapse
Affiliation(s)
- Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
4
|
McGann CD, Barshop WD, Canterbury JD, Lin C, Gabriel W, Huang J, Bergen D, Zabrouskov V, Melani RD, Wilhelm M, McAlister GC, Schweppe DK. Real-Time Spectral Library Matching for Sample Multiplexed Quantitative Proteomics. J Proteome Res 2023; 22:2836-2846. [PMID: 37557900 DOI: 10.1021/acs.jproteome.3c00085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
Sample multiplexed quantitative proteomics assays have proved to be a highly versatile means to assay molecular phenotypes. Yet, stochastic precursor selection and precursor coisolation can dramatically reduce the efficiency of data acquisition and quantitative accuracy. To address this, intelligent data acquisition (IDA) strategies have recently been developed to improve instrument efficiency and quantitative accuracy for both discovery and targeted methods. Toward this end, we sought to develop and implement a new real-time spectral library searching (RTLS) workflow that could enable intelligent scan triggering and peak selection within milliseconds of scan acquisition. To ensure ease of use and general applicability, we built an application to read in diverse spectral libraries and file types from both empirical and predicted spectral libraries. We demonstrate that RTLS methods enable improved quantitation of multiplexed samples, particularly with consideration for quantitation from chimeric fragment spectra. We used RTLS to profile proteome responses to small molecule perturbations and were able to quantify up to 15% more significantly regulated proteins in half the gradient time compared to traditional methods. Taken together, the development of RTLS expands the IDA toolbox to improve instrument efficiency and quantitative accuracy for sample multiplexed analyses.
Collapse
Affiliation(s)
- Chris D McGann
- University of Washington, Seattle, Washington 98105, United States
| | | | | | - Chuwei Lin
- University of Washington, Seattle, Washington 98105, United States
| | | | - Jingjing Huang
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - David Bergen
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Vlad Zabrouskov
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Rafael D Melani
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | | | | | - Devin K Schweppe
- University of Washington, Seattle, Washington 98105, United States
| |
Collapse
|
5
|
Paulo JA. Isobaric labeling: Expanding the breadth, accuracy, depth, and diversity of sample multiplexing. Proteomics 2022; 22:e2200328. [PMID: 36089831 PMCID: PMC10777124 DOI: 10.1002/pmic.202200328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 08/25/2022] [Accepted: 08/26/2022] [Indexed: 11/10/2022]
Abstract
Isobaric labeling has rapidly become a predominant strategy for proteome-wide abundance measurements. Coupled to mass spectrometry, sample multiplexing techniques using isobaric labeling are unparalleled for profiling proteins and posttranslational modifications across multiple samples in a single experiment. Here, I highlight aspects of isobaric labeling in the context of expanding the breadth of multiplexing, improving quantitative accuracy and proteome depth, and developing a wide range of diverse applications. I underscore two facets that enhance quantitative accuracy and reproducibility, specifically the availability of quality control standards for isobaric labeling experiments and the evolution of data acquisition methods. I also emphasize the necessity for standardized methodologies, particularly for emerging high-throughput workflows. Future developments in sample multiplexing will further strengthen the importance of isobaric labeling for comprehensive proteome profiling.
Collapse
Affiliation(s)
- Joao A Paulo
- Department of Cell Biology, Blavatnik Institute at Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|