1
|
Flender D, Vilenne F, Adams C, Boonen K, Valkenborg D, Baggerman G. Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain. MASS SPECTROMETRY REVIEWS 2024. [PMID: 39152539 DOI: 10.1002/mas.21905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
Immunopeptidomics is becoming an increasingly important field of study. The capability to identify immunopeptides with pivotal roles in the human immune system is essential to shift the current curative medicine towards personalized medicine. Throughout the years, the field has matured, giving insight into the current pitfalls. Nowadays, it is commonly accepted that generalizing shotgun proteomics workflows is malpractice because immunopeptidomics faces numerous challenges. While many of these difficulties have been addressed, the road towards the ideal workflow remains complicated. Although the presence of Posttranslational modifications (PTMs) in the immunopeptidome has been demonstrated, their identification remains highly challenging despite their significance for immunotherapies. The large number of unpredictable modifications in the immunopeptidome plays a pivotal role in the functionality and these challenges. This review provides a comprehensive overview of the current advancements in immunopeptidomics. We delve into the challenges associated with identifying PTMs within the immunopeptidome, aiming to address the current state of the field.
Collapse
Affiliation(s)
- Daniel Flender
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- Health Unit, VITO, Mol, Belgium
| | - Frédérique Vilenne
- Health Unit, VITO, Mol, Belgium
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- ImmuneSpec, Niel, Belgium
| | - Dirk Valkenborg
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Geert Baggerman
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- ImmuneSpec, Niel, Belgium
| |
Collapse
|
2
|
Kalhor M, Lapin J, Picciani M, Wilhelm M. Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification. Mol Cell Proteomics 2024; 23:100798. [PMID: 38871251 PMCID: PMC11269915 DOI: 10.1016/j.mcpro.2024.100798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/26/2024] [Accepted: 06/09/2024] [Indexed: 06/15/2024] Open
Abstract
Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Joel Lapin
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
3
|
Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. mSphere 2024; 9:e0079323. [PMID: 38780289 PMCID: PMC11332332 DOI: 10.1128/msphere.00793-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/17/2024] [Indexed: 05/25/2024] Open
Abstract
Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification, and prioritization of microbial proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant [to generate peptide-spectral matches (PSMs) and quantification], PepQuery2 (to verify the quality of PSMs), Unipept (for taxonomic and functional annotation), and MSstatsTMT (for statistical analysis). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies. IMPORTANCE Clinical metaproteomics has immense potential to offer functional insights into the microbiome and its contributions to human disease. However, there are numerous challenges in the metaproteomic analysis of clinical samples, including handling of very large protein sequence databases for sensitive and accurate peptide and protein identification from mass spectrometry data, as well as taxonomic and functional annotation of quantified peptides and proteins to enable interpretation of results. To address these challenges, we have developed a novel clinical metaproteomics workflow that provides customized bioinformatic identification, verification, quantification, and taxonomic and functional annotation. This bioinformatic workflow is implemented in the Galaxy ecosystem and has been used to characterize diverse clinical sample types, such as nasopharyngeal swabs and bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness and availability for use by the research community via analysis of residual fluid from cervical swabs.
Collapse
Affiliation(s)
- Katherine Do
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Reid Wagner
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Dechen Bhuming
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Andrew T. Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Amy P. N. Skubitz
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota, USA
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
4
|
Picciani M, Gabriel W, Giurcoiu VG, Shouman O, Hamood F, Lautenbacher L, Jensen CB, Müller J, Kalhor M, Soleymaniniya A, Kuster B, The M, Wilhelm M. Oktoberfest: Open-source spectral library generation and rescoring pipeline based on Prosit. Proteomics 2024; 24:e2300112. [PMID: 37672792 DOI: 10.1002/pmic.202300112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 09/08/2023]
Abstract
Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.
Collapse
Affiliation(s)
- Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Victor-George Giurcoiu
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Omar Shouman
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Firas Hamood
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Ludwig Lautenbacher
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Cecilia Bang Jensen
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Julian Müller
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Armin Soleymaniniya
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
5
|
Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.21.568121. [PMID: 38045370 PMCID: PMC10690215 DOI: 10.1101/2023.11.21.568121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, which are usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification and prioritization of microbial and host proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant (to generate peptide-spectral matches (PSMs) and quantification), PepQuery2 (to verify the quality of PSMs), and Unipept and MSstatsTMT (for taxonomy and functional annotation). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies.
Collapse
|
6
|
Jia W, Peng J, Zhang Y, Zhu J, Qiang X, Zhang R, Shi L. Exploring novel ANGICon-EIPs through ameliorated peptidomics techniques: Can deep learning strategies as a core breakthrough in peptide structure and function prediction? Food Res Int 2023; 174:113640. [PMID: 37986483 DOI: 10.1016/j.foodres.2023.113640] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/23/2023] [Accepted: 10/24/2023] [Indexed: 11/22/2023]
Abstract
Dairy-derived angiotensin-I-converting enzyme inhibitory peptides (ANGICon-EIPs) have been regarded as a relatively safe supplementary diet-therapy strategy for individuals with hypertension, and short-chain peptides may have more relevant antihypertensive benefits due to their direct intestinal absorption. Our previous explorations have confirmed that endogenous goat milk short-chain peptides are also an essential source of ANGICon-EIPs. Nonetheless, there are limited explorations on endogenous ANGICon-EIPs owing to the limitations of the extraction and enrichment of endogenous peptides, currently. This review outlined ameliorated pre-treatment strategies, data acquisition methods, and tools for the prediction of peptide structure and function, aiming to provide creative ideas for discovering novel ANGICon-EIPs. Currently, deep learning-based peptide structure and function prediction algorithms have achieved significant advancements. The convolutional neural network (CNN) and peptide sequence-based multi-label deep learning approach for determining the multi-functionalities of bioactive peptides (MLBP) can predict multiple peptide functions with absolute true value and accuracy of 0.699 and 0.708, respectively. Utilizing peptide sequence input, torsion angles, and inter-residue distance to train neural networks, APPTEST predicted the average backbone root mean square deviation (RMSD) value of peptide (5-40 aa) structures as low as 1.96 Å. Overall, with the exploration of more neural network architectures, deep learning could be considered a critical research tool to reduce the cost and improve the efficiency of identifying novel endogenous ANGICon-EIPs.
Collapse
Affiliation(s)
- Wei Jia
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China; Inspection and Testing Center of Fuping County (Shaanxi goat milk product quality supervision and Inspection Center), Weinan 711700, China; Shaanxi Research Institute of Agricultural Products Processing Technology, Xi'an 710021, China.
| | - Jian Peng
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Yan Zhang
- Inspection and Testing Center of Fuping County (Shaanxi goat milk product quality supervision and Inspection Center), Weinan 711700, China
| | - Jiying Zhu
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Xin Qiang
- Inspection and Testing Center of Fuping County (Shaanxi goat milk product quality supervision and Inspection Center), Weinan 711700, China
| | - Rong Zhang
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Lin Shi
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| |
Collapse
|
7
|
McGann CD, Barshop WD, Canterbury JD, Lin C, Gabriel W, Huang J, Bergen D, Zabrouskov V, Melani RD, Wilhelm M, McAlister GC, Schweppe DK. Real-Time Spectral Library Matching for Sample Multiplexed Quantitative Proteomics. J Proteome Res 2023; 22:2836-2846. [PMID: 37557900 DOI: 10.1021/acs.jproteome.3c00085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
Sample multiplexed quantitative proteomics assays have proved to be a highly versatile means to assay molecular phenotypes. Yet, stochastic precursor selection and precursor coisolation can dramatically reduce the efficiency of data acquisition and quantitative accuracy. To address this, intelligent data acquisition (IDA) strategies have recently been developed to improve instrument efficiency and quantitative accuracy for both discovery and targeted methods. Toward this end, we sought to develop and implement a new real-time spectral library searching (RTLS) workflow that could enable intelligent scan triggering and peak selection within milliseconds of scan acquisition. To ensure ease of use and general applicability, we built an application to read in diverse spectral libraries and file types from both empirical and predicted spectral libraries. We demonstrate that RTLS methods enable improved quantitation of multiplexed samples, particularly with consideration for quantitation from chimeric fragment spectra. We used RTLS to profile proteome responses to small molecule perturbations and were able to quantify up to 15% more significantly regulated proteins in half the gradient time compared to traditional methods. Taken together, the development of RTLS expands the IDA toolbox to improve instrument efficiency and quantitative accuracy for sample multiplexed analyses.
Collapse
Affiliation(s)
- Chris D McGann
- University of Washington, Seattle, Washington 98105, United States
| | | | | | - Chuwei Lin
- University of Washington, Seattle, Washington 98105, United States
| | | | - Jingjing Huang
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - David Bergen
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Vlad Zabrouskov
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Rafael D Melani
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | | | | | - Devin K Schweppe
- University of Washington, Seattle, Washington 98105, United States
| |
Collapse
|
8
|
Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023; 20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]
Abstract
INTRODUCTION Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.
Collapse
Affiliation(s)
- Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Matthias Bernt
- Helmholtz Centre for Environmental Research - UFZ, Department Computational Biology, Leipzig, Germany
| | | | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Valentin Loux
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | - W C Mike Thang
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Sippy Downs, University of the Sunshine Coast, Australia
| | - Gareth Price
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
9
|
Wilburn DB, Shannon AE, Spicer V, Richards AL, Yeung D, Swaney DL, Krokhin OV, Searle BC. Deep learning from harmonized peptide libraries enables retention time prediction of diverse post translational modifications. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.30.542978. [PMID: 37398395 PMCID: PMC10312522 DOI: 10.1101/2023.05.30.542978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
In proteomics experiments, peptide retention time (RT) is an orthogonal property to fragmentation when assessing detection confidence. Advances in deep learning enable accurate RT prediction for any peptide from sequence alone, including those yet to be experimentally observed. Here we present Chronologer, an open-source software tool for rapid and accurate peptide RT prediction. Using new approaches to harmonize and false-discovery correct across independently collected datasets, Chronologer is built on a massive database with >2.2 million peptides including 10 common post-translational modification (PTM) types. By linking knowledge learned across diverse peptide chemistries, Chronologer predicts RTs with less than two-thirds the error of other deep learning tools. We show how RT for rare PTMs, such as OGlcNAc, can be learned with high accuracy using as few as 10-100 example peptides in newly harmonized datasets. This iteratively updatable workflow enables Chronologer to comprehensively predict RTs for PTM-marked peptides across entire proteomes.
Collapse
|