1
|
Buur LM, Declercq A, Strobl M, Bouwmeester R, Degroeve S, Martens L, Dorfer V, Gabriels R. MS 2Rescore 3.0 Is a Modular, Flexible, and User-Friendly Platform to Boost Peptide Identifications, as Showcased with MS Amanda 3.0. J Proteome Res 2024; 23:3200-3207. [PMID: 38491990 DOI: 10.1021/acs.jproteome.3c00785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2024]
Abstract
Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MS2Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS2Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation. To showcase this new version, we connected MS2Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS2Rescore offers a powerful solution to boost peptide identifications. MS2Rescore's modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect the MS2Rescore to be a valuable tool for the wider proteomics community. MS2Rescore is available at https://github.com/compomics/ms2rescore.
Collapse
Affiliation(s)
- Louise M Buur
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Marina Strobl
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Viktoria Dorfer
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| |
Collapse
|
2
|
Lautenbacher L, Yang KL, Kockmann T, Panse C, Chambers M, Kahl E, Yu F, Gabriel W, Bold D, Schmidt T, Li K, MacLean B, Nesvizhskii AI, Wilhelm M. Koina: Democratizing machine learning for proteomics research. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.01.596953. [PMID: 38895358 PMCID: PMC11185529 DOI: 10.1101/2024.06.01.596953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Recent developments in machine-learning (ML) and deep-learning (DL) have immense potential for applications in proteomics, such as generating spectral libraries, improving peptide identification, and optimizing targeted acquisition modes. Although new ML/DL models for various applications and peptide properties are frequently published, the rate at which these models are adopted by the community is slow, which is mostly due to technical challenges. We believe that, for the community to make better use of state-of-the-art models, more attention should be spent on making models easy to use and accessible by the community. To facilitate this, we developed Koina, an open-source containerized, decentralized and online-accessible high-performance prediction service that enables ML/DL model usage in any pipeline. Using the widely used FragPipe computational platform as example, we show how Koina can be easily integrated with existing proteomics software tools and how these integrations improve data analysis.
Collapse
Affiliation(s)
- Ludwig Lautenbacher
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | - Kevin L. Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Tobias Kockmann
- Functional Genomics Center Zurich (FGCZ) - University of Zurich | ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
| | - Christian Panse
- Functional Genomics Center Zurich (FGCZ) - University of Zurich | ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Batiment Amphipole, CH-1015 Lausanne, Switzerland
| | - Matthew Chambers
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Elias Kahl
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | - Fengchao Yu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Wassim Gabriel
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | - Dulguun Bold
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | | | - Kai Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Brendan MacLean
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Alexey I. Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
- Munich Data Science Institute, Technical University of Munich, 85748, Garching, Germany
| |
Collapse
|
3
|
Adams C, Laukens K, Bittremieux W, Boonen K. Machine learning-based peptide-spectrum match rescoring opens up the immunopeptidome. Proteomics 2024; 24:e2300336. [PMID: 38009585 DOI: 10.1002/pmic.202300336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 11/29/2023]
Abstract
Immunopeptidomics is a key technology in the discovery of targets for immunotherapy and vaccine development. However, identifying immunopeptides remains challenging due to their non-tryptic nature, which results in distinct spectral characteristics. Moreover, the absence of strict digestion rules leads to extensive search spaces, further amplified by the incorporation of somatic mutations, pathogen genomes, unannotated open reading frames, and post-translational modifications. This inflation in search space leads to an increase in random high-scoring matches, resulting in fewer identifications at a given false discovery rate. Peptide-spectrum match rescoring has emerged as a machine learning-based solution to address challenges in mass spectrometry-based immunopeptidomics data analysis. It involves post-processing unfiltered spectrum annotations to better distinguish between correct and incorrect peptide-spectrum matches. Recently, features based on predicted peptidoform properties, including fragment ion intensities, retention time, and collisional cross section, have been used to improve the accuracy and sensitivity of immunopeptide identification. In this review, we describe the diverse bioinformatics pipelines that are currently available for peptide-spectrum match rescoring and discuss how they can be used for the analysis of immunopeptidomics data. Finally, we provide insights into current and future machine learning solutions to boost immunopeptide identification.
Collapse
Affiliation(s)
- Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- ImmuneSpec BV, Niel, Belgium
| |
Collapse
|
4
|
Gomez-Zepeda D, Arnold-Schild D, Beyrle J, Declercq A, Gabriels R, Kumm E, Preikschat A, Łącki MK, Hirschler A, Rijal JB, Carapito C, Martens L, Distler U, Schild H, Tenzer S. Thunder-DDA-PASEF enables high-coverage immunopeptidomics and is boosted by MS 2Rescore with MS 2PIP timsTOF fragmentation prediction model. Nat Commun 2024; 15:2288. [PMID: 38480730 PMCID: PMC10937930 DOI: 10.1038/s41467-024-46380-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 02/26/2024] [Indexed: 03/17/2024] Open
Abstract
Human leukocyte antigen (HLA) class I peptide ligands (HLAIps) are key targets for developing vaccines and immunotherapies against infectious pathogens or cancer cells. Identifying HLAIps is challenging due to their high diversity, low abundance, and patient individuality. Here, we develop a highly sensitive method for identifying HLAIps using liquid chromatography-ion mobility-tandem mass spectrometry (LC-IMS-MS/MS). In addition, we train a timsTOF-specific peak intensity MS2PIP model for tryptic and non-tryptic peptides and implement it in MS2Rescore (v3) together with the CCS predictor from ionmob. The optimized method, Thunder-DDA-PASEF, semi-selectively fragments singly and multiply charged HLAIps based on their IMS and m/z. Moreover, the method employs the high sensitivity mode and extended IMS resolution with fewer MS/MS frames (300 ms TIMS ramp, 3 MS/MS frames), doubling the coverage of immunopeptidomics analyses, compared to the proteomics-tailored DDA-PASEF (100 ms TIMS ramp, 10 MS/MS frames). Additionally, rescoring boosts the HLAIps identification by 41.7% to 33%, resulting in 5738 HLAIps from as little as one million JY cell equivalents, and 14,516 HLAIps from 20 million. This enables in-depth profiling of HLAIps from diverse human cell lines and human plasma. Finally, profiling JY and Raji cells transfected to express the SARS-CoV-2 spike protein results in 16 spike HLAIps, thirteen of which have been reported to elicit immune responses in human patients.
Collapse
Affiliation(s)
- David Gomez-Zepeda
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany.
- Helmholtz Institute for Translational Oncology Mainz (HI-TRON Mainz) - A Helmholtz Institute of the DKFZ, Mainz, Germany.
- German Cancer Research Center (DKFZ) Heidelberg, Division 191, Heidelberg, Germany.
| | - Danielle Arnold-Schild
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Julian Beyrle
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
- Helmholtz Institute for Translational Oncology Mainz (HI-TRON Mainz) - A Helmholtz Institute of the DKFZ, Mainz, Germany
- German Cancer Research Center (DKFZ) Heidelberg, Division 191, Heidelberg, Germany
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Elena Kumm
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Annica Preikschat
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Mateusz Krzysztof Łącki
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Aurélie Hirschler
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI - FR2048, Strasbourg, France
| | - Jeewan Babu Rijal
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI - FR2048, Strasbourg, France
| | - Christine Carapito
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI - FR2048, Strasbourg, France
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Ute Distler
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
- Research Center for Immunotherapy (FZI), University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Hansjörg Schild
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
- Research Center for Immunotherapy (FZI), University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Stefan Tenzer
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany.
- Helmholtz Institute for Translational Oncology Mainz (HI-TRON Mainz) - A Helmholtz Institute of the DKFZ, Mainz, Germany.
- German Cancer Research Center (DKFZ) Heidelberg, Division 191, Heidelberg, Germany.
- Research Center for Immunotherapy (FZI), University Medical Center of the Johannes-Gutenberg University, Mainz, Germany.
| |
Collapse
|
5
|
Liao H, Barra C, Zhou Z, Peng X, Woodhouse I, Tailor A, Parker R, Carré A, Borrow P, Hogan MJ, Paes W, Eisenlohr LC, Mallone R, Nielsen M, Ternette N. MARS an improved de novo peptide candidate selection method for non-canonical antigen target discovery in cancer. Nat Commun 2024; 15:661. [PMID: 38253617 PMCID: PMC10803737 DOI: 10.1038/s41467-023-44460-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 12/14/2023] [Indexed: 01/24/2024] Open
Abstract
Understanding the nature and extent of non-canonical human leukocyte antigen (HLA) presentation in tumour cells is a priority for target antigen discovery for the development of next generation immunotherapies in cancer. We here employ a de novo mass spectrometric sequencing approach with a refined, MHC-centric analysis strategy to detect non-canonical MHC-associated peptides specific to cancer without any prior knowledge of the target sequence from genomic or RNA sequencing data. Our strategy integrates MHC binding rank, Average local confidence scores, and peptide Retention time prediction for improved de novo candidate Selection; culminating in the machine learning model MARS. We benchmark our model on a large synthetic peptide library dataset and reanalysis of a published dataset of high-quality non-canonical MHC-associated peptide identifications in human cancer. We achieve almost 2-fold improvement for high quality spectral assignments in comparison to de novo sequencing alone with an estimated accuracy of above 85.7% when integrated with a stepwise peptide sequence mapping strategy. Finally, we utilize MARS to detect and validate lncRNA-derived peptides in human cervical tumour resections, demonstrating its suitability to discover novel, immunogenic, non-canonical peptide sequences in primary tumour tissue.
Collapse
Affiliation(s)
- Hanqing Liao
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | | | - Zhicheng Zhou
- Université Paris Cité, Institut Cochin, CNRS, INSERM, 75014, Paris, France
| | - Xu Peng
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
| | - Isaac Woodhouse
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Arun Tailor
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Robert Parker
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Alexia Carré
- Université Paris Cité, Institut Cochin, CNRS, INSERM, 75014, Paris, France
| | - Persephone Borrow
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Michael J Hogan
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Wayne Paes
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Laurence C Eisenlohr
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Roberto Mallone
- Université Paris Cité, Institut Cochin, CNRS, INSERM, 75014, Paris, France
- Assistance Publique Hôpitaux de Paris, Service de Diabétologie et Immunologie Clinique, Cochin Hospital, 75014, Paris, France
| | | | - Nicola Ternette
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK.
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK.
- University of Utrecht, Department of Pharmaceutical Sciences, 3584 CH, Utrecht, The Netherlands.
| |
Collapse
|
6
|
Fan KT, Hsu CW, Chen YR. Mass spectrometry in the discovery of peptides involved in intercellular communication: From targeted to untargeted peptidomics approaches. MASS SPECTROMETRY REVIEWS 2023; 42:2404-2425. [PMID: 35765846 DOI: 10.1002/mas.21789] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 03/17/2022] [Accepted: 04/08/2022] [Indexed: 06/15/2023]
Abstract
Endogenous peptide hormones represent an essential class of biomolecules, which regulate cell-cell communications in diverse physiological processes of organisms. Mass spectrometry (MS) has been developed to be a powerful technology for identifying and quantifying peptides in a highly efficient manner. However, it is difficult to directly identify these peptide hormones due to their diverse characteristics, dynamic regulations, low abundance, and existence in a complicated biological matrix. Here, we summarize and discuss the roles of targeted and untargeted MS in discovering peptide hormones using bioassay-guided purification, bioinformatics screening, or the peptidomics-based approach. Although the peptidomics approach is expected to discover novel peptide hormones unbiasedly, only a limited number of successful cases have been reported. The critical challenges and corresponding measures for peptidomics from the steps of sample preparation, peptide extraction, and separation to the MS data acquisition and analysis are also discussed. We also identify emerging technologies and methods that can be integrated into the discovery platform toward the comprehensive study of endogenous peptide hormones.
Collapse
Affiliation(s)
- Kai-Ting Fan
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| | - Chia-Wei Hsu
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| | - Yet-Ran Chen
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
7
|
Lei JT, Jaehnig EJ, Smith H, Holt MV, Li X, Anurag M, Ellis MJ, Mills GB, Zhang B, Labrie M. The Breast Cancer Proteome and Precision Oncology. Cold Spring Harb Perspect Med 2023; 13:a041323. [PMID: 37137501 PMCID: PMC10547392 DOI: 10.1101/cshperspect.a041323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
The goal of precision oncology is to translate the molecular features of cancer into predictive and prognostic tests that can be used to individualize treatment leading to improved outcomes and decreased toxicity. Success for this strategy in breast cancer is exemplified by efficacy of trastuzumab in tumors overexpressing ERBB2 and endocrine therapy for tumors that are estrogen receptor positive. However, other effective treatments, including chemotherapy, immune checkpoint inhibitors, and CDK4/6 inhibitors are not associated with strong predictive biomarkers. Proteomics promises another tier of information that, when added to genomic and transcriptomic features (proteogenomics), may create new opportunities to improve both treatment precision and therapeutic hypotheses. Here, we review both mass spectrometry-based and antibody-dependent proteomics as complementary approaches. We highlight how these methods have contributed toward a more complete understanding of breast cancer and describe the potential to guide diagnosis and treatment more accurately.
Collapse
Affiliation(s)
- Jonathan T Lei
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Eric J Jaehnig
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Hannah Smith
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon 97239, USA
| | - Matthew V Holt
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Xi Li
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon 97239, USA
| | - Meenakshi Anurag
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Matthew J Ellis
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Gordon B Mills
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon 97239, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Marilyne Labrie
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon 97239, USA
| |
Collapse
|
8
|
Zhang B, Bassani-Sternberg M. Current perspectives on mass spectrometry-based immunopeptidomics: the computational angle to tumor antigen discovery. J Immunother Cancer 2023; 11:e007073. [PMID: 37899131 PMCID: PMC10619091 DOI: 10.1136/jitc-2023-007073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/21/2023] [Indexed: 10/31/2023] Open
Abstract
Identification of tumor antigens presented by the human leucocyte antigen (HLA) molecules is essential for the design of effective and safe cancer immunotherapies that rely on T cell recognition and killing of tumor cells. Mass spectrometry (MS)-based immunopeptidomics enables high-throughput, direct identification of HLA-bound peptides from a variety of cell lines, tumor tissues, and healthy tissues. It involves immunoaffinity purification of HLA complexes followed by MS profiling of the extracted peptides using data-dependent acquisition, data-independent acquisition, or targeted approaches. By incorporating DNA, RNA, and ribosome sequencing data into immunopeptidomics data analysis, the proteogenomic approach provides a powerful means for identifying tumor antigens encoded within the canonical open reading frames of annotated coding genes and non-canonical tumor antigens derived from presumably non-coding regions of our genome. We discuss emerging computational challenges in immunopeptidomics data analysis and tumor antigen identification, highlighting key considerations in the proteogenomics-based approach, including accurate DNA, RNA and ribosomal sequencing data analysis, careful incorporation of predicted novel protein sequences into reference protein database, special quality control in MS data analysis due to the expanded and heterogeneous search space, cancer-specificity determination, and immunogenicity prediction. The advancements in technology and computation is continually enabling us to identify tumor antigens with higher sensitivity and accuracy, paving the way toward the development of more effective cancer immunotherapies.
Collapse
Affiliation(s)
- Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| |
Collapse
|
9
|
McGann CD, Barshop WD, Canterbury JD, Lin C, Gabriel W, Huang J, Bergen D, Zabrouskov V, Melani RD, Wilhelm M, McAlister GC, Schweppe DK. Real-Time Spectral Library Matching for Sample Multiplexed Quantitative Proteomics. J Proteome Res 2023; 22:2836-2846. [PMID: 37557900 DOI: 10.1021/acs.jproteome.3c00085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
Sample multiplexed quantitative proteomics assays have proved to be a highly versatile means to assay molecular phenotypes. Yet, stochastic precursor selection and precursor coisolation can dramatically reduce the efficiency of data acquisition and quantitative accuracy. To address this, intelligent data acquisition (IDA) strategies have recently been developed to improve instrument efficiency and quantitative accuracy for both discovery and targeted methods. Toward this end, we sought to develop and implement a new real-time spectral library searching (RTLS) workflow that could enable intelligent scan triggering and peak selection within milliseconds of scan acquisition. To ensure ease of use and general applicability, we built an application to read in diverse spectral libraries and file types from both empirical and predicted spectral libraries. We demonstrate that RTLS methods enable improved quantitation of multiplexed samples, particularly with consideration for quantitation from chimeric fragment spectra. We used RTLS to profile proteome responses to small molecule perturbations and were able to quantify up to 15% more significantly regulated proteins in half the gradient time compared to traditional methods. Taken together, the development of RTLS expands the IDA toolbox to improve instrument efficiency and quantitative accuracy for sample multiplexed analyses.
Collapse
Affiliation(s)
- Chris D McGann
- University of Washington, Seattle, Washington 98105, United States
| | | | | | - Chuwei Lin
- University of Washington, Seattle, Washington 98105, United States
| | | | - Jingjing Huang
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - David Bergen
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Vlad Zabrouskov
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Rafael D Melani
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | | | | | - Devin K Schweppe
- University of Washington, Seattle, Washington 98105, United States
| |
Collapse
|
10
|
Yang KL, Yu F, Teo GC, Li K, Demichev V, Ralser M, Nesvizhskii AI. MSBooster: improving peptide identification rates using deep learning-based features. Nat Commun 2023; 14:4539. [PMID: 37500632 PMCID: PMC10374903 DOI: 10.1038/s41467-023-40129-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 07/06/2023] [Indexed: 07/29/2023] Open
Abstract
Peptide identification in liquid chromatography-tandem mass spectrometry (LC-MS/MS) experiments relies on computational algorithms for matching acquired MS/MS spectra against sequences of candidate peptides using database search tools, such as MSFragger. Here, we present a new tool, MSBooster, for rescoring peptide-to-spectrum matches using additional features incorporating deep learning-based predictions of peptide properties, such as LC retention time, ion mobility, and MS/MS spectra. We demonstrate the utility of MSBooster, in tandem with MSFragger and Percolator, in several different workflows, including nonspecific searches (immunopeptidomics), direct identification of peptides from data independent acquisition data, single-cell proteomics, and data generated on an ion mobility separation-enabled timsTOF MS platform. MSBooster is fast, robust, and fully integrated into the widely used FragPipe computational platform.
Collapse
Affiliation(s)
- Kevin L Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| | - Guo Ci Teo
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Kai Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Vadim Demichev
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Markus Ralser
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Nuffield Department of Medicine, The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
11
|
Prensner JR, Abelin JG, Kok LW, Clauser KR, Mudge JM, Ruiz-Orera J, Bassani-Sternberg M, Deutsch EW, van Heesch S. What can Ribo-seq and proteomics tell us about the non-canonical proteome? BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.16.541049. [PMID: 37292611 PMCID: PMC10245706 DOI: 10.1101/2023.05.16.541049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Ribosome profiling (Ribo-seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands of non-canonical sites of ribosome translation outside of the currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7,000 non-canonical open reading frames (ORFs) are translated, which, at first glance, has the potential to expand the number of human protein-coding sequences by 30%, from ∼19,500 annotated CDSs to over 26,000. Yet, additional scrutiny of these ORFs has raised numerous questions about what fraction of them truly produce a protein product and what fraction of those can be understood as proteins according to conventional understanding of the term. Adding further complication is the fact that published estimates of non-canonical ORFs vary widely by around 30-fold, from several thousand to several hundred thousand. The summation of this research has left the genomics and proteomics communities both excited by the prospect of new coding regions in the human genome, but searching for guidance on how to proceed. Here, we discuss the current state of non-canonical ORF research, databases, and interpretation, focusing on how to assess whether a given ORF can be said to be "protein-coding". In brief The human genome encodes thousands of non-canonical open reading frames (ORFs) in addition to protein-coding genes. As a nascent field, many questions remain regarding non-canonical ORFs. How many exist? Do they encode proteins? What level of evidence is needed for their verification? Central to these debates has been the advent of ribosome profiling (Ribo-seq) as a method to discern genome-wide ribosome occupancy, and immunopeptidomics as a method to detect peptides that are processed and presented by MHC molecules and not observed in traditional proteomics experiments. This article provides a synthesis of the current state of non-canonical ORF research and proposes standards for their future investigation and reporting. Highlights Combined use of Ribo-seq and proteomics-based methods enables optimal confidence in detecting non-canonical ORFs and their protein products.Ribo-seq can provide more sensitive detection of non-canonical ORFs, but data quality and analytical pipelines will impact results.Non-canonical ORF catalogs are diverse and span both high-stringency and low-stringency ORF nominations.A framework for standardized non-canonical ORF evidence will advance the research field.
Collapse
Affiliation(s)
- John R. Prensner
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | - Leron W. Kok
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| | - Karl R. Clauser
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jonathan M. Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center Bugnon 25A, 1005 Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1005 Lausanne, Switzerland
- Agora Cancer Research Centre, 1011 Lausanne, Switzerland
| | - Eric W. Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| |
Collapse
|
12
|
Lichti CF, Wan X. Using mass spectrometry to identify neoantigens in autoimmune diseases: The type 1 diabetes example. Semin Immunol 2023; 66:101730. [PMID: 36827760 PMCID: PMC10324092 DOI: 10.1016/j.smim.2023.101730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/06/2023] [Accepted: 02/09/2023] [Indexed: 02/24/2023]
Abstract
In autoimmune diseases, recognition of self-antigens presented by major histocompatibility complex (MHC) molecules elicits unexpected attack of tissue by autoantibodies and/or autoreactive T cells. Post-translational modification (PTM) may alter the MHC-binding motif or TCR contact residues in a peptide antigen, transforming the tolerance to self to autoreactivity. Mass spectrometry-based immunopeptidomics provides a valuable mechanism for identifying MHC ligands that contain PTMs and can thus provide valuable insights into pathogenesis and therapeutics of autoimmune diseases. A plethora of PTMs have been implicated in this process, and this review highlights their formation and identification.
Collapse
Affiliation(s)
- Cheryl F Lichti
- Department of Pathology and Immunology, Division of Immunobiology, The Andrew M. and Jane M. Bursky Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, 660 S. Euclid Ave, Campus Box 8118, St. Louis, MO 63110, USA.
| | - Xiaoxiao Wan
- Department of Pathology and Immunology, Division of Immunobiology, The Andrew M. and Jane M. Bursky Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, 660 S. Euclid Ave, Campus Box 8118, St. Louis, MO 63110, USA.
| |
Collapse
|
13
|
Yi X, Wen B, Ji S, Saltzman A, Jaehnig EJ, Lei JT, Gao Q, Zhang B. Deep learning prediction boosts phosphoproteomics-based discoveries through improved phosphopeptide identification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.11.523329. [PMID: 36711982 PMCID: PMC9882090 DOI: 10.1101/2023.01.11.523329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Shotgun phosphoproteomics enables high-throughput analysis of phosphopeptides in biological samples, but low phosphopeptide identification rate in data analysis limits the potential of this technology. Here we present DeepRescore2, a computational workflow that leverages deep learning-based retention time and fragment ion intensity predictions to improve phosphopeptide identification and phosphosite localization. Using a state-of-the-art computational workflow as a benchmark, DeepRescore2 increases the number of correctly identified peptide-spectrum matches by 17% in a synthetic dataset and identifies 19%-46% more phosphopeptides in biological datasets. In a liver cancer dataset, 30% of the significantly altered phosphosites between tumor and normal tissues and 60% of the prognosis-associated phosphosites identified from DeepRescore2-processed data could not be identified based on the state-of-the-art workflow. Notably, DeepRescore2-processed data uniquely identifies EGFR hyperactivation as a new target in poor-prognosis liver cancer, which is validated experimentally. Integration of deep learning prediction in DeepRescore2 improves phosphopeptide identification and facilitates biological discoveries.
Collapse
|
14
|
Cox J. Prediction of peptide mass spectral libraries with machine learning. Nat Biotechnol 2023; 41:33-43. [PMID: 36008611 DOI: 10.1038/s41587-022-01424-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/11/2022] [Indexed: 01/21/2023]
Abstract
The recent development of machine learning methods to identify peptides in complex mass spectrometric data constitutes a major breakthrough in proteomics. Longstanding methods for peptide identification, such as search engines and experimental spectral libraries, are being superseded by deep learning models that allow the fragmentation spectra of peptides to be predicted from their amino acid sequence. These new approaches, including recurrent neural networks and convolutional neural networks, use predicted in silico spectral libraries rather than experimental libraries to achieve higher sensitivity and/or specificity in the analysis of proteomics data. Machine learning is galvanizing applications that involve large search spaces, such as immunopeptidomics and proteogenomics. Current challenges in the field include the prediction of spectra for peptides with post-translational modifications and for cross-linked pairs of peptides. Permeation of machine-learning-based spectral prediction into search engines and spectrum-centric data-independent acquisition workflows for diverse peptide classes and measurement conditions will continue to push sensitivity and dynamic range in proteomics applications in the coming years.
Collapse
Affiliation(s)
- Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany.
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
15
|
Zeng WF, Zhou XX, Willems S, Ammar C, Wahle M, Bludau I, Voytik E, Strauss MT, Mann M. AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics. Nat Commun 2022; 13:7238. [PMID: 36433986 PMCID: PMC9700817 DOI: 10.1038/s41467-022-34904-3] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 11/10/2022] [Indexed: 11/27/2022] Open
Abstract
Machine learning and in particular deep learning (DL) are increasingly important in mass spectrometry (MS)-based proteomics. Recent DL models can predict the retention time, ion mobility and fragment intensities of a peptide just from the amino acid sequence with good accuracy. However, DL is a very rapidly developing field with new neural network architectures frequently appearing, which are challenging to incorporate for proteomics researchers. Here we introduce AlphaPeptDeep, a modular Python framework built on the PyTorch DL library that learns and predicts the properties of peptides ( https://github.com/MannLabs/alphapeptdeep ). It features a model shop that enables non-specialists to create models in just a few lines of code. AlphaPeptDeep represents post-translational modifications in a generic manner, even if only the chemical composition is known. Extensive use of transfer learning obviates the need for large data sets to refine models for particular experimental conditions. The AlphaPeptDeep models for predicting retention time, collisional cross sections and fragment intensities are at least on par with existing tools. Additional sequence-based properties can also be predicted by AlphaPeptDeep, as demonstrated with a HLA peptide prediction model to improve HLA peptide identification for data-independent acquisition ( https://github.com/MannLabs/PeptDeep-HLA ).
Collapse
Affiliation(s)
- Wen-Feng Zeng
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Xie-Xuan Zhou
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Sander Willems
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Constantin Ammar
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Maria Wahle
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Isabell Bludau
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Eugenia Voytik
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Maximillian T Strauss
- Proteomics Program, NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Matthias Mann
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- Proteomics Program, NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
16
|
Perez-Riverol Y. Proteomic repository data submission, dissemination, and reuse: key messages. Expert Rev Proteomics 2022; 19:297-310. [PMID: 36529941 PMCID: PMC7614296 DOI: 10.1080/14789450.2022.2160324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 12/07/2022] [Indexed: 12/23/2022]
Abstract
INTRODUCTION The creation of ProteomeXchange data workflows in 2012 transformed the field of proteomics, consisting of the standardization of data submission and dissemination and enabling the widespread reanalysis of public MS proteomics data worldwide. ProteomeXchange has triggered a growing trend toward public dissemination of proteomics data, facilitating the assessment, reuse, comparative analyses, and extraction of new findings from public datasets. By 2022, the consortium is integrated by PRIDE, PeptideAtlas, MassIVE, jPOST, iProX, and Panorama Public. AREAS COVERED Here, we review and discuss the current ecosystem of resources, guidelines, and file formats for proteomics data dissemination and reanalysis. Special attention is drawn to new exciting quantitative and post-translational modification-oriented resources. The challenges and future directions on data depositions including the lack of metadata and cloud-based and high-performance software solutions for fast and reproducible reanalysis of the available data are discussed. EXPERT OPINION The success of ProteomeXchange and the amount of proteomics data available in the public domain have triggered the creation and/or growth of other protein knowledgebase resources. Data reuse is a leading, active, and evolving field; supporting the creation of new formats, tools, and workflows to rediscover and reshape the public proteomics data.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
17
|
IntroSpect: Motif-Guided Immunopeptidome Database Building Tool to Improve the Sensitivity of HLA I Binding Peptide Identification by Mass Spectrometry. Biomolecules 2022; 12:biom12040579. [PMID: 35454168 PMCID: PMC9025654 DOI: 10.3390/biom12040579] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 04/11/2022] [Accepted: 04/12/2022] [Indexed: 01/02/2023] Open
Abstract
Although database search tools originally developed for shotgun proteome have been widely used in immunopeptidomic mass spectrometry identifications, they have been reported to achieve undesirably low sensitivities or high false positive rates as a result of the hugely inflated search space caused by the lack of specific enzymic digestions in immunopeptidome. To overcome such a problem, we developed a motif-guided immunopeptidome database building tool named IntroSpect, which is designed to first learn the peptide motifs from high confidence hits in the initial search, and then build a targeted database for refined search. Evaluated on 18 representative HLA class I datasets, IntroSpect can improve the sensitivity by an average of 76%, compared to conventional searches with unspecific digestions, while maintaining a very high level of accuracy (~96%), as confirmed by synthetic validation experiments. A distinct advantage of IntroSpect is that it does not depend on any external HLA data, so that it performs equally well on both well-studied and poorly-studied HLA types, unlike the previously developed method SpectMHC. We have also designed IntroSpect to keep a global FDR that can be conveniently controlled, similar to a conventional database search. Finally, we demonstrate the practical value of IntroSpect by discovering neoepitopes from MS data directly, an important application in cancer immunotherapies. IntroSpect is freely available to download and use.
Collapse
|
18
|
Nielsen M, Ternette N, Barra C. The interdependence of machine learning and LC-MS approaches for an unbiased understanding of the cellular immunopeptidome. Expert Rev Proteomics 2022; 19:77-88. [PMID: 35390265 DOI: 10.1080/14789450.2022.2064278] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
INTRODUCTION The comprehensive collection of peptides presented by Major Histocompatibility Complex (MHC) molecules on the cell surface is collectively known as the immunopeptidome. The analysis and interpretation of such data sets holds great promise for furthering our understanding of basic immunology and adaptive immune activation and regulation, and for direct rational discovery of T cell antigens and the design of T-cell based therapeutics and vaccines. These applications are however challenged by the complex nature of immunopeptidome data. AREAS COVERED Here, we describe the benefits and shortcomings of applying liquid chromatography-tandem mass spectrometry (MS) to obtain large scale immunopeptidome data sets and illustrate how the accurate analysis and optimal interpretation of such data is reliant on the availability of refined and highly optimized machine learning approaches. EXPERT OPINION Further we demonstrate how the accuracy of immunoinformatics prediction methods within the field of MHC antigen presentation has benefited greatly from the availability of MS-immunopeptidomics data, and exemplify how optimal antigen discovery is best performed in a synergistic combination of MS experiments and such in silico models trained on large scale immunopeptidomics data.
Collapse
Affiliation(s)
- Morten Nielsen
- Department of Health technology, Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Nicola Ternette
- Centre for Cellular and Molecular Physiology, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Carolina Barra
- Department of Health technology, Technical University of Denmark, DK-2800 Lyngby, Denmark
| |
Collapse
|
19
|
Becker JP, Riemer AB. The Importance of Being Presented: Target Validation by Immunopeptidomics for Epitope-Specific Immunotherapies. Front Immunol 2022; 13:883989. [PMID: 35464395 PMCID: PMC9018990 DOI: 10.3389/fimmu.2022.883989] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 03/16/2022] [Indexed: 11/26/2022] Open
Abstract
Presentation of tumor-specific or tumor-associated peptides by HLA class I molecules to CD8+ T cells is the foundation of epitope-centric cancer immunotherapies. While often in silico HLA binding predictions or in vitro immunogenicity assays are utilized to select candidates, mass spectrometry-based immunopeptidomics is currently the only method providing a direct proof of actual cell surface presentation. Despite much progress in the last decade, identification of such HLA-presented peptides remains challenging. Here we review typical workflows and current developments in the field of immunopeptidomics, highlight the challenges which remain to be solved and emphasize the importance of direct target validation for clinical immunotherapy development.
Collapse
Affiliation(s)
- Jonas P. Becker
- Immunotherapy and Immunoprevention, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Angelika B. Riemer
- Immunotherapy and Immunoprevention, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Molecular Vaccine Design, German Center for Infection Research (DZIF), Partner Site Heidelberg, Heidelberg, Germany
| |
Collapse
|
20
|
Yi X, Liao Y, Wen B, Li K, Dou Y, Savage SR, Zhang B. caAtlas: An immunopeptidome atlas of human cancer. iScience 2021; 24:103107. [PMID: 34622160 PMCID: PMC8479791 DOI: 10.1016/j.isci.2021.103107] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 08/10/2021] [Accepted: 09/03/2021] [Indexed: 01/24/2023] Open
Abstract
Comprehensive characterization of tumor antigens is essential for the design of cancer immunotherapies, and mass spectrometry (MS)-based immunopeptidomics enables high-throughput identification of major histocompatibility complex (MHC)-bound peptide antigens in vivo. Here we construct an immunopeptidome atlas of human cancer through an extensive collection of 43 published immunopeptidomic datasets and standardized analysis of 81.6 million MS/MS spectra using an open search engine. Our analysis greatly expands the current knowledge of MHC-bound antigens, including an unprecedented characterization of post-translationally modified antigens and their cancer-association. We also perform systematic analysis of cancer-testis antigens, cancer-associated antigens, and neoantigens. We make all these data together with annotated MS/MS spectra supporting identification of each antigen in an easily browsable web portal named cancer antigen atlas (caAtlas). caAtlas provides a central resource for the selection and prioritization of MHC-bound peptides for in vitro HLA binding assay and immunogenicity testing, which will pave the way to eventual development of cancer immunotherapies. Extensive collection of 43 immunopeptidomic datasets with 1018 samples Standardized and rigorous identification of HLA-bound peptides, including PTM peptides Comprehensive annotation of CT antigens and cancer-associated antigens User-friendly data dissemination through the caAtlas web portal
Collapse
Affiliation(s)
- Xinpei Yi
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yuxing Liao
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Bo Wen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kai Li
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yongchao Dou
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sara R Savage
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
21
|
Kudriavtseva P, Kashkinov M, Kertész-Farkas A. Deep Convolutional Neural Networks Help Scoring Tandem Mass Spectrometry Data in Database-Searching Approaches. J Proteome Res 2021; 20:4708-4717. [PMID: 34449232 DOI: 10.1021/acs.jproteome.1c00315] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Spectrum annotation is a challenging task due to the presence of unexpected peptide fragmentation ions as well as the inaccuracy of the detectors of the spectrometers. We present a deep convolutional neural network, called Slider, which learns an optimal feature extraction in its kernels for scoring mass spectrometry (MS)/MS spectra to increase the number of spectrum annotations with high confidence. Experimental results using publicly available data sets show that Slider can annotate slightly more spectra than the state-of-the-art methods (BoltzMatch, Res-EV, Prosit), albeit 2-10 times faster. More interestingly, Slider provides only 2-4% fewer spectrum annotations with low-resolution fragmentation information than other methods with high-resolution information. This means that Slider can exploit nearly as much information from the context of low-resolution spectrum peaks as the high-resolution fragmentation information can provide for other scoring methods. Thus, Slider can be an optimal choice for practitioners using old spectrometers with low-resolution detectors.
Collapse
Affiliation(s)
- Polina Kudriavtseva
- Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 11 Pokrovsky Bvld., Moscow 109028, Russian Federation
| | - Matvey Kashkinov
- Faculty of Computer Science, HSE University, 11 Pokrovsky Bvld., Moscow 109028, Russian Federation
| | - Attila Kertész-Farkas
- Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 11 Pokrovsky Bvld., Moscow 109028, Russian Federation
| |
Collapse
|
22
|
Mann M, Kumar C, Zeng WF, Strauss MT. Artificial intelligence for proteomics and biomarker discovery. Cell Syst 2021; 12:759-770. [PMID: 34411543 DOI: 10.1016/j.cels.2021.06.006] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/07/2021] [Accepted: 06/28/2021] [Indexed: 12/14/2022]
Abstract
There is an avalanche of biomedical data generation and a parallel expansion in computational capabilities to analyze and make sense of these data. Starting with genome sequencing and widely employed deep sequencing technologies, these trends have now taken hold in all omics disciplines and increasingly call for multi-omics integration as well as data interpretation by artificial intelligence technologies. Here, we focus on mass spectrometry (MS)-based proteomics and describe how machine learning and, in particular, deep learning now predicts experimental peptide measurements from amino acid sequences alone. This will dramatically improve the quality and reliability of analytical workflows because experimental results should agree with predictions in a multi-dimensional data landscape. Machine learning has also become central to biomarker discovery from proteomics data, which now starts to outperform existing best-in-class assays. Finally, we discuss model transparency and explainability and data privacy that are required to deploy MS-based biomarkers in clinical settings.
Collapse
Affiliation(s)
- Matthias Mann
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | - Chanchal Kumar
- Translational Science & Experimental Medicine, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
| | - Wen-Feng Zeng
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | | |
Collapse
|
23
|
Wilhelm M, Zolg DP, Graber M, Gessulat S, Schmidt T, Schnatbaum K, Schwencke-Westphal C, Seifert P, de Andrade Krätzig N, Zerweck J, Knaute T, Bräunlein E, Samaras P, Lautenbacher L, Klaeger S, Wenschuh H, Rad R, Delanghe B, Huhmer A, Carr SA, Clauser KR, Krackhardt AM, Reimer U, Kuster B. Deep learning boosts sensitivity of mass spectrometry-based immunopeptidomics. Nat Commun 2021; 12:3346. [PMID: 34099720 PMCID: PMC8184761 DOI: 10.1038/s41467-021-23713-9] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 05/11/2021] [Indexed: 12/30/2022] Open
Abstract
Characterizing the human leukocyte antigen (HLA) bound ligandome by mass spectrometry (MS) holds great promise for developing vaccines and drugs for immune-oncology. Still, the identification of non-tryptic peptides presents substantial computational challenges. To address these, we synthesized and analyzed >300,000 peptides by multi-modal LC-MS/MS within the ProteomeTools project representing HLA class I & II ligands and products of the proteases AspN and LysN. The resulting data enabled training of a single model using the deep learning framework Prosit, allowing the accurate prediction of fragment ion spectra for tryptic and non-tryptic peptides. Applying Prosit demonstrates that the identification of HLA peptides can be improved up to 7-fold, that 87% of the proposed proteasomally spliced HLA peptides may be incorrect and that dozens of additional immunogenic neo-epitopes can be identified from patient tumors in published data. Together, the provided peptides, spectra and computational tools substantially expand the analytical depth of immunopeptidomics workflows.
Collapse
Affiliation(s)
- Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany.
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany.
| | - Daniel P Zolg
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany
| | - Michael Graber
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany
| | - Tobias Schmidt
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany
| | | | - Celina Schwencke-Westphal
- Klinik und Poliklinik für Innere Medizin III, Klinikum rechts der Isar, School of Medicine, Technical University of Munich (TUM), Munich, Germany
- German Cancer Consortium (DKTK), partner site Munich; and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Center for Translational Cancer Research (TranslaTUM), TUM School of Medicine, Technical University of Munich (TUM), Munich, Germany
| | - Philipp Seifert
- Klinik und Poliklinik für Innere Medizin III, Klinikum rechts der Isar, School of Medicine, Technical University of Munich (TUM), Munich, Germany
- Center for Translational Cancer Research (TranslaTUM), TUM School of Medicine, Technical University of Munich (TUM), Munich, Germany
| | - Niklas de Andrade Krätzig
- Center for Translational Cancer Research (TranslaTUM), TUM School of Medicine, Technical University of Munich (TUM), Munich, Germany
- Institute of Molecular Oncology and Functional Genomics, TUM School of Medicine, Technical University of Munich (TUM), Munich, Germany
- Klinik und Poliklinik für Innere Medizin II, Klinikum rechts der Isar, School of Medicine, Technical University of Munich (TUM), Munich, Germany
| | | | | | - Eva Bräunlein
- Klinik und Poliklinik für Innere Medizin III, Klinikum rechts der Isar, School of Medicine, Technical University of Munich (TUM), Munich, Germany
- Center for Translational Cancer Research (TranslaTUM), TUM School of Medicine, Technical University of Munich (TUM), Munich, Germany
| | - Patroklos Samaras
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany
| | - Ludwig Lautenbacher
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany
| | - Susan Klaeger
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Roland Rad
- Center for Translational Cancer Research (TranslaTUM), TUM School of Medicine, Technical University of Munich (TUM), Munich, Germany
- Institute of Molecular Oncology and Functional Genomics, TUM School of Medicine, Technical University of Munich (TUM), Munich, Germany
- Klinik und Poliklinik für Innere Medizin II, Klinikum rechts der Isar, School of Medicine, Technical University of Munich (TUM), Munich, Germany
| | | | | | - Steven A Carr
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Angela M Krackhardt
- Klinik und Poliklinik für Innere Medizin III, Klinikum rechts der Isar, School of Medicine, Technical University of Munich (TUM), Munich, Germany
- German Cancer Consortium (DKTK), partner site Munich; and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Institute of Molecular Oncology and Functional Genomics, TUM School of Medicine, Technical University of Munich (TUM), Munich, Germany
| | - Ulf Reimer
- JPT Peptide Technologies GmbH, Berlin, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany.
- Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich (TUM), Freising, Germany.
| |
Collapse
|
24
|
Juanes-Velasco P, Landeira-Viñuela A, Acebes-Fernandez V, Hernández ÁP, Garcia-Vaquero ML, Arias-Hidalgo C, Bareke H, Montalvillo E, Gongora R, Fuentes M. Deciphering Human Leukocyte Antigen Susceptibility Maps From Immunopeptidomics Characterization in Oncology and Infections. Front Cell Infect Microbiol 2021; 11:642583. [PMID: 34123866 PMCID: PMC8195621 DOI: 10.3389/fcimb.2021.642583] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 04/29/2021] [Indexed: 12/13/2022] Open
Abstract
Genetic variability across the three major histocompatibility complex (MHC) class I genes (human leukocyte antigen [HLA] A, B, and C) may affect susceptibility to many diseases such as cancer, auto-immune or infectious diseases. Individual genetic variation may help to explain different immune responses to microorganisms across a population. HLA typing can be fast and inexpensive; however, deciphering peptides loaded on MHC-I and II which are presented to T cells, require the design and development of high-sensitivity methodological approaches and subsequently databases. Hence, these novel strategies and databases could help in the generation of vaccines using these potential immunogenic peptides and in identifying high-risk HLA types to be prioritized for vaccination programs. Herein, the recent developments and approaches, in this field, focusing on the identification of immunogenic peptides have been reviewed and the next steps to promote their translation into biomedical and clinical practice are discussed.
Collapse
Affiliation(s)
- Pablo Juanes-Velasco
- Department of Medicine and Cytometry General Service-Nucleus, CIBERONC, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
| | - Alicia Landeira-Viñuela
- Department of Medicine and Cytometry General Service-Nucleus, CIBERONC, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
| | - Vanessa Acebes-Fernandez
- Department of Medicine and Cytometry General Service-Nucleus, CIBERONC, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
| | - Ángela-Patricia Hernández
- Department of Medicine and Cytometry General Service-Nucleus, CIBERONC, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
| | - Marina L. Garcia-Vaquero
- Department of Medicine and Cytometry General Service-Nucleus, CIBERONC, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
| | - Carlota Arias-Hidalgo
- Department of Medicine and Cytometry General Service-Nucleus, CIBERONC, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
| | - Halin Bareke
- Department of Medicine and Cytometry General Service-Nucleus, CIBERONC, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
| | - Enrique Montalvillo
- Department of Medicine and Cytometry General Service-Nucleus, CIBERONC, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
| | - Rafael Gongora
- Department of Medicine and Cytometry General Service-Nucleus, CIBERONC, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
| | - Manuel Fuentes
- Department of Medicine and Cytometry General Service-Nucleus, CIBERONC, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
- Proteomics Unit, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), Salamanca, Spain
| |
Collapse
|
25
|
Ding F, Liu Y, Zhuang Z, Wang Z. A Sawn Timber Tree Species Recognition Method Based on AM-SPPResNet. SENSORS 2021; 21:s21113699. [PMID: 34073445 PMCID: PMC8198648 DOI: 10.3390/s21113699] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 05/20/2021] [Accepted: 05/25/2021] [Indexed: 01/11/2023]
Abstract
Sawn timber is an important component material in furniture manufacturing, decoration, construction and other industries. The mechanical properties, surface colors, textures, use and other properties of sawn timber possesed by different tree species are different. In order to meet the needs of reasonable timber use and product quality of sawn timber products, sawn timber must be identified according to tree species to ensure the best use of materials. In this study, an optimized convolution neural network was proposed to process sawn timber image data to identify the tree species of the sawn timber. The spatial pyramid pooling and attention mechanism were used to improve the convolution layer of ResNet101 to extract the feature vector of sawn timber images. The optimized ResNet (simply called “AM-SPPResNet”) was used to identify the sawn timber image, and the basic recognition model was obtained. Then, the weight parameters of the feature extraction layer of the basic model were frozen, the full connection layer was removed, and using support vector machine (SVM) and XGBoost classifier which were commonly used in machine learning to train and learn the 21 × 1024 dimension feature vectors extracted by feature extraction layer. Through a number of comparative experiments, it is found that the prediction model using linear function as the kernel function of support vector machine learning the feature vectors extracted from the improved convolution layer performed best, and the F1 score and overall accuracy of all kinds of samples were above 99%. Compared with the traditional methods, the accuracy was improved by up to 12%.
Collapse
|
26
|
Zolg DP, Gessulat S, Paschke C, Graber M, Rathke-Kuhnert M, Seefried F, Fitzemeier K, Berg F, Lopez-Ferrer D, Horn D, Henrich C, Huhmer A, Delanghe B, Frejno M. INFERYS rescoring: Boosting peptide identifications and scoring confidence of database search results. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2021:e9128. [PMID: 34015160 DOI: 10.1002/rcm.9128] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 04/14/2021] [Accepted: 05/17/2021] [Indexed: 06/12/2023]
Abstract
Database search engines for bottom-up proteomics largely ignore peptide fragment ion intensities during the automated scoring of tandem mass spectra against protein databases. Recent advances in deep learning allow the accurate prediction of peptide fragment ion intensities. Using these predictions to calculate additional intensity-based scores helps to overcome this drawback. Here, we describe a processing workflow termed INFERYS™ rescoring for the intensity-based rescoring of Sequest HT search engine results in Thermo Scientific™ Proteome Discoverer™ 2.5 software. The workflow is based on the deep learning platform INFERYS capable of predicting fragment ion intensities, which runs on personal computers without the need for graphics processing units. This workflow calculates intensity-based scores comparing peptide spectrum matches from Sequest HT and predicted spectra. Resulting scores are combined with classical search engine scores for input to the false discovery rate estimation tool Percolator. We demonstrate the merits of this approach by analyzing a classical HeLa standard sample and exemplify how this workflow leads to a better separation of target and decoy identifications, in turn resulting in increased peptide spectrum match, peptide and protein identification numbers. On an immunopeptidome dataset, this workflow leads to a 50% increase in identified peptides, emphasizing the advantage of intensity-based scores when analyzing low-intensity spectra or analytes with very similar physicochemical properties that require vast search spaces. Overall, the end-to-end integration of INFERYS rescoring enables simple and easy access to a powerful enhancement to classical database search engines, promising a deeper, more confident and more comprehensive analysis of proteomic data from any organism by unlocking the intensity dimension of tandem mass spectra for identification and more confident scoring.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Frank Berg
- Thermo Fisher Scientific (Bremen) GmbH, Bremen, Germany
| | | | - David Horn
- Thermo Fisher Scientific, San Jose, CA, USA
| | | | | | | | | |
Collapse
|
27
|
Empirical Evaluation of the Use of Computational HLA Binding as an Early Filter to the Mass Spectrometry-Based Epitope Discovery Workflow. Cancers (Basel) 2021; 13:cancers13102307. [PMID: 34065814 PMCID: PMC8150281 DOI: 10.3390/cancers13102307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 05/06/2021] [Accepted: 05/06/2021] [Indexed: 12/22/2022] Open
Abstract
Immunopeptidomics is used to identify novel epitopes for (therapeutic) vaccination strategies in cancer and infectious disease. Various false discovery rates (FDRs) are applied in the field when converting liquid chromatography-tandem mass spectrometry (LC-MS/MS) spectra to peptides. Subsequently, large efforts have recently been made to rescue peptides of lower confidence. However, it remains unclear what the overall relation is between the FDR threshold and the percentage of obtained HLA-binders. We here directly evaluated the effect of varying FDR thresholds on the resulting immunopeptidomes of HLA-eluates from human cancer cell lines and primary hepatocyte isolates using HLA-binding algorithms. Additional peptides obtained using less stringent FDR-thresholds, although generally derived from poorer spectra, still contained a high amount of HLA-binders and confirmed recently developed tools that tap into this pool of otherwise ignored peptides. Most of these peptides were identified with improved confidence when cell input was increased, supporting the validity and potential of these identifications. Altogether, our data suggest that increasing the FDR threshold for peptide identification in conjunction with data filtering by HLA-binding prediction, is a valid and highly potent method to more efficient exhaustion of immunopeptidome datasets for epitope discovery and reveals the extent of peptides to be rescued by recently developed algorithms.
Collapse
|
28
|
Wen B, Zhang B. Computational Proteomics: Focus on Deep Learning. Proteomics 2020; 20:e2000258. [PMID: 33210458 DOI: 10.1002/pmic.202000258] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 10/14/2020] [Indexed: 11/09/2022]
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| |
Collapse
|
29
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|