1
|
Hamood F, Gabriel W, Pfeiffer P, Kuster B, Wilhelm M, The M. ProSIMSIt: The Best of Both Worlds in Data-Driven Rescoring and Identification Transfer. J Proteome Res 2025; 24:2173-2180. [PMID: 40119808 PMCID: PMC11976853 DOI: 10.1021/acs.jproteome.4c00967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 02/19/2025] [Accepted: 03/10/2025] [Indexed: 03/24/2025]
Abstract
Multibatch isobaric labeling experiments are frequently applied for clinical and pharmaceutical studies of large sample cohorts. To tackle the critical issue of missing values in such studies, we introduce the ProSIMSIt pipeline. It combines the advantages of tandem mass spectrum clustering via SIMSI-Transfer and data-driven rescoring via Prosit and Oktoberfest. We demonstrate that these two tools are complementary and mutually beneficial. On large-scale cancer cohort data, ProSIMSIt increased the number of peptide spectrum matches (PSMs) by 40% on both global and phosphoproteome data sets. Furthermore, on data from proteome-wide drug-response profiling of post-translational modifications (decryptM), our pipeline substantially increased drug-PTM relations and revealed previously unseen downstream effects of drug target inhibition. ProSIMSIt is available as an open-source Python package with a simple command line interface that allows easy application to MaxQuant result files.
Collapse
Affiliation(s)
- Firas Hamood
- Chair
of Proteomics and Bioanalytics, School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Wassim Gabriel
- Assistant
Professorship of Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Pia Pfeiffer
- Assistant
Professorship of Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Bernhard Kuster
- Chair
of Proteomics and Bioanalytics, School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
- Munich
Data Science Institute (MDSI), Technical
University of Munich, 85748 Garching, Germany
| | - Mathias Wilhelm
- Assistant
Professorship of Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
- Munich
Data Science Institute (MDSI), Technical
University of Munich, 85748 Garching, Germany
| | - Matthew The
- Chair
of Proteomics and Bioanalytics, School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
2
|
Castaño JD, Beaudry F. Comparative Analysis of Data-Driven Rescoring Platforms for Improved Peptide Identification in HeLa Digest Samples. Proteomics 2025; 25:e202400225. [PMID: 39895169 PMCID: PMC11962579 DOI: 10.1002/pmic.202400225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 09/16/2024] [Accepted: 01/21/2025] [Indexed: 02/04/2025]
Abstract
Mass spectrometry is a critical tool to understand complex changes in biological processes. Despite significant advances in search engine technology, many spectra remain unassigned. This research evaluates the performance of three rescoring platforms, Oktoberfest, MS2Rescore, and inSPIRE, using MaxQuant output. The results indicated a substantial increase in identifications at the peptide level (40%-53%) and PSM level (64%-67%). However, some peptides were lost due to limitations in processing posttranslational modifications (PTMs)-with up to 75% of lost peptides exhibiting PTMs. Each platform displayed distinct strengths and weaknesses. For instance, inSPIRE performed best in terms of peptide identifications and unique peptides, while MS2Rescore performed better for PSMs at higher FDR values. Differences in platform performance stemmed from different sources: original search engine feature selection, type of ion series predicted, retention time predictor, and PTMs compatibility. Overall, inSPIRE showed a superior ability to harness original search engine results. Taken all together, rescoring platforms clearly outperformed original search results; however, they demanded additional computation time (up to 77%) and manual adjustments. The findings here underline the necessity of integrating rescoring platforms into current proteomics pipelines but also address some challenges in their implementation and optimization. Future integrated platforms may help enhance adoption.
Collapse
Affiliation(s)
- Jesus D. Castaño
- Département de Biomédecine Vétérinaire, Faculté de Médecine VétérinaireUniversité de MontréalSaint‐HyacintheCanada
- Centre de recherche sur le cerveau et l'apprentissage (CIRCA)Université de MontréalSaint‐HyacintheCanada
| | - Francis Beaudry
- Département de Biomédecine Vétérinaire, Faculté de Médecine VétérinaireUniversité de MontréalSaint‐HyacintheCanada
- Centre de recherche sur le cerveau et l'apprentissage (CIRCA)Université de MontréalSaint‐HyacintheCanada
| |
Collapse
|
3
|
Movassaghi CS, Sun J, Jiang Y, Turner N, Chang V, Chung N, Chen RJ, Browne EN, Lin C, Schweppe DK, Malaker SA, Meyer JG. Recent Advances in Mass Spectrometry-Based Bottom-Up Proteomics. Anal Chem 2025; 97:4728-4749. [PMID: 40000226 DOI: 10.1021/acs.analchem.4c06750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2025]
Abstract
Mass spectrometry-based proteomics is about 35 years old, and recent progress appears to be speeding up across all subfields. In this review, we focus on advances over the last two years in select areas within bottom-up proteomics, including approaches to high-throughput experiments, data analysis using machine learning, drug discovery, glycoproteomics, extracellular vesicle proteomics, and structural proteomics.
Collapse
Affiliation(s)
- Cameron S Movassaghi
- Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Jie Sun
- Department of Biochemistry & Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Yuming Jiang
- Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Natalie Turner
- Departments of Molecular Medicine and Neurobiology, Scripps Research Institute, La Jolla, California 92037, United States
| | - Vincent Chang
- Department of Chemistry, Yale University, 275 Prospect Street, New Haven, Connecticut 06511, United States
| | - Nara Chung
- Department of Chemistry, Yale University, 275 Prospect Street, New Haven, Connecticut 06511, United States
| | - Ryan J Chen
- Department of Chemistry, Yale University, 275 Prospect Street, New Haven, Connecticut 06511, United States
| | - Elizabeth N Browne
- Department of Chemistry, Yale University, 275 Prospect Street, New Haven, Connecticut 06511, United States
| | - Chuwei Lin
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - Devin K Schweppe
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - Stacy A Malaker
- Department of Chemistry, Yale University, 275 Prospect Street, New Haven, Connecticut 06511, United States
| | - Jesse G Meyer
- Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| |
Collapse
|
4
|
Declercq A, Devreese R, Scheid J, Jachmann C, Van Den Bossche T, Preikschat A, Gomez-Zepeda D, Rijal JB, Hirschler A, Krieger JR, Srikumar T, Rosenberger G, Martelli C, Trede D, Carapito C, Tenzer S, Walz JS, Degroeve S, Bouwmeester R, Martens L, Gabriels R. TIMS 2Rescore: A Data Dependent Acquisition-Parallel Accumulation and Serial Fragmentation-Optimized Data-Driven Rescoring Pipeline Based on MS 2Rescore. J Proteome Res 2025; 24:1067-1076. [PMID: 39915959 PMCID: PMC11894666 DOI: 10.1021/acs.jproteome.4c00609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Revised: 11/08/2024] [Accepted: 01/27/2025] [Indexed: 03/08/2025]
Abstract
The high throughput analysis of proteins with mass spectrometry (MS) is highly valuable for understanding human biology, discovering disease biomarkers, identifying therapeutic targets, and exploring pathogen interactions. To achieve these goals, specialized proteomics subfields, including plasma proteomics, immunopeptidomics, and metaproteomics, must tackle specific analytical challenges, such as an increased identification ambiguity compared to routine proteomics experiments. Technical advancements in MS instrumentation can mitigate these issues by acquiring more discerning information at higher sensitivity levels. This is exemplified by the incorporation of ion mobility and parallel accumulation and serial fragmentation (PASEF) technologies in timsTOF instruments. In addition, AI-based bioinformatics solutions can help overcome ambiguity issues by integrating more data into the identification workflow. Here, we introduce TIMS2Rescore, a data-driven rescoring workflow optimized for DDA-PASEF data from timsTOF instruments. This platform includes new timsTOF MS2PIP spectrum prediction models and IM2Deep, a new deep learning-based peptide ion mobility predictor. Furthermore, to fully streamline data throughput, TIMS2Rescore directly accepts Bruker raw mass spectrometry data and search results from ProteoScape and many other search engines, including Sage and PEAKS. We showcase TIMS2Rescore performance on plasma proteomics, immunopeptidomics (HLA class I and II), and metaproteomics data sets. TIMS2Rescore is open-source and freely available at https://github.com/compomics/tims2rescore.
Collapse
Affiliation(s)
- Arthur Declercq
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Robbe Devreese
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Jonas Scheid
- Department
of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Tübingen 72076, Germany
- Cluster of
Excellence iFIT (ECX2180) Image-Guided and Functionally Instructed
Tumor Therapies, University of Tuebingen, Tuebingen 72076, Germany
- Quantitative
Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
| | - Caroline Jachmann
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Annica Preikschat
- Institute
of Immunology, University Medical Center
of the Johannes-Gutenberg University, Mainz 55131, Germany
| | - David Gomez-Zepeda
- Helmholtz
Institute for Translational Oncology Mainz (HI-TRON Mainz) −
A Helmholtz Institute of the DKFZ, Mainz 55131, Germany
- German Cancer
Research Center (DKFZ) Heidelberg, Division 191 & Immunopeptidomics
Platform, Heidelberg 69120, Germany
| | - Jeewan Babu Rijal
- BioOrganic
Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI
FR2048, Strasbourg 67087, France
| | - Aurélie Hirschler
- BioOrganic
Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI
FR2048, Strasbourg 67087, France
| | | | | | | | | | - Dennis Trede
- Bruker
Daltonics GmbH & Co. KG, Bremen 28359, Germany
| | - Christine Carapito
- BioOrganic
Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI
FR2048, Strasbourg 67087, France
| | - Stefan Tenzer
- Institute
of Immunology, University Medical Center
of the Johannes-Gutenberg University, Mainz 55131, Germany
- Helmholtz
Institute for Translational Oncology Mainz (HI-TRON Mainz) −
A Helmholtz Institute of the DKFZ, Mainz 55131, Germany
- Research
Center for Immunotherapy (FZI), University
Medical Center of the Johannes-Gutenberg University, Mainz 55131, Germany
| | - Juliane S Walz
- Department
of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Tübingen 72076, Germany
- Cluster of
Excellence iFIT (ECX2180) Image-Guided and Functionally Instructed
Tumor Therapies, University of Tuebingen, Tuebingen 72076, Germany
- Clinical
Collaboration Unit Translational Immunology, Department of Internal
Medicine, University Hospital Tuebingen, Tuebingen 72076, Germany
- German
Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ),
partner site Tübingen, Tübingen 72076, Germany
| | - Sven Degroeve
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Robbin Bouwmeester
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Lennart Martens
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
- BioOrganic
Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI
FR2048, Strasbourg 67087, France
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| |
Collapse
|
5
|
Cormican JA, Medfai L, Wawrzyniuk M, Pašen M, Afrache H, Fourny C, Khan S, Gneiße P, Soh WT, Timelli A, Nolfi E, Pannekoek Y, Cope A, Urlaub H, Sijts AJAM, Mishto M, Liepe J. PEPSeek-Mediated Identification of Novel Epitopes From Viral and Bacterial Pathogens and the Impact on Host Cell Immunopeptidomes. Mol Cell Proteomics 2025; 24:100937. [PMID: 40044041 DOI: 10.1016/j.mcpro.2025.100937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 02/11/2025] [Accepted: 03/02/2025] [Indexed: 04/07/2025] Open
Abstract
Here, we develop PEPSeek, a web-server-based software to allow higher performance in the identification of pathogen-derived epitope candidates detected via mass spectrometry in MHC class I immunopeptidomes. We apply it to human and mouse cell lines infected with SARS-CoV-2, Listeria monocytogenes, or Chlamydia trachomatis, thereby identifying a large number of novel antigens and epitopes that we prove to be recognized by CD8+ T cells. In infected cells, we identified antigenic peptide features that suggested how the processing and presentation of pathogenic antigens differ between pathogens. The quantitative tools of PEPSeek also helped to define how C. trachomatis infection cycle could impact the antigenic landscape of the host human cell system, likely reflecting metabolic changes that occurred in the infected cells.
Collapse
Affiliation(s)
- John A Cormican
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Göttingen Graduate Center for Neurosciences, Biophysics, and Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Lobna Medfai
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Magdalena Wawrzyniuk
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Martin Pašen
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Göttingen Graduate Center for Neurosciences, Biophysics, and Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Hassnae Afrache
- Centre for Inflammation Biology and Cancer Immunology, King's College London, London, United Kingdom; Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom; Research group of Molecular Immunology, Francis Crick Institute, London, United Kingdom
| | - Constance Fourny
- Centre for Inflammation Biology and Cancer Immunology, King's College London, London, United Kingdom; Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom; Research group of Molecular Immunology, Francis Crick Institute, London, United Kingdom
| | - Sahil Khan
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Göttingen Graduate Center for Neurosciences, Biophysics, and Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Pascal Gneiße
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Georg-August University School of Science (GAUSS), University of Göttingen, Göttingen, Germany
| | - Wai Tuck Soh
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Arianna Timelli
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Emanuele Nolfi
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - Yvonne Pannekoek
- Department of Medical Microbiology and Infection Prevention, Amsterdam UMC Location University of Amsterdam, Amsterdam Institute for Infection and Immunity, Amsterdam, The Netherlands
| | - Andrew Cope
- Centre for Inflammation Biology and Cancer Immunology, King's College London, London, United Kingdom; Centre for Rheumatic Diseases, King's College London, London, UK
| | - Henning Urlaub
- Research group of Bioanalytical Mass Spectrometry, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Bioanalytics, Department of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany; Göttingen Center for Molecular Biosciences, University of Göttingen, Göttingen, Germany
| | - Alice J A M Sijts
- Department of Biomolecular Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands; Chair T-cell Tolerance, Leibniz Institute for Immunotherapy, Regensburg, Germany.
| | - Michele Mishto
- Centre for Inflammation Biology and Cancer Immunology, King's College London, London, United Kingdom; Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom; Research group of Molecular Immunology, Francis Crick Institute, London, United Kingdom.
| | - Juliane Liepe
- Research group of Quantitative and Systems Biology, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany; Facility for Data Sciences and Biostatistics, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen, Germany.
| |
Collapse
|
6
|
Gabriel W, González RM, Laposchan S, Riedel E, Dündar G, Poppenberger B, Wilhelm M, Lee CY. Deep Learning Enhances Precision of Citrullination Identification in Human and Plant Tissue Proteomes. Mol Cell Proteomics 2025; 24:100924. [PMID: 39921205 PMCID: PMC11925583 DOI: 10.1016/j.mcpro.2025.100924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 01/17/2025] [Accepted: 01/28/2025] [Indexed: 02/10/2025] Open
Abstract
Citrullination is a critical yet understudied post-translational modification (PTM) implicated in various biological processes. Exploring its role in health and disease requires a comprehensive understanding of the prevalence of this PTM at a proteome-wide scale. Although mass spectrometry has enabled the identification of citrullination sites in complex biological samples, it faces significant challenges, including limited enrichment tools and a high rate of false positives due to the identical mass with deamidation (+0.9840 Da) and errors in monoisotopic ion selection. These issues often necessitate manual spectrum inspection, reducing throughput in large-scale studies. In this work, we present a novel data analysis pipeline that incorporates the deep learning model Prosit-Cit into the MS database search workflow to improve both the sensitivity and the precision of citrullination site identification. Prosit-Cit, an extension of the existing Prosit model, has been trained on ∼53,000 spectra from ∼2500 synthetic citrullinated peptides and provides precise predictions for chromatographic retention time and fragment ion intensities of both citrullinated and deamidated peptides. This enhances the accuracy of identification and reduces false positives. Our pipeline demonstrated high precision on the evaluation dataset, recovering the majority of known citrullination sites in human tissue proteomes and improving sensitivity by identifying up to 14 times more citrullinated sites. Sequence motif analysis revealed consistency with previously reported findings, validating the reliability of our approach. Furthermore, extending the pipeline to a tissue proteome dataset of the model plant Arabidopsis thaliana enabled the identification of ∼200 citrullination sites across 169 proteins from 30 tissues, representing the first large-scale citrullination mapping in plants. This pipeline can be seamlessly applied to existing proteomics datasets, offering a robust tool for advancing biological discoveries and deepening our understanding of protein citrullination across species.
Collapse
Affiliation(s)
- Wassim Gabriel
- Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Rebecca Meelker González
- Young Investigator Group: Mass Spectrometry in Systems Neurosciences, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Sophia Laposchan
- Young Investigator Group: Mass Spectrometry in Systems Neurosciences, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Erik Riedel
- Young Investigator Group: Mass Spectrometry in Systems Neurosciences, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Gönül Dündar
- Biotechnology of Horticultural Crops, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Brigitte Poppenberger
- Biotechnology of Horticultural Crops, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, Freising, Germany; Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany.
| | - Chien-Yun Lee
- Young Investigator Group: Mass Spectrometry in Systems Neurosciences, School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
7
|
Wacholder A, Deutsch EW, Kok LW, van Dinter JT, Lee J, Wright JC, Leblanc S, Jayatissa AH, Jiang K, Arefiev I, Cao K, Bourassa F, Trifiro FA, Bassani-Sternberg M, Baranov PV, Bogaert A, Chothani S, Fierro-Monti I, Fijalkowska D, Gevaert K, Hubner N, Mudge JM, Ruiz-Orera J, Schulz J, Vizcaino JA, Prensner JR, Brunet MA, Martinez TF, Slavoff SA, Roucou X, Choudhary JS, van Heesch S, Moritz RL, Carvunis AR. Detection of human unannotated microproteins by mass spectrometry-based proteomics: a community assessment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.19.639069. [PMID: 40027765 PMCID: PMC11870587 DOI: 10.1101/2025.02.19.639069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Thousands of short open reading frames (sORFs) are translated outside of annotated coding sequences. Recent studies have pioneered searching for sORF-encoded microproteins in mass spectrometry (MS)-based proteomics and peptidomics datasets. Here, we assessed literature-reported MS-based identifications of unannotated human proteins. We find that studies vary by three orders of magnitude in the number of unannotated proteins they report. Of nearly 10,000 reported sORF-encoded peptides, 96% were unique to a single study, and 12% mapped to annotated proteins or proteoforms. Manual curation of a benchmark dataset of 406 manually evaluated spectra from 204 sORF-encoded proteins revealed large variation in peptide-spectrum match (PSM) quality between studies, with immunopeptidomics studies generally reporting higher quality PSMs than conventional enzymatic digests of whole cell lysates. We estimate that 65% of predicted sORF-encoded protein detections in immunopeptidomics studies were supported by high-quality PSMs versus 7.8% in non-immunopeptidomics datasets. Our work stresses the need for standardized protocols and analysis workflows to guide future advancements in microprotein detection by MS towards uncovering how many human microproteins exist.
Collapse
|
8
|
Valdés-Mas R, Leshem A, Zheng D, Cohen Y, Kern L, Zmora N, He Y, Katina C, Eliyahu-Miller S, Yosef-Hevroni T, Richman L, Raykhel B, Allswang S, Better R, Shmueli M, Saftien A, Cullin N, Slamovitz F, Ciocan D, Ouyang KS, Mor U, Dori-Bachash M, Molina S, Levin Y, Atarashi K, Jona G, Puschhof J, Harmelin A, Stettner N, Chen M, Suez J, Honda K, Lieb W, Bang C, Kori M, Maharshak N, Merbl Y, Shibolet O, Halpern Z, Shouval DS, Shamir R, Franke A, Abdeen SK, Shapiro H, Savidor A, Elinav E. Metagenome-informed metaproteomics of the human gut microbiome, host, and dietary exposome uncovers signatures of health and inflammatory bowel disease. Cell 2025; 188:1062-1083.e36. [PMID: 39837331 DOI: 10.1016/j.cell.2024.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 10/08/2024] [Accepted: 12/11/2024] [Indexed: 01/23/2025]
Abstract
Host-microbiome-dietary interactions play crucial roles in regulating human health, yet their direct functional assessment remains challenging. We adopted metagenome-informed metaproteomics (MIM), in mice and humans, to non-invasively explore species-level microbiome-host interactions during commensal and pathogen colonization, nutritional modification, and antibiotic-induced perturbation. Simultaneously, fecal MIM accurately characterized the nutritional exposure landscape in multiple clinical and dietary contexts. Implementation of MIM in murine auto-inflammation and in human inflammatory bowel disease (IBD) characterized a "compositional dysbiosis" and a concomitant species-specific "functional dysbiosis" driven by suppressed commensal responses to inflammatory host signals. Microbiome transfers unraveled early-onset kinetics of these host-commensal cross-responsive patterns, while predictive analyses identified candidate fecal host-microbiome IBD biomarker protein pairs outperforming S100A8/S100A9 (calprotectin). Importantly, a simultaneous fecal nutritional MIM assessment enabled the determination of IBD-related consumption patterns, dietary treatment compliance, and small intestinal digestive aberrations. Collectively, a parallelized dietary-bacterial-host MIM assessment functionally uncovers trans-kingdom interactomes shaping gastrointestinal ecology while offering personalized diagnostic and therapeutic insights into microbiome-associated disease.
Collapse
Affiliation(s)
- Rafael Valdés-Mas
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Avner Leshem
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel; Department of Surgery, Tel Aviv Sourasky Medical Center, Tel Aviv University, Tel Aviv, Israel
| | - Danping Zheng
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel; Department of Gastroenterology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Yotam Cohen
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Lara Kern
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Niv Zmora
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel; School of Medicine, Faculty of Medicine and Health Sciences, Tel Aviv University, Tel Aviv, Israel; Research Center for Digestive Tract and Liver Diseases, Tel Aviv Sourasky Medical Center, Tel Aviv University, Tel Aviv, Israel
| | - Yiming He
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel; Department of Gastroenterology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Corine Katina
- de Botton Institute for Protein Profiling, The Nancy and Stephen Grand Israel National Center for Personalized Medicine (G-INCPM), Weizmann Institute of Science, Rehovot, Israel
| | | | - Tal Yosef-Hevroni
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Liron Richman
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Barbara Raykhel
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Shira Allswang
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Reut Better
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Merav Shmueli
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | | | - Nyssa Cullin
- Division of Microbiome & Cancer, DKFZ, Heidelberg, Germany
| | - Fernando Slamovitz
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Dragos Ciocan
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | | | - Uria Mor
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Mally Dori-Bachash
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Shahar Molina
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Yishai Levin
- de Botton Institute for Protein Profiling, The Nancy and Stephen Grand Israel National Center for Personalized Medicine (G-INCPM), Weizmann Institute of Science, Rehovot, Israel
| | - Koji Atarashi
- RIKEN Center for Integrative Medical Sciences (IMS), Tsurumi, Yokohama, Kanagawa, Japan; Department of Microbiology and Immunology, Keio University School of Medicine, Tokyo, Japan
| | - Ghil Jona
- Department of Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot, Israel
| | - Jens Puschhof
- Division of Microbiome & Cancer, DKFZ, Heidelberg, Germany
| | - Alon Harmelin
- Department of Veterinary Resources, Weizmann Institute of Science, Rehovot, Israel
| | - Noa Stettner
- Department of Veterinary Resources, Weizmann Institute of Science, Rehovot, Israel
| | - Minhu Chen
- Department of Gastroenterology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Jotham Suez
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel; W. Harry Feinstone Department of Molecular Microbiology and Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Kenya Honda
- RIKEN Center for Integrative Medical Sciences (IMS), Tsurumi, Yokohama, Kanagawa, Japan; Department of Microbiology and Immunology, Keio University School of Medicine, Tokyo, Japan
| | - Wolfgang Lieb
- Institute of Epidemiology and Biobank Popgen, University Hospital of Schleswig-Holstein (UKSH), Kiel, Germany
| | - Corinna Bang
- Institute of Clinical Molecular Biology, Christian-Albrechts-Universität Zu Kiel, Kiel, Germany; University Hospital of Schleswig-Holstein (UKSH), Kiel, Germany
| | - Michal Kori
- Pediatric Gastroenterology Unit, Kaplan Medical Center, Rehovot, Israel; Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Nitsan Maharshak
- School of Medicine, Faculty of Medicine and Health Sciences, Tel Aviv University, Tel Aviv, Israel; Department of Gastroenterology and Hepatology, Tel Aviv Medical Center, Tel Aviv, Israel
| | - Yifat Merbl
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Oren Shibolet
- School of Medicine, Faculty of Medicine and Health Sciences, Tel Aviv University, Tel Aviv, Israel; Department of Gastroenterology and Hepatology, Tel Aviv Medical Center, Tel Aviv, Israel
| | - Zamir Halpern
- School of Medicine, Faculty of Medicine and Health Sciences, Tel Aviv University, Tel Aviv, Israel; Department of Gastroenterology and Hepatology, Tel Aviv Medical Center, Tel Aviv, Israel
| | - Dror S Shouval
- School of Medicine, Faculty of Medicine and Health Sciences, Tel Aviv University, Tel Aviv, Israel; Institute of Gastroenterology, Nutrition, and Liver Diseases, Schneider Children's Medical Centre, Petach-Tikva, Israel
| | - Raanan Shamir
- School of Medicine, Faculty of Medicine and Health Sciences, Tel Aviv University, Tel Aviv, Israel; Institute of Gastroenterology, Nutrition, and Liver Diseases, Schneider Children's Medical Centre, Petach-Tikva, Israel
| | - Andre Franke
- Institute of Clinical Molecular Biology, Christian-Albrechts-Universität Zu Kiel, Kiel, Germany; University Hospital of Schleswig-Holstein (UKSH), Kiel, Germany
| | - Suhaib K Abdeen
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Hagit Shapiro
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Alon Savidor
- de Botton Institute for Protein Profiling, The Nancy and Stephen Grand Israel National Center for Personalized Medicine (G-INCPM), Weizmann Institute of Science, Rehovot, Israel
| | - Eran Elinav
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel; Division of Microbiome & Cancer, DKFZ, Heidelberg, Germany.
| |
Collapse
|
9
|
Wen B, Freestone J, Riffle M, MacCoss MJ, Noble WS, Keich U. Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.06.01.596967. [PMID: 38895431 PMCID: PMC11185562 DOI: 10.1101/2024.06.01.596967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
A pressing statistical challenge in the field of mass spectrometry proteomics is how to assess whether a given software tool provides accurate error control. Each software tool for searching such data uses its own internally implemented methodology for reporting and controlling the error. Many of these software tools are closed source, with incompletely documented methodology, and the strategies for validating the error are inconsistent across tools. In this work, we identify three different methods for validating false discovery rate (FDR) control in use in the field, one of which is invalid, one of which can only provide a lower bound rather than an upper bound, and one of which is valid but under-powered. The result is that the field has a very poor understanding of how well we are doing with respect to FDR control, particularly for the analysis of data-independent acquisition (DIA) data. We therefore propose a theoretical formulation of entrapment experiments that allows us to rigorously characterize the behavior of the various entrapment methods. We also propose a more powerful method for evaluating FDR control, and we employ that method, along with other existing techniques, to characterize a variety of popular search tools. We empirically validate our entrapment analysis in the fairly well-understood DDA setup before applying it in the DIA setup. We find that none of the DIA search tools consistently controls the FDR at the peptide level, and the tools struggle particularly with analysis of single cell datasets.
Collapse
Affiliation(s)
- Bo Wen
- Department of Genome Sciences, University of Washington
| | - Jack Freestone
- School of Mathematics and Statistics, University of Sydney
| | | | | | - William S. Noble
- Department of Genome Sciences, University of Washington
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| | - Uri Keich
- School of Mathematics and Statistics, University of Sydney
| |
Collapse
|
10
|
Castaño JD, Beaudry F. Optimization of protein identifications through the use of different chromatographic approaches and bioinformatic pipelines. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2025; 39:e9937. [PMID: 39496564 DOI: 10.1002/rcm.9937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 10/17/2024] [Accepted: 10/18/2024] [Indexed: 11/06/2024]
Abstract
RATIONALE Selection of proteomic workflows for a given project can be a daunting task. This research provides a guide outlining the impact on protein identification of different steps such as chromatographic separation, data acquisition strategies, and bioinformatic pipelines. The data presented here will help experts and nonexpert proteomic users to increase proteome coverage and peptide identification. METHODS HeLa protein digests were analyzed through different C18 chromatographic columns (15 and 50 cm in length), using top 12 data-dependent acquisition (DDA), top 20 DDA, and data-independent acquisition (DIA) with a nanospray source in positive mode in a Thermo Q Exactive instrument. The raw data were analyzed using different search engines, rescoring approaches, and multi-engine searches. The results were analyzed in the context of peptide and protein identifications, precursor properties, and computation requirements to understand the differences between methods. RESULTS Our results showed that higher column lengths and top N DDA approaches were able to significantly increase protein identifications. The use of multiple search engines yielded limited gains, whereas the use of rescoring methods clearly outperformed other strategies. Finally, DIA approaches, although successful at generating new identifications, had a limited performance influenced by the previous collection of DDA data, which could prohibitively increase instrument time. Nonetheless, the use of library-free methods showed promising results. CONCLUSIONS Our results highlight the impact of different experimental approaches on proteome coverage. Changes in chromatographic columns, data acquisition, or bioinformatic analysis can significantly increase the number of protein identifications (>400%). Thus, this research provides a reference upon which to build a successful proteomic workflow with different considerations at every step.
Collapse
Affiliation(s)
- Jesus D Castaño
- Département de Biomédecine Vétérinaire, Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, Québec, Canada
- Centre de recherche sur le cerveau et l'apprentissage (CIRCA), Université de Montréal, Montréal, Québec, Canada
| | - Francis Beaudry
- Département de Biomédecine Vétérinaire, Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, Québec, Canada
- Centre de recherche sur le cerveau et l'apprentissage (CIRCA), Université de Montréal, Montréal, Québec, Canada
| |
Collapse
|
11
|
Leduc A, Khoury L, Cantlon J, Khan S, Slavov N. Massively parallel sample preparation for multiplexed single-cell proteomics using nPOP. Nat Protoc 2024; 19:3750-3776. [PMID: 39117766 PMCID: PMC11614709 DOI: 10.1038/s41596-024-01033-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 05/27/2024] [Indexed: 08/10/2024]
Abstract
Single-cell proteomics by mass spectrometry (MS) allows the quantification of proteins with high specificity and sensitivity. To increase its throughput, we developed nano-proteomic sample preparation (nPOP), a method for parallel preparation of thousands of single cells in nanoliter-volume droplets deposited on glass slides. Here, we describe its protocol with emphasis on its flexibility to prepare samples for different multiplexed MS methods. An implementation using the plexDIA MS multiplexing method, which uses non-isobaric mass tags to barcode peptides from different samples for data-independent acquisition, demonstrates accurate quantification of ~3,000-3,700 proteins per human cell. A separate implementation with isobaric mass tags and prioritized data acquisition demonstrates analysis of 1,827 single cells at a rate of >1,000 single cells per day at a depth of 800-1,200 proteins per human cell. The protocol is implemented by using a cell-dispensing and liquid-handling robot-the CellenONE instrument-and uses readily available consumables, which should facilitate broad adoption. nPOP can be applied to all samples that can be processed to a single-cell suspension. It takes 1 or 2 d to prepare >3,000 single cells. We provide metrics and software (the QuantQC R package) for quality control and data exploration. QuantQC supports the robust scaling of nPOP to higher plex reagents for achieving reliable and scalable single-cell proteomics.
Collapse
Affiliation(s)
- Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA.
| | - Luke Khoury
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA
| | | | - Saad Khan
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA.
- Parallel Squared Technology Institute, Watertown, MA, USA.
| |
Collapse
|
12
|
Zhu C, Liu LY, Ha A, Yamaguchi TN, Zhu H, Hugh-White R, Livingstone J, Patel Y, Kislinger T, Boutros PC. moPepGen: Rapid and Comprehensive Identification of Non-canonical Peptides. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.28.587261. [PMID: 38585946 PMCID: PMC10996593 DOI: 10.1101/2024.03.28.587261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Gene expression is a multi-step transformation of biological information from its storage form (DNA) into functional forms (protein and some RNAs). Regulatory activities at each step of this transformation multiply a single gene into a myriad of proteoforms. Proteogenomics is the study of how genomic and transcriptomic variation creates this proteomic diversity, and is limited by the challenges of modeling the complexities of gene-expression. We therefore created moPepGen, a graph-based algorithm that comprehensively generates non-canonical peptides in linear time. moPepGen works with multiple technologies, in multiple species and on all types of genetic and transcriptomic data. In human cancer proteomes, it enumerates previously unobservable noncanonical peptides arising from germline and somatic genomic variants, noncoding open reading frames, RNA fusions and RNA circularization. By enabling efficient detection and quantitation of previously hidden proteins in both existing and new proteomic data, moPepGen facilitates all proteogenomics applications. It is available at: https://github.com/uclahs-cds/package-moPepGen.
Collapse
Affiliation(s)
- Chenghao Zhu
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
- Department of Urology, University of California, Los Angeles, CA, USA
| | - Lydia Y. Liu
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Annie Ha
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Takafumi N. Yamaguchi
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Helen Zhu
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Rupert Hugh-White
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Julie Livingstone
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Yash Patel
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Thomas Kislinger
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Paul C. Boutros
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
- Department of Urology, University of California, Los Angeles, CA, USA
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
| |
Collapse
|
13
|
Wen B, Hsu C, Zeng WF, Riffle M, Chang A, Mudge M, Nunn B, Berg MD, Villén J, MacCoss MJ, Noble WS. Carafe enables high quality in silico spectral library generation for data-independent acquisition proteomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.15.618504. [PMID: 39463980 PMCID: PMC11507862 DOI: 10.1101/2024.10.15.618504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Data-independent acquisition (DIA)-based mass spectrometry is becoming an increasingly popular mass spectrometry acquisition strategy for carrying out quantitative proteomics experiments. Most of the popular DIA search engines make use of in silico generated spectral libraries. However, the generation of high-quality spectral libraries for DIA data analysis remains a challenge, particularly because most such libraries are generated directly from data-dependent acquisition (DDA) data or are from in silico prediction using models trained on DDA data. In this study, we developed Carafe, a tool that generates high-quality experiment-specific in silico spectral libraries by training deep learning models directly on DIA data. We demonstrate the performance of Carafe on a wide range of DIA datasets, where we observe improved fragment ion intensity prediction and peptide detection relative to existing pretrained DDA models.
Collapse
Affiliation(s)
- Bo Wen
- Department of Genome Sciences, University of Washington
| | - Chris Hsu
- Department of Genome Sciences, University of Washington
| | - Wen-Feng Zeng
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Germany
| | | | - Alexis Chang
- Department of Genome Sciences, University of Washington
| | - Miranda Mudge
- Department of Genome Sciences, University of Washington
| | - Brook Nunn
- Department of Genome Sciences, University of Washington
| | | | - Judit Villén
- Department of Genome Sciences, University of Washington
| | | | - William S. Noble
- Department of Genome Sciences, University of Washington
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| |
Collapse
|
14
|
Elhamraoui Z, Borràs E, Wilhelm M, Sabidó E. Theoretical Assessment of Indistinguishable Peptides in Mass Spectrometry-Based Proteomics. Anal Chem 2024; 96:15829-15833. [PMID: 39322219 PMCID: PMC11465223 DOI: 10.1021/acs.analchem.4c02803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/28/2024] [Accepted: 08/28/2024] [Indexed: 09/27/2024]
Abstract
Mass-spectrometry-based proteomics has advanced with the integration of experimental and predicted spectral libraries, which have significantly improved peptide identification in complex search spaces. However, challenges persist in distinguishing some peptides with close retention times and nearly identical fragmentation patterns. In this study, we conducted a theoretical assessment to quantify the prevalence of indistinguishable peptides within the human canonical proteome and immunopeptidome using state-of-the-art retention time and spectrum prediction models. By quantifying the proportion of peptides posing challenges to unequivocal identification, we set the theoretical nonaccessible portion within a given proteome, and underscore the effectiveness of contemporary analytical methodologies in resolving the complexity of the human proteome and immunopeptidome via mass spectrometry.
Collapse
Affiliation(s)
- Zahra Elhamraoui
- Centre
for Genomic Regulation (CRG), The Barcelona
Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat
Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Eva Borràs
- Centre
for Genomic Regulation (CRG), The Barcelona
Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat
Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Mathias Wilhelm
- Computational
Mass Spectrometry, Technical University
of Munich, D-85354 Freising, Germany
- Munich Data
Science Institute (MDSI), Technical University
of Munich, D-85748 Garching, Germany
| | - Eduard Sabidó
- Centre
for Genomic Regulation (CRG), The Barcelona
Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat
Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| |
Collapse
|
15
|
Tsour S, Machne R, Leduc A, Widmer S, Guez J, Karczewski K, Slavov N. Alternate RNA decoding results in stable and abundant proteins in mammals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.26.609665. [PMID: 39253435 PMCID: PMC11383030 DOI: 10.1101/2024.08.26.609665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Amino acid substitutions may substantially alter protein stability and function, but the contribution of substitutions arising from alternate translation (deviations from the genetic code) is unknown. To explore it, we analyzed deep proteomic and transcriptomic data from over 1,000 human samples, including 6 cancer types and 26 healthy human tissues. This global analysis identified 60,024 high confidence substitutions corresponding to 8,801 unique sites in proteins derived from 1,990 genes. Some substitutions are shared across samples, while others exhibit strong tissue-type and cancer specificity. Surprisingly, products of alternate translation are more abundant than their canonical counterparts for hundreds of proteins, suggesting sense codon recoding. Recoded proteins include transcription factors, proteases, signaling proteins, and proteins associated with neurodegeneration. Mechanisms contributing to substitution abundance include protein stability, codon frequency, codon-anticodon mismatches, and RNA modifications. We characterize sequence motifs around alternatively translated amino acids and how substitution ratios vary across protein domains, tissue types and cancers. The substitution ratios are positively associated with intrinsically disordered regions and genetic polymorphisms in gnomAD, though the polymorphisms cannot account for the substitutions. Both the sequence and the tissue-specificity of alternatively translated proteins are conserved between human and mouse. These results demonstrate the contribution of alternate translation to diversifying mammalian proteomes, and its association with protein stability, tissue-specific proteomes, and diseases.
Collapse
Affiliation(s)
- Shira Tsour
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Alnylam Pharmaceuticals, Cambridge, MA, USA
| | - Rainer Machne
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Simon Widmer
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Jeremy Guez
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Konrad Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Parallel Squared Technology Institute, Watertown, MA, USA
| |
Collapse
|
16
|
Flender D, Vilenne F, Adams C, Boonen K, Valkenborg D, Baggerman G. Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain. MASS SPECTROMETRY REVIEWS 2024. [PMID: 39152539 DOI: 10.1002/mas.21905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
Immunopeptidomics is becoming an increasingly important field of study. The capability to identify immunopeptides with pivotal roles in the human immune system is essential to shift the current curative medicine towards personalized medicine. Throughout the years, the field has matured, giving insight into the current pitfalls. Nowadays, it is commonly accepted that generalizing shotgun proteomics workflows is malpractice because immunopeptidomics faces numerous challenges. While many of these difficulties have been addressed, the road towards the ideal workflow remains complicated. Although the presence of Posttranslational modifications (PTMs) in the immunopeptidome has been demonstrated, their identification remains highly challenging despite their significance for immunotherapies. The large number of unpredictable modifications in the immunopeptidome plays a pivotal role in the functionality and these challenges. This review provides a comprehensive overview of the current advancements in immunopeptidomics. We delve into the challenges associated with identifying PTMs within the immunopeptidome, aiming to address the current state of the field.
Collapse
Affiliation(s)
- Daniel Flender
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- Health Unit, VITO, Mol, Belgium
| | - Frédérique Vilenne
- Health Unit, VITO, Mol, Belgium
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- ImmuneSpec, Niel, Belgium
| | - Dirk Valkenborg
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Geert Baggerman
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- ImmuneSpec, Niel, Belgium
| |
Collapse
|
17
|
Buur LM, Declercq A, Strobl M, Bouwmeester R, Degroeve S, Martens L, Dorfer V, Gabriels R. MS 2Rescore 3.0 Is a Modular, Flexible, and User-Friendly Platform to Boost Peptide Identifications, as Showcased with MS Amanda 3.0. J Proteome Res 2024; 23:3200-3207. [PMID: 38491990 DOI: 10.1021/acs.jproteome.3c00785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2024]
Abstract
Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MS2Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS2Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation. To showcase this new version, we connected MS2Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS2Rescore offers a powerful solution to boost peptide identifications. MS2Rescore's modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect the MS2Rescore to be a valuable tool for the wider proteomics community. MS2Rescore is available at https://github.com/compomics/ms2rescore.
Collapse
Affiliation(s)
- Louise M Buur
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Marina Strobl
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Viktoria Dorfer
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| |
Collapse
|
18
|
Abbas Q, Wilhelm M, Kuster B, Poppenberger B, Frishman D. Exploring crop genomes: assembly features, gene prediction accuracy, and implications for proteomics studies. BMC Genomics 2024; 25:619. [PMID: 38898442 PMCID: PMC11186247 DOI: 10.1186/s12864-024-10521-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024] Open
Abstract
Plant genomics plays a pivotal role in enhancing global food security and sustainability by offering innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy and contiguity of genome assemblies improve, structural annotation of plant genomes continues to be a significant challenge due to their large size, polyploidy, and rich repeat content. In this paper, we present an overview of the current landscape in crop genomics research, highlighting the diversity of genomic characteristics across various crop species. We also assessed the accuracy of popular gene prediction tools in identifying genes within crop genomes and examined the factors that impact their performance. Our findings highlight the strengths and limitations of BRAKER2 and Helixer as leading structural genome annotation tools and underscore the impact of genome complexity, fragmentation, and repeat content on their performance. Furthermore, we evaluated the suitability of the predicted proteins as a reliable search space in proteomics studies using mass spectrometry data. Our results provide valuable insights for future efforts to refine and advance the field of structural genome annotation.
Collapse
Affiliation(s)
- Qussai Abbas
- Chair of Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Bernhard Kuster
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Brigitte Poppenberger
- Biotechnology of Horticultural Crops, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Dmitrij Frishman
- Chair of Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
19
|
Lautenbacher L, Yang KL, Kockmann T, Panse C, Chambers M, Kahl E, Yu F, Gabriel W, Bold D, Schmidt T, Li K, MacLean B, Nesvizhskii AI, Wilhelm M. Koina: Democratizing machine learning for proteomics research. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.01.596953. [PMID: 38895358 PMCID: PMC11185529 DOI: 10.1101/2024.06.01.596953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Recent developments in machine-learning (ML) and deep-learning (DL) have immense potential for applications in proteomics, such as generating spectral libraries, improving peptide identification, and optimizing targeted acquisition modes. Although new ML/DL models for various applications and peptide properties are frequently published, the rate at which these models are adopted by the community is slow, which is mostly due to technical challenges. We believe that, for the community to make better use of state-of-the-art models, more attention should be spent on making models easy to use and accessible by the community. To facilitate this, we developed Koina, an open-source containerized, decentralized and online-accessible high-performance prediction service that enables ML/DL model usage in any pipeline. Using the widely used FragPipe computational platform as example, we show how Koina can be easily integrated with existing proteomics software tools and how these integrations improve data analysis.
Collapse
Affiliation(s)
- Ludwig Lautenbacher
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | - Kevin L. Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Tobias Kockmann
- Functional Genomics Center Zurich (FGCZ) - University of Zurich | ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
| | - Christian Panse
- Functional Genomics Center Zurich (FGCZ) - University of Zurich | ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Batiment Amphipole, CH-1015 Lausanne, Switzerland
| | - Matthew Chambers
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Elias Kahl
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | - Fengchao Yu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Wassim Gabriel
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | - Dulguun Bold
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | | | - Kai Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Brendan MacLean
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Alexey I. Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
- Munich Data Science Institute, Technical University of Munich, 85748, Garching, Germany
| |
Collapse
|
20
|
Adams C, Gabriel W, Laukens K, Picciani M, Wilhelm M, Bittremieux W, Boonen K. Fragment ion intensity prediction improves the identification rate of non-tryptic peptides in timsTOF. Nat Commun 2024; 15:3956. [PMID: 38730277 PMCID: PMC11087512 DOI: 10.1038/s41467-024-48322-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 04/29/2024] [Indexed: 05/12/2024] Open
Abstract
Immunopeptidomics is crucial for immunotherapy and vaccine development. Because the generation of immunopeptides from their parent proteins does not adhere to clear-cut rules, rather than being able to use known digestion patterns, every possible protein subsequence within human leukocyte antigen (HLA) class-specific length restrictions needs to be considered during sequence database searching. This leads to an inflation of the search space and results in lower spectrum annotation rates. Peptide-spectrum match (PSM) rescoring is a powerful enhancement of standard searching that boosts the spectrum annotation performance. We analyze 302,105 unique synthesized non-tryptic peptides from the ProteomeTools project on a timsTOF-Pro to generate a ground-truth dataset containing 93,227 MS/MS spectra of 74,847 unique peptides, that is used to fine-tune the deep learning-based fragment ion intensity prediction model Prosit. We demonstrate up to 3-fold improvement in the identification of immunopeptides, as well as increased detection of immunopeptides from low input samples.
Collapse
Affiliation(s)
- Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wassim Gabriel
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
| | - Kris Laukens
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Mario Picciani
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
- Munich Data Science Institute, Technical University of Munich, 85748, Garching, Germany
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, Antwerp, Belgium.
| | - Kurt Boonen
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium.
- Sustainable Health Department, Flemish Institute for Technological Research (VITO), Antwerp, Belgium.
| |
Collapse
|
21
|
Bittremieux W. From data to discovery: The essential role of computational tools in proteomics. Proteomics 2024; 24:e2300081. [PMID: 38629976 DOI: 10.1002/pmic.202300081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 02/20/2024] [Indexed: 04/19/2024]
Affiliation(s)
- Wout Bittremieux
- Department of Computer Science, University of Antwerp, Antwerpen, Belgium
| |
Collapse
|
22
|
Adams C, Laukens K, Bittremieux W, Boonen K. Machine learning-based peptide-spectrum match rescoring opens up the immunopeptidome. Proteomics 2024; 24:e2300336. [PMID: 38009585 DOI: 10.1002/pmic.202300336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 11/29/2023]
Abstract
Immunopeptidomics is a key technology in the discovery of targets for immunotherapy and vaccine development. However, identifying immunopeptides remains challenging due to their non-tryptic nature, which results in distinct spectral characteristics. Moreover, the absence of strict digestion rules leads to extensive search spaces, further amplified by the incorporation of somatic mutations, pathogen genomes, unannotated open reading frames, and post-translational modifications. This inflation in search space leads to an increase in random high-scoring matches, resulting in fewer identifications at a given false discovery rate. Peptide-spectrum match rescoring has emerged as a machine learning-based solution to address challenges in mass spectrometry-based immunopeptidomics data analysis. It involves post-processing unfiltered spectrum annotations to better distinguish between correct and incorrect peptide-spectrum matches. Recently, features based on predicted peptidoform properties, including fragment ion intensities, retention time, and collisional cross section, have been used to improve the accuracy and sensitivity of immunopeptide identification. In this review, we describe the diverse bioinformatics pipelines that are currently available for peptide-spectrum match rescoring and discuss how they can be used for the analysis of immunopeptidomics data. Finally, we provide insights into current and future machine learning solutions to boost immunopeptide identification.
Collapse
Affiliation(s)
- Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- ImmuneSpec BV, Niel, Belgium
| |
Collapse
|
23
|
Ferreira HJ, Stevenson BJ, Pak H, Yu F, Almeida Oliveira J, Huber F, Taillandier-Coindard M, Michaux J, Ricart-Altimiras E, Kraemer AI, Kandalaft LE, Speiser DE, Nesvizhskii AI, Müller M, Bassani-Sternberg M. Immunopeptidomics-based identification of naturally presented non-canonical circRNA-derived peptides. Nat Commun 2024; 15:2357. [PMID: 38490980 PMCID: PMC10943130 DOI: 10.1038/s41467-024-46408-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 02/16/2024] [Indexed: 03/18/2024] Open
Abstract
Circular RNAs (circRNAs) are covalently closed non-coding RNAs lacking the 5' cap and the poly-A tail. Nevertheless, it has been demonstrated that certain circRNAs can undergo active translation. Therefore, aberrantly expressed circRNAs in human cancers could be an unexplored source of tumor-specific antigens, potentially mediating anti-tumor T cell responses. This study presents an immunopeptidomics workflow with a specific focus on generating a circRNA-specific protein fasta reference. The main goal of this workflow is to streamline the process of identifying and validating human leukocyte antigen (HLA) bound peptides potentially originating from circRNAs. We increase the analytical stringency of our workflow by retaining peptides identified independently by two mass spectrometry search engines and/or by applying a group-specific FDR for canonical-derived and circRNA-derived peptides. A subset of circRNA-derived peptides specifically encoded by the region spanning the back-splice junction (BSJ) are validated with targeted MS, and with direct Sanger sequencing of the respective source transcripts. Our workflow identifies 54 unique BSJ-spanning circRNA-derived peptides in the immunopeptidome of melanoma and lung cancer samples. Our approach enlarges the catalog of source proteins that can be explored for immunotherapy.
Collapse
Affiliation(s)
- Humberto J Ferreira
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Brian J Stevenson
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - HuiSong Pak
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Jessica Almeida Oliveira
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Florian Huber
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Marie Taillandier-Coindard
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Justine Michaux
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Emma Ricart-Altimiras
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Anne I Kraemer
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Lana E Kandalaft
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- Center of Experimental Therapeutics, Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Daniel E Speiser
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Alexey I Nesvizhskii
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Markus Müller
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland.
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland.
- Agora Cancer Research Centre, Lausanne, Switzerland.
- Center of Experimental Therapeutics, Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland.
| |
Collapse
|
24
|
The M, Picciani M, Jensen C, Gabriel W, Kuster B, Wilhelm M. AI-Assisted Processing Pipeline to Boost Protein Isoform Detection. Methods Mol Biol 2024; 2836:157-181. [PMID: 38995541 DOI: 10.1007/978-1-0716-4007-4_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Proteomics, the study of proteins within biological systems, has seen remarkable advancements in recent years, with protein isoform detection emerging as one of the next major frontiers. One of the primary challenges is achieving the necessary peptide and protein coverage to confidently differentiate isoforms as a result of the protein inference problem and protein false discovery rate estimation challenge in large data. In this chapter, we describe the application of artificial intelligence-assisted peptide property prediction for database search engine rescoring by Oktoberfest, an approach that has proven effective, particularly for complex samples and extensive search spaces, which can greatly increase peptide coverage. Further, it illustrates a method for increasing isoform coverage by the PickedGroupFDR approach that is designed to excel when applied on large data. Real-world examples are provided to illustrate the utility of the tools in the context of rescoring, protein grouping, and false discovery rate estimation. By implementing these cutting-edge techniques, researchers can achieve a substantial increase in both peptide and isoform coverage, thus unlocking the potential of protein isoform detection in their studies and shedding light on their roles and functions in biological processes.
Collapse
Affiliation(s)
- Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Cecilia Jensen
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|