1
|
Castaño JD, Beaudry F. Optimization of protein identifications through the use of different chromatographic approaches and bioinformatic pipelines. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2025; 39:e9937. [PMID: 39496564 DOI: 10.1002/rcm.9937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 10/17/2024] [Accepted: 10/18/2024] [Indexed: 11/06/2024]
Abstract
RATIONALE Selection of proteomic workflows for a given project can be a daunting task. This research provides a guide outlining the impact on protein identification of different steps such as chromatographic separation, data acquisition strategies, and bioinformatic pipelines. The data presented here will help experts and nonexpert proteomic users to increase proteome coverage and peptide identification. METHODS HeLa protein digests were analyzed through different C18 chromatographic columns (15 and 50 cm in length), using top 12 data-dependent acquisition (DDA), top 20 DDA, and data-independent acquisition (DIA) with a nanospray source in positive mode in a Thermo Q Exactive instrument. The raw data were analyzed using different search engines, rescoring approaches, and multi-engine searches. The results were analyzed in the context of peptide and protein identifications, precursor properties, and computation requirements to understand the differences between methods. RESULTS Our results showed that higher column lengths and top N DDA approaches were able to significantly increase protein identifications. The use of multiple search engines yielded limited gains, whereas the use of rescoring methods clearly outperformed other strategies. Finally, DIA approaches, although successful at generating new identifications, had a limited performance influenced by the previous collection of DDA data, which could prohibitively increase instrument time. Nonetheless, the use of library-free methods showed promising results. CONCLUSIONS Our results highlight the impact of different experimental approaches on proteome coverage. Changes in chromatographic columns, data acquisition, or bioinformatic analysis can significantly increase the number of protein identifications (>400%). Thus, this research provides a reference upon which to build a successful proteomic workflow with different considerations at every step.
Collapse
Affiliation(s)
- Jesus D Castaño
- Département de Biomédecine Vétérinaire, Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, Québec, Canada
- Centre de recherche sur le cerveau et l'apprentissage (CIRCA), Université de Montréal, Montréal, Québec, Canada
| | - Francis Beaudry
- Département de Biomédecine Vétérinaire, Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, Québec, Canada
- Centre de recherche sur le cerveau et l'apprentissage (CIRCA), Université de Montréal, Montréal, Québec, Canada
| |
Collapse
|
2
|
Leduc A, Khoury L, Cantlon J, Khan S, Slavov N. Massively parallel sample preparation for multiplexed single-cell proteomics using nPOP. Nat Protoc 2024; 19:3750-3776. [PMID: 39117766 PMCID: PMC11614709 DOI: 10.1038/s41596-024-01033-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 05/27/2024] [Indexed: 08/10/2024]
Abstract
Single-cell proteomics by mass spectrometry (MS) allows the quantification of proteins with high specificity and sensitivity. To increase its throughput, we developed nano-proteomic sample preparation (nPOP), a method for parallel preparation of thousands of single cells in nanoliter-volume droplets deposited on glass slides. Here, we describe its protocol with emphasis on its flexibility to prepare samples for different multiplexed MS methods. An implementation using the plexDIA MS multiplexing method, which uses non-isobaric mass tags to barcode peptides from different samples for data-independent acquisition, demonstrates accurate quantification of ~3,000-3,700 proteins per human cell. A separate implementation with isobaric mass tags and prioritized data acquisition demonstrates analysis of 1,827 single cells at a rate of >1,000 single cells per day at a depth of 800-1,200 proteins per human cell. The protocol is implemented by using a cell-dispensing and liquid-handling robot-the CellenONE instrument-and uses readily available consumables, which should facilitate broad adoption. nPOP can be applied to all samples that can be processed to a single-cell suspension. It takes 1 or 2 d to prepare >3,000 single cells. We provide metrics and software (the QuantQC R package) for quality control and data exploration. QuantQC supports the robust scaling of nPOP to higher plex reagents for achieving reliable and scalable single-cell proteomics.
Collapse
Affiliation(s)
- Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA.
| | - Luke Khoury
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA
| | | | - Saad Khan
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA.
- Parallel Squared Technology Institute, Watertown, MA, USA.
| |
Collapse
|
3
|
Zhu C, Liu LY, Ha A, Yamaguchi TN, Zhu H, Hugh-White R, Livingstone J, Patel Y, Kislinger T, Boutros PC. moPepGen: Rapid and Comprehensive Identification of Non-canonical Peptides. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.28.587261. [PMID: 38585946 PMCID: PMC10996593 DOI: 10.1101/2024.03.28.587261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Gene expression is a multi-step transformation of biological information from its storage form (DNA) into functional forms (protein and some RNAs). Regulatory activities at each step of this transformation multiply a single gene into a myriad of proteoforms. Proteogenomics is the study of how genomic and transcriptomic variation creates this proteomic diversity, and is limited by the challenges of modeling the complexities of gene-expression. We therefore created moPepGen, a graph-based algorithm that comprehensively generates non-canonical peptides in linear time. moPepGen works with multiple technologies, in multiple species and on all types of genetic and transcriptomic data. In human cancer proteomes, it enumerates previously unobservable noncanonical peptides arising from germline and somatic genomic variants, noncoding open reading frames, RNA fusions and RNA circularization. By enabling efficient detection and quantitation of previously hidden proteins in both existing and new proteomic data, moPepGen facilitates all proteogenomics applications. It is available at: https://github.com/uclahs-cds/package-moPepGen.
Collapse
Affiliation(s)
- Chenghao Zhu
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
- Department of Urology, University of California, Los Angeles, CA, USA
| | - Lydia Y. Liu
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Annie Ha
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Takafumi N. Yamaguchi
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Helen Zhu
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Rupert Hugh-White
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Julie Livingstone
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Yash Patel
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Thomas Kislinger
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Paul C. Boutros
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
- Department of Urology, University of California, Los Angeles, CA, USA
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
| |
Collapse
|
4
|
Wen B, Freestone J, Riffle M, MacCoss MJ, Noble WS, Keich U. Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.01.596967. [PMID: 38895431 PMCID: PMC11185562 DOI: 10.1101/2024.06.01.596967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
A pressing statistical challenge in the field of mass spectrometry proteomics is how to assess whether a given software tool provides accurate error control. Each software tool for searching such data uses its own internally implemented methodology for reporting and controlling the error. Many of these software tools are closed source, with incompletely documented methodology, and the strategies for validating the error are inconsistent across tools. In this work, we identify three different methods for validating false discovery rate (FDR) control in use in the field, one of which is invalid, one of which can only provide a lower bound rather than an upper bound, and one of which is valid but under-powered. The result is that the field has a very poor understanding of how well we are doing with respect to FDR control, particularly for the analysis of data-independent acquisition (DIA) data. We therefore propose a theoretical formulation of entrapment experiments that allows us to rigorously characterize the behavior of the various entrapment methods. We also propose a more powerful method for evaluating FDR control, and we employ that method, along with other existing techniques, to characterize a variety of popular search tools. We empirically validate our entrapment analysis in the fairly well-understood DDA setup before applying it in the DIA setup. We find that none of the DIA search tools consistently controls the FDR at the peptide level, and the tools struggle particularly with analysis of single cell datasets.
Collapse
Affiliation(s)
- Bo Wen
- Department of Genome Sciences, University of Washington
| | - Jack Freestone
- School of Mathematics and Statistics, University of Sydney
| | | | | | - William S. Noble
- Department of Genome Sciences, University of Washington
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| | - Uri Keich
- School of Mathematics and Statistics, University of Sydney
| |
Collapse
|
5
|
Wen B, Hsu C, Zeng WF, Riffle M, Chang A, Mudge M, Nunn B, Berg MD, Villén J, MacCoss MJ, Noble WS. Carafe enables high quality in silico spectral library generation for data-independent acquisition proteomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.15.618504. [PMID: 39463980 PMCID: PMC11507862 DOI: 10.1101/2024.10.15.618504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Data-independent acquisition (DIA)-based mass spectrometry is becoming an increasingly popular mass spectrometry acquisition strategy for carrying out quantitative proteomics experiments. Most of the popular DIA search engines make use of in silico generated spectral libraries. However, the generation of high-quality spectral libraries for DIA data analysis remains a challenge, particularly because most such libraries are generated directly from data-dependent acquisition (DDA) data or are from in silico prediction using models trained on DDA data. In this study, we developed Carafe, a tool that generates high-quality experiment-specific in silico spectral libraries by training deep learning models directly on DIA data. We demonstrate the performance of Carafe on a wide range of DIA datasets, where we observe improved fragment ion intensity prediction and peptide detection relative to existing pretrained DDA models.
Collapse
Affiliation(s)
- Bo Wen
- Department of Genome Sciences, University of Washington
| | - Chris Hsu
- Department of Genome Sciences, University of Washington
| | - Wen-Feng Zeng
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Germany
| | | | - Alexis Chang
- Department of Genome Sciences, University of Washington
| | - Miranda Mudge
- Department of Genome Sciences, University of Washington
| | - Brook Nunn
- Department of Genome Sciences, University of Washington
| | | | - Judit Villén
- Department of Genome Sciences, University of Washington
| | | | - William S. Noble
- Department of Genome Sciences, University of Washington
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| |
Collapse
|
6
|
Elhamraoui Z, Borràs E, Wilhelm M, Sabidó E. Theoretical Assessment of Indistinguishable Peptides in Mass Spectrometry-Based Proteomics. Anal Chem 2024; 96:15829-15833. [PMID: 39322219 PMCID: PMC11465223 DOI: 10.1021/acs.analchem.4c02803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/28/2024] [Accepted: 08/28/2024] [Indexed: 09/27/2024]
Abstract
Mass-spectrometry-based proteomics has advanced with the integration of experimental and predicted spectral libraries, which have significantly improved peptide identification in complex search spaces. However, challenges persist in distinguishing some peptides with close retention times and nearly identical fragmentation patterns. In this study, we conducted a theoretical assessment to quantify the prevalence of indistinguishable peptides within the human canonical proteome and immunopeptidome using state-of-the-art retention time and spectrum prediction models. By quantifying the proportion of peptides posing challenges to unequivocal identification, we set the theoretical nonaccessible portion within a given proteome, and underscore the effectiveness of contemporary analytical methodologies in resolving the complexity of the human proteome and immunopeptidome via mass spectrometry.
Collapse
Affiliation(s)
- Zahra Elhamraoui
- Centre
for Genomic Regulation (CRG), The Barcelona
Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat
Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Eva Borràs
- Centre
for Genomic Regulation (CRG), The Barcelona
Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat
Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Mathias Wilhelm
- Computational
Mass Spectrometry, Technical University
of Munich, D-85354 Freising, Germany
- Munich Data
Science Institute (MDSI), Technical University
of Munich, D-85748 Garching, Germany
| | - Eduard Sabidó
- Centre
for Genomic Regulation (CRG), The Barcelona
Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat
Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| |
Collapse
|
7
|
Tsour S, Machne R, Leduc A, Widmer S, Guez J, Karczewski K, Slavov N. Alternate RNA decoding results in stable and abundant proteins in mammals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.26.609665. [PMID: 39253435 PMCID: PMC11383030 DOI: 10.1101/2024.08.26.609665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Amino acid substitutions may substantially alter protein stability and function, but the contribution of substitutions arising from alternate translation (deviations from the genetic code) is unknown. To explore it, we analyzed deep proteomic and transcriptomic data from over 1,000 human samples, including 6 cancer types and 26 healthy human tissues. This global analysis identified 60,024 high confidence substitutions corresponding to 8,801 unique sites in proteins derived from 1,990 genes. Some substitutions are shared across samples, while others exhibit strong tissue-type and cancer specificity. Surprisingly, products of alternate translation are more abundant than their canonical counterparts for hundreds of proteins, suggesting sense codon recoding. Recoded proteins include transcription factors, proteases, signaling proteins, and proteins associated with neurodegeneration. Mechanisms contributing to substitution abundance include protein stability, codon frequency, codon-anticodon mismatches, and RNA modifications. We characterize sequence motifs around alternatively translated amino acids and how substitution ratios vary across protein domains, tissue types and cancers. The substitution ratios are positively associated with intrinsically disordered regions and genetic polymorphisms in gnomAD, though the polymorphisms cannot account for the substitutions. Both the sequence and the tissue-specificity of alternatively translated proteins are conserved between human and mouse. These results demonstrate the contribution of alternate translation to diversifying mammalian proteomes, and its association with protein stability, tissue-specific proteomes, and diseases.
Collapse
Affiliation(s)
- Shira Tsour
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Alnylam Pharmaceuticals, Cambridge, MA, USA
| | - Rainer Machne
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Simon Widmer
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Jeremy Guez
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Konrad Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Parallel Squared Technology Institute, Watertown, MA, USA
| |
Collapse
|
8
|
Flender D, Vilenne F, Adams C, Boonen K, Valkenborg D, Baggerman G. Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain. MASS SPECTROMETRY REVIEWS 2024. [PMID: 39152539 DOI: 10.1002/mas.21905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
Immunopeptidomics is becoming an increasingly important field of study. The capability to identify immunopeptides with pivotal roles in the human immune system is essential to shift the current curative medicine towards personalized medicine. Throughout the years, the field has matured, giving insight into the current pitfalls. Nowadays, it is commonly accepted that generalizing shotgun proteomics workflows is malpractice because immunopeptidomics faces numerous challenges. While many of these difficulties have been addressed, the road towards the ideal workflow remains complicated. Although the presence of Posttranslational modifications (PTMs) in the immunopeptidome has been demonstrated, their identification remains highly challenging despite their significance for immunotherapies. The large number of unpredictable modifications in the immunopeptidome plays a pivotal role in the functionality and these challenges. This review provides a comprehensive overview of the current advancements in immunopeptidomics. We delve into the challenges associated with identifying PTMs within the immunopeptidome, aiming to address the current state of the field.
Collapse
Affiliation(s)
- Daniel Flender
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- Health Unit, VITO, Mol, Belgium
| | - Frédérique Vilenne
- Health Unit, VITO, Mol, Belgium
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- ImmuneSpec, Niel, Belgium
| | - Dirk Valkenborg
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Geert Baggerman
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- ImmuneSpec, Niel, Belgium
| |
Collapse
|
9
|
Buur LM, Declercq A, Strobl M, Bouwmeester R, Degroeve S, Martens L, Dorfer V, Gabriels R. MS 2Rescore 3.0 Is a Modular, Flexible, and User-Friendly Platform to Boost Peptide Identifications, as Showcased with MS Amanda 3.0. J Proteome Res 2024; 23:3200-3207. [PMID: 38491990 DOI: 10.1021/acs.jproteome.3c00785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2024]
Abstract
Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MS2Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS2Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation. To showcase this new version, we connected MS2Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS2Rescore offers a powerful solution to boost peptide identifications. MS2Rescore's modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect the MS2Rescore to be a valuable tool for the wider proteomics community. MS2Rescore is available at https://github.com/compomics/ms2rescore.
Collapse
Affiliation(s)
- Louise M Buur
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Marina Strobl
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Viktoria Dorfer
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| |
Collapse
|
10
|
Abbas Q, Wilhelm M, Kuster B, Poppenberger B, Frishman D. Exploring crop genomes: assembly features, gene prediction accuracy, and implications for proteomics studies. BMC Genomics 2024; 25:619. [PMID: 38898442 PMCID: PMC11186247 DOI: 10.1186/s12864-024-10521-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024] Open
Abstract
Plant genomics plays a pivotal role in enhancing global food security and sustainability by offering innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy and contiguity of genome assemblies improve, structural annotation of plant genomes continues to be a significant challenge due to their large size, polyploidy, and rich repeat content. In this paper, we present an overview of the current landscape in crop genomics research, highlighting the diversity of genomic characteristics across various crop species. We also assessed the accuracy of popular gene prediction tools in identifying genes within crop genomes and examined the factors that impact their performance. Our findings highlight the strengths and limitations of BRAKER2 and Helixer as leading structural genome annotation tools and underscore the impact of genome complexity, fragmentation, and repeat content on their performance. Furthermore, we evaluated the suitability of the predicted proteins as a reliable search space in proteomics studies using mass spectrometry data. Our results provide valuable insights for future efforts to refine and advance the field of structural genome annotation.
Collapse
Affiliation(s)
- Qussai Abbas
- Chair of Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Bernhard Kuster
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Brigitte Poppenberger
- Biotechnology of Horticultural Crops, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Dmitrij Frishman
- Chair of Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
11
|
Lautenbacher L, Yang KL, Kockmann T, Panse C, Chambers M, Kahl E, Yu F, Gabriel W, Bold D, Schmidt T, Li K, MacLean B, Nesvizhskii AI, Wilhelm M. Koina: Democratizing machine learning for proteomics research. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.01.596953. [PMID: 38895358 PMCID: PMC11185529 DOI: 10.1101/2024.06.01.596953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Recent developments in machine-learning (ML) and deep-learning (DL) have immense potential for applications in proteomics, such as generating spectral libraries, improving peptide identification, and optimizing targeted acquisition modes. Although new ML/DL models for various applications and peptide properties are frequently published, the rate at which these models are adopted by the community is slow, which is mostly due to technical challenges. We believe that, for the community to make better use of state-of-the-art models, more attention should be spent on making models easy to use and accessible by the community. To facilitate this, we developed Koina, an open-source containerized, decentralized and online-accessible high-performance prediction service that enables ML/DL model usage in any pipeline. Using the widely used FragPipe computational platform as example, we show how Koina can be easily integrated with existing proteomics software tools and how these integrations improve data analysis.
Collapse
Affiliation(s)
- Ludwig Lautenbacher
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | - Kevin L. Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Tobias Kockmann
- Functional Genomics Center Zurich (FGCZ) - University of Zurich | ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
| | - Christian Panse
- Functional Genomics Center Zurich (FGCZ) - University of Zurich | ETH Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Batiment Amphipole, CH-1015 Lausanne, Switzerland
| | - Matthew Chambers
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Elias Kahl
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | - Fengchao Yu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Wassim Gabriel
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | - Dulguun Bold
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
| | | | - Kai Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Brendan MacLean
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Alexey I. Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich (TUM), Freising, Germany
- Munich Data Science Institute, Technical University of Munich, 85748, Garching, Germany
| |
Collapse
|
12
|
Adams C, Gabriel W, Laukens K, Picciani M, Wilhelm M, Bittremieux W, Boonen K. Fragment ion intensity prediction improves the identification rate of non-tryptic peptides in timsTOF. Nat Commun 2024; 15:3956. [PMID: 38730277 PMCID: PMC11087512 DOI: 10.1038/s41467-024-48322-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 04/29/2024] [Indexed: 05/12/2024] Open
Abstract
Immunopeptidomics is crucial for immunotherapy and vaccine development. Because the generation of immunopeptides from their parent proteins does not adhere to clear-cut rules, rather than being able to use known digestion patterns, every possible protein subsequence within human leukocyte antigen (HLA) class-specific length restrictions needs to be considered during sequence database searching. This leads to an inflation of the search space and results in lower spectrum annotation rates. Peptide-spectrum match (PSM) rescoring is a powerful enhancement of standard searching that boosts the spectrum annotation performance. We analyze 302,105 unique synthesized non-tryptic peptides from the ProteomeTools project on a timsTOF-Pro to generate a ground-truth dataset containing 93,227 MS/MS spectra of 74,847 unique peptides, that is used to fine-tune the deep learning-based fragment ion intensity prediction model Prosit. We demonstrate up to 3-fold improvement in the identification of immunopeptides, as well as increased detection of immunopeptides from low input samples.
Collapse
Affiliation(s)
- Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wassim Gabriel
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
| | - Kris Laukens
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Mario Picciani
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich, 85354, Freising, Germany
- Munich Data Science Institute, Technical University of Munich, 85748, Garching, Germany
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, Antwerp, Belgium.
| | - Kurt Boonen
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium.
- Sustainable Health Department, Flemish Institute for Technological Research (VITO), Antwerp, Belgium.
| |
Collapse
|
13
|
Bittremieux W. From data to discovery: The essential role of computational tools in proteomics. Proteomics 2024; 24:e2300081. [PMID: 38629976 DOI: 10.1002/pmic.202300081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 02/20/2024] [Indexed: 04/19/2024]
Affiliation(s)
- Wout Bittremieux
- Department of Computer Science, University of Antwerp, Antwerpen, Belgium
| |
Collapse
|
14
|
Adams C, Laukens K, Bittremieux W, Boonen K. Machine learning-based peptide-spectrum match rescoring opens up the immunopeptidome. Proteomics 2024; 24:e2300336. [PMID: 38009585 DOI: 10.1002/pmic.202300336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 11/29/2023]
Abstract
Immunopeptidomics is a key technology in the discovery of targets for immunotherapy and vaccine development. However, identifying immunopeptides remains challenging due to their non-tryptic nature, which results in distinct spectral characteristics. Moreover, the absence of strict digestion rules leads to extensive search spaces, further amplified by the incorporation of somatic mutations, pathogen genomes, unannotated open reading frames, and post-translational modifications. This inflation in search space leads to an increase in random high-scoring matches, resulting in fewer identifications at a given false discovery rate. Peptide-spectrum match rescoring has emerged as a machine learning-based solution to address challenges in mass spectrometry-based immunopeptidomics data analysis. It involves post-processing unfiltered spectrum annotations to better distinguish between correct and incorrect peptide-spectrum matches. Recently, features based on predicted peptidoform properties, including fragment ion intensities, retention time, and collisional cross section, have been used to improve the accuracy and sensitivity of immunopeptide identification. In this review, we describe the diverse bioinformatics pipelines that are currently available for peptide-spectrum match rescoring and discuss how they can be used for the analysis of immunopeptidomics data. Finally, we provide insights into current and future machine learning solutions to boost immunopeptide identification.
Collapse
Affiliation(s)
- Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- ImmuneSpec BV, Niel, Belgium
| |
Collapse
|
15
|
Ferreira HJ, Stevenson BJ, Pak H, Yu F, Almeida Oliveira J, Huber F, Taillandier-Coindard M, Michaux J, Ricart-Altimiras E, Kraemer AI, Kandalaft LE, Speiser DE, Nesvizhskii AI, Müller M, Bassani-Sternberg M. Immunopeptidomics-based identification of naturally presented non-canonical circRNA-derived peptides. Nat Commun 2024; 15:2357. [PMID: 38490980 PMCID: PMC10943130 DOI: 10.1038/s41467-024-46408-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 02/16/2024] [Indexed: 03/18/2024] Open
Abstract
Circular RNAs (circRNAs) are covalently closed non-coding RNAs lacking the 5' cap and the poly-A tail. Nevertheless, it has been demonstrated that certain circRNAs can undergo active translation. Therefore, aberrantly expressed circRNAs in human cancers could be an unexplored source of tumor-specific antigens, potentially mediating anti-tumor T cell responses. This study presents an immunopeptidomics workflow with a specific focus on generating a circRNA-specific protein fasta reference. The main goal of this workflow is to streamline the process of identifying and validating human leukocyte antigen (HLA) bound peptides potentially originating from circRNAs. We increase the analytical stringency of our workflow by retaining peptides identified independently by two mass spectrometry search engines and/or by applying a group-specific FDR for canonical-derived and circRNA-derived peptides. A subset of circRNA-derived peptides specifically encoded by the region spanning the back-splice junction (BSJ) are validated with targeted MS, and with direct Sanger sequencing of the respective source transcripts. Our workflow identifies 54 unique BSJ-spanning circRNA-derived peptides in the immunopeptidome of melanoma and lung cancer samples. Our approach enlarges the catalog of source proteins that can be explored for immunotherapy.
Collapse
Affiliation(s)
- Humberto J Ferreira
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Brian J Stevenson
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - HuiSong Pak
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Jessica Almeida Oliveira
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Florian Huber
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Marie Taillandier-Coindard
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Justine Michaux
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Emma Ricart-Altimiras
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Anne I Kraemer
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| | - Lana E Kandalaft
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- Center of Experimental Therapeutics, Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Daniel E Speiser
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Alexey I Nesvizhskii
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Markus Müller
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland.
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland.
- Agora Cancer Research Centre, Lausanne, Switzerland.
- Center of Experimental Therapeutics, Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland.
| |
Collapse
|
16
|
The M, Picciani M, Jensen C, Gabriel W, Kuster B, Wilhelm M. AI-Assisted Processing Pipeline to Boost Protein Isoform Detection. Methods Mol Biol 2024; 2836:157-181. [PMID: 38995541 DOI: 10.1007/978-1-0716-4007-4_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Proteomics, the study of proteins within biological systems, has seen remarkable advancements in recent years, with protein isoform detection emerging as one of the next major frontiers. One of the primary challenges is achieving the necessary peptide and protein coverage to confidently differentiate isoforms as a result of the protein inference problem and protein false discovery rate estimation challenge in large data. In this chapter, we describe the application of artificial intelligence-assisted peptide property prediction for database search engine rescoring by Oktoberfest, an approach that has proven effective, particularly for complex samples and extensive search spaces, which can greatly increase peptide coverage. Further, it illustrates a method for increasing isoform coverage by the PickedGroupFDR approach that is designed to excel when applied on large data. Real-world examples are provided to illustrate the utility of the tools in the context of rescoring, protein grouping, and false discovery rate estimation. By implementing these cutting-edge techniques, researchers can achieve a substantial increase in both peptide and isoform coverage, thus unlocking the potential of protein isoform detection in their studies and shedding light on their roles and functions in biological processes.
Collapse
Affiliation(s)
- Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Cecilia Jensen
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|