1
|
Bouwmeester R, Richardson K, Denny R, Wilson ID, Degroeve S, Martens L, Vissers JPC. Predicting ion mobility collision cross sections and assessing prediction variation by combining conventional and data driven modeling. Talanta 2024; 274:125970. [PMID: 38621320 DOI: 10.1016/j.talanta.2024.125970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/01/2024] [Accepted: 03/20/2024] [Indexed: 04/17/2024]
Abstract
The use of collision cross section (CCS) values derived from ion mobility studies is proving to be an increasingly important tool in the characterization and identification of molecules detected in complex mixtures. Here, a novel machine learning (ML) based method for predicting CCS integrating both molecular modeling (MM) and ML methodologies has been devised and shown to be able to accurately predict CCS values for singly charged small molecular weight molecules from a broad range of chemical classes. The model performed favorably compared to existing models, improving compound identifications for isobaric analytes in terms of ranking and assigning identification probability values to the annotation. Furthermore, charge localization was seen to be correlated with CCS prediction accuracy and with gas-phase proton affinity demonstrating the potential to provide a proxy for prediction error based on chemical structural properties. The presented approach and findings represent a further step towards accurate prediction and application of computationally generated CCS values.
Collapse
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
| | | | | | - Ian D Wilson
- Computational & Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College, United Kingdom
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | | |
Collapse
|
2
|
Siraj A, Bouwmeester R, Declercq A, Welp L, Chernev A, Wulf A, Urlaub H, Martens L, Degroeve S, Kohlbacher O, Sachsenberg T. Intensity and retention time prediction improves the rescoring of protein-nucleic acid cross-links. Proteomics 2024; 24:e2300144. [PMID: 38629965 DOI: 10.1002/pmic.202300144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 12/29/2023] [Accepted: 01/05/2024] [Indexed: 04/19/2024]
Abstract
In protein-RNA cross-linking mass spectrometry, UV or chemical cross-linking introduces stable bonds between amino acids and nucleic acids in protein-RNA complexes that are then analyzed and detected in mass spectra. This analytical tool delivers valuable information about RNA-protein interactions and RNA docking sites in proteins, both in vitro and in vivo. The identification of cross-linked peptides with oligonucleotides of different length leads to a combinatorial increase in search space. We demonstrate that the peptide retention time prediction tasks can be transferred to the task of cross-linked peptide retention time prediction using a simple amino acid composition encoding, yielding improved identification rates when the prediction error is included in rescoring. For the more challenging task of including fragment intensity prediction of cross-linked peptides in the rescoring, we obtain, on average, a similar improvement. Further improvement in the encoding and fine-tuning of retention time and intensity prediction models might lead to further gains, and merit further research.
Collapse
Affiliation(s)
- Arslan Siraj
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Robbin Bouwmeester
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Arthur Declercq
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Luisa Welp
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- Bioanalytics, Institute of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Aleksandar Chernev
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Alexander Wulf
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Henning Urlaub
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- Bioanalytics, Institute of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Lennart Martens
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Sven Degroeve
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Oliver Kohlbacher
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Timo Sachsenberg
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| |
Collapse
|
3
|
Buur LM, Declercq A, Strobl M, Bouwmeester R, Degroeve S, Martens L, Dorfer V, Gabriels R. MS 2Rescore 3.0 Is a Modular, Flexible, and User-Friendly Platform to Boost Peptide Identifications, as Showcased with MS Amanda 3.0. J Proteome Res 2024. [PMID: 38491990 DOI: 10.1021/acs.jproteome.3c00785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2024]
Abstract
Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MS2Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS2Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation. To showcase this new version, we connected MS2Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS2Rescore offers a powerful solution to boost peptide identifications. MS2Rescore's modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect the MS2Rescore to be a valuable tool for the wider proteomics community. MS2Rescore is available at https://github.com/compomics/ms2rescore.
Collapse
Affiliation(s)
- Louise M Buur
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Marina Strobl
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Viktoria Dorfer
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| |
Collapse
|
4
|
Koutrouli M, Nastou K, Piera Líndez P, Bouwmeester R, Rasmussen S, Martens L, Jensen LJ. FAVA: high-quality functional association networks inferred from scRNA-seq and proteomics data. Bioinformatics 2024; 40:btae010. [PMID: 38192003 PMCID: PMC10868155 DOI: 10.1093/bioinformatics/btae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 12/07/2023] [Accepted: 01/05/2024] [Indexed: 01/10/2024] Open
Abstract
MOTIVATION Protein networks are commonly used for understanding how proteins interact. However, they are typically biased by data availability, favoring well-studied proteins with more interactions. To uncover functions of understudied proteins, we must use data that are not affected by this literature bias, such as single-cell RNA-seq and proteomics. Due to data sparseness and redundancy, functional association analysis becomes complex. RESULTS To address this, we have developed FAVA (Functional Associations using Variational Autoencoders), which compresses high-dimensional data into a low-dimensional space. FAVA infers networks from high-dimensional omics data with much higher accuracy than existing methods, across a diverse collection of real as well as simulated datasets. FAVA can process large datasets with over 0.5 million conditions and has predicted 4210 interactions between 1039 understudied proteins. Our findings showcase FAVA's capability to offer novel perspectives on protein interactions. FAVA functions within the scverse ecosystem, employing AnnData as its input source. AVAILABILITY AND IMPLEMENTATION Source code, documentation, and tutorials for FAVA are accessible on GitHub at https://github.com/mikelkou/fava. FAVA can also be installed and used via pip/PyPI as well as via the scverse ecosystem https://github.com/scverse/ecosystem-packages/tree/main/packages/favapy.
Collapse
Affiliation(s)
- Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Pau Piera Líndez
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| |
Collapse
|
5
|
Sakarika M, Kerckhof FM, Van Peteghem L, Pereira A, Van Den Bossche T, Bouwmeester R, Gabriels R, Van Haver D, Ulčar B, Martens L, Impens F, Boon N, Ganigué R, Rabaey K. The nutritional composition and cell size of microbial biomass for food applications are defined by the growth conditions. Microb Cell Fact 2023; 22:254. [PMID: 38072930 PMCID: PMC10712164 DOI: 10.1186/s12934-023-02265-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 12/02/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND It is increasingly recognized that conventional food production systems are not able to meet the globally increasing protein needs, resulting in overexploitation and depletion of resources, and environmental degradation. In this context, microbial biomass has emerged as a promising sustainable protein alternative. Nevertheless, often no consideration is given on the fact that the cultivation conditions affect the composition of microbial cells, and hence their quality and nutritional value. Apart from the properties and nutritional quality of the produced microbial food (ingredient), this can also impact its sustainability. To qualitatively assess these aspects, here, we investigated the link between substrate availability, growth rate, cell composition and size of Cupriavidus necator and Komagataella phaffii. RESULTS Biomass with decreased nucleic acid and increased protein content was produced at low growth rates. Conversely, high rates resulted in larger cells, which could enable more efficient biomass harvesting. The proteome allocation varied across the different growth rates, with more ribosomal proteins at higher rates, which could potentially affect the techno-functional properties of the biomass. Considering the distinct amino acid profiles established for the different cellular components, variations in their abundance impacts the product quality leading to higher cysteine and phenylalanine content at low growth rates. Therefore, we hint that costly external amino acid supplementations that are often required to meet the nutritional needs could be avoided by carefully applying conditions that enable targeted growth rates. CONCLUSION In summary, we demonstrate tradeoffs between nutritional quality and production rate, and we discuss the microbial biomass properties that vary according to the growth conditions.
Collapse
Affiliation(s)
- Myrsini Sakarika
- Center for Microbial Ecology and Technology (CMET), Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, B-9000, Belgium.
- Center for Advanced Process Technology for Urban Resource recovery (CAPTURE), Frieda Saeysstraat 1, Ghent, 9052, Belgium.
| | - Frederiek-Maarten Kerckhof
- Center for Microbial Ecology and Technology (CMET), Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, B-9000, Belgium
- Center for Advanced Process Technology for Urban Resource recovery (CAPTURE), Frieda Saeysstraat 1, Ghent, 9052, Belgium
- Kytos BV, IIC UGent, Frieda Saeysstraat 1/B, Ghent, 9052, Belgium
| | - Lotte Van Peteghem
- Center for Microbial Ecology and Technology (CMET), Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, B-9000, Belgium
- Center for Advanced Process Technology for Urban Resource recovery (CAPTURE), Frieda Saeysstraat 1, Ghent, 9052, Belgium
| | - Alexandra Pereira
- Center for Microbial Ecology and Technology (CMET), Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, B-9000, Belgium
- Center for Advanced Process Technology for Urban Resource recovery (CAPTURE), Frieda Saeysstraat 1, Ghent, 9052, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Delphi Van Haver
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Proteomics Core, VIB, Ghent, Belgium
| | - Barbara Ulčar
- Center for Microbial Ecology and Technology (CMET), Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, B-9000, Belgium
- Center for Advanced Process Technology for Urban Resource recovery (CAPTURE), Frieda Saeysstraat 1, Ghent, 9052, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Francis Impens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Proteomics Core, VIB, Ghent, Belgium
| | - Nico Boon
- Center for Microbial Ecology and Technology (CMET), Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, B-9000, Belgium
- Center for Advanced Process Technology for Urban Resource recovery (CAPTURE), Frieda Saeysstraat 1, Ghent, 9052, Belgium
| | - Ramon Ganigué
- Center for Microbial Ecology and Technology (CMET), Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, B-9000, Belgium
- Center for Advanced Process Technology for Urban Resource recovery (CAPTURE), Frieda Saeysstraat 1, Ghent, 9052, Belgium
| | - Korneel Rabaey
- Center for Microbial Ecology and Technology (CMET), Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, B-9000, Belgium
- Center for Advanced Process Technology for Urban Resource recovery (CAPTURE), Frieda Saeysstraat 1, Ghent, 9052, Belgium
| |
Collapse
|
6
|
Declercq A, Bouwmeester R, Chiva C, Sabidó E, Hirschler A, Carapito C, Martens L, Degroeve S, Gabriels R. Updated MS²PIP web server supports cutting-edge proteomics applications. Nucleic Acids Res 2023:7151340. [PMID: 37140039 DOI: 10.1093/nar/gkad335] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 04/04/2023] [Accepted: 04/25/2023] [Indexed: 05/05/2023] Open
Abstract
Interest in the use of machine learning for peptide fragmentation spectrum prediction has been strongly on the rise over the past years, especially for applications in challenging proteomics identification workflows such as immunopeptidomics and the full-proteome identification of data independent acquisition spectra. Since its inception, the MS²PIP peptide spectrum predictor has been widely used for various downstream applications, mostly thanks to its accuracy, ease-of-use, and broad applicability. We here present a thoroughly updated version of the MS²PIP web server, which includes new and more performant prediction models for both tryptic- and non-tryptic peptides, for immunopeptides, and for CID-fragmented TMT-labeled peptides. Additionally, we have also added new functionality to greatly facilitate the generation of proteome-wide predicted spectral libraries, requiring only a FASTA protein file as input. These libraries also include retention time predictions from DeepLC. Moreover, we now provide pre-built and ready-to-download spectral libraries for various model organisms in multiple DIA-compatible spectral library formats. Besides upgrading the back-end models, the user experience on the MS²PIP web server is thus also greatly enhanced, extending its applicability to new domains, including immunopeptidomics and MS3-based TMT quantification experiments. MS²PIP is freely available at https://iomics.ugent.be/ms2pip/.
Collapse
Affiliation(s)
- Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| | - Cristina Chiva
- Proteomics Unit, Universitat Pompeu Fabra, 08003, Barcelona, Spain
- Proteomics Unit, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08003, Barcelona, Spain
| | - Eduard Sabidó
- Proteomics Unit, Universitat Pompeu Fabra, 08003, Barcelona, Spain
- Proteomics Unit, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08003, Barcelona, Spain
| | - Aurélie Hirschler
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), Université de Strasbourg, CNRS, France
| | - Christine Carapito
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), Université de Strasbourg, CNRS, France
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| |
Collapse
|
7
|
Claeys T, Menu M, Bouwmeester R, Gevaert K, Martens L. Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins. J Proteome Res 2023; 22:1181-1192. [PMID: 36963412 PMCID: PMC10088018 DOI: 10.1021/acs.jproteome.2c00644] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2023]
Abstract
Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyze the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines.
Collapse
Affiliation(s)
- Tine Claeys
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Maxime Menu
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Kris Gevaert
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| |
Collapse
|
8
|
Neely BA, Dorfer V, Martens L, Bludau I, Bouwmeester R, Degroeve S, Deutsch EW, Gessulat S, Käll L, Palczynski P, Payne SH, Rehfeldt TG, Schmidt T, Schwämmle V, Uszkoreit J, Vizcaíno JA, Wilhelm M, Palmblad M. Toward an Integrated Machine Learning Model of a Proteomics Experiment. J Proteome Res 2023; 22:681-696. [PMID: 36744821 PMCID: PMC9990124 DOI: 10.1021/acs.jproteome.2c00711] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.
Collapse
Affiliation(s)
- Benjamin A Neely
- National Institute of Standards and Technology, Charleston, South Carolina 29412, United States
| | - Viktoria Dorfer
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Softwarepark 11, 4232 Hagenberg, Austria
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, 9000 Ghent, Belgium
| | - Isabell Bludau
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, 9000 Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, 9000 Ghent, Belgium
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | - Lukas Käll
- Science for Life Laboratory, KTH - Royal Institute of Technology, 171 21 Solna, Sweden
| | - Pawel Palczynski
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Samuel H Payne
- Department of Biology, Brigham Young University, Provo, Utah 84602, United States
| | - Tobias Greisager Rehfeldt
- Institute for Mathematics and Computer Science, University of Southern Denmark, 5230 Odense, Denmark
| | | | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, 5230 Odense, Denmark
| | - Julian Uszkoreit
- Medical Proteome Analysis, Center for Protein Diagnostics (ProDi), Ruhr University Bochum, 44801 Bochum, Germany.,Medizinisches Proteom-Center, Medical Faculty, Ruhr University Bochum, 44801 Bochum, Germany
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich (TUM), 85354 Freising, Germany
| | - Magnus Palmblad
- Leiden University Medical Center, Postbus 9600, 2300 RC Leiden, The Netherlands
| |
Collapse
|
9
|
Gabriels R, Declercq A, Bouwmeester R, Degroeve S, Martens L. psm_utils: A High-Level Python API for Parsing and Handling Peptide-Spectrum Matches and Proteomics Search Results. J Proteome Res 2023; 22:557-560. [PMID: 36508242 DOI: 10.1021/acs.jproteome.2c00609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A plethora of proteomics search engine output file formats are in circulation. This lack of standardized output files greatly complicates generic downstream processing of peptide-spectrum matches (PSMs) and PSM files. While standards exist to solve this problem, these are far from universally supported by search engines. Moreover, software libraries are available to read a selection of PSM file formats, but a package to parse PSM files into a unified data structure has been missing. Here, we present psm_utils, a Python package to read and write various PSM file formats and to handle peptidoforms, PSMs, and PSM lists in a unified and user-friendly Python-, command line-, and web-interface. psm_utils was developed with pragmatism and maintainability in mind, adhering to community standards and relying on existing packages where possible. The Python API and command line interface greatly facilitate handling various PSM file formats. Moreover, a user-friendly web application was built using psm_utils that allows anyone to interconvert PSM files and retrieve basic PSM statistics. psm_utils is freely available under the permissive Apache2 license at https://github.com/compomics/psm_utils.
Collapse
Affiliation(s)
- Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
10
|
Rehfeldt T, Gabriels R, Bouwmeester R, Gessulat S, Neely BA, Palmblad M, Perez-Riverol Y, Schmidt T, Vizcaíno JA, Deutsch EW. ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics. J Proteome Res 2023; 22:632-636. [PMID: 36693629 PMCID: PMC9903315 DOI: 10.1021/acs.jproteome.2c00629] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.
Collapse
Affiliation(s)
- Tobias
G. Rehfeldt
- Institute
for Mathematics and Computer Science, University
of Southern Denmark, 5000 Odense, Denmark
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium,Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Robbin Bouwmeester
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium,Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | | | - Benjamin A. Neely
- National
Institute of Standards and Technology, Charleston, South Carolina 29412, United States
| | - Magnus Palmblad
- Center for
Proteomics and Metabolomics, Leiden University
Medical Center, 2300 RC Leiden, The Netherlands
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust
Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | | | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust
Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom,Juan
Antonio Vizcaíno: , Phone: +44 (0) 1223 492686
| | - Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States,Eric Deutsch: ,
Phone: 206-732-1200, Fax: 206-732-1299
| |
Collapse
|
11
|
Declercq A, Bouwmeester R, Hirschler A, Carapito C, Degroeve S, Martens L, Gabriels R. MS 2Rescore: Data-driven rescoring dramatically boosts immunopeptide identification rates. Mol Cell Proteomics 2022; 21:100266. [PMID: 35803561 PMCID: PMC9411678 DOI: 10.1016/j.mcpro.2022.100266] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 12/03/2022] Open
Abstract
Immunopeptidomics aims to identify major histocompatibility complex (MHC)-presented peptides on almost all cells that can be used in anti-cancer vaccine development. However, existing immunopeptidomics data analysis pipelines suffer from the nontryptic nature of immunopeptides, complicating their identification. Previously, peak intensity predictions by MS2PIP and retention time predictions by DeepLC have been shown to improve tryptic peptide identifications when rescoring peptide-spectrum matches with Percolator. However, as MS2PIP was tailored toward tryptic peptides, we have here retrained MS2PIP to include nontryptic peptides. Interestingly, the new models not only greatly improve predictions for immunopeptides but also yield further improvements for tryptic peptides. We show that the integration of new MS2PIP models, DeepLC, and Percolator in one software package, MS2Rescore, increases spectrum identification rate and unique identified peptides with 46% and 36% compared to standard Percolator rescoring at 1% FDR. Moreover, MS2Rescore also outperforms the current state-of-the-art in immunopeptide-specific identification approaches. Altogether, MS2Rescore thus allows substantially improved identification of novel epitopes from existing immunopeptidomics workflows. MS2Rescore significantly boosts immunopeptide identification rates Data-driven post-processing allows for a ten-fold increase in specificity MS2PIP and DeepLC predictors are integrated with Percolator post-processing MS2Rescore accepts identification results from MaxQuant, PEAKS, MS-GF+ and X!Tandem MS2Rescore shows great promise to extend current neo- and xeno-epitope landscapes
Collapse
Affiliation(s)
- Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biomolecular Medicine, Ghent University, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biomolecular Medicine, Ghent University, Belgium
| | - Aurélie Hirschler
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), Université de Strasbourg, CNRS
| | - Christine Carapito
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), Université de Strasbourg, CNRS
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biomolecular Medicine, Ghent University, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biomolecular Medicine, Ghent University, Belgium.
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biomolecular Medicine, Ghent University, Belgium
| |
Collapse
|
12
|
Shiferaw GA, Gabriels R, Bouwmeester R, Van Den Bossche T, Vandermarliere E, Martens L, Volders PJ. Sensitive and Specific Spectral Library Searching with CompOmics Spectral Library Searching Tool and Percolator. J Proteome Res 2022; 21:1365-1370. [PMID: 35446579 DOI: 10.1021/acs.jproteome.2c00075] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Maintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we investigate the effects of integrating the machine learning-based postprocessor Percolator into our spectral library searching tool COSS (CompOmics Spectral library Searching tool). To evaluate the effects of this postprocessing, we have used 40 data sets from 2 different projects and have searched these against the NIST and MassIVE spectral libraries. The searching is carried out using 2 spectral library search tools, COSS and MSPepSearch with and without Percolator postprocessing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step to COSS is effective and results in a substantial improvement in sensitivity and specificity of the identifications. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compomics/COSS.
Collapse
Affiliation(s)
- Genet Abay Shiferaw
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Elien Vandermarliere
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Pieter-Jan Volders
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
13
|
van Gils JHM, Gogishvili D, van Eck J, Bouwmeester R, van Dijk E, Abeln S. How sticky are our proteins? Quantifying hydrophobicity of the human proteome. Bioinform Adv 2022; 2:vbac002. [PMID: 36699344 PMCID: PMC9710682 DOI: 10.1093/bioadv/vbac002] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/19/2021] [Accepted: 01/24/2022] [Indexed: 01/28/2023]
Abstract
Summary Proteins tend to bury hydrophobic residues inside their core during the folding process to provide stability to the protein structure and to prevent aggregation. Nevertheless, proteins do expose some 'sticky' hydrophobic residues to the solvent. These residues can play an important functional role, e.g. in protein-protein and membrane interactions. Here, we first investigate how hydrophobic protein surfaces are by providing three measures for surface hydrophobicity: the total hydrophobic surface area, the relative hydrophobic surface area and-using our MolPatch method-the largest hydrophobic patch. Secondly, we analyze how difficult it is to predict these measures from sequence: by adapting solvent accessibility predictions from NetSurfP2.0, we obtain well-performing prediction methods for the THSA and RHSA, while predicting LHP is more challenging. Finally, we analyze implications of exposed hydrophobic surfaces: we show that hydrophobic proteins typically have low expression, suggesting cells avoid an overabundance of sticky proteins. Availability and implementation The data underlying this article are available in GitHub at https://github.com/ibivu/hydrophobic_patches. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Juami Hermine Mariama van Gils
- Computer Science Department, Center for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Noord-Holland, The Netherlands,To whom correspondence should be addressed. or
| | - Dea Gogishvili
- Computer Science Department, Center for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Noord-Holland, The Netherlands
| | - Jan van Eck
- Computer Science Department, Center for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Noord-Holland, The Netherlands
| | - Robbin Bouwmeester
- Computer Science Department, Center for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Noord-Holland, The Netherlands
| | - Erik van Dijk
- Computer Science Department, Center for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Noord-Holland, The Netherlands
| | - Sanne Abeln
- Computer Science Department, Center for Integrative Bioinformatics (IBIVU), Vrije Universiteit Amsterdam, 1081 HV Noord-Holland, The Netherlands,To whom correspondence should be addressed. or
| |
Collapse
|
14
|
Kensert A, Bouwmeester R, Efthymiadis K, Van Broeck P, Desmet G, Cabooter D. Graph Convolutional Networks for Improved Prediction and Interpretability of Chromatographic Retention Data. Anal Chem 2021; 93:15633-15641. [PMID: 34780168 DOI: 10.1021/acs.analchem.1c02988] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Machine learning is a popular technique to predict the retention times of molecules based on descriptors. Descriptors and associated labels (e.g., retention times) of a set of molecules can be used to train a machine learning algorithm. However, descriptors are fixed molecular features which are not necessarily optimized for the given machine learning problem (e.g., to predict retention times). Recent advances in molecular machine learning make use of so-called graph convolutional networks (GCNs) to learn molecular representations from atoms and their bonds to adjacent atoms to optimize the molecular representation for the given problem. In this study, two GCNs were implemented to predict the retention times of molecules for three different chromatographic data sets and compared to seven benchmarks (including two state-of-the art machine learning models). Additionally, saliency maps were computed from trained GCNs to better interpret the importance of certain molecular sub-structures in the data sets. Based on the overall observations of this study, the GCNs performed better than all benchmarks, either significantly outperforming them (5-25% lower mean absolute error) or performing similar to them (<5% difference). Saliency maps revealed a significant difference in molecular sub-structures that are important for predictions of different chromatographic data sets (reversed-phase liquid chromatography vs hydrophilic interaction liquid chromatography).
Collapse
Affiliation(s)
- Alexander Kensert
- Department for Pharmaceutical and Pharmacological Sciences, University of Leuven (KU Leuven), Pharmaceutical Analysis, Herestraat 49, Leuven 3000, Belgium.,Department of Chemical Engineering, Vrije Universiteit Brussel, Pleinlaan 2, Brussel 1050, Belgium
| | - Robbin Bouwmeester
- VIB, VIB-UGent Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, Gent 9052, Belgium.,Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, Gent 9052, Belgium
| | - Kyriakos Efthymiadis
- Department for Pharmaceutical and Pharmacological Sciences, University of Leuven (KU Leuven), Pharmaceutical Analysis, Herestraat 49, Leuven 3000, Belgium.,Department of Computer Science, Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 9, Brussel 1050, Belgium
| | - Peter Van Broeck
- Department of Pharmaceutical Development and Manufacturing Sciences, Janssen Pharmaceutica, Turnhoutseweg 30, Beerse 2340, Belgium
| | - Gert Desmet
- Department of Chemical Engineering, Vrije Universiteit Brussel, Pleinlaan 2, Brussel 1050, Belgium
| | - Deirdre Cabooter
- Department for Pharmaceutical and Pharmacological Sciences, University of Leuven (KU Leuven), Pharmaceutical Analysis, Herestraat 49, Leuven 3000, Belgium
| |
Collapse
|
15
|
Boone M, Ramasamy P, Zuallaert J, Bouwmeester R, Van Moer B, Maddelein D, Turan D, Hulstaert N, Eeckhaut H, Vandermarliere E, Martens L, Degroeve S, De Neve W, Vranken W, Callewaert N. Massively parallel interrogation of protein fragment secretability using SECRiFY reveals features influencing secretory system transit. Nat Commun 2021; 12:6414. [PMID: 34741024 PMCID: PMC8571348 DOI: 10.1038/s41467-021-26720-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 10/15/2021] [Indexed: 11/09/2022] Open
Abstract
While transcriptome- and proteome-wide technologies to assess processes in protein biogenesis are now widely available, we still lack global approaches to assay post-ribosomal biogenesis events, in particular those occurring in the eukaryotic secretory system. We here develop a method, SECRiFY, to simultaneously assess the secretability of >105 protein fragments by two yeast species, S. cerevisiae and P. pastoris, using custom fragment libraries, surface display and a sequencing-based readout. Screening human proteome fragments with a median size of 50-100 amino acids, we generate datasets that enable datamining into protein features underlying secretability, revealing a striking role for intrinsic disorder and chain flexibility. The SECRiFY methodology generates sufficient amounts of annotated data for advanced machine learning methods to deduce secretability patterns. The finding that secretability is indeed a learnable feature of protein sequences provides a solid base for application-focused studies.
Collapse
Affiliation(s)
- Morgane Boone
- Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium. .,Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium. .,Department of Biochemistry and Biophysics, UCSF, San Francisco, CA, USA.
| | - Pathmanaban Ramasamy
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium ,grid.8767.e0000 0001 2290 8069Structural Biology Brussels, VUB, Brussels, Belgium ,grid.11486.3a0000000104788040Structural Biology Research Center, VIB, Brussels, Belgium ,Interuniversity Institute of Bioinformatics in Brussels (IB)2, ULB-VUB, Brussels, Belgium
| | - Jasper Zuallaert
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium ,grid.510328.dCenter for Biotech Data Science, Ghent University Global Campus, Songdo, Incheon, South Korea ,grid.5342.00000 0001 2069 7798IDLab, ELIS, UGent, Ghent, Belgium
| | - Robbin Bouwmeester
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Berre Van Moer
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Davy Maddelein
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Demet Turan
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Niels Hulstaert
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Hannah Eeckhaut
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Elien Vandermarliere
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Lennart Martens
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Sven Degroeve
- grid.11486.3a0000000104788040Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium ,grid.5342.00000 0001 2069 7798Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Wesley De Neve
- grid.510328.dCenter for Biotech Data Science, Ghent University Global Campus, Songdo, Incheon, South Korea ,grid.5342.00000 0001 2069 7798IDLab, ELIS, UGent, Ghent, Belgium
| | - Wim Vranken
- grid.8767.e0000 0001 2290 8069Structural Biology Brussels, VUB, Brussels, Belgium ,grid.11486.3a0000000104788040Structural Biology Research Center, VIB, Brussels, Belgium ,Interuniversity Institute of Bioinformatics in Brussels (IB)2, ULB-VUB, Brussels, Belgium
| | - Nico Callewaert
- Center for Medical Biotechnology, VIB, Zwijnaarde, Belgium. .,Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium.
| |
Collapse
|
16
|
Bouwmeester R, Gabriels R, Hulstaert N, Martens L, Degroeve S. DeepLC can predict retention times for peptides that carry as-yet unseen modifications. Nat Methods 2021; 18:1363-1369. [PMID: 34711972 DOI: 10.1038/s41592-021-01301-5] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 09/13/2021] [Indexed: 11/09/2022]
Abstract
The inclusion of peptide retention time prediction promises to remove peptide identification ambiguity in complex liquid chromatography-mass spectrometry identification workflows. However, due to the way peptides are encoded in current prediction models, accurate retention times cannot be predicted for modified peptides. This is especially problematic for fledgling open searches, which will benefit from accurate retention time prediction for modified peptides to reduce identification ambiguity. We present DeepLC, a deep learning peptide retention time predictor using peptide encoding based on atomic composition that allows the retention time of (previously unseen) modified peptides to be predicted accurately. We show that DeepLC performs similarly to current state-of-the-art approaches for unmodified peptides and, more importantly, accurately predicts retention times for modifications not seen during training. Moreover, we show that DeepLC's ability to predict retention times for any modification enables potentially incorrect identifications to be flagged in an open search of a wide variety of proteome data.
Collapse
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Niels Hulstaert
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium. .,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
17
|
Van Puyvelde B, Van Uytfanghe K, Tytgat O, Van Oudenhove L, Gabriels R, Bouwmeester R, Daled S, Van Den Bossche T, Ramasamy P, Verhelst S, De Clerck L, Corveleyn L, Willems S, Debunne N, Wynendaele E, De Spiegeleer B, Judak P, Roels K, De Wilde L, Van Eenoo P, Reyns T, Cherlet M, Dumont E, Debyser G, t'Kindt R, Sandra K, Gupta S, Drouin N, Harms A, Hankemeier T, Jones DJL, Gupta P, Lane D, Lane CS, El Ouadi S, Vincendet JB, Morrice N, Oehrle S, Tanna N, Silvester S, Hannam S, Sigloch FC, Bhangu-Uhlmann A, Claereboudt J, Anderson NL, Razavi M, Degroeve S, Cuypers L, Stove C, Lagrou K, Martens GA, Deforce D, Martens L, Vissers JPC, Dhaenens M. Cov-MS: A Community-Based Template Assay for Mass-Spectrometry-Based Protein Detection in SARS-CoV-2 Patients. JACS Au 2021. [PMID: 34254058 DOI: 10.1101/2020.11.18.20231688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Rising population density and global mobility are among the reasons why pathogens such as SARS-CoV-2, the virus that causes COVID-19, spread so rapidly across the globe. The policy response to such pandemics will always have to include accurate monitoring of the spread, as this provides one of the few alternatives to total lockdown. However, COVID-19 diagnosis is currently performed almost exclusively by reverse transcription polymerase chain reaction (RT-PCR). Although this is efficient, automatable, and acceptably cheap, reliance on one type of technology comes with serious caveats, as illustrated by recurring reagent and test shortages. We therefore developed an alternative diagnostic test that detects proteolytically digested SARS-CoV-2 proteins using mass spectrometry (MS). We established the Cov-MS consortium, consisting of 15 academic laboratories and several industrial partners to increase applicability, accessibility, sensitivity, and robustness of this kind of SARS-CoV-2 detection. This, in turn, gave rise to the Cov-MS Digital Incubator that allows other laboratories to join the effort, navigate, and share their optimizations and translate the assay into their clinic. As this test relies on viral proteins instead of RNA, it provides an orthogonal and complementary approach to RT-PCR using other reagents that are relatively inexpensive and widely available, as well as orthogonally skilled personnel and different instruments. Data are available via ProteomeXchange with identifier PXD022550.
Collapse
Affiliation(s)
- Bart Van Puyvelde
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Katleen Van Uytfanghe
- Laboratory of Toxicology, Department of Bioanalysis, Faculty of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium
| | - Olivier Tytgat
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
- Department of Life Science Technologies, Imec, 3000 Leuven, Belgium
| | | | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Simon Daled
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Pathmanaban Ramasamy
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, 1050 Brussels, Belgium
| | - Sigrid Verhelst
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Laura De Clerck
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Laura Corveleyn
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Sander Willems
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Nathan Debunne
- Drug Quality and Registration Group, Faculty of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium
| | - Evelien Wynendaele
- Drug Quality and Registration Group, Faculty of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium
| | - Bart De Spiegeleer
- Drug Quality and Registration Group, Faculty of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium
| | - Peter Judak
- Doping Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Kris Roels
- Doping Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Laurie De Wilde
- Doping Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Peter Van Eenoo
- Doping Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Tim Reyns
- Department of Clinical Chemistry, Ghent University Hospital, 9000 Ghent, Belgium
| | - Marc Cherlet
- Department of Pharmacology, Toxicology, and Biochemistry, Faculty of Veterinary Medicine, Ghent University 9000 Ghent, Belgium
| | - Emmie Dumont
- Research Institute for Chromatography (RIC), 8500 Kortrijk, Belgium
| | - Griet Debyser
- Research Institute for Chromatography (RIC), 8500 Kortrijk, Belgium
| | - Ruben t'Kindt
- Research Institute for Chromatography (RIC), 8500 Kortrijk, Belgium
| | - Koen Sandra
- Research Institute for Chromatography (RIC), 8500 Kortrijk, Belgium
| | - Surya Gupta
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Nicolas Drouin
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, 2311 G Leiden, The Netherlands
| | - Amy Harms
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, 2311 G Leiden, The Netherlands
| | - Thomas Hankemeier
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, 2311 G Leiden, The Netherlands
| | - Donald J L Jones
- Leicester Cancer Research Centre, RKCSB, University of Leicester, U.K., and John and Lucille van Geest Biomarker Facility, Cardiovascular Research Centre, Glenfield Hospital, Leicester LE1 7RH, United Kingdom
| | - Pankaj Gupta
- The Department of Chemical Pathology and Metabolic Diseases, Level 4, Sandringham Building, Leicester Royal Infirmary, Leicester LE1 7RH, United Kingdom
| | - Dan Lane
- The Department of Chemical Pathology and Metabolic Diseases, Level 4, Sandringham Building, Leicester Royal Infirmary, Leicester LE1 7RH, United Kingdom
| | | | - Said El Ouadi
- AB Sciex, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | | | - Nick Morrice
- AB Sciex, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | - Stuart Oehrle
- Waters Corporation, Milford, Massachusetts 01757, United States
| | - Nikunj Tanna
- Waters Corporation, Milford, Massachusetts 01757, United States
| | - Steve Silvester
- Alderley Analytical, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | - Sally Hannam
- Alderley Analytical, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | | | | | | | - N Leigh Anderson
- SISCAPA Assay Technologies, Inc., Washington, D.C. 20009, United States
| | - Morteza Razavi
- SISCAPA Assay Technologies, Inc., Washington, D.C. 20009, United States
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Lize Cuypers
- Clinical Department of Laboratory Medicine, UZ Leuven, KU Leuven, 3000 Leuven, Belgium
| | - Christophe Stove
- Laboratory of Toxicology, Department of Bioanalysis, Faculty of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium
| | - Katrien Lagrou
- Clinical Department of Laboratory Medicine, UZ Leuven, KU Leuven, 3000 Leuven, Belgium
| | - Geert A Martens
- AZ Delta Medical Laboratories, AZ Delta General Hospital, 8800 Roeselare, Belgium
| | - Dieter Deforce
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | | | - Maarten Dhaenens
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
18
|
Van Puyvelde B, Van Uytfanghe K, Tytgat O, Van Oudenhove L, Gabriels R, Bouwmeester R, Daled S, Van Den Bossche T, Ramasamy P, Verhelst S, De Clerck L, Corveleyn L, Willems S, Debunne N, Wynendaele E, De Spiegeleer B, Judak P, Roels K, De Wilde L, Van Eenoo P, Reyns T, Cherlet M, Dumont E, Debyser G, t’Kindt R, Sandra K, Gupta S, Drouin N, Harms A, Hankemeier T, Jones DJL, Gupta P, Lane D, Lane CS, El Ouadi S, Vincendet JB, Morrice N, Oehrle S, Tanna N, Silvester S, Hannam S, Sigloch FC, Bhangu-Uhlmann A, Claereboudt J, Anderson NL, Razavi M, Degroeve S, Cuypers L, Stove C, Lagrou K, Martens GA, Deforce D, Martens L, Vissers JPC, Dhaenens M. Cov-MS: A Community-Based Template Assay for Mass-Spectrometry-Based Protein Detection in SARS-CoV-2 Patients. JACS Au 2021; 1:750-765. [PMID: 34254058 PMCID: PMC8230961 DOI: 10.1021/jacsau.1c00048] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Indexed: 05/03/2023]
Abstract
Rising population density and global mobility are among the reasons why pathogens such as SARS-CoV-2, the virus that causes COVID-19, spread so rapidly across the globe. The policy response to such pandemics will always have to include accurate monitoring of the spread, as this provides one of the few alternatives to total lockdown. However, COVID-19 diagnosis is currently performed almost exclusively by reverse transcription polymerase chain reaction (RT-PCR). Although this is efficient, automatable, and acceptably cheap, reliance on one type of technology comes with serious caveats, as illustrated by recurring reagent and test shortages. We therefore developed an alternative diagnostic test that detects proteolytically digested SARS-CoV-2 proteins using mass spectrometry (MS). We established the Cov-MS consortium, consisting of 15 academic laboratories and several industrial partners to increase applicability, accessibility, sensitivity, and robustness of this kind of SARS-CoV-2 detection. This, in turn, gave rise to the Cov-MS Digital Incubator that allows other laboratories to join the effort, navigate, and share their optimizations and translate the assay into their clinic. As this test relies on viral proteins instead of RNA, it provides an orthogonal and complementary approach to RT-PCR using other reagents that are relatively inexpensive and widely available, as well as orthogonally skilled personnel and different instruments. Data are available via ProteomeXchange with identifier PXD022550.
Collapse
Affiliation(s)
- Bart Van Puyvelde
- ProGenTomics,
Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Katleen Van Uytfanghe
- Laboratory
of Toxicology, Department of Bioanalysis, Faculty of Pharmaceutical
Sciences, Ghent University, 9000 Ghent, Belgium
| | - Olivier Tytgat
- ProGenTomics,
Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
- Department
of Life Science Technologies, Imec, 3000 Leuven, Belgium
| | | | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Robbin Bouwmeester
- VIB-UGent
Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Simon Daled
- ProGenTomics,
Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Pathmanaban Ramasamy
- VIB-UGent
Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
- Interuniversity
Institute of Bioinformatics in Brussels, ULB/VUB, 1050 Brussels, Belgium
| | - Sigrid Verhelst
- ProGenTomics,
Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Laura De Clerck
- ProGenTomics,
Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Laura Corveleyn
- ProGenTomics,
Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Sander Willems
- ProGenTomics,
Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Nathan Debunne
- Drug Quality and Registration Group, Faculty of Pharmaceutical
Sciences, Ghent University, 9000 Ghent, Belgium
| | - Evelien Wynendaele
- Drug Quality and Registration Group, Faculty of Pharmaceutical
Sciences, Ghent University, 9000 Ghent, Belgium
| | - Bart De Spiegeleer
- Drug Quality and Registration Group, Faculty of Pharmaceutical
Sciences, Ghent University, 9000 Ghent, Belgium
| | - Peter Judak
- Doping
Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Kris Roels
- Doping
Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Laurie De Wilde
- Doping
Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Peter Van Eenoo
- Doping
Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Tim Reyns
- Department
of Clinical Chemistry, Ghent University
Hospital, 9000 Ghent, Belgium
| | - Marc Cherlet
- Department
of Pharmacology, Toxicology, and Biochemistry, Faculty of Veterinary
Medicine, Ghent University 9000 Ghent, Belgium
| | - Emmie Dumont
- Research Institute for Chromatography
(RIC), 8500 Kortrijk, Belgium
| | - Griet Debyser
- Research Institute for Chromatography
(RIC), 8500 Kortrijk, Belgium
| | - Ruben t’Kindt
- Research Institute for Chromatography
(RIC), 8500 Kortrijk, Belgium
| | - Koen Sandra
- Research Institute for Chromatography
(RIC), 8500 Kortrijk, Belgium
| | - Surya Gupta
- VIB-UGent
Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Nicolas Drouin
- Division
of Systems Biomedicine and Pharmacology, Leiden Academic
Centre for Drug Research, Leiden University, 2311 G Leiden, The Netherlands
| | - Amy Harms
- Division
of Systems Biomedicine and Pharmacology, Leiden Academic
Centre for Drug Research, Leiden University, 2311 G Leiden, The Netherlands
| | - Thomas Hankemeier
- Division
of Systems Biomedicine and Pharmacology, Leiden Academic
Centre for Drug Research, Leiden University, 2311 G Leiden, The Netherlands
| | - Donald J. L. Jones
- Leicester
Cancer Research Centre, RKCSB, University of Leicester, U.K., and
John and Lucille van Geest Biomarker Facility, Cardiovascular Research
Centre, Glenfield Hospital, Leicester LE1 7RH, United Kingdom
| | - Pankaj Gupta
- The
Department of Chemical Pathology and Metabolic Diseases, Level 4,
Sandringham Building, Leicester Royal Infirmary, Leicester LE1 7RH, United Kingdom
| | - Dan Lane
- The
Department of Chemical Pathology and Metabolic Diseases, Level 4,
Sandringham Building, Leicester Royal Infirmary, Leicester LE1 7RH, United Kingdom
| | | | - Said El Ouadi
- AB Sciex, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | | | - Nick Morrice
- AB Sciex, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | - Stuart Oehrle
- Waters Corporation, Milford, Massachusetts 01757, United States
| | - Nikunj Tanna
- Waters Corporation, Milford, Massachusetts 01757, United States
| | - Steve Silvester
- Alderley Analytical, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | - Sally Hannam
- Alderley Analytical, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | | | | | | | - N. Leigh Anderson
- SISCAPA Assay Technologies, Inc., Washington, D.C. 20009, United States
| | - Morteza Razavi
- SISCAPA Assay Technologies, Inc., Washington, D.C. 20009, United States
| | - Sven Degroeve
- VIB-UGent
Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Lize Cuypers
- Clinical
Department of Laboratory Medicine, UZ Leuven, KU Leuven, 3000 Leuven, Belgium
| | - Christophe Stove
- Laboratory
of Toxicology, Department of Bioanalysis, Faculty of Pharmaceutical
Sciences, Ghent University, 9000 Ghent, Belgium
| | - Katrien Lagrou
- Clinical
Department of Laboratory Medicine, UZ Leuven, KU Leuven, 3000 Leuven, Belgium
| | - Geert A. Martens
- AZ
Delta Medical Laboratories, AZ Delta General
Hospital, 8800 Roeselare, Belgium
| | - Dieter Deforce
- ProGenTomics,
Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent
Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | | | - Maarten Dhaenens
- ProGenTomics,
Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
19
|
Salz R, Bouwmeester R, Gabriels R, Degroeve S, Martens L, Volders PJ, 't Hoen PAC. Personalized Proteome: Comparing Proteogenomics and Open Variant Search Approaches for Single Amino Acid Variant Detection. J Proteome Res 2021; 20:3353-3364. [PMID: 33998808 PMCID: PMC8280751 DOI: 10.1021/acs.jproteome.1c00264] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Indexed: 12/30/2022]
Abstract
Discovery of variant peptides such as a single amino acid variant (SAAV) in shotgun proteomics data is essential for personalized proteomics. Both the resolution of shotgun proteomics methods and the search engines have improved dramatically, allowing for confident identification of SAAV peptides. However, it is not yet known if these methods are truly successful in accurately identifying SAAV peptides without prior genomic information in the search database. We studied this in unprecedented detail by exploiting publicly available long-read RNA sequences and shotgun proteomics data from the gold standard reference cell line NA12878. Searching spectra from this cell line with the state-of-the-art open modification search engine ionbot against carefully curated search databases resulted in 96.7% false-positive SAAVs and an 85% lower true positive rate than searching with peptide search databases that incorporate prior genetic information. While adding genetic variants to the search database remains indispensable for correct peptide identification, inclusion of long-read RNA sequences in the search database contributes only 0.3% new peptide identifications. These findings reveal the differences in SAAV detection that result from various approaches, providing guidance to researchers studying SAAV peptides and developers of peptide spectrum identification tools.
Collapse
Affiliation(s)
- Renee Salz
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Pieter-Jan Volders
- VIB-UGent Center for Medical Biotechnology VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Peter A C 't Hoen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands
| |
Collapse
|
20
|
C Silva AS, Bouwmeester R, Martens L, Degroeve S. Accurate peptide fragmentation predictions allow data driven approaches to replace and improve upon proteomics search engine scoring functions. Bioinformatics 2020; 35:5243-5248. [PMID: 31077310 DOI: 10.1093/bioinformatics/btz383] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 04/01/2019] [Accepted: 05/02/2019] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION The use of post-processing tools to maximize the information gained from a proteomics search engine is widely accepted and used by the community, with the most notable example being Percolator-a semi-supervised machine learning model which learns a new scoring function for a given dataset. The usage of such tools is however bound to the search engine's scoring scheme, which doesn't always make full use of the intensity information present in a spectrum. We aim to show how this tool can be applied in such a way that maximizes the use of spectrum intensity information by leveraging another machine learning-based tool, MS2PIP. MS2PIP predicts fragment ion peak intensities. RESULTS We show how comparing predicted intensities to annotated experimental spectra by calculating direct similarity metrics provides enough information for a tool such as Percolator to accurately separate two classes of peptide-to-spectrum matches. This approach allows using more information out of the data (compared with simpler intensity based metrics, like peak counting or explained intensities summing) while maintaining control of statistics such as the false discovery rate. AVAILABILITY AND IMPLEMENTATION All of the code is available online at https://github.com/compomics/ms2rescore. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ana S C Silva
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine, Ghent, Belgium.,Bioinformatics Institute, Ghent University, Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine, Ghent, Belgium.,Bioinformatics Institute, Ghent University, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine, Ghent, Belgium.,Bioinformatics Institute, Ghent University, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine, Ghent, Belgium.,Bioinformatics Institute, Ghent University, Ghent, Belgium
| |
Collapse
|
21
|
Bouwmeester R, Gabriels R, Van Den Bossche T, Martens L, Degroeve S. The Age of Data-Driven Proteomics: How Machine Learning Enables Novel Workflows. Proteomics 2020; 20:e1900351. [PMID: 32267083 DOI: 10.1002/pmic.201900351] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 03/21/2020] [Indexed: 12/30/2022]
Abstract
A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. Therefore, highly promising recent machine learning developments in proteomics are pointed out in this viewpoint, alongside some of the remaining challenges.
Collapse
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| |
Collapse
|
22
|
Bouwmeester R, Martens L, Degroeve S. Generalized Calibration Across Liquid Chromatography Setups for Generic Prediction of Small-Molecule Retention Times. Anal Chem 2020; 92:6571-6578. [DOI: 10.1021/acs.analchem.0c00233] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology VIB, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology VIB, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology VIB, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000 Ghent, Belgium
| |
Collapse
|
23
|
Nye LC, Williams JP, Munjoma NC, Letertre MP, Coen M, Bouwmeester R, Martens L, Swann JR, Nicholson JK, Plumb RS, McCullagh M, Gethings LA, Lai S, Langridge JI, Vissers JP, Wilson ID. A comparison of collision cross section values obtained via travelling wave ion mobility-mass spectrometry and ultra high performance liquid chromatography-ion mobility-mass spectrometry: Application to the characterisation of metabolites in rat urine. J Chromatogr A 2019; 1602:386-396. [DOI: 10.1016/j.chroma.2019.06.056] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 06/24/2019] [Accepted: 06/26/2019] [Indexed: 01/01/2023]
|
24
|
Bouwmeester R, Martens L, Degroeve S. Comprehensive and Empirical Evaluation of Machine Learning Algorithms for Small Molecule LC Retention Time Prediction. Anal Chem 2019; 91:3694-3703. [DOI: 10.1021/acs.analchem.8b05820] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
25
|
Schlueter R, Park G, Bouwmeester R, Shu L, Lotfalian M, Rastgoufard P, Shayanfar A. Simulation and Assessment of Wind Array Power Variations Based on Simultaneous Wind Speed Measurements. ACTA ACUST UNITED AC 1984. [DOI: 10.1109/tpas.1984.318705] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|