1
|
Buur LM, Declercq A, Strobl M, Bouwmeester R, Degroeve S, Martens L, Dorfer V, Gabriels R. MS 2Rescore 3.0 Is a Modular, Flexible, and User-Friendly Platform to Boost Peptide Identifications, as Showcased with MS Amanda 3.0. J Proteome Res 2024; 23:3200-3207. [PMID: 38491990 DOI: 10.1021/acs.jproteome.3c00785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2024]
Abstract
Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MS2Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS2Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation. To showcase this new version, we connected MS2Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS2Rescore offers a powerful solution to boost peptide identifications. MS2Rescore's modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect the MS2Rescore to be a valuable tool for the wider proteomics community. MS2Rescore is available at https://github.com/compomics/ms2rescore.
Collapse
Affiliation(s)
- Louise M Buur
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Marina Strobl
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Viktoria Dorfer
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| |
Collapse
|
2
|
Naryzhny S. Puzzle of Proteoform Variety-Where Is a Key? Proteomes 2024; 12:15. [PMID: 38804277 PMCID: PMC11130821 DOI: 10.3390/proteomes12020015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/03/2024] [Accepted: 05/06/2024] [Indexed: 05/29/2024] Open
Abstract
One of the human proteome puzzles is an imbalance between the theoretically calculated and experimentally measured amounts of proteoforms. Considering the possibility of combinations of different post-translational modifications (PTMs), the quantity of possible proteoforms is huge. An estimation gives more than a million different proteoforms in each cell type. But, it seems that there is strict control over the production and maintenance of PTMs. Although the potential complexity of proteoforms due to PTMs is tremendous, available information indicates that only a small part of it is being implemented. As a result, a protein could have many proteoforms according to the number of modification sites, but because of different systems of personal regulation, the profile of PTMs for a given protein in each organism is slightly different.
Collapse
Affiliation(s)
- Stanislav Naryzhny
- B. P. Konstantinov Petersburg Nuclear Physics Institute, National Research Center "Kurchatov Institute", Leningrad Region, Gatchina 188300, Russia
| |
Collapse
|
3
|
Jeong K, Kaulich PT, Jung W, Kim J, Tholey A, Kohlbacher O. Precursor deconvolution error estimation: The missing puzzle piece in false discovery rate in top-down proteomics. Proteomics 2024; 24:e2300068. [PMID: 37997224 DOI: 10.1002/pmic.202300068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/09/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023]
Abstract
Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform-level information than conventional bottom-up proteomics (BUP) that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentation, and data analysis, reliable and reproducible data analysis still remains one of the major bottlenecks in TDP. A key step for robust data analysis is the establishment of an objective estimation of proteoform-level false discovery rate (FDR) in proteoform identification. The most widely used FDR estimation scheme is based on the target-decoy approach (TDA), which has primarily been established for BUP. We present evidence that the TDA-based FDR estimation may not work at the proteoform-level due to an overlooked factor, namely the erroneous deconvolution of precursor masses, which leads to incorrect FDR estimation. We argue that the conventional TDA-based FDR in proteoform identification is in fact protein-level FDR rather than proteoform-level FDR unless precursor deconvolution error rate is taken into account. To address this issue, we propose a formula to correct for proteoform-level FDR bias by combining TDA-based FDR and precursor deconvolution error rate.
Collapse
Affiliation(s)
- Kyowon Jeong
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Philipp T Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Wonhyeuk Jung
- Department of Cell Biology, Yale School of Medicine, New Haven, Connecticut, USA
| | - Jihyung Kim
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Oliver Kohlbacher
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- Translational Bioinformatics, University Hospital Tübingen, Tübingen, Germany
| |
Collapse
|
4
|
Walzer M, Jeong K, Tabb DL, Vizcaíno JA. TopDownApp: An open and modular platform for analysis and visualisation of top-down proteomics data. Proteomics 2024; 24:e2200403. [PMID: 37787899 DOI: 10.1002/pmic.202200403] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/13/2023] [Accepted: 09/13/2023] [Indexed: 10/04/2023]
Abstract
Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise their adoption. In this context, there are numerous improvements that are possible in the area of open science practices, including a greater application of the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. These include, for example, increased data sharing practices and readily available open data standards. Additionally, the field would benefit from the development of open data analysis workflows that can enable data reuse of public datasets, something that is increasingly common in other proteomics fields.
Collapse
Affiliation(s)
- Mathias Walzer
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Kyowon Jeong
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen, Germany
| | - David L Tabb
- Institut Pasteur, Université Paris Cité, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris, France
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
5
|
Gabriel W, Picciani M, The M, Wilhelm M. Deep Learning-Assisted Analysis of Immunopeptidomics Data. Methods Mol Biol 2024; 2758:457-483. [PMID: 38549030 DOI: 10.1007/978-1-0716-3646-6_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Liquid chromatography-coupled mass spectrometry (LC-MS/MS) is the primary method to obtain direct evidence for the presentation of disease- or patient-specific human leukocyte antigen (HLA). However, compared to the analysis of tryptic peptides in proteomics, the analysis of HLA peptides still poses computational and statistical challenges. Recently, fragment ion intensity-based matching scores assessing the similarity between predicted and observed spectra were shown to substantially increase the number of confidently identified peptides, particularly in use cases where non-tryptic peptides are analyzed. In this chapter, we describe in detail three procedures on how to benefit from state-of-the-art deep learning models to analyze and validate single spectra, single measurements, and multiple measurements in mass spectrometry-based immunopeptidomics. For this, we explain how to use the Universal Spectrum Explorer (USE), online Oktoberfest, and offline Oktoberfest. For intensity-based scoring, Oktoberfest uses fragment ion intensity and retention time predictions from the deep learning framework Prosit, a deep neural network trained on a very large number of synthetic peptides and tandem mass spectra generated within the ProteomeTools project. The examples shown highlight how deep learning-assisted analysis can increase the number of identified HLA peptides, facilitate the discovery of confidently identified neo-epitopes, or provide assistance in the assessment of the presence of cryptic peptides, such as spliced peptides.
Collapse
Affiliation(s)
- Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
6
|
Dowling P, Swandulla D, Ohlendieck K. Mass Spectrometry-Based Proteomic Technology and Its Application to Study Skeletal Muscle Cell Biology. Cells 2023; 12:2560. [PMID: 37947638 PMCID: PMC10649384 DOI: 10.3390/cells12212560] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 10/27/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023] Open
Abstract
Voluntary striated muscles are characterized by a highly complex and dynamic proteome that efficiently adapts to changed physiological demands or alters considerably during pathophysiological dysfunction. The skeletal muscle proteome has been extensively studied in relation to myogenesis, fiber type specification, muscle transitions, the effects of physical exercise, disuse atrophy, neuromuscular disorders, muscle co-morbidities and sarcopenia of old age. Since muscle tissue accounts for approximately 40% of body mass in humans, alterations in the skeletal muscle proteome have considerable influence on whole-body physiology. This review outlines the main bioanalytical avenues taken in the proteomic characterization of skeletal muscle tissues, including top-down proteomics focusing on the characterization of intact proteoforms and their post-translational modifications, bottom-up proteomics, which is a peptide-centric method concerned with the large-scale detection of proteins in complex mixtures, and subproteomics that examines the protein composition of distinct subcellular fractions. Mass spectrometric studies over the last two decades have decisively improved our general cell biological understanding of protein diversity and the heterogeneous composition of individual myofibers in skeletal muscles. This detailed proteomic knowledge can now be integrated with findings from other omics-type methodologies to establish a systems biological view of skeletal muscle function.
Collapse
Affiliation(s)
- Paul Dowling
- Department of Biology, Maynooth University, National University of Ireland, W23 F2H6 Maynooth, Co. Kildare, Ireland;
- Kathleen Lonsdale Institute for Human Health Research, Maynooth University, W23 F2H6 Maynooth, Co. Kildare, Ireland
| | - Dieter Swandulla
- Institute of Physiology, Faculty of Medicine, University of Bonn, D53115 Bonn, Germany;
| | - Kay Ohlendieck
- Department of Biology, Maynooth University, National University of Ireland, W23 F2H6 Maynooth, Co. Kildare, Ireland;
- Kathleen Lonsdale Institute for Human Health Research, Maynooth University, W23 F2H6 Maynooth, Co. Kildare, Ireland
| |
Collapse
|
7
|
Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S. UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023; 22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.
Collapse
Affiliation(s)
- E H Bowler-Barnett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Luo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - S Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
8
|
Tabb DL, Jeong K, Druart K, Gant MS, Brown KA, Nicora C, Zhou M, Couvillion S, Nakayasu E, Williams JE, Peterson HK, McGuire MK, McGuire MA, Metz TO, Chamot-Rooke J. Comparing Top-Down Proteoform Identification: Deconvolution, PrSM Overlap, and PTM Detection. J Proteome Res 2023. [PMID: 37235544 DOI: 10.1021/acs.jproteome.2c00673] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Generating top-down tandem mass spectra (MS/MS) from complex mixtures of proteoforms benefits from improvements in fractionation, separation, fragmentation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and match-counting approaches producing high-quality proteoform-spectrum matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification (ProSight PD, TopPIC, MSPathFinderT, and pTop) in their yield of PrSMs while controlling false discovery rate. We evaluated deconvolution engines (ThermoFisher Xtract, Bruker AutoMSn, Matrix Science Mascot Distiller, TopFD, and FLASHDeconv) in both ThermoFisher Orbitrap-class and Bruker maXis Q-TOF data (PXD033208) to produce consistent precursor charges and mass determinations. Finally, we sought post-translational modifications (PTMs) in proteoforms from bovine milk (PXD031744) and human ovarian tissue. Contemporary identification workflows produce excellent PrSM yields, although approximately half of all identified proteoforms from these four pipelines were specific to only one workflow. Deconvolution algorithms disagree on precursor masses and charges, contributing to identification variability. Detection of PTMs is inconsistent among algorithms. In bovine milk, 18% of PrSMs produced by pTop and TopMG were singly phosphorylated, but this percentage fell to 1% for one algorithm. Applying multiple search engines produces more comprehensive assessments of experiments. Top-down algorithms would benefit from greater interoperability.
Collapse
Affiliation(s)
- David L Tabb
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Kyowon Jeong
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen 72076, Germany
| | - Karen Druart
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Megan S Gant
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Kyle A Brown
- School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin 53705, United States
| | - Carrie Nicora
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Mowei Zhou
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Sneha Couvillion
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Ernesto Nakayasu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Janet E Williams
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Haley K Peterson
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Michelle K McGuire
- Margaret Ritchie School of Family and Consumer Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Mark A McGuire
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Julia Chamot-Rooke
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| |
Collapse
|
9
|
Deutsch EW, Mendoza L, Shteynberg DD, Hoopmann MR, Sun Z, Eng JK, Moritz RL. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 2023; 22:615-624. [PMID: 36648445 PMCID: PMC10166710 DOI: 10.1021/acs.jproteome.2c00624] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
10
|
Gabriels R, Declercq A, Bouwmeester R, Degroeve S, Martens L. psm_utils: A High-Level Python API for Parsing and Handling Peptide-Spectrum Matches and Proteomics Search Results. J Proteome Res 2023; 22:557-560. [PMID: 36508242 DOI: 10.1021/acs.jproteome.2c00609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A plethora of proteomics search engine output file formats are in circulation. This lack of standardized output files greatly complicates generic downstream processing of peptide-spectrum matches (PSMs) and PSM files. While standards exist to solve this problem, these are far from universally supported by search engines. Moreover, software libraries are available to read a selection of PSM file formats, but a package to parse PSM files into a unified data structure has been missing. Here, we present psm_utils, a Python package to read and write various PSM file formats and to handle peptidoforms, PSMs, and PSM lists in a unified and user-friendly Python-, command line-, and web-interface. psm_utils was developed with pragmatism and maintainability in mind, adhering to community standards and relying on existing packages where possible. The Python API and command line interface greatly facilitate handling various PSM file formats. Moreover, a user-friendly web application was built using psm_utils that allows anyone to interconvert PSM files and retrieve basic PSM statistics. psm_utils is freely available under the permissive Apache2 license at https://github.com/compomics/psm_utils.
Collapse
Affiliation(s)
- Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
11
|
Bittremieux W, Levitsky L, Pilz M, Sachsenberg T, Huber F, Wang M, Dorrestein PC. Unified and Standardized Mass Spectrometry Data Processing in Python Using spectrum_utils. J Proteome Res 2023; 22:625-631. [PMID: 36688502 DOI: 10.1021/acs.jproteome.2c00632] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
spectrum_utils is a Python package for mass spectrometry data processing and visualization. Since its introduction, spectrum_utils has grown into a fundamental software solution that powers various applications in proteomics and metabolomics, ranging from spectrum preprocessing prior to spectrum identification and machine learning applications to spectrum plotting from online data repositories and assisting data analysis tasks for dozens of other projects. Here, we present updates to spectrum_utils, which include new functionality to integrate mass spectrometry community data standards, enhanced mass spectral data processing, and unified mass spectral data visualization in Python. spectrum_utils is freely available as open source at https://github.com/bittremieux/spectrum_utils.
Collapse
Affiliation(s)
- Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Lev Levitsky
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Matteo Pilz
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
| | - Timo Sachsenberg
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
| | - Florian Huber
- Centre for Digitalisation and Digitality, University of Applied Sciences Düsseldorf, 40476 Düsseldorf, Germany
| | - Mingxun Wang
- Department of Computer Science, University of California─Riverside, Riverside, California 92507, United States
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California─San Diego, La Jolla, California 92093, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California─San Diego, La Jolla California 92093, United States
| |
Collapse
|
12
|
Deutsch EW, Vizcaíno JA, Jones AR, Binz PA, Lam H, Klein J, Bittremieux W, Perez-Riverol Y, Tabb DL, Walzer M, Ricard-Blum S, Hermjakob H, Neumann S, Mak TD, Kawano S, Mendoza L, Van Den Bossche T, Gabriels R, Bandeira N, Carver J, Pullman B, Sun Z, Hoffmann N, Shofstahl J, Zhu Y, Licata L, Quaglia F, Tosatto SCE, Orchard SE. Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work. J Proteome Res 2023; 22:287-301. [PMID: 36626722 PMCID: PMC9903322 DOI: 10.1021/acs.jproteome.2c00637] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Indexed: 01/11/2023]
Abstract
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Pierre-Alain Binz
- Clinical
Chemistry Service, Lausanne University Hospital, 1011 976 Lausanne, Switzerland
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, P. R. China.
| | - Joshua Klein
- Program for
Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Wout Bittremieux
- Skaggs
School
of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - David L. Tabb
- SA MRC
Centre for TB Research, DST/NRF Centre of Excellence for Biomedical
TB Research, Division of Molecular Biology and Human Genetics, Faculty
of Medicine and Health Sciences, Stellenbosch
University, Cape Town 7602, South Africa
| | - Mathias Walzer
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Sylvie Ricard-Blum
- Univ.
Lyon, Université Lyon 1, ICBMS, UMR 5246, 69622 Villeurbanne, France
| | - Henning Hermjakob
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Steffen Neumann
- Bioinformatics
and Scientific Data, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), 04103 Halle-Jena-Leipzig, Germany
| | - Tytus D. Mak
- Mass Spectrometry
Data Center, National Institute of Standards
and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United
States
| | - Shin Kawano
- Database
Center for Life Science, Joint Support Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, Japan
- Faculty
of Contemporary Society, Toyama University
of International Studies, Toyama 930-1292, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Nuno Bandeira
- Skaggs
School
of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Benjamin Pullman
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Jim Shofstahl
- Thermo
Fisher Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Luana Licata
- Fondazione
Human Technopole, 20157 Milan, Italy
- Department
of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Federica Quaglia
- Institute
of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), 70126 Bari, Italy
- Department
of Biomedical Sciences, University of Padova, 35131 Padova, Italy
| | | | - Sandra E. Orchard
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
13
|
Deutsch EW, Bandeira N, Perez-Riverol Y, Sharma V, Carver J, Mendoza L, Kundu DJ, Wang S, Bandla C, Kamatchinathan S, Hewapathirana S, Pullman B, Wertz J, Sun Z, Kawano S, Okuda S, Watanabe Y, MacLean B, MacCoss M, Zhu Y, Ishihama Y, Vizcaíno J. The ProteomeXchange consortium at 10 years: 2023 update. Nucleic Acids Res 2023; 51:D1539-D1548. [PMID: 36370099 PMCID: PMC9825490 DOI: 10.1093/nar/gkac1040] [Citation(s) in RCA: 199] [Impact Index Per Article: 199.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/20/2022] [Accepted: 10/23/2022] [Indexed: 11/13/2022] Open
Abstract
Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.
Collapse
Affiliation(s)
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Jeremy J Carver
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Luis Mendoza
- Institute for Systems Biology, Seattle WA 98109, USA
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Benjamin S Pullman
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Julie Wertz
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Zhi Sun
- Institute for Systems Biology, Seattle WA 98109, USA
| | - Shin Kawano
- Faculty of Contemporary Society, Toyama University of International Studies, Toyama 930-1292, Japan
- Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, Japan
- School of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | - Yu Watanabe
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | | | | | - Yunping Zhu
- Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
14
|
Thakur M, Bateman A, Brooksbank C, Freeberg M, Harrison M, Hartley M, Keane T, Kleywegt G, Leach A, Levchenko M, Morgan S, McDonagh E, Orchard S, Papatheodorou I, Velankar S, Vizcaino J, Witham R, Zdrazil B, McEntyre J. EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022. Nucleic Acids Res 2023; 51:D9-D17. [PMID: 36477213 PMCID: PMC9825486 DOI: 10.1093/nar/gkac1098] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 10/21/2022] [Accepted: 10/31/2022] [Indexed: 12/13/2022] Open
Abstract
The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.
Collapse
Affiliation(s)
| | - Alex Bateman
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Cath Brooksbank
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mallory Freeberg
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Melissa Harrison
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matthew Hartley
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Thomas Keane
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Gerard Kleywegt
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Andrew Leach
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mariia Levchenko
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sarah Morgan
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ellen M McDonagh
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
- OpenTargets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sandra Orchard
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Irene Papatheodorou
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sameer Velankar
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Juan Antonio Vizcaino
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Rick Witham
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Barbara Zdrazil
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | | |
Collapse
|
15
|
Seeing the complete picture: proteins in top-down mass spectrometry. Essays Biochem 2022; 67:283-300. [PMID: 36468679 DOI: 10.1042/ebc20220098] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/11/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022]
Abstract
Abstract
Top-down protein mass spectrometry can provide unique insights into protein sequence and structure, including precise proteoform identification and study of protein–ligand and protein–protein interactions. In contrast with the commonly applied bottom-up approach, top-down approaches do not include digestion of the protein of interest into small peptides, but instead rely on the ionization and subsequent fragmentation of intact proteins. As such, it is fundamentally the only way to fully characterize the composition of a proteoform. Here, we provide an overview of how a top-down protein mass spectrometry experiment is performed and point out recent applications from the literature to the reader. While some parts of the top-down workflow are broadly applicable, different research questions are best addressed with specific experimental designs. The most important divide is between studies that prioritize sequence information (i.e., proteoform identification) versus structural information (e.g., conformational studies, or mapping protein–protein or protein–ligand interactions). Another important consideration is whether to work under native or denaturing solution conditions, and the overall complexity of the sample also needs to be taken into account, as it determines whether (chromatographic) separation is required prior to MS analysis. In this review, we aim to provide enough information to support both newcomers and more experienced readers in the decision process of how to answer a potential research question most efficiently and to provide an overview of the methods that exist to answer these questions.
Collapse
|