1
|
Madej D, Lam H. Query Mix-Max Method for FDR Estimation Supported by Entrapment Queries. J Proteome Res 2025; 24:1135-1147. [PMID: 39907052 PMCID: PMC11894652 DOI: 10.1021/acs.jproteome.4c00744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 01/24/2025] [Accepted: 01/28/2025] [Indexed: 02/06/2025]
Abstract
Estimating the false discovery rate (FDR) is one of the key steps in ensuring appropriate error control in the analysis of shotgun proteomics data. Traditional estimation methods typically rely on decoy sequence databases or spectral libraries, which may not always provide satisfactory results due to limitations of decoy construction methods. This study introduces the query mix-max (QMM) method, a decoy-free alternative for FDR estimation in proteomics. The QMM framework builds upon the existing mix-max procedure but replaces decoy matches with entrapment queries to estimate the number of false positive discoveries. Through simulations and real data set analyses, the QMM method was demonstrated to provide reasonably accurate FDR estimation across various scenarios, particularly when smaller sample-to-entrapment spectra ratios were achieved. The QMM method tends to be conservatively biased, particularly at higher FDR values, which can ensure stringent FDR control. While flexible, the protocol's effectiveness may vary depending on the evolutionary distance between the sample and entrapment organisms. It also requires a sufficient number of entrapment queries to provide stable FDR estimates, especially for low FDR values. Despite these limitations, the QMM method is a promising alternative as one of the first query-based FDR estimation approaches in shotgun proteomics.
Collapse
Affiliation(s)
- Dominik Madej
- Department of Chemical and
Biological Engineering, The Hong Kong University
of Science and Technology, Hong
Kong, China
| | - Henry Lam
- Department of Chemical and
Biological Engineering, The Hong Kong University
of Science and Technology, Hong
Kong, China
| |
Collapse
|
2
|
Hackl S, Jachmann C, Witte Paz M, Harbig TA, Martens L, Nieselt K. PTMVision: An Interactive Visualization Webserver for Post-translational Modifications of Proteins. J Proteome Res 2025; 24:919-928. [PMID: 39772617 PMCID: PMC11812001 DOI: 10.1021/acs.jproteome.4c00679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 11/19/2024] [Accepted: 12/19/2024] [Indexed: 01/11/2025]
Abstract
Recent improvements in methods and instruments used in mass spectrometry have greatly enhanced the detection of protein post-translational modifications (PTMs). On the computational side, the adoption of open modification search strategies now allows for the identification of a wide variety of PTMs, potentially revealing hundreds to thousands of distinct modifications in biological samples. While the observable part of the proteome is continuously growing, the visualization and interpretation of this vast amount of data in a comprehensive fashion is not yet possible. There is a clear need for methods to easily investigate the PTM landscape and to thoroughly examine modifications on proteins of interest from acquired mass spectrometry data. We present PTMVision, a web server providing an intuitive and simple way to interactively explore PTMs identified in mass spectrometry-based proteomics experiments and to analyze the modification sites of proteins within relevant context. It offers a variety of tools to visualize the PTM landscape from different angles and at different levels, such as 3D structures and contact maps, UniMod classification summaries, and site specific overviews. The web server's user-friendly interface ensures accessibility across diverse scientific backgrounds. PTMVision is available at https://ptmvision-tuevis.cs.uni-tuebingen.de/.
Collapse
Affiliation(s)
- Simon Hackl
- Institute
for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Sand 14, 72076 Tubingen, Germany
| | - Caroline Jachmann
- VIB-UGent
Center for Medical Biotechnology, VIB, Suzanne Tassierstraat 1, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, Ghent 9052, Belgium
| | - Mathias Witte Paz
- Institute
for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Sand 14, 72076 Tubingen, Germany
| | - Theresa Anisja Harbig
- Institute
for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Sand 14, 72076 Tubingen, Germany
| | - Lennart Martens
- VIB-UGent
Center for Medical Biotechnology, VIB, Suzanne Tassierstraat 1, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, Ghent 9052, Belgium
| | - Kay Nieselt
- Institute
for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Sand 14, 72076 Tubingen, Germany
| |
Collapse
|
3
|
Fierro-Monti I, Fröhlich K, Schori C, Schmidt A. Assessment of Data-Independent Acquisition Mass Spectrometry (DIA-MS) for the Identification of Single Amino Acid Variants. Proteomes 2024; 12:33. [PMID: 39585120 PMCID: PMC11587465 DOI: 10.3390/proteomes12040033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 10/25/2024] [Accepted: 10/30/2024] [Indexed: 11/26/2024] Open
Abstract
Proteogenomics integrates genomic and proteomic data to elucidate cellular processes by identifying variant peptides, including single amino acid variants (SAAVs). In this study, we assessed the capability of data-independent acquisition mass spectrometry (DIA-MS) to identify SAAV peptides in HeLa cells using various search engine pipelines. We developed a customised sequence database (DB) incorporating SAAV sequences from the HeLa genome and conducted searches using DIA-NN, Spectronaut, and Fragpipe-MSFragger. Our evaluation focused on identifying true positive SAAV peptides and false positives through entrapment DBs. This study revealed that DIA-MS provides reproducible and comprehensive coverage of the proteome, identifying a substantial proportion of SAAV peptides. Notably, the DIA-MS searches maintained consistent identification of SAAV peptides despite varying sizes of the entrapment DB. A comparative analysis showed that Fragpipe-MSFragger (FP-DIA) demonstrated the most conservative and effective performance, exhibiting the lowest false discovery match ratio (FDMR). Additionally, integrating DIA and data-dependent acquisition (DDA) MS data search outputs enhanced SAAV peptide identification, with a lower false discovery rate (FDR) observed in DDA searches. The validation using stable isotope dilution and parallel reaction monitoring (SID-PRM) confirmed the SAAV peptides identified by DIA-MS and DDA-MS searches, highlighting the reliability of our approach. Our findings underscore the effectiveness of DIA-MS in proteogenomic workflows for identifying SAAV peptides, offering insights into optimising search engine pipelines and DB construction for accurate proteomics analysis. These methodologies advance the understanding of proteome variability, contributing to cancer research and the identification of novel proteoform therapeutic targets.
Collapse
Affiliation(s)
- Ivo Fierro-Monti
- European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, Cambridgeshire, UK
- Faculty of Science, Department Biozentrum, University of Basel, 4056 Basel, Switzerland; (K.F.); (C.S.)
| | - Klemens Fröhlich
- Faculty of Science, Department Biozentrum, University of Basel, 4056 Basel, Switzerland; (K.F.); (C.S.)
| | - Christian Schori
- Faculty of Science, Department Biozentrum, University of Basel, 4056 Basel, Switzerland; (K.F.); (C.S.)
| | - Alexander Schmidt
- Faculty of Science, Department Biozentrum, University of Basel, 4056 Basel, Switzerland; (K.F.); (C.S.)
| |
Collapse
|
4
|
Sun Y, Xing Z, Liang S, Miao Z, Zhuo LB, Jiang W, Zhao H, Gao H, Xie Y, Zhou Y, Yue L, Cai X, Chen YM, Zheng JS, Guo T. metaExpertPro: A Computational Workflow for Metaproteomics Spectral Library Construction and Data-Independent Acquisition Mass Spectrometry Data Analysis. Mol Cell Proteomics 2024; 23:100840. [PMID: 39278598 PMCID: PMC11795700 DOI: 10.1016/j.mcpro.2024.100840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 08/04/2024] [Accepted: 09/11/2024] [Indexed: 09/18/2024] Open
Abstract
Analysis of large-scale data-independent acquisition mass spectrometry metaproteomics data remains a computational challenge. Here, we present a computational pipeline called metaExpertPro for metaproteomics data analysis. This pipeline encompasses spectral library generation using data-dependent acquisition MS, protein identification and quantification using data-independent acquisition mass spectrometry, functional and taxonomic annotation, as well as quantitative matrix generation for both microbiota and hosts. By integrating FragPipe and DIA-NN, metaExpertPro offers compatibility with both Orbitrap and timsTOF MS instruments. To evaluate the depth and accuracy of identification and quantification, we conducted extensive assessments using human fecal samples and benchmark tests. Performance tests conducted on human fecal samples indicated that metaExpertPro quantified an average of 45,000 peptides in a 60-min diaPASEF injection. Notably, metaExpertPro outperformed three existing software tools by characterizing a higher number of peptides and proteins. Importantly, metaExpertPro maintained a low factual false discovery rate of approximately 5% for protein groups across four benchmark tests. Applying a filter of five peptides per genus, metaExpertPro achieved relatively high accuracy (F-score = 0.67-0.90) in genus diversity and showed a high correlation (rSpearman = 0.73-0.82) between the measured and true genus relative abundance in benchmark tests. Additionally, the quantitative results at the protein, taxonomy, and function levels exhibited high reproducibility and consistency across the commonly adopted public human gut microbial protein databases IGC and UHGP. In a metaproteomic analysis of dyslipidemia patients, metaExpertPro revealed characteristic alterations in microbial functions and potential interactions between the microbiota and the host.
Collapse
Affiliation(s)
- Yingying Sun
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Ziyuan Xing
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Shuang Liang
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; State Key Laboratory for Managing Biotic and Chemical Treats to the Quality and Safety of Agro-products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Zelei Miao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China
| | - Lai-Bao Zhuo
- Department of Epidemiology, Guangdong Provincial Key Laboratory of Food, Nutrition and Health, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Wenhao Jiang
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Hui Zhao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China
| | - Huanhuan Gao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Yuting Xie
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Yan Zhou
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Liang Yue
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Xue Cai
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Yu-Ming Chen
- Department of Epidemiology, Guangdong Provincial Key Laboratory of Food, Nutrition and Health, School of Public Health, Sun Yat-sen University, Guangzhou, China.
| | - Ju-Sheng Zheng
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China.
| | - Tiannan Guo
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China.
| |
Collapse
|
5
|
Langella O, Renne T, Balliau T, Davanture M, Brehmer S, Zivy M, Blein-Nicolas M, Rusconi F. Full Native timsTOF PASEF-Enabled Quantitative Proteomics with the i2MassChroQ Software Package. J Proteome Res 2024; 23:3353-3366. [PMID: 39016325 DOI: 10.1021/acs.jproteome.3c00732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
Ion mobility mass spectrometry has become popular in proteomics lately, in particular because the Bruker timsTOF instruments have found significant adoption in proteomics facilities. The Bruker's implementation of the ion mobility dimension generates massive amounts of mass spectrometric data that require carefully designed software both to extract meaningful information and to perform processing tasks at reasonable speed. In a historical move, the Bruker company decided to harness the skills of the scientific software development community by releasing to the public the timsTOF data file format specification. As a proteomics facility that has been developing Free Open Source Software (FOSS) solutions since decades, we took advantage of this opportunity to implement the very first FOSS proteomics complete solution to natively read the timsTOF data, low-level process them, and explore them in an integrated quantitative proteomics software environment. We dubbed our software i2MassChroQ because it implements a (peptide)identification-(protein)inference-mass-chromatogram-quantification processing workflow. The software benchmarking results reported in this paper show that i2MassChroQ performed better than competing software on two critical characteristics: (1) feature extraction capability and (2) protein quantitative dynamic range. Altogether, i2MassChroQ yielded better quantified protein numbers, both in a technical replicate MS runs setting and in a differential protein abundance analysis setting.
Collapse
Affiliation(s)
- Olivier Langella
- GQE-Le Moulon, Université Paris-Saclay, INRAE, CNRS, AgroParisTech, IDEEV, 12, Route 128, Gif-sur-Yvette F-91272, France
| | - Thomas Renne
- GQE-Le Moulon, Université Paris-Saclay, INRAE, CNRS, AgroParisTech, IDEEV, 12, Route 128, Gif-sur-Yvette F-91272, France
| | - Thierry Balliau
- GQE-Le Moulon, Université Paris-Saclay, INRAE, CNRS, AgroParisTech, IDEEV, 12, Route 128, Gif-sur-Yvette F-91272, France
| | - Marlène Davanture
- GQE-Le Moulon, Université Paris-Saclay, INRAE, CNRS, AgroParisTech, IDEEV, 12, Route 128, Gif-sur-Yvette F-91272, France
| | - Sven Brehmer
- Bruker Software Development, Bruker Daltonics GmbH & Co. KG, Bremen D-28359, Germany
| | - Michel Zivy
- GQE-Le Moulon, Université Paris-Saclay, INRAE, CNRS, AgroParisTech, IDEEV, 12, Route 128, Gif-sur-Yvette F-91272, France
| | - Mélisande Blein-Nicolas
- GQE-Le Moulon, Université Paris-Saclay, INRAE, CNRS, AgroParisTech, IDEEV, 12, Route 128, Gif-sur-Yvette F-91272, France
| | - Filippo Rusconi
- GQE-Le Moulon, Université Paris-Saclay, INRAE, CNRS, AgroParisTech, IDEEV, 12, Route 128, Gif-sur-Yvette F-91272, France
- INSERM, UMR-S 1138, Centre de Recherche des Cordeliers, Paris F-75005, France
| |
Collapse
|
6
|
Madej D, Lam H. On the use of tandem mass spectra acquired from samples of evolutionarily distant organisms to validate methods for false discovery rate estimation. Proteomics 2024; 24:e2300398. [PMID: 38491400 DOI: 10.1002/pmic.202300398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 03/01/2024] [Accepted: 03/06/2024] [Indexed: 03/18/2024]
Abstract
Estimating the false discovery rate (FDR) of peptide identifications is a key step in proteomics data analysis, and many methods have been proposed for this purpose. Recently, an entrapment-inspired protocol to validate methods for FDR estimation appeared in articles showcasing new spectral library search tools. That validation approach involves generating incorrect spectral matches by searching spectra from evolutionarily distant organisms (entrapment queries) against the original target search space. Although this approach may appear similar to the solutions using entrapment databases, it represents a distinct conceptual framework whose correctness has not been verified yet. In this viewpoint, we first discussed the background of the entrapment-based validation protocols and then conducted a few simple computational experiments to verify the assumptions behind them. The results reveal that entrapment databases may, in some implementations, be a reasonable choice for validation, while the assumptions underpinning validation protocols based on entrapment queries are likely to be violated in practice. This article also highlights the need for well-designed frameworks for validating FDR estimation methods in proteomics.
Collapse
Affiliation(s)
- Dominik Madej
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| |
Collapse
|
7
|
Fröhlich K, Fahrner M, Brombacher E, Seredynska A, Maldacker M, Kreutz C, Schmidt A, Schilling O. Data-Independent Acquisition: A Milestone and Prospect in Clinical Mass Spectrometry-Based Proteomics. Mol Cell Proteomics 2024; 23:100800. [PMID: 38880244 PMCID: PMC11380018 DOI: 10.1016/j.mcpro.2024.100800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 06/08/2024] [Accepted: 06/13/2024] [Indexed: 06/18/2024] Open
Abstract
Data-independent acquisition (DIA) has revolutionized the field of mass spectrometry (MS)-based proteomics over the past few years. DIA stands out for its ability to systematically sample all peptides in a given m/z range, allowing an unbiased acquisition of proteomics data. This greatly mitigates the issue of missing values and significantly enhances quantitative accuracy, precision, and reproducibility compared to many traditional methods. This review focuses on the critical role of DIA analysis software tools, primarily focusing on their capabilities and the challenges they address in proteomic research. Advances in MS technology, such as trapped ion mobility spectrometry, or high field asymmetric waveform ion mobility spectrometry require sophisticated analysis software capable of handling the increased data complexity and exploiting the full potential of DIA. We identify and critically evaluate leading software tools in the DIA landscape, discussing their unique features, and the reliability of their quantitative and qualitative outputs. We present the biological and clinical relevance of DIA-MS and discuss crucial publications that paved the way for in-depth proteomic characterization in patient-derived specimens. Furthermore, we provide a perspective on emerging trends in clinical applications and present upcoming challenges including standardization and certification of MS-based acquisition strategies in molecular diagnostics. While we emphasize the need for continuous development of software tools to keep pace with evolving technologies, we advise researchers against uncritically accepting the results from DIA software tools. Each tool may have its own biases, and some may not be as sensitive or reliable as others. Our overarching recommendation for both researchers and clinicians is to employ multiple DIA analysis tools, utilizing orthogonal analysis approaches to enhance the robustness and reliability of their findings.
Collapse
Affiliation(s)
- Klemens Fröhlich
- Proteomics Core Facility, Biozentrum Basel, University of Basel, Basel, Switzerland
| | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Freiburg, Germany
| | - Eva Brombacher
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, Freiburg, Germany; Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, Freiburg, Germany; Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg, Germany; Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Adrianna Seredynska
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Freiburg, Germany; Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Maximilian Maldacker
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Clemens Kreutz
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, Freiburg, Germany; Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, Freiburg, Germany
| | - Alexander Schmidt
- Proteomics Core Facility, Biozentrum Basel, University of Basel, Basel, Switzerland
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Freiburg, Germany.
| |
Collapse
|
8
|
Adams C, Laukens K, Bittremieux W, Boonen K. Machine learning-based peptide-spectrum match rescoring opens up the immunopeptidome. Proteomics 2024; 24:e2300336. [PMID: 38009585 DOI: 10.1002/pmic.202300336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 11/29/2023]
Abstract
Immunopeptidomics is a key technology in the discovery of targets for immunotherapy and vaccine development. However, identifying immunopeptides remains challenging due to their non-tryptic nature, which results in distinct spectral characteristics. Moreover, the absence of strict digestion rules leads to extensive search spaces, further amplified by the incorporation of somatic mutations, pathogen genomes, unannotated open reading frames, and post-translational modifications. This inflation in search space leads to an increase in random high-scoring matches, resulting in fewer identifications at a given false discovery rate. Peptide-spectrum match rescoring has emerged as a machine learning-based solution to address challenges in mass spectrometry-based immunopeptidomics data analysis. It involves post-processing unfiltered spectrum annotations to better distinguish between correct and incorrect peptide-spectrum matches. Recently, features based on predicted peptidoform properties, including fragment ion intensities, retention time, and collisional cross section, have been used to improve the accuracy and sensitivity of immunopeptide identification. In this review, we describe the diverse bioinformatics pipelines that are currently available for peptide-spectrum match rescoring and discuss how they can be used for the analysis of immunopeptidomics data. Finally, we provide insights into current and future machine learning solutions to boost immunopeptide identification.
Collapse
Affiliation(s)
- Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- ImmuneSpec BV, Niel, Belgium
| |
Collapse
|
9
|
Abstract
The growing complexity and volume of proteomics data necessitate the development of efficient software tools for peptide identification and quantification from mass spectra. Given their central role in proteomics, it is imperative that these tools are auditable and extensible─requirements that are best fulfilled by open-source and permissively licensed software. This work presents Sage, a high-performance, open-source, and freely available proteomics pipeline. Scalable and cloud-ready, Sage matches the performance of state-of-the-art software tools while running an order of magnitude faster.
Collapse
Affiliation(s)
- Michael R Lazear
- Belharra Therapeutics, 3985 Sorrento Valley Boulevard Suite C, San Diego, California 92121, United States
| |
Collapse
|
10
|
Sun Z, Ning Z, Cheng K, Duan H, Wu Q, Mayne J, Figeys D. MetaPep: A core peptide database for faster human gut metaproteomics database searches. Comput Struct Biotechnol J 2023; 21:4228-4237. [PMID: 37692080 PMCID: PMC10491838 DOI: 10.1016/j.csbj.2023.08.025] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/18/2023] [Accepted: 08/25/2023] [Indexed: 09/12/2023] Open
Abstract
Metaproteomics has increasingly been applied to study functional changes in the human gut microbiome. Peptide identification is an important step in metaproteomics research, with sequence database search (SDS) and spectral library search (SLS) as the two main methods to identify peptides. However, the large search space in metaproteomics studies causes significant challenges for both identification methods. Moreover, with the development of mass spectrometry, it is now feasible to perform metaproteomic projects involving 100-1000 individual microbiomes. These large-scale projects create a conundrum for searching large databases. In this study, we constructed MetaPep, a core peptide database (including both collections of peptide sequences and tandem MS spectra) greatly accelerating the peptide identifications. Raw files from fifteen metaproteomics projects were re-analyzed and the identified peptide-spectrum matches (PSMs) were used to construct the MetaPep database. The constructed MetaPep database achieved rapid and accurate identification of peptides for human gut metaproteomics. MetaPep has a large collection of peptides and spectra that have been identified in published human gut metaproteomics datasets. MetaPep database can be used as an important resource in the current stage of human gut metaproteomics research. This study showed the possibility of applying a core peptide database as a generic metaproteomics workflow. MetaPep could also be an important resource for future human gut metaproteomics research, such as DIA (data-independent acquisition) analysis.
Collapse
Affiliation(s)
- Zhongzhi Sun
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Zhibin Ning
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Kai Cheng
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Haonan Duan
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Qing Wu
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Janice Mayne
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Daniel Figeys
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| |
Collapse
|
11
|
Debrie E, Malfait M, Gabriels R, Declerq A, Sticker A, Martens L, Clement L. Quality Control for the Target Decoy Approach for Peptide Identification. J Proteome Res 2023; 22:350-358. [PMID: 36648107 DOI: 10.1021/acs.jproteome.2c00423] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Reliable peptide identification is key in mass spectrometry (MS) based proteomics. To this end, the target decoy approach (TDA) has become the cornerstone for extracting a set of reliable peptide-to-spectrum matches (PSMs) that will be used in downstream analysis. Indeed, TDA is now the default method to estimate the false discovery rate (FDR) for a given set of PSMs, and users typically view it as a universal solution for assessing the FDR in the peptide identification step. However, the TDA also relies on a minimal set of assumptions, which are typically never verified in practice. We argue that a violation of these assumptions can lead to poor FDR control, which can be detrimental to any downstream data analysis. We here therefore first clearly spell out these TDA assumptions, and introduce TargetDecoy, a Bioconductor package with all the necessary functionality to control the TDA quality and its underlying assumptions for a given set of PSMs.
Collapse
Affiliation(s)
- Elke Debrie
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000Ghent, Belgium
| | - Milan Malfait
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000Ghent, Belgium.,Statistics and Decision Sciences, Janssen Pharmaceutical Companies of Johnson and Johnson, 2340Beerse, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9052Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000Ghent, Belgium
| | - Arthur Declerq
- VIB-UGent Center for Medical Biotechnology, VIB, 9052Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000Ghent, Belgium
| | - Adriaan Sticker
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000Ghent, Belgium.,VIB-UGent Center for Medical Biotechnology, VIB, 9052Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000Ghent, Belgium
| | - Lieven Clement
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000Ghent, Belgium
| |
Collapse
|
12
|
Vu NQ, Yen HC, Fields L, Cao W, Li L. HyPep: An Open-Source Software for Identification and Discovery of Neuropeptides Using Sequence Homology Search. J Proteome Res 2023; 22:420-431. [PMID: 36696582 PMCID: PMC10160011 DOI: 10.1021/acs.jproteome.2c00597] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Neuropeptides are a class of endogenous peptides that have key regulatory roles in biochemical, physiological, and behavioral processes. Mass spectrometry analyses of neuropeptides often rely on protein informatics tools for database searching and peptide identification. As neuropeptide databases are typically experimentally built and comprised of short sequences with high sequence similarity to each other, we developed a novel database searching tool, HyPep, which utilizes sequence homology searching for peptide identification. HyPep aligns de novo sequenced peptides, generated through PEAKS software, with neuropeptide database sequences and identifies neuropeptides based on the alignment score. HyPep performance was optimized using LC-MS/MS measurements of peptide extracts from various Callinectes sapidus neuronal tissue types and compared with a commercial database searching software, PEAKS DB. HyPep identified more neuropeptides from each tissue type than PEAKS DB at 1% false discovery rate, and the false match rate from both programs was 2%. In addition to identification, this report describes how HyPep can aid in the discovery of novel neuropeptides.
Collapse
Affiliation(s)
- Nhu Q Vu
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Hsu-Ching Yen
- Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, Wisconsin 53706, United States
| | - Lauren Fields
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Weifeng Cao
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States.,School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| |
Collapse
|
13
|
Yılmaz Ş, Busch F, Nagaraj N, Cox J. Accurate and Automated High-Coverage Identification of Chemically Cross-Linked Peptides with MaxLynx. Anal Chem 2022; 94:1608-1617. [PMID: 35014260 PMCID: PMC8792900 DOI: 10.1021/acs.analchem.1c03688] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Cross-linking combined with mass spectrometry (XL-MS) provides a wealth of information about the three-dimensional (3D) structure of proteins and their interactions. We introduce MaxLynx, a novel computational proteomics workflow for XL-MS integrated into the MaxQuant environment. It is applicable to noncleavable and MS-cleavable cross-linkers. For both, we have generalized the Andromeda peptide database search engine to efficiently identify cross-linked peptides. For noncleavable peptides, we implemented a novel dipeptide Andromeda score, which is the basis for a computationally efficient N-squared search engine. Additionally, partial scores summarize the evidence for the two constituents of the dipeptide individually. A posterior error probability (PEP) based on total and partial scores is used to control false discovery rates (FDRs). For MS-cleavable cross-linkers, a score of signature peaks is combined with the conventional Andromeda score on the cleavage products. The MaxQuant 3D peak detection was improved to ensure more accurate determination of the monoisotopic peak of isotope patterns for heavy molecules, which cross-linked peptides typically are. A wide selection of filtering parameters can replace the manual filtering of identifications, which is often necessary when using other pipelines. On benchmark data sets of synthetic peptides, MaxLynx outperforms all other tested software on data for both types of cross-linkers and on a proteome-wide data set of cross-linked Drosophila melanogaster cell lysate. The workflow also supports ion mobility-enhanced MS data. MaxLynx runs on Windows and Linux, contains an interactive viewer for displaying annotated cross-linked spectra, and is freely available at https://www.maxquant.org/.
Collapse
Affiliation(s)
- Şule Yılmaz
- Computational Systems Biochemistry Research Group, Max-Planck-Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany
| | - Florian Busch
- Bruker Daltonics GmbH & Co. KG, 28359 Bremen, Germany
| | | | - Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck-Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany.,Department of Biological and Medical Psychology, University of Bergen, 5007 Bergen, Norway
| |
Collapse
|
14
|
Couté Y, Bruley C, Burger T. Beyond Target-Decoy Competition: Stable Validation of Peptide and Protein Identifications in Mass Spectrometry-Based Discovery Proteomics. Anal Chem 2020; 92:14898-14906. [PMID: 32970414 DOI: 10.1021/acs.analchem.0c00328] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
In bottom-up discovery proteomics, target-decoy competition (TDC) is the most popular method for false discovery rate (FDR) control. Despite unquestionable statistical foundations, this method has drawbacks, including its hitherto unknown intrinsic lack of stability vis-à-vis practical conditions of application. Although some consequences of this instability have already been empirically described, they may have been misinterpreted. This article provides evidence that TDC has become less reliable as the accuracy of modern mass spectrometers improved. We therefore propose to replace TDC by a totally different method to control the FDR at the spectrum, peptide, and protein levels, while benefiting from the theoretical guarantees of the Benjamini-Hochberg framework. As this method is simpler to use, faster to compute, and more stable than TDC, we argue that it is better adapted to the standardization and throughput constraints of current proteomic platforms.
Collapse
Affiliation(s)
- Yohann Couté
- Université Grenoble Alpes, CNRS, CEA, INSERM, IRIG, BGE, F-38000 Grenoble, France
| | - Christophe Bruley
- Université Grenoble Alpes, CNRS, CEA, INSERM, IRIG, BGE, F-38000 Grenoble, France
| | - Thomas Burger
- Université Grenoble Alpes, CNRS, CEA, INSERM, IRIG, BGE, F-38000 Grenoble, France
| |
Collapse
|
15
|
C Silva AS, Bouwmeester R, Martens L, Degroeve S. Accurate peptide fragmentation predictions allow data driven approaches to replace and improve upon proteomics search engine scoring functions. Bioinformatics 2020; 35:5243-5248. [PMID: 31077310 DOI: 10.1093/bioinformatics/btz383] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 04/01/2019] [Accepted: 05/02/2019] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION The use of post-processing tools to maximize the information gained from a proteomics search engine is widely accepted and used by the community, with the most notable example being Percolator-a semi-supervised machine learning model which learns a new scoring function for a given dataset. The usage of such tools is however bound to the search engine's scoring scheme, which doesn't always make full use of the intensity information present in a spectrum. We aim to show how this tool can be applied in such a way that maximizes the use of spectrum intensity information by leveraging another machine learning-based tool, MS2PIP. MS2PIP predicts fragment ion peak intensities. RESULTS We show how comparing predicted intensities to annotated experimental spectra by calculating direct similarity metrics provides enough information for a tool such as Percolator to accurately separate two classes of peptide-to-spectrum matches. This approach allows using more information out of the data (compared with simpler intensity based metrics, like peak counting or explained intensities summing) while maintaining control of statistics such as the false discovery rate. AVAILABILITY AND IMPLEMENTATION All of the code is available online at https://github.com/compomics/ms2rescore. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ana S C Silva
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine, Ghent, Belgium.,Bioinformatics Institute, Ghent University, Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine, Ghent, Belgium.,Bioinformatics Institute, Ghent University, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine, Ghent, Belgium.,Bioinformatics Institute, Ghent University, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium.,Department of Biomolecular Medicine, Faculty of Medicine, Ghent, Belgium.,Bioinformatics Institute, Ghent University, Ghent, Belgium
| |
Collapse
|
16
|
Shiferaw GA, Vandermarliere E, Hulstaert N, Gabriels R, Martens L, Volders PJ. COSS: A Fast and User-Friendly Tool for Spectral Library Searching. J Proteome Res 2020; 19:2786-2793. [PMID: 32384242 DOI: 10.1021/acs.jproteome.9b00743] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Spectral similarity searching to identify peptide-derived MS/MS spectra is a promising technique, and different spectrum similarity search tools have therefore been developed. Each of these tools, however, comes with some limitations, mainly because of low processing speed and issues with handling large databases. Furthermore, the number of spectral data formats supported is typically limited, which also creates a threshold to adoption. We have therefore developed COSS (CompOmics Spectral Searching), a new and user-friendly spectral library search tool supporting two scoring functions. COSS also includes decoy spectra generation for result validation. We have benchmarked COSS on three different spectral libraries and compared the results with established spectral searching tools and a sequence database search tool. Our comparison showed that COSS more reliably identifies spectra, is capable of handling large data sets and libraries, and is an easy to use tool that can run on low computer specifications. COSS binaries and source code can be freely downloaded from https://github.com/compomics/COSS.
Collapse
Affiliation(s)
- Genet Abay Shiferaw
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Elien Vandermarliere
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Niels Hulstaert
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Pieter-Jan Volders
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
17
|
Kumar P, Johnson JE, Easterly C, Mehta S, Sajulga R, Nunn B, Jagtap PD, Griffin TJ. A Sectioning and Database Enrichment Approach for Improved Peptide Spectrum Matching in Large, Genome-Guided Protein Sequence Databases. J Proteome Res 2020; 19:2772-2785. [DOI: 10.1021/acs.jproteome.0c00260] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Praveen Kumar
- Bioinformatics and Computational Biology, University of Minnesota−Rochester, Rochester, Minnesota 55904, United States
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Caleb Easterly
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Subina Mehta
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Ray Sajulga
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Brook Nunn
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Pratik D. Jagtap
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Timothy J. Griffin
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
18
|
Liang X, Xia Z, Jian L, Wang Y, Niu X, Link AJ. A cost-sensitive online learning method for peptide identification. BMC Genomics 2020; 21:324. [PMID: 32334531 PMCID: PMC7183122 DOI: 10.1186/s12864-020-6693-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Accepted: 03/24/2020] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Post-database search is a key procedure in peptide identification with tandem mass spectrometry (MS/MS) strategies for refining peptide-spectrum matches (PSMs) generated by database search engines. Although many statistical and machine learning-based methods have been developed to improve the accuracy of peptide identification, the challenge remains on large-scale datasets and datasets with a distribution of unbalanced PSMs. A more efficient learning strategy is required for improving the accuracy of peptide identification on challenging datasets. While complex learning models have larger power of classification, they may cause overfitting problems and introduce computational complexity on large-scale datasets. Kernel methods map data from the sample space to high dimensional spaces where data relationships can be simplified for modeling. RESULTS In order to tackle the computational challenge of using the kernel-based learning model for practical peptide identification problems, we present an online learning algorithm, OLCS-Ranker, which iteratively feeds only one training sample into the learning model at each round, and, as a result, the memory requirement for computation is significantly reduced. Meanwhile, we propose a cost-sensitive learning model for OLCS-Ranker by using a larger loss of decoy PSMs than that of target PSMs in the loss function. CONCLUSIONS The new model can reduce its false discovery rate on datasets with a distribution of unbalanced PSMs. Experimental studies show that OLCS-Ranker outperforms other methods in terms of accuracy and stability, especially on datasets with a distribution of unbalanced PSMs. Furthermore, OLCS-Ranker is 15-85 times faster than CRanker.
Collapse
Affiliation(s)
- Xijun Liang
- College of Science, China University of Petroleum, Changjiang West Road, Qingdao, 266580 China
| | - Zhonghang Xia
- School of Engineering and Applied Science, Western Kentucky University, Bowling Green, 42101 KY USA
| | - Ling Jian
- School of Economics and Management, China University of Petroleum, Changjiang West Road, Qingdao, 266580 China
| | - Yongxiang Wang
- College of Science, China University of Petroleum, Changjiang West Road, Qingdao, 266580 China
| | - Xinnan Niu
- Dept. of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, 37232 TN USA
| | - Andrew J. Link
- Dept. of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, 37232 TN USA
| |
Collapse
|
19
|
Van Puyvelde B, Willems S, Gabriels R, Daled S, De Clerck L, Vande Casteele S, Staes A, Impens F, Deforce D, Martens L, Degroeve S, Dhaenens M. Removing the Hidden Data Dependency of DIA with Predicted Spectral Libraries. Proteomics 2020; 20:e1900306. [PMID: 31981311 DOI: 10.1002/pmic.201900306] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 12/20/2019] [Indexed: 12/22/2022]
Abstract
Data-independent acquisition (DIA) generates comprehensive yet complex mass spectrometric data, which imposes the use of data-dependent acquisition (DDA) libraries for deep peptide-centric detection. Here, it is shown that DIA can be redeemed from this dependency by combining predicted fragment intensities and retention times with narrow window DIA. This eliminates variation in library building and omits stochastic sampling, finally making the DIA workflow fully deterministic. Especially for clinical proteomics, this has the potential to facilitate inter-laboratory comparison.
Collapse
Affiliation(s)
- Bart Van Puyvelde
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Sander Willems
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, 9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Simon Daled
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Laura De Clerck
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Sofie Vande Casteele
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - An Staes
- VIB-UGent Center for Medical Biotechnology, 9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium.,VIB Proteomics Core, 9000, Ghent, Belgium
| | - Francis Impens
- VIB-UGent Center for Medical Biotechnology, 9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium.,VIB Proteomics Core, 9000, Ghent, Belgium
| | - Dieter Deforce
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, 9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, 9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Maarten Dhaenens
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000, Ghent, Belgium
| |
Collapse
|
20
|
Hubler SL, Kumar P, Mehta S, Easterly C, Johnson JE, Jagtap PD, Griffin TJ. Challenges in Peptide-Spectrum Matching: A Robust and Reproducible Statistical Framework for Removing Low-Accuracy, High-Scoring Hits. J Proteome Res 2019; 19:161-173. [DOI: 10.1021/acs.jproteome.9b00478] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
21
|
Tay AP, Liang A, Hamey JJ, Hart‐Smith G, Wilkins MR. MS2‐Deisotoper: A Tool for Deisotoping High‐Resolution MS/MS Spectra in Normal and Heavy Isotope‐Labelled Samples. Proteomics 2019; 19:e1800444. [DOI: 10.1002/pmic.201800444] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 07/05/2019] [Indexed: 01/09/2023]
Affiliation(s)
- Aidan P. Tay
- Systems Biology InitiativeSchool of Biotechnology and Biomolecular SciencesThe University of New South Wales Sydney New South Wales 2052 Australia
| | - Angelita Liang
- Systems Biology InitiativeSchool of Biotechnology and Biomolecular SciencesThe University of New South Wales Sydney New South Wales 2052 Australia
| | - Joshua J. Hamey
- Systems Biology InitiativeSchool of Biotechnology and Biomolecular SciencesThe University of New South Wales Sydney New South Wales 2052 Australia
| | - Gene Hart‐Smith
- Systems Biology InitiativeSchool of Biotechnology and Biomolecular SciencesThe University of New South Wales Sydney New South Wales 2052 Australia
| | - Marc R. Wilkins
- Systems Biology InitiativeSchool of Biotechnology and Biomolecular SciencesThe University of New South Wales Sydney New South Wales 2052 Australia
| |
Collapse
|
22
|
Muth T, Renard BY. Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification? Brief Bioinform 2019; 19:954-970. [PMID: 28369237 DOI: 10.1093/bib/bbx033] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Indexed: 01/24/2023] Open
Abstract
While peptide identifications in mass spectrometry (MS)-based shotgun proteomics are mostly obtained using database search methods, high-resolution spectrum data from modern MS instruments nowadays offer the prospect of improving the performance of computational de novo peptide sequencing. The major benefit of de novo sequencing is that it does not require a reference database to deduce full-length or partial tag-based peptide sequences directly from experimental tandem mass spectrometry spectra. Although various algorithms have been developed for automated de novo sequencing, the prediction accuracy of proposed solutions has been rarely evaluated in independent benchmarking studies. The main objective of this work is to provide a detailed evaluation on the performance of de novo sequencing algorithms on high-resolution data. For this purpose, we processed four experimental data sets acquired from different instrument types from collision-induced dissociation and higher energy collisional dissociation (HCD) fragmentation mode using the software packages Novor, PEAKS and PepNovo. Moreover, the accuracy of these algorithms is also tested on ground truth data based on simulated spectra generated from peak intensity prediction software. We found that Novor shows the overall best performance compared with PEAKS and PepNovo with respect to the accuracy of correct full peptide, tag-based and single-residue predictions. In addition, the same tool outpaced the commercial competitor PEAKS in terms of running time speedup by factors of around 12-17. Despite around 35% prediction accuracy for complete peptide sequences on HCD data sets, taken as a whole, the evaluated algorithms perform moderately on experimental data but show a significantly better performance on simulated data (up to 84% accuracy). Further, we describe the most frequently occurring de novo sequencing errors and evaluate the influence of missing fragment ion peaks and spectral noise on the accuracy. Finally, we discuss the potential of de novo sequencing for now becoming more widely used in the field.
Collapse
Affiliation(s)
- Thilo Muth
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
23
|
Kou Q, Wang Z, Lubeckyj RA, Wu S, Sun L, Liu X. A Markov Chain Monte Carlo Method for Estimating the Statistical Significance of Proteoform Identifications by Top-Down Mass Spectrometry. J Proteome Res 2019; 18:878-889. [PMID: 30638379 DOI: 10.1021/acs.jproteome.8b00562] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Top-down mass spectrometry is capable of identifying whole proteoform sequences with multiple post-translational modifications because it generates tandem mass spectra directly from intact proteoforms. Many software tools, such as ProSightPC, MSPathFinder, and TopMG, have been proposed for identifying proteoforms with modifications. In these tools, various methods are employed to estimate the statistical significance of identifications. However, most existing methods are designed for proteoform identifications without modifications, and the challenge remains for accurately estimating the statistical significance of proteoform identifications with modifications. Here we propose TopMCMC, a method that combines a Markov chain random walk algorithm and a greedy algorithm for assigning statistical significance to matches between spectra and protein sequences with variable modifications. Experimental results showed that TopMCMC achieved high accuracy in estimating E-values and false discovery rates of identifications in top-down mass spectrometry. Coupled with TopMG, TopMCMC identified more spectra than the generating function method from an MCF-7 top-down mass spectrometry data set.
Collapse
Affiliation(s)
- Qiang Kou
- Department of BioHealth Informatics , Indiana University-Purdue University Indianapolis , Indianapolis , Indiana 46202 , United States
| | - Zhe Wang
- Department of Chemistry and Biochemistry , The University of Oklahoma , Norman , Oklahoma 73019-5251 , United States
| | - Rachele A Lubeckyj
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824-1332 , United States
| | - Si Wu
- Department of Chemistry and Biochemistry , The University of Oklahoma , Norman , Oklahoma 73019-5251 , United States
| | - Liangliang Sun
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824-1332 , United States
| | - Xiaowen Liu
- Department of BioHealth Informatics , Indiana University-Purdue University Indianapolis , Indianapolis , Indiana 46202 , United States.,Center for Computational Biology and Bioinformatics , Indiana University School of Medicine , Indianapolis , Indiana 46202 , United States
| |
Collapse
|
24
|
Abstract
Recent advancements in mass spectrometry (MS) and data analysis software have enabled new strategies for biological discovery using proteomics. Proteomics has evolved from routine discovery and identification of proteins to integrated multi-omics projects relating specific proteins to their genes and metabolites. Using additional information, such as that contained in biological pathways, has enabled the use of targeted protein quantitation for monitoring fold changes in expression as well as biomarker discovery. Here we discuss a full proteomic workflow from discovery proteomics on a quadrupole Time-of-Flight (Q-TOF) MS to targeted proteomics using a triple quadrupole (QQQ) MS. A discovery proteomics workflow encompassing acquisition of data-dependent proteomics data on a Q-TOF and protein database searching will be described which uses the protein abundances from identified proteins for subsequent statistical analysis and pathway visualization. From the active pathways, a protein target list is created for use in a peptide-based QQQ assay. These peptides are used as surrogates for target protein quantitation. Peptide-based QQQ assays provide sensitivity and selectivity allowing rapid and robust analysis of large batches of samples. These quantitative results are then statistically compared and visualized on the original biological pathways with a more complete coverage of proteins in the studied pathways.
Collapse
|
25
|
Feng XD, Li LW, Zhang JH, Zhu YP, Chang C, Shu KX, Ma J. Using the entrapment sequence method as a standard to evaluate key steps of proteomics data analysis process. BMC Genomics 2017; 18:143. [PMID: 28361671 PMCID: PMC5374549 DOI: 10.1186/s12864-017-3491-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Background The mass spectrometry based technical pipeline has provided a high-throughput, high-sensitivity and high-resolution platform for post-genomic biology. Varied models and algorithms are implemented by different tools to improve proteomics data analysis. The target-decoy searching strategy has become the most popular strategy to control false identification in peptide and protein identifications. While this strategy can estimate the false discovery rate (FDR) within a dataset, it cannot directly evaluate the false positive matches in target identifications. Results As a supplement to target-decoy strategy, the entrapment sequence method was introduced to assess the key steps of mass spectrometry data analysis process, database search engines and quality control methods. Using the entrapment sequences as the standard, we evaluated five database search engines for both the origanal scores and reprocessed scores, as well as four quality control methods in term of quantity and quality aspects. Our results showed that the latest developed search engine MS-GF+ and percolator-embeded quality control method PepDistiller performed best in all tools respectively. Combined with efficient quality control methods, the search engines can improve the low sensitivity of their original scores. Moreover, based on the entrapment sequence method, we proved that filtering the identifications separately could increase the number of identified peptides while improving the confidence level. Conclusion In this study, we have proved that the entrapment sequence method could be an useful strategy to assess the key steps of the mass spectrometry data analysis process. Its applications can be extended to all steps of the common workflow, such as the protein assembling methods and data integration methods. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3491-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiao-Dong Feng
- Chongqing University of Posts and Telecommunications, 2 Chong Wen Road of Nan'an District, Chongqing, 400065, China.,Department of Bioinformatics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 38 Life Science Park Road, Beijing, 102206, China
| | - Li-Wei Li
- Department of Bioinformatics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 38 Life Science Park Road, Beijing, 102206, China
| | - Jian-Hong Zhang
- Department of Bioinformatics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 38 Life Science Park Road, Beijing, 102206, China
| | - Yun-Ping Zhu
- Department of Bioinformatics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 38 Life Science Park Road, Beijing, 102206, China
| | - Cheng Chang
- Department of Bioinformatics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 38 Life Science Park Road, Beijing, 102206, China
| | - Kun-Xian Shu
- Chongqing University of Posts and Telecommunications, 2 Chong Wen Road of Nan'an District, Chongqing, 400065, China.
| | - Jie Ma
- Department of Bioinformatics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 38 Life Science Park Road, Beijing, 102206, China.
| |
Collapse
|
26
|
Abstract
Scoring functions that assess spectrum similarity play a crucial role in many computational mass spectrometry algorithms. These functions are used to compare an experimentally acquired fragmentation (MS/MS) spectrum against two different types of target MS/MS spectra: either against a theoretical MS/MS spectrum derived from a peptide from a sequence database, or against another, previously acquired MS/MS spectrum. The former is typically encountered in database searching, while the latter is used in spectrum clustering and spectral library searching. The comparison between acquired versus theoretical MS/MS spectra is most commonly performed using cross-correlations or probability derived scoring functions, while the comparison of two acquired MS/MS spectra typically makes use of a normalized dot product, especially in spectrum library search algorithms. In addition to these scoring functions, Pearson's or Spearman's correlation coefficients, mean squared error, or median absolute deviation scores can also be used for the same purpose. Here, we describe and evaluate these scoring functions with regards to their ability to assess spectrum similarity for theoretical versus acquired, and acquired versus acquired spectra.
Collapse
Affiliation(s)
- Şule Yilmaz
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Elien Vandermarliere
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Lennart Martens
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium.
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium.
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium.
| |
Collapse
|
27
|
Vandemoortele G, Staes A, Gonnelli G, Samyn N, De Sutter D, Vandermarliere E, Timmerman E, Gevaert K, Martens L, Eyckerman S. An extra dimension in protein tagging by quantifying universal proteotypic peptides using targeted proteomics. Sci Rep 2016; 6:27220. [PMID: 27264994 PMCID: PMC4893672 DOI: 10.1038/srep27220] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 05/11/2016] [Indexed: 11/16/2022] Open
Abstract
The use of protein tagging to facilitate detailed characterization of target proteins has not only revolutionized cell biology, but also enabled biochemical analysis through efficient recovery of the protein complexes wherein the tagged proteins reside. The endogenous use of these tags for detailed protein characterization is widespread in lower organisms that allow for efficient homologous recombination. With the recent advances in genome engineering, tagging of endogenous proteins is now within reach for most experimental systems, including mammalian cell lines cultures. In this work, we describe the selection of peptides with ideal mass spectrometry characteristics for use in quantification of tagged proteins using targeted proteomics. We mined the proteome of the hyperthermophile Pyrococcus furiosus to obtain two peptides that are unique in the proteomes of all known model organisms (proteotypic) and allow sensitive quantification of target proteins in a complex background. By combining these 'Proteotypic peptides for Quantification by SRM' (PQS peptides) with epitope tags, we demonstrate their use in co-immunoprecipitation experiments upon transfection of protein pairs, or after introduction of these tags in the endogenous proteins through genome engineering. Endogenous protein tagging for absolute quantification provides a powerful extra dimension to protein analysis, allowing the detailed characterization of endogenous proteins.
Collapse
Affiliation(s)
- Giel Vandemoortele
- VIB Medical Biotechnology Center, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - An Staes
- VIB Medical Biotechnology Center, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Giulia Gonnelli
- VIB Medical Biotechnology Center, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Noortje Samyn
- VIB Medical Biotechnology Center, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Delphine De Sutter
- VIB Medical Biotechnology Center, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Elien Vandermarliere
- VIB Medical Biotechnology Center, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Evy Timmerman
- VIB Medical Biotechnology Center, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Kris Gevaert
- VIB Medical Biotechnology Center, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Lennart Martens
- VIB Medical Biotechnology Center, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Sven Eyckerman
- VIB Medical Biotechnology Center, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| |
Collapse
|
28
|
Vaudel M, Barsnes H, Ræder H, Berven FS. Using Proteomics Bioinformatics Tools and Resources in Proteogenomic Studies. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 926:65-75. [DOI: 10.1007/978-3-319-42316-6_5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
29
|
|
30
|
Pagel O, Loroch S, Sickmann A, Zahedi RP. Current strategies and findings in clinically relevant post-translational modification-specific proteomics. Expert Rev Proteomics 2015; 12:235-53. [PMID: 25955281 PMCID: PMC4487610 DOI: 10.1586/14789450.2015.1042867] [Citation(s) in RCA: 122] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Mass spectrometry-based proteomics has considerably extended our knowledge about the occurrence and dynamics of protein post-translational modifications (PTMs). So far, quantitative proteomics has been mainly used to study PTM regulation in cell culture models, providing new insights into the role of aberrant PTM patterns in human disease. However, continuous technological and methodical developments have paved the way for an increasing number of PTM-specific proteomic studies using clinical samples, often limited in sample amount. Thus, quantitative proteomics holds a great potential to discover, validate and accurately quantify biomarkers in body fluids and primary tissues. A major effort will be to improve the complete integration of robust but sensitive proteomics technology to clinical environments. Here, we discuss PTMs that are relevant for clinical research, with a focus on phosphorylation, glycosylation and proteolytic cleavage; furthermore, we give an overview on the current developments and novel findings in mass spectrometry-based PTM research.
Collapse
Affiliation(s)
- Oliver Pagel
- Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V., Otto-Hahn-Straße 6b, 44227 Dortmund, Germany
| | - Stefan Loroch
- Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V., Otto-Hahn-Straße 6b, 44227 Dortmund, Germany
| | | | - René P Zahedi
- Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V., Otto-Hahn-Straße 6b, 44227 Dortmund, Germany
| |
Collapse
|
31
|
Muth T, Kolmeder CA, Salojärvi J, Keskitalo S, Varjosalo M, Verdam FJ, Rensen SS, Reichl U, de Vos WM, Rapp E, Martens L. Navigating through metaproteomics data: a logbook of database searching. Proteomics 2015; 15:3439-53. [PMID: 25778831 DOI: 10.1002/pmic.201400560] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Revised: 02/13/2015] [Accepted: 03/06/2015] [Indexed: 11/12/2022]
Abstract
Metaproteomic research involves various computational challenges during the identification of fragmentation spectra acquired from the proteome of a complex microbiome. These issues are manifold and range from the construction of customized sequence databases, the optimal setting of search parameters to limitations in the identification search algorithms themselves. In order to assess the importance of these individual factors, we studied the effect of strategies to combine different search algorithms, explored the influence of chosen database search settings, and investigated the impact of the size of the protein sequence database used for identification. Furthermore, we applied de novo sequencing as a complementary approach to classic database searching. All evaluations were performed on a human intestinal metaproteome dataset. Pyrococcus furiosus proteome data were used to contrast database searching of metaproteomic data to a classic proteomic experiment. Searching against subsets of metaproteome databases and the use of multiple search engines increased the number of identifications. The integration of P. furiosus sequences in a metaproteomic sequence database showcased the limitation of the target-decoy-controlled false discovery rate approach in combination with large sequence databases. The selection of varying search engine parameters and the application of de novo sequencing represented useful methods to increase the reliability of the results. Based on our findings, we provide recommendations for the data analysis that help researchers to establish or improve analysis workflows in metaproteomics.
Collapse
Affiliation(s)
- Thilo Muth
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Carolin A Kolmeder
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
| | - Jarkko Salojärvi
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
| | - Salla Keskitalo
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Markku Varjosalo
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Froukje J Verdam
- Department of General Surgery, NUTRIM, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Sander S Rensen
- Department of General Surgery, NUTRIM, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Udo Reichl
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany.,Otto-von-Guericke University, Bioprocess Engineering, Magdeburg, Germany
| | - Willem M de Vos
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland.,Department of Bacteriology and Immunology, University of Helsinki, Helsinki, Finland.,Laboratory of Microbiology, Wageningen University, Wageningen, The Netherlands
| | - Erdmann Rapp
- Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Lennart Martens
- Department of Biochemistry, Ghent University, Ghent, Belgium.,Department of Medical Protein Research, VIB, Ghent, Belgium
| |
Collapse
|
32
|
Quandt A, Espona L, Balasko A, Weisser H, Brusniak MY, Kunszt P, Aebersold R, Malmström L. Using synthetic peptides to benchmark peptide identification software and search parameters for MS/MS data analysis. EUPA OPEN PROTEOMICS 2014. [DOI: 10.1016/j.euprot.2014.10.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
33
|
Leprevost FV, Valente RH, Lima DB, Perales J, Melani R, Yates JR, Barbosa VC, Junqueira M, Carvalho PC. PepExplorer: a similarity-driven tool for analyzing de novo sequencing results. Mol Cell Proteomics 2014; 13:2480-9. [PMID: 24878498 DOI: 10.1074/mcp.m113.037002] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Peptide spectrum matching is the current gold standard for protein identification via mass-spectrometry-based proteomics. Peptide spectrum matching compares experimental mass spectra against theoretical spectra generated from a protein sequence database to perform identification, but protein sequences not present in a database cannot be identified unless their sequences are in part conserved. The alternative approach, de novo sequencing, can make it possible to infer a peptide sequence directly from a mass spectrum, but interpreting long lists of peptide sequences resulting from large-scale experiments is not trivial. With this as motivation, PepExplorer was developed to use rigorous pattern recognition to assemble a list of homologue proteins using de novo sequencing data coupled to sequence alignment to allow biological interpretation of the data. PepExplorer can read the output of various widely adopted de novo sequencing tools and converge to a list of proteins with a global false-discovery rate. To this end, it employs a radial basis function neural network that considers precursor charge states, de novo sequencing scores, peptide lengths, and alignment scores to select similar protein candidates, from a target-decoy database, usually obtained from phylogenetically related species. Alignments are performed using a modified Smith-Waterman algorithm tailored for the task at hand. We verified the effectiveness of our approach using a reference set of identifications generated by ProLuCID when searching for Pyrococcus furiosus mass spectra on the corresponding NCBI RefSeq database. We then modified the sequence database by swapping amino acids until ProLuCID was no longer capable of identifying any proteins. By searching the mass spectra using PepExplorer on the modified database, we were able to recover most of the identifications at a 1% false-discovery rate. Finally, we employed PepExplorer to disclose a comprehensive proteomic assessment of the Bothrops jararaca plasma, a known biological source of natural inhibitors of snake toxins. PepExplorer is integrated into the PatternLab for Proteomics environment, which makes available various tools for downstream data analysis, including resources for quantitative and differential proteomics.
Collapse
Affiliation(s)
- Felipe V Leprevost
- From the ‡Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil
| | - Richard H Valente
- §Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro, Brazil; ¶Instituto Nacional de Ciência e Tecnologia em Toxinas (INCTTox/CNPq), Brazil
| | - Diogo B Lima
- From the ‡Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil
| | - Jonas Perales
- §Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro, Brazil; ¶Instituto Nacional de Ciência e Tecnologia em Toxinas (INCTTox/CNPq), Brazil
| | - Rafael Melani
- ‖Proteomics Unit, Rio de Janeiro Proteomics Network, Department of Biochemistry, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - John R Yates
- **Department of Chemical Physiology, The Scripps Research Institute, La Jolla, California
| | - Valmir C Barbosa
- ‡‡Systems Engineering and Computer Science Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Magno Junqueira
- ‖Proteomics Unit, Rio de Janeiro Proteomics Network, Department of Biochemistry, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Paulo C Carvalho
- From the ‡Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil
| |
Collapse
|
34
|
Vaudel M, Barsnes H, Martens L, Berven FS. Bioinformatics for proteomics: opportunities at the interface between the scientists, their experiments, and the community. Methods Mol Biol 2014; 1156:239-48. [PMID: 24791993 DOI: 10.1007/978-1-4939-0685-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
Abstract
Within the last decade, bioinformatics has moved from command line scripts dedicated to single experiments towards production grade software integrated in experimental workflows providing a rich environment for biological investigation. Located at the interface between the scientists, their experiments, and the community, bioinformatics acts as a gateway to a wide source of information. This chapter does not list tools and methods, but rather hints at how bioinformatics can help in improving biological projects, all the way from their initial design to the dissemination of the results.
Collapse
Affiliation(s)
- Marc Vaudel
- Proteomics Unit, Department of Biomedicine, University of Bergen, Jonas Liesvei 91, Bergen, 5009, Norway,
| | | | | | | |
Collapse
|
35
|
Vaudel M, Sickmann A, Martens L. Current methods for global proteome identification. Expert Rev Proteomics 2013. [PMID: 23194269 DOI: 10.1586/epr.12.51] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In a time frame of a few decades, protein identification went from laborious single protein identification to automated identification of entire proteomes. This shift was enabled by the emergence of peptide-centric, gel-free analyses, in particular the so-called shotgun approaches, which not only rely on extensive experiments, but also on cutting-edge data processing methods. The present review therefore provides an overview of a shotgun proteomics identification workflow, listing the state-of-the-art methods involved and software that implement these. The authors focus on freely available tools where possible. Finally, data analysis in the context of emerging across-omics studies will also be discussed briefly, where proteomics goes beyond merely delivering a list of protein accession numbers.
Collapse
Affiliation(s)
- Marc Vaudel
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Dortmund, Germany
| | | | | |
Collapse
|
36
|
Loroch S, Dickhut C, Zahedi RP, Sickmann A. Phosphoproteomics--more than meets the eye. Electrophoresis 2013; 34:1483-92. [PMID: 23576030 DOI: 10.1002/elps.201200710] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2012] [Revised: 02/22/2013] [Accepted: 03/10/2013] [Indexed: 12/16/2022]
Abstract
PTMs enable cells to adapt to internal and external stimuli in the milliseconds to seconds time regime. Protein phosphorylation is probably the most important of these modifications as it affects protein structure and interactions, critically influencing the life cycle of a cell. In the last 15 years, new insights into phosphorylation have been provided by highly sensitive MS-based approaches combined with specific phosphopeptide enrichment strategies. Although so far research has mainly focused on the discovery and characterization of O-phosphorylation, this review also briefly outlines the current knowledge about N-phosphorylation depicting its ubiquitous relevance. Further, common pitfalls in sample preparation, LC-MS analysis, and subsequent data analysis are discussed as well as issues regarding quality and comparability of studies on protein phosphorylation.
Collapse
Affiliation(s)
- Stefan Loroch
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V, Dortmund, Germany
| | | | | | | |
Collapse
|
37
|
Zhang Y, Fonslow BR, Shan B, Baek MC, Yates JR. Protein analysis by shotgun/bottom-up proteomics. Chem Rev 2013; 113:2343-94. [PMID: 23438204 PMCID: PMC3751594 DOI: 10.1021/cr3003533] [Citation(s) in RCA: 1017] [Impact Index Per Article: 84.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Yaoyang Zhang
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bryan R. Fonslow
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bing Shan
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Moon-Chang Baek
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Department of Molecular Medicine, Cell and Matrix Biology Research Institute, School of Medicine, Kyungpook National University, Daegu 700-422, Republic of Korea
| | - John R. Yates
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
38
|
Ivanov AR, Colangelo CM, Dufresne CP, Friedman DB, Lilley KS, Mechtler K, Phinney BS, Rose KL, Rudnick PA, Searle BC, Shaffer SA, Weintraub ST. Interlaboratory studies and initiatives developing standards for proteomics. Proteomics 2013; 13:904-9. [PMID: 23319436 DOI: 10.1002/pmic.201200532] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2012] [Revised: 12/18/2012] [Accepted: 12/19/2012] [Indexed: 01/02/2023]
Abstract
Proteomics is a rapidly transforming interdisciplinary field of research that embraces a diverse set of analytical approaches to tackle problems in fundamental and applied biology. This viewpoint article highlights the benefits of interlaboratory studies and standardization initiatives to enable investigators to address many of the challenges found in proteomics research. Among these initiatives, we discuss our efforts on a comprehensive performance standard for characterizing PTMs by MS that was recently developed by the Association of Biomolecular Resource Facilities (ABRF) Proteomics Standards Research Group (sPRG).
Collapse
Affiliation(s)
- Alexander R Ivanov
- Barnett Institute of Chemical and Biological Analysis, Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Vaudel M, Breiter D, Beck F, Rahnenführer J, Martens L, Zahedi RP. D-score: a search engine independent MD-score. Proteomics 2013; 13:1036-41. [PMID: 23307401 DOI: 10.1002/pmic.201200408] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Revised: 11/11/2012] [Accepted: 12/04/2012] [Indexed: 01/29/2023]
Abstract
While peptides carrying PTMs are routinely identified in gel-free MS, the localization of the PTMs onto the peptide sequences remains challenging. Search engine scores of secondary peptide matches have been used in different approaches in order to infer the quality of site inference, by penalizing the localization whenever the search engine similarly scored two candidate peptides with different site assignments. In the present work, we show how the estimation of posterior error probabilities for peptide candidates allows the estimation of a PTM score called the D-score, for multiple search engine studies. We demonstrate the applicability of this score to three popular search engines: Mascot, OMSSA, and X!Tandem, and evaluate its performance using an already published high resolution data set of synthetic phosphopeptides. For those peptides with phosphorylation site inference uncertainty, the number of spectrum matches with correctly localized phosphorylation increased by up to 25.7% when compared to using Mascot alone, although the actual increase depended on the fragmentation method used. Since this method relies only on search engine scores, it can be readily applied to the scoring of the localization of virtually any modification at no additional experimental or in silico cost.
Collapse
Affiliation(s)
- Marc Vaudel
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Dortmund, Germany
| | | | | | | | | | | |
Collapse
|
40
|
Vaudel M, Burkhart JM, Radau S, Zahedi RP, Martens L, Sickmann A. Integral Quantification Accuracy Estimation for Reporter Ion-based Quantitative Proteomics (iQuARI). J Proteome Res 2012; 11:5072-80. [DOI: 10.1021/pr300247u] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Marc Vaudel
- Leibniz-Institut für Analytische Wissenschaften − ISAS − e.V., Dortmund, Germany
| | - Julia M. Burkhart
- Leibniz-Institut für Analytische Wissenschaften − ISAS − e.V., Dortmund, Germany
| | - Sonja Radau
- Leibniz-Institut für Analytische Wissenschaften − ISAS − e.V., Dortmund, Germany
| | - René P. Zahedi
- Leibniz-Institut für Analytische Wissenschaften − ISAS − e.V., Dortmund, Germany
| | - Lennart Martens
- Department of Medical Protein Research, VIB, Ghent, Belgium
- Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Albert Sickmann
- Leibniz-Institut für Analytische Wissenschaften − ISAS − e.V., Dortmund, Germany
- Medizinisches
Proteom-Center (MPC), Ruhr-Universität, Bochum, Germany
| |
Collapse
|