1
|
Madej D, Lam H. On the use of tandem mass spectra acquired from samples of evolutionarily distant organisms to validate methods for false discovery rate estimation. Proteomics 2024; 24:e2300398. [PMID: 38491400 DOI: 10.1002/pmic.202300398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 03/01/2024] [Accepted: 03/06/2024] [Indexed: 03/18/2024]
Abstract
Estimating the false discovery rate (FDR) of peptide identifications is a key step in proteomics data analysis, and many methods have been proposed for this purpose. Recently, an entrapment-inspired protocol to validate methods for FDR estimation appeared in articles showcasing new spectral library search tools. That validation approach involves generating incorrect spectral matches by searching spectra from evolutionarily distant organisms (entrapment queries) against the original target search space. Although this approach may appear similar to the solutions using entrapment databases, it represents a distinct conceptual framework whose correctness has not been verified yet. In this viewpoint, we first discussed the background of the entrapment-based validation protocols and then conducted a few simple computational experiments to verify the assumptions behind them. The results reveal that entrapment databases may, in some implementations, be a reasonable choice for validation, while the assumptions underpinning validation protocols based on entrapment queries are likely to be violated in practice. This article also highlights the need for well-designed frameworks for validating FDR estimation methods in proteomics.
Collapse
Affiliation(s)
- Dominik Madej
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| |
Collapse
|
2
|
Sun Z, Ning Z, Cheng K, Duan H, Wu Q, Mayne J, Figeys D. MetaPep: A core peptide database for faster human gut metaproteomics database searches. Comput Struct Biotechnol J 2023; 21:4228-4237. [PMID: 37692080 PMCID: PMC10491838 DOI: 10.1016/j.csbj.2023.08.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/18/2023] [Accepted: 08/25/2023] [Indexed: 09/12/2023] Open
Abstract
Metaproteomics has increasingly been applied to study functional changes in the human gut microbiome. Peptide identification is an important step in metaproteomics research, with sequence database search (SDS) and spectral library search (SLS) as the two main methods to identify peptides. However, the large search space in metaproteomics studies causes significant challenges for both identification methods. Moreover, with the development of mass spectrometry, it is now feasible to perform metaproteomic projects involving 100-1000 individual microbiomes. These large-scale projects create a conundrum for searching large databases. In this study, we constructed MetaPep, a core peptide database (including both collections of peptide sequences and tandem MS spectra) greatly accelerating the peptide identifications. Raw files from fifteen metaproteomics projects were re-analyzed and the identified peptide-spectrum matches (PSMs) were used to construct the MetaPep database. The constructed MetaPep database achieved rapid and accurate identification of peptides for human gut metaproteomics. MetaPep has a large collection of peptides and spectra that have been identified in published human gut metaproteomics datasets. MetaPep database can be used as an important resource in the current stage of human gut metaproteomics research. This study showed the possibility of applying a core peptide database as a generic metaproteomics workflow. MetaPep could also be an important resource for future human gut metaproteomics research, such as DIA (data-independent acquisition) analysis.
Collapse
Affiliation(s)
- Zhongzhi Sun
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Zhibin Ning
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Kai Cheng
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Haonan Duan
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Qing Wu
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Janice Mayne
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Daniel Figeys
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| |
Collapse
|
3
|
Kuo TY, Wang JH, Huang YW, Sung TY, Chen CT. Improving quantitation accuracy in isobaric-labeling mass spectrometry experiments with spectral library searching and feature-based peptide-spectrum match filter. Sci Rep 2023; 13:14119. [PMID: 37644119 PMCID: PMC10465558 DOI: 10.1038/s41598-023-41124-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/22/2023] [Indexed: 08/31/2023] Open
Abstract
Isobaric labeling relative quantitation is one of the dominating proteomic quantitation technologies. Traditional quantitation pipelines for isobaric-labeled mass spectrometry data are based on sequence database searching. In this study, we present a novel quantitation pipeline that integrates sequence database searching, spectral library searching, and a feature-based peptide-spectrum-match (PSM) filter using various spectral features for filtering. The combined database and spectral library searching results in larger quantitation coverage, and the filter removes PSMs with larger quantitation errors, retaining those with higher quantitation accuracy. Quantitation results show that the proposed pipeline can improve the overall quantitation accuracy at the PSM and protein levels. To our knowledge, this is the first study that utilizes spectral library searching to improve isobaric labeling-based quantitation. For users to conveniently perform the proposed pipeline, we have implemented the feature-based filter being executable on both Windows and Linux platforms; its executable files, user manual, and sample data sets are freely available at https://ms.iis.sinica.edu.tw/comics/Software_FPF.html . Furthermore, with the developed filter, the proposed pipeline is fully compatible with the Trans-Proteomic Pipeline.
Collapse
Affiliation(s)
- Tzu-Yun Kuo
- Department of Biochemical Science and Technology, National Taiwan University, Taipei, 10617, Taiwan
| | - Jen-Hung Wang
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, 11221, Taiwan
| | - Yung-Wen Huang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, 10617, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan.
| | - Ching-Tai Chen
- Department of Bioinformatics and Biomedical Engineering, Asia University, Taichung, 41354, Taiwan.
- Center for Precision Health Research, Asia University, Taichung, 41354, Taiwan.
| |
Collapse
|
4
|
Geer LY, Lapin J, Slotta DJ, Mak TD, Stein SE. AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence. J Proteome Res 2023; 22:2246-2255. [PMID: 37232537 PMCID: PMC10542943 DOI: 10.1021/acs.jproteome.2c00807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The unbounded permutations of biological molecules, including proteins and their constituent peptides, present a dilemma in identifying the components of complex biosamples. Sequence search algorithms used to identify peptide spectra can be expanded to cover larger classes of molecules, including more modifications, isoforms, and atypical cleavage, but at the cost of false positives or false negatives due to the simplified spectra they compute from sequence records. Spectral library searching can help solve this issue by precisely matching experimental spectra to library spectra with excellent sensitivity and specificity. However, compiling spectral libraries that span entire proteomes is pragmatically difficult. Neural networks that predict complete spectra containing a full range of annotated and unannotated ions can be used to replace these simplified spectra with libraries of fully predicted spectra, including modified peptides. Using such a network, we created predicted spectral libraries that were used to rescore matches from a sequence search done over a large search space, including a large number of modifications. Rescoring improved the separation of true and false hits by 82%, yielding an 8% increase in peptide identifications, including a 21% increase in nonspecifically cleaved peptides and a 17% increase in phosphopeptides.
Collapse
Affiliation(s)
- Lewis Y. Geer
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Joel Lapin
- Department of Physics, Georgetown University, Washington, DC 20057, United States
- Associate, Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Douglas J. Slotta
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Tytus D. Mak
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Stephen E. Stein
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| |
Collapse
|
5
|
Searle BC, Shannon AE, Wilburn DB. Scribe: Next Generation Library Searching for DDA Experiments. J Proteome Res 2023; 22:482-490. [PMID: 36695531 DOI: 10.1021/acs.jproteome.2c00672] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Spectrum library searching is a powerful alternative to database searching for data dependent acquisition experiments, but has been historically limited to identifying previously observed peptides in libraries. Here we present Scribe, a new library search engine designed to leverage deep learning fragmentation prediction software such as Prosit. Rather than relying on highly curated DDA libraries, this approach predicts fragmentation and retention times for every peptide in a FASTA database. Scribe embeds Percolator for false discovery rate correction and an interference tolerant, label-free quantification integrator for an end-to-end proteomics workflow. By leveraging expected relative fragmentation and retention time values, we find that library searching with Scribe can outperform traditional database searching tools both in terms of sensitivity and quantitative precision. Scribe and its graphical interface are easy to use, freely accessible, and fully open source.
Collapse
Affiliation(s)
- Brian C Searle
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States.,Proteome Software Inc., Portland, Oregon97219, United States
| | - Ariana E Shannon
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| | - Damien Beau Wilburn
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| |
Collapse
|
6
|
Arab I, Fondrie WE, Laukens K, Bittremieux W. Semisupervised Machine Learning for Sensitive Open Modification Spectral Library Searching. J Proteome Res 2023; 22:585-593. [PMID: 36688569 DOI: 10.1021/acs.jproteome.2c00616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
A key analysis task in mass spectrometry proteomics is matching the acquired tandem mass spectra to their originating peptides by sequence database searching or spectral library searching. Machine learning is an increasingly popular postprocessing approach to maximize the number of confident spectrum identifications that can be obtained at a given false discovery rate threshold. Here, we have integrated semisupervised machine learning in the ANN-SoLo tool, an efficient spectral library search engine that is optimized for open modification searching to identify peptides with any type of post-translational modification. We show that machine learning rescoring boosts the number of spectra that can be identified for both standard searching and open searching, and we provide insights into relevant spectrum characteristics harnessed by the machine learning model. The semisupervised machine learning functionality has now been fully integrated into ANN-SoLo, which is available as open source under the permissive Apache 2.0 license on GitHub at https://github.com/bittremieux/ANN-SoLo.
Collapse
Affiliation(s)
- Issar Arab
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | | | - Kris Laukens
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| |
Collapse
|
7
|
Dorl S, Winkler S, Mechtler K, Dorfer V. MS Ana: Improving Sensitivity in Peptide Identification with Spectral Library Search. J Proteome Res 2023; 22:462-470. [PMID: 36688604 PMCID: PMC9903325 DOI: 10.1021/acs.jproteome.2c00658] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Spectral library search can enable more sensitive peptide identification in tandem mass spectrometry experiments. However, its drawbacks are the limited availability of high-quality libraries and the added difficulty of creating decoy spectra for result validation. We describe MS Ana, a new spectral library search engine that enables high sensitivity peptide identification using either curated or predicted spectral libraries as well as robust false discovery control through its own decoy library generation algorithm. MS Ana identifies on average 36% more spectrum matches and 4% more proteins than database search in a benchmark test on single-shot human cell-line data. Further, we demonstrate the quality of the result validation with tests on synthetic peptide pools and show the importance of library selection through a comparison of library search performance with different configurations of publicly available human spectral libraries.
Collapse
Affiliation(s)
- Sebastian Dorl
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria,E-mail: . Phone: +43 (0) 50804
27145
| | - Stephan Winkler
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria
| | - Karl Mechtler
- Research
Institute of Molecular Pathology (IMP), Protein Chemistry, Campus-Vienna-Biocenter 1, 1030Vienna, Austria,Institute
of Molecular Biotechnology (IMBA), Protein Chemistry, Vienna Biocenter
(VBC), Dr. Bohr-Gasse 3, 1030Vienna, Austria,Gregor
Mendel Institute of Molecular Plant Biology of the Austrian Academy
of Sciences (GMI), Dr.
Bohr Gasse 3, 1030Vienna, Austria
| | - Viktoria Dorfer
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,E-mail: . Phone: +43 (0) 50804
22740
| |
Collapse
|
8
|
Lee SY, Lee H, Yun SH, Park EC, Seo G, Kim HY, Jun S, Kim NH, Tark D, Lee JY, Lee CS, Kim SI. Proteomics-based diagnostic peptide discovery for severe fever with thrombocytopenia syndrome virus in patients. Clin Proteomics 2022; 19:28. [PMID: 35842602 PMCID: PMC9287713 DOI: 10.1186/s12014-022-09366-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 07/04/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Severe fever with thrombocytopenia syndrome (SFTS) virus is an emerging infectious virus which causes severe hemorrhage, thrombocytopenia, and leukopenia, with a high fatality rate. Since there is no approved therapeutics or vaccines for SFTS, early diagnosis is essential to manage this infectious disease. METHODS Here, we tried to detect SFTS virus in serum samples from SFTS patients by proteomic analysis. Firstly, in order to obtain the reference MS/MS spectral data of SFTS virus, medium from infected Vero cell culture was used for shotgun proteomic analysis. Then, tryptic peptides in sera from SFTS patients were confirmed by comparative analysis with the reference MS/MS spectral data of SFTS virus. RESULTS Proteomic analysis of culture medium successfully discovered tryptic peptides from all the five antigen proteins of SFTS virus. The comparative spectral analysis of sera of SFTS patients revealed that the N-terminal tryptic peptide of the nucleocapsid (N) protein is the major epitope of SFTS virus detected in the patient samples. The prevalence of the peptides was strongly correlated with the viral load in the clinical samples. CONCLUSIONS Proteomic analysis of SFTS patient samples revealed that nucleocapsid (N) protein is the major antigen proteins in sera of SFTS patients and N-terminal tryptic peptide of the N protein might be a useful proteomic target for direct detection of SFTS virus. These findings suggest that proteomic analysis could be an alternative tool for detection of pathogens in clinical samples and diagnosis of infectious diseases.
Collapse
Affiliation(s)
- Sang-Yeop Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Ochang, 28119, Republic of Korea.,Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology (KRICT), Daejeon, 34114, Republic of Korea
| | - Hayoung Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Ochang, 28119, Republic of Korea.,Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology (KRICT), Daejeon, 34114, Republic of Korea.,Department of Bio-Analytical Science, University of Science and Technology, Daejeon, 34113, Republic of Korea
| | - Sung Ho Yun
- Center for Research Equipment, Korea Basic Science Institute, Ochang, 28119, Republic of Korea
| | - Edmond Changkyun Park
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Ochang, 28119, Republic of Korea.,Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology (KRICT), Daejeon, 34114, Republic of Korea.,Department of Bio-Analytical Science, University of Science and Technology, Daejeon, 34113, Republic of Korea
| | - Giwan Seo
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Ochang, 28119, Republic of Korea.,Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology (KRICT), Daejeon, 34114, Republic of Korea
| | - Hye-Yeon Kim
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Ochang, 28119, Republic of Korea.,Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology (KRICT), Daejeon, 34114, Republic of Korea
| | - Sangmi Jun
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology (KRICT), Daejeon, 34114, Republic of Korea.,Center for Research Equipment, Korea Basic Science Institute, Ochang, 28119, Republic of Korea
| | - Nam Hoon Kim
- Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology (KRICT), Daejeon, 34114, Republic of Korea
| | - Dongseob Tark
- Korea Zoonosis Research Institute, Jeonbuk National University, Iksan, Jeonbuk, 54531, Republic of Korea
| | - Ju Yeon Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Ochang, 28119, Republic of Korea
| | - Chang-Seop Lee
- Department of Internal Medicine, Jeonbuk National University Medical School, Jeonju, 54986, Republic of Korea. .,Biomedical Research Institute of Jeonbuk National University Hospital, Jeonju, 54907, Republic of Korea.
| | - Seung Il Kim
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute, Ochang, 28119, Republic of Korea. .,Center for Convergent Research of Emerging Virus Infection, Korea Research Institute of Chemical Technology (KRICT), Daejeon, 34114, Republic of Korea. .,Department of Bio-Analytical Science, University of Science and Technology, Daejeon, 34113, Republic of Korea.
| |
Collapse
|
9
|
Shiferaw GA, Gabriels R, Bouwmeester R, Van Den Bossche T, Vandermarliere E, Martens L, Volders PJ. Sensitive and Specific Spectral Library Searching with CompOmics Spectral Library Searching Tool and Percolator. J Proteome Res 2022; 21:1365-1370. [PMID: 35446579 DOI: 10.1021/acs.jproteome.2c00075] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Maintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we investigate the effects of integrating the machine learning-based postprocessor Percolator into our spectral library searching tool COSS (CompOmics Spectral library Searching tool). To evaluate the effects of this postprocessing, we have used 40 data sets from 2 different projects and have searched these against the NIST and MassIVE spectral libraries. The searching is carried out using 2 spectral library search tools, COSS and MSPepSearch with and without Percolator postprocessing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step to COSS is effective and results in a substantial improvement in sensitivity and specificity of the identifications. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compomics/COSS.
Collapse
Affiliation(s)
- Genet Abay Shiferaw
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Elien Vandermarliere
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Pieter-Jan Volders
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
10
|
Na S, Choi H, Paek E. Deephos: Predicted spectral database search for TMT-labeled phosphopeptides and its false discovery rate estimation. Bioinformatics 2022; 38:2980-2987. [PMID: 35441674 DOI: 10.1093/bioinformatics/btac280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 03/26/2022] [Accepted: 04/14/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Tandem mass tag (TMT)-based tandem mass spectrometry (MS/MS) has become the method of choice for the quantification of post-translational modifications in complex mixtures. Many cancer proteogenomic studies have highlighted the importance of large-scale phosphopeptide quantification coupled with TMT labeling. Herein, we propose a predicted Spectral DataBase (pSDB) search strategy called Deephos that can improve both sensitivity and specificity in identifying MS/MS spectra of TMT-labeled phosphopeptides. RESULTS With deep learning-based fragment ion prediction, we compiled a pSDB of TMT-labeled phosphopeptides generated from ∼8,000 human phosphoproteins annotated in UniProt. Deep learning could successfully recognize the fragmentation patterns altered by both TMT labeling and phosphorylation. In addition, we discuss the decoy spectra for false discovery rate (FDR) estimation in the pSDB search. We show that FDR could be inaccurately estimated by the existing decoy spectra generation methods and propose an innovative method to generate decoy spectra for more accurate FDR estimation. The utilities of Deephos were demonstrated in multi-stage analyses (coupled with database searches) of glioblastoma, acute myeloid leukemia, and breast cancer phosphoproteomes. AVAILABILITY Deephos pSDB and the search software are available at https://github.com/seungjinna/deephos.
Collapse
Affiliation(s)
- Seungjin Na
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea
| | - Hyunjin Choi
- Department of Automotive Engineering, Hanyang University, Seoul, 04763, Republic of Korea
| | - Eunok Paek
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea.,Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| |
Collapse
|
11
|
Abstract
There are probably no biological samples that did more to spur interest in proteomics than serum and plasma. The belief was that comparing the proteomes of these samples obtained from healthy and disease-affected individuals would lead to biomarkers that could be used to diagnose conditions such as cancer. While the continuing development of mass spectrometers with greater sensitivity and resolution has been invaluable, the invention of strategic strategies to separate circulatory proteins has been just as critical. Novel and creative separation techniques were required because serum and plasma probably have the greatest dynamic range of protein concentration of any biological sample. The concentrations of circulating proteins can range over twelve orders of magnitude, making it a challenge to identify low-abundance proteins where the bulk of the useful biomarkers are believed to exist. The major goals of this article are to (i) provide an historical perspective on the rapid development of serum and plasma proteomics; (ii) describe various separation techniques that have made obtaining an in-depth view of the proteome of these biological samples possible; and (iii) describe applications where serum and plasma proteomics have been employed to discover potential biomarkers for pathological conditions.
Collapse
|
12
|
Lee H, Kim SI. Review of Liquid Chromatography-Mass Spectrometry-Based Proteomic Analyses of Body Fluids to Diagnose Infectious Diseases. Int J Mol Sci 2022; 23:ijms23042187. [PMID: 35216306 PMCID: PMC8878692 DOI: 10.3390/ijms23042187] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 02/11/2022] [Accepted: 02/14/2022] [Indexed: 01/27/2023] Open
Abstract
Rapid and precise diagnostic methods are required to control emerging infectious diseases effectively. Human body fluids are attractive clinical samples for discovering diagnostic targets because they reflect the clinical statuses of patients and most of them can be obtained with minimally invasive sampling processes. Body fluids are good reservoirs for infectious parasites, bacteria, and viruses. Therefore, recent clinical proteomics methods have focused on body fluids when aiming to discover human- or pathogen-originated diagnostic markers. Cutting-edge liquid chromatography-mass spectrometry (LC-MS)-based proteomics has been applied in this regard; it is considered one of the most sensitive and specific proteomics approaches. Here, the clinical characteristics of each body fluid, recent tandem mass spectroscopy (MS/MS) data-acquisition methods, and applications of body fluids for proteomics regarding infectious diseases (including the coronavirus disease of 2019 [COVID-19]), are summarized and discussed.
Collapse
Affiliation(s)
- Hayoung Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute (KBSI), Ochang 28119, Korea;
- Bio-Analytical Science Division, University of Science and Technology (UST), Daejeon 34113, Korea
| | - Seung Il Kim
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute (KBSI), Ochang 28119, Korea;
- Bio-Analytical Science Division, University of Science and Technology (UST), Daejeon 34113, Korea
- Correspondence:
| |
Collapse
|
13
|
Wang JH, Choong WK, Chen CT, Sung TY. Calibr improves spectral library search for spectrum-centric analysis of data independent acquisition proteomics. Sci Rep 2022; 12:2045. [PMID: 35132134 PMCID: PMC8821666 DOI: 10.1038/s41598-022-06026-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 01/21/2022] [Indexed: 12/20/2022] Open
Abstract
Identifying peptides and proteins from mass spectrometry (MS) data, spectral library searching has emerged as a complementary approach to the conventional database searching. However, for the spectrum-centric analysis of data-independent acquisition (DIA) data, spectral library searching has not been widely exploited because existing spectral library search tools are mainly designed and optimized for the analysis of data-dependent acquisition (DDA) data. We present Calibr, a spectral library search tool for spectrum-centric DIA data analysis. Calibr optimizes spectrum preprocessing for pseudo MS2 spectra, generating an 8.11% increase in spectrum–spectrum match (SSM) number and a 7.49% increase in peptide number over the traditional preprocessing approach. When searching against the DDA-based spectral library, Calibr improves SSM number by 17.6–26.65% and peptide number by 18.45–37.31% over two state-of-the-art tools on three different data sets. Searching against the public spectral library from MassIVE, Calibr improves state-of-the-art tools in SSM and peptide numbers by more than 31.49% and 25.24%, respectively, for two data sets. Our analyses indicate higher sensitivity of Calibr results from the use of various spectral similarity measures and statistical scores, coupled with machine learning-based statistical validation for FDR control. Calibr executable files including a graphical user-interface application are available at https://ms.iis.sinica.edu.tw/COmics/Software_CalibrWizard.html and https://sourceforge.net/projects/comics-calibr.
Collapse
|
14
|
Abstract
Mass spectrometry (MS)-based proteomics is currently the most successful approach to measure and compare peptides and proteins in a large variety of biological samples. Modern mass spectrometers, equipped with high-resolution analyzers, provide large amounts of data output. This is the case of shotgun/bottom-up proteomics, which consists in the enzymatic digestion of protein into peptides that are then measured by MS-instruments through a data dependent acquisition (DDA) mode. Dedicated bioinformatic tools and platforms have been developed to face the increasing size and complexity of raw MS data that need to be processed and interpreted for large-scale protein identification and quantification. This chapter illustrates the most popular bioinformatics solution for the analysis of shotgun MS-proteomics data. A general description will be provided on the data preprocessing options and the different search engines available, including practical suggestions on how to optimize the parameters for peptide search, based on hands-on experience.
Collapse
Affiliation(s)
- Avinash Yadav
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Federica Marini
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Alessandro Cuomo
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Tiziana Bonaldi
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy.
| |
Collapse
|