1
|
Klein J, Lam H, Mak TD, Bittremieux W, Perez-Riverol Y, Gabriels R, Shofstahl J, Hecht H, Binz PA, Kawano S, Van Den Bossche T, Carver J, Neely BA, Mendoza L, Suomi T, Claeys T, Payne T, Schulte D, Sun Z, Hoffmann N, Zhu Y, Neumann S, Jones AR, Bandeira N, Vizcaíno JA, Deutsch EW. The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF. Anal Chem 2024. [PMID: 39514576 DOI: 10.1021/acs.analchem.4c04091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Mass spectral libraries are collections of reference spectra, usually associated with specific analytes from which the spectra were generated, that are used for further downstream analysis of new spectra. There are many different formats used for encoding spectral libraries, but none have undergone a standardization process to ensure broad applicability to many applications. As part of the Human Proteome Organization Proteomics Standards Initiative (PSI), we have developed a standardized format for encoding spectral libraries, called mzSpecLib (https://psidev.info/mzSpecLib). It is primarily a data model that flexibly encodes metadata about the library entries using the extensible PSI-MS controlled vocabulary and can be encoded in and converted between different serialization formats. We have also developed a standardized data model and serialization for fragment ion peak annotations, called mzPAF (https://psidev.info/mzPAF). It is defined as a separate standard, since it may be used for other applications besides spectral libraries. The mzSpecLib and mzPAF standards are compatible with existing PSI standards such as ProForma 2.0 and the Universal Spectrum Identifier. The mzSpecLib and mzPAF standards have been primarily defined for peptides in proteomics applications with basic small molecule support. They could be extended in the future to other fields that need to encode spectral libraries for nonpeptidic analytes.
Collapse
Affiliation(s)
- Joshua Klein
- Program for Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, 999077 Hong Kong, P. R. China
| | - Tytus D Mak
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jim Shofstahl
- Thermo Fisher Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Helge Hecht
- RECETOX, Faculty of Science, Masaryk University, Kotlářská 2, 60200 Brno, Czech Republic
| | | | - Shin Kawano
- Database Center for Life Science, Joint Support Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, Japan
- School of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jeremy Carver
- Center for Computational Mass Spectrometry, Department of Computer Science and Engineering, University of California, San Diego, California 92093-0404, United States
| | - Benjamin A Neely
- National Institute of Standards and Technology (NIST) Charleston, Charleston, South Carolina 29412, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Tomi Suomi
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
| | - Tine Claeys
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Thomas Payne
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Douwe Schulte
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584, CH, Utrecht, The Netherlands
| | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute for Bio- and Geosciences (IBG-5), Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
| | - Yunping Zhu
- National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Steffen Neumann
- Computational Plant Biochemistry, Leibniz Institute of Plant Biochemistry, 06120 Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, United Kingdom
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, Department of Computer Science and Engineering, University of California, San Diego, California 92093-0404, United States
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
2
|
Weaver C, Nam A, Settle C, Overton M, Giddens M, Richardson KP, Piver R, Mysona DP, Rungruang B, Ghamande S, McIndoe R, Purohit S. Serum Proteomic Signatures in Cervical Cancer: Current Status and Future Directions. Cancers (Basel) 2024; 16:1629. [PMID: 38730581 PMCID: PMC11083044 DOI: 10.3390/cancers16091629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/18/2024] [Accepted: 04/19/2024] [Indexed: 05/13/2024] Open
Abstract
In 2020, the World Health Organization (WHO) reported 604,000 new diagnoses of cervical cancer (CC) worldwide, and over 300,000 CC-related fatalities. The vast majority of CC cases are caused by persistent human papillomavirus (HPV) infections. HPV-related CC incidence and mortality rates have declined worldwide because of increased HPV vaccination and CC screening with the Papanicolaou test (PAP test). Despite these significant improvements, developing countries face difficulty implementing these programs, while developed nations are challenged with identifying HPV-independent cases. Molecular and proteomic information obtained from blood or tumor samples have a strong potential to provide information on malignancy progression and response to therapy in CC. There is a large amount of published biomarker data related to CC available but the extensive validation required by the FDA approval for clinical use is lacking. The ability of researchers to use the big data obtained from clinical studies and to draw meaningful relationships from these data are two obstacles that must be overcome for implementation into clinical practice. We report on identified multimarker panels of serum proteomic studies in CC for the past 5 years, the potential for modern computational biology efforts, and the utilization of nationwide biobanks to bridge the gap between multivariate protein signature development and the prediction of clinically relevant CC patient outcomes.
Collapse
Affiliation(s)
- Chaston Weaver
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (C.W.); (K.P.R.); (R.P.); (D.P.M.); (R.M.)
| | - Alisha Nam
- Department of Undergraduate Health Professions, College of Allied Health Sciences, Augusta University, Augusta, GA 30912, USA; (A.N.); (C.S.); (M.O.); (M.G.)
| | - Caitlin Settle
- Department of Undergraduate Health Professions, College of Allied Health Sciences, Augusta University, Augusta, GA 30912, USA; (A.N.); (C.S.); (M.O.); (M.G.)
| | - Madelyn Overton
- Department of Undergraduate Health Professions, College of Allied Health Sciences, Augusta University, Augusta, GA 30912, USA; (A.N.); (C.S.); (M.O.); (M.G.)
| | - Maya Giddens
- Department of Undergraduate Health Professions, College of Allied Health Sciences, Augusta University, Augusta, GA 30912, USA; (A.N.); (C.S.); (M.O.); (M.G.)
| | - Katherine P. Richardson
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (C.W.); (K.P.R.); (R.P.); (D.P.M.); (R.M.)
| | - Rachael Piver
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (C.W.); (K.P.R.); (R.P.); (D.P.M.); (R.M.)
- Department of Obstetrics and Gynecology, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (B.R.); (S.G.)
| | - David P. Mysona
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (C.W.); (K.P.R.); (R.P.); (D.P.M.); (R.M.)
- Department of Obstetrics and Gynecology, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (B.R.); (S.G.)
| | - Bunja Rungruang
- Department of Obstetrics and Gynecology, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (B.R.); (S.G.)
| | - Sharad Ghamande
- Department of Obstetrics and Gynecology, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (B.R.); (S.G.)
| | - Richard McIndoe
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (C.W.); (K.P.R.); (R.P.); (D.P.M.); (R.M.)
- Department of Obstetrics and Gynecology, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (B.R.); (S.G.)
| | - Sharad Purohit
- Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (C.W.); (K.P.R.); (R.P.); (D.P.M.); (R.M.)
- Department of Undergraduate Health Professions, College of Allied Health Sciences, Augusta University, Augusta, GA 30912, USA; (A.N.); (C.S.); (M.O.); (M.G.)
- Department of Obstetrics and Gynecology, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA; (B.R.); (S.G.)
| |
Collapse
|
3
|
Palstrøm NB, Campbell AJ, Lindegaard CA, Cakar S, Matthiesen R, Beck HC. Spectral library search for improved TMTpro labelled peptide assignment in human plasma proteomics. Proteomics 2024; 24:e2300236. [PMID: 37706597 DOI: 10.1002/pmic.202300236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 09/15/2023]
Abstract
Clinical biomarker discovery is often based on the analysis of human plasma samples. However, the high dynamic range and complexity of plasma pose significant challenges to mass spectrometry-based proteomics. Current methods for improving protein identifications require laborious pre-analytical sample preparation. In this study, we developed and evaluated a TMTpro-specific spectral library for improved protein identification in human plasma proteomics. The library was constructed by LC-MS/MS analysis of highly fractionated TMTpro-tagged human plasma, human cell lysates, and relevant arterial tissues. The library was curated using several quality filters to ensure reliable peptide identifications. Our results show that spectral library searching using the TMTpro spectral library improves the identification of proteins in plasma samples compared to conventional sequence database searching. Protein identifications made by the spectral library search engine demonstrated a high degree of complementarity with the sequence database search engine, indicating the feasibility of increasing the number of protein identifications without additional pre-analytical sample preparation. The TMTpro-specific spectral library provides a resource for future plasma proteomics research and optimization of search algorithms for greater accuracy and speed in protein identifications in human plasma proteomics, and is made publicly available to the research community via ProteomeXchange with identifier PXD042546.
Collapse
Affiliation(s)
- Nicolai B Palstrøm
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | - Amanda J Campbell
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | | | - Samir Cakar
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | - Rune Matthiesen
- Computational and Experimental Biology Group, CEDOC, Chronic Diseases Research Centre, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal
| | - Hans C Beck
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| |
Collapse
|
4
|
McGann CD, Barshop W, Canterbury J, Lin C, Gabriel W, Huang J, Bergen D, Zubraskov V, Melani R, Wilhelm M, McAlister G, Schweppe DK. Real-Time Spectral Library Matching for Sample Multiplexed Quantitative Proteomics. J Proteome Res 2023; 22:2836-2846. [PMID: 37557900 PMCID: PMC11554524 DOI: 10.1021/acs.jproteome.3c00085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
Sample multiplexed quantitative proteomics assays have proved to be a highly versatile means to assay molecular phenotypes. Yet, stochastic precursor selection and precursor coisolation can dramatically reduce the efficiency of data acquisition and quantitative accuracy. To address this, intelligent data acquisition (IDA) strategies have recently been developed to improve instrument efficiency and quantitative accuracy for both discovery and targeted methods. Toward this end, we sought to develop and implement a new real-time spectral library searching (RTLS) workflow that could enable intelligent scan triggering and peak selection within milliseconds of scan acquisition. To ensure ease of use and general applicability, we built an application to read in diverse spectral libraries and file types from both empirical and predicted spectral libraries. We demonstrate that RTLS methods enable improved quantitation of multiplexed samples, particularly with consideration for quantitation from chimeric fragment spectra. We used RTLS to profile proteome responses to small molecule perturbations and were able to quantify up to 15% more significantly regulated proteins in half the gradient time compared to traditional methods. Taken together, the development of RTLS expands the IDA toolbox to improve instrument efficiency and quantitative accuracy for sample multiplexed analyses.
Collapse
Affiliation(s)
| | - Will Barshop
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Jesse Canterbury
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Chuwei Lin
- University of Washington, Seattle, WA 98105
| | | | - Jingjing Huang
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - David Bergen
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Vlad Zubraskov
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Rafael Melani
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | | | - Graeme McAlister
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | | |
Collapse
|
5
|
Sun Z, Ning Z, Cheng K, Duan H, Wu Q, Mayne J, Figeys D. MetaPep: A core peptide database for faster human gut metaproteomics database searches. Comput Struct Biotechnol J 2023; 21:4228-4237. [PMID: 37692080 PMCID: PMC10491838 DOI: 10.1016/j.csbj.2023.08.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/18/2023] [Accepted: 08/25/2023] [Indexed: 09/12/2023] Open
Abstract
Metaproteomics has increasingly been applied to study functional changes in the human gut microbiome. Peptide identification is an important step in metaproteomics research, with sequence database search (SDS) and spectral library search (SLS) as the two main methods to identify peptides. However, the large search space in metaproteomics studies causes significant challenges for both identification methods. Moreover, with the development of mass spectrometry, it is now feasible to perform metaproteomic projects involving 100-1000 individual microbiomes. These large-scale projects create a conundrum for searching large databases. In this study, we constructed MetaPep, a core peptide database (including both collections of peptide sequences and tandem MS spectra) greatly accelerating the peptide identifications. Raw files from fifteen metaproteomics projects were re-analyzed and the identified peptide-spectrum matches (PSMs) were used to construct the MetaPep database. The constructed MetaPep database achieved rapid and accurate identification of peptides for human gut metaproteomics. MetaPep has a large collection of peptides and spectra that have been identified in published human gut metaproteomics datasets. MetaPep database can be used as an important resource in the current stage of human gut metaproteomics research. This study showed the possibility of applying a core peptide database as a generic metaproteomics workflow. MetaPep could also be an important resource for future human gut metaproteomics research, such as DIA (data-independent acquisition) analysis.
Collapse
Affiliation(s)
- Zhongzhi Sun
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Zhibin Ning
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Kai Cheng
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Haonan Duan
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Qing Wu
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Janice Mayne
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Daniel Figeys
- School of Pharmaceutical Sciences, Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| |
Collapse
|
6
|
Kuo TY, Wang JH, Huang YW, Sung TY, Chen CT. Improving quantitation accuracy in isobaric-labeling mass spectrometry experiments with spectral library searching and feature-based peptide-spectrum match filter. Sci Rep 2023; 13:14119. [PMID: 37644119 PMCID: PMC10465558 DOI: 10.1038/s41598-023-41124-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/22/2023] [Indexed: 08/31/2023] Open
Abstract
Isobaric labeling relative quantitation is one of the dominating proteomic quantitation technologies. Traditional quantitation pipelines for isobaric-labeled mass spectrometry data are based on sequence database searching. In this study, we present a novel quantitation pipeline that integrates sequence database searching, spectral library searching, and a feature-based peptide-spectrum-match (PSM) filter using various spectral features for filtering. The combined database and spectral library searching results in larger quantitation coverage, and the filter removes PSMs with larger quantitation errors, retaining those with higher quantitation accuracy. Quantitation results show that the proposed pipeline can improve the overall quantitation accuracy at the PSM and protein levels. To our knowledge, this is the first study that utilizes spectral library searching to improve isobaric labeling-based quantitation. For users to conveniently perform the proposed pipeline, we have implemented the feature-based filter being executable on both Windows and Linux platforms; its executable files, user manual, and sample data sets are freely available at https://ms.iis.sinica.edu.tw/comics/Software_FPF.html . Furthermore, with the developed filter, the proposed pipeline is fully compatible with the Trans-Proteomic Pipeline.
Collapse
Affiliation(s)
- Tzu-Yun Kuo
- Department of Biochemical Science and Technology, National Taiwan University, Taipei, 10617, Taiwan
| | - Jen-Hung Wang
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, 11221, Taiwan
| | - Yung-Wen Huang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, 10617, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan.
| | - Ching-Tai Chen
- Department of Bioinformatics and Biomedical Engineering, Asia University, Taichung, 41354, Taiwan.
- Center for Precision Health Research, Asia University, Taichung, 41354, Taiwan.
| |
Collapse
|
7
|
Geer LY, Lapin J, Slotta DJ, Mak TD, Stein SE. AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence. J Proteome Res 2023; 22:2246-2255. [PMID: 37232537 PMCID: PMC10542943 DOI: 10.1021/acs.jproteome.2c00807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The unbounded permutations of biological molecules, including proteins and their constituent peptides, present a dilemma in identifying the components of complex biosamples. Sequence search algorithms used to identify peptide spectra can be expanded to cover larger classes of molecules, including more modifications, isoforms, and atypical cleavage, but at the cost of false positives or false negatives due to the simplified spectra they compute from sequence records. Spectral library searching can help solve this issue by precisely matching experimental spectra to library spectra with excellent sensitivity and specificity. However, compiling spectral libraries that span entire proteomes is pragmatically difficult. Neural networks that predict complete spectra containing a full range of annotated and unannotated ions can be used to replace these simplified spectra with libraries of fully predicted spectra, including modified peptides. Using such a network, we created predicted spectral libraries that were used to rescore matches from a sequence search done over a large search space, including a large number of modifications. Rescoring improved the separation of true and false hits by 82%, yielding an 8% increase in peptide identifications, including a 21% increase in nonspecifically cleaved peptides and a 17% increase in phosphopeptides.
Collapse
Affiliation(s)
- Lewis Y. Geer
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Joel Lapin
- Department of Physics, Georgetown University, Washington, DC 20057, United States
- Associate, Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Douglas J. Slotta
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Tytus D. Mak
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Stephen E. Stein
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| |
Collapse
|
8
|
Searle BC, Shannon AE, Wilburn DB. Scribe: Next Generation Library Searching for DDA Experiments. J Proteome Res 2023; 22:482-490. [PMID: 36695531 DOI: 10.1021/acs.jproteome.2c00672] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Spectrum library searching is a powerful alternative to database searching for data dependent acquisition experiments, but has been historically limited to identifying previously observed peptides in libraries. Here we present Scribe, a new library search engine designed to leverage deep learning fragmentation prediction software such as Prosit. Rather than relying on highly curated DDA libraries, this approach predicts fragmentation and retention times for every peptide in a FASTA database. Scribe embeds Percolator for false discovery rate correction and an interference tolerant, label-free quantification integrator for an end-to-end proteomics workflow. By leveraging expected relative fragmentation and retention time values, we find that library searching with Scribe can outperform traditional database searching tools both in terms of sensitivity and quantitative precision. Scribe and its graphical interface are easy to use, freely accessible, and fully open source.
Collapse
Affiliation(s)
- Brian C Searle
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States.,Proteome Software Inc., Portland, Oregon97219, United States
| | - Ariana E Shannon
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| | - Damien Beau Wilburn
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| |
Collapse
|
9
|
Dorl S, Winkler S, Mechtler K, Dorfer V. MS Ana: Improving Sensitivity in Peptide Identification with Spectral Library Search. J Proteome Res 2023; 22:462-470. [PMID: 36688604 PMCID: PMC9903325 DOI: 10.1021/acs.jproteome.2c00658] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Spectral library search can enable more sensitive peptide identification in tandem mass spectrometry experiments. However, its drawbacks are the limited availability of high-quality libraries and the added difficulty of creating decoy spectra for result validation. We describe MS Ana, a new spectral library search engine that enables high sensitivity peptide identification using either curated or predicted spectral libraries as well as robust false discovery control through its own decoy library generation algorithm. MS Ana identifies on average 36% more spectrum matches and 4% more proteins than database search in a benchmark test on single-shot human cell-line data. Further, we demonstrate the quality of the result validation with tests on synthetic peptide pools and show the importance of library selection through a comparison of library search performance with different configurations of publicly available human spectral libraries.
Collapse
Affiliation(s)
- Sebastian Dorl
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria,E-mail: . Phone: +43 (0) 50804
27145
| | - Stephan Winkler
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria
| | - Karl Mechtler
- Research
Institute of Molecular Pathology (IMP), Protein Chemistry, Campus-Vienna-Biocenter 1, 1030Vienna, Austria,Institute
of Molecular Biotechnology (IMBA), Protein Chemistry, Vienna Biocenter
(VBC), Dr. Bohr-Gasse 3, 1030Vienna, Austria,Gregor
Mendel Institute of Molecular Plant Biology of the Austrian Academy
of Sciences (GMI), Dr.
Bohr Gasse 3, 1030Vienna, Austria
| | - Viktoria Dorfer
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,E-mail: . Phone: +43 (0) 50804
22740
| |
Collapse
|
10
|
Bittremieux W, Wang M, Dorrestein PC. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 2022; 18:94. [PMID: 36409434 PMCID: PMC10284100 DOI: 10.1007/s11306-022-01947-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/19/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. AIM OF REVIEW We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. KEY SCIENTIFIC CONCEPTS OF REVIEW This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.
Collapse
Affiliation(s)
- Wout Bittremieux
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
| | - Mingxun Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, 92507, USA
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
11
|
Dai Y, Millikin R, Rolfs Z, Shortreed MR, Smith LM. A Hybrid Spectral Library and Protein Sequence Database Search Strategy for Bottom-Up and Top-Down Proteomic Data Analysis. J Proteome Res 2022; 21:2609-2618. [PMID: 36206157 PMCID: PMC9869658 DOI: 10.1021/acs.jproteome.2c00305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Tandem mass spectrometry (MS/MS) is widely employed for the analysis of complex proteomic samples. While protein sequence database searching and spectral library searching are both well-established peptide identification methods, each has shortcomings. Protein sequence databases lack fragment peak intensity information, which can result in poor discrimination between correct and incorrect spectrum assignments. Spectral libraries usually contain fewer peptides than protein sequence databases, which limits the number of peptides that can be identified. Notably, few post-translationally modified peptides are represented in spectral libraries. This is because few search engines can both identify a broad spectrum of PTMs and create corresponding spectral libraries. Also, programs that generate spectral libraries using deep learning approaches are not yet able to accurately predict spectra for the vast majority of PTMs. Here, we address these limitations through use of a hybrid search strategy that combines protein sequence database and spectral library searches to improve identification success rates and sensitivity. This software uses Global PTM Discovery (G-PTM-D) to produce spectral libraries for a wide variety of different PTMs. These features, along with a new spectrum annotation and visualization tool, have been integrated into the freely available and open-source search engine MetaMorpheus.
Collapse
Affiliation(s)
- Yuling Dai
- Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Robert Millikin
- Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Zach Rolfs
- Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Michael R. Shortreed
- Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lloyd M. Smith
- Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, Wisconsin 53706, United States
| |
Collapse
|
12
|
Shiferaw GA, Gabriels R, Bouwmeester R, Van Den Bossche T, Vandermarliere E, Martens L, Volders PJ. Sensitive and Specific Spectral Library Searching with CompOmics Spectral Library Searching Tool and Percolator. J Proteome Res 2022; 21:1365-1370. [PMID: 35446579 DOI: 10.1021/acs.jproteome.2c00075] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Maintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we investigate the effects of integrating the machine learning-based postprocessor Percolator into our spectral library searching tool COSS (CompOmics Spectral library Searching tool). To evaluate the effects of this postprocessing, we have used 40 data sets from 2 different projects and have searched these against the NIST and MassIVE spectral libraries. The searching is carried out using 2 spectral library search tools, COSS and MSPepSearch with and without Percolator postprocessing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step to COSS is effective and results in a substantial improvement in sensitivity and specificity of the identifications. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compomics/COSS.
Collapse
Affiliation(s)
- Genet Abay Shiferaw
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Elien Vandermarliere
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Pieter-Jan Volders
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
13
|
Qin G, Zhang P, Sun M, Fu W, Cai C. Comprehensive spectral libraries for various rabbit eye tissue proteomes. Sci Data 2022; 9:111. [PMID: 35351915 PMCID: PMC8964796 DOI: 10.1038/s41597-022-01241-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 03/03/2022] [Indexed: 12/14/2022] Open
Abstract
Rabbits have been widely used for studying ocular physiology and pathology due to their relatively large eye size and similar structures with human eyes. Various rabbit ocular disease models, such as dry eye, age-related macular degeneration, and glaucoma, have been established. Despite the growing application of proteomics in vision research using rabbit ocular models, there is no spectral assay library for rabbit eye proteome publicly available. Here, we generated spectral assay libraries for rabbit eye compartments, including conjunctiva, cornea, iris, retina, sclera, vitreous humor, and tears using fractionated samples and ion mobility separation enabling deep proteome coverage. The rabbit eye spectral assay library includes 9,830 protein groups and 113,593 peptides. We present the data as a freely available community resource for proteomic studies in the vision field. Instrument data and spectral libraries are available via ProteomeXchange with identifier PXD031194. Measurement(s) | database type spectral library | Technology Type(s) | ion mobility spectrometry-mass spectrometry | Sample Characteristic - Organism | Oryctolagus cuniculus | Sample Characteristic - Environment | eye | Sample Characteristic - Location | United States of America |
Collapse
|
14
|
Wang JH, Choong WK, Chen CT, Sung TY. Calibr improves spectral library search for spectrum-centric analysis of data independent acquisition proteomics. Sci Rep 2022; 12:2045. [PMID: 35132134 PMCID: PMC8821666 DOI: 10.1038/s41598-022-06026-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 01/21/2022] [Indexed: 12/20/2022] Open
Abstract
Identifying peptides and proteins from mass spectrometry (MS) data, spectral library searching has emerged as a complementary approach to the conventional database searching. However, for the spectrum-centric analysis of data-independent acquisition (DIA) data, spectral library searching has not been widely exploited because existing spectral library search tools are mainly designed and optimized for the analysis of data-dependent acquisition (DDA) data. We present Calibr, a spectral library search tool for spectrum-centric DIA data analysis. Calibr optimizes spectrum preprocessing for pseudo MS2 spectra, generating an 8.11% increase in spectrum–spectrum match (SSM) number and a 7.49% increase in peptide number over the traditional preprocessing approach. When searching against the DDA-based spectral library, Calibr improves SSM number by 17.6–26.65% and peptide number by 18.45–37.31% over two state-of-the-art tools on three different data sets. Searching against the public spectral library from MassIVE, Calibr improves state-of-the-art tools in SSM and peptide numbers by more than 31.49% and 25.24%, respectively, for two data sets. Our analyses indicate higher sensitivity of Calibr results from the use of various spectral similarity measures and statistical scores, coupled with machine learning-based statistical validation for FDR control. Calibr executable files including a graphical user-interface application are available at https://ms.iis.sinica.edu.tw/COmics/Software_CalibrWizard.html and https://sourceforge.net/projects/comics-calibr.
Collapse
|
15
|
Zhang W, Liang Z, Chen X, Xin L, Shan B, Luo Z, Li M. ChimST: An Efficient Spectral Library Search Tool for Peptide Identification from Chimeric Spectra in Data-Dependent Acquisition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1416-1425. [PMID: 31603795 DOI: 10.1109/tcbb.2019.2945954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Accurate and sensitive identification of peptides from MS/MS spectra is a very challenging problem in computational shotgun proteomics. To tackle this problem, spectral library search has been one of the competitive solutions. However, most existing library search tools were developed on the basis of one peptide per spectrum, which prevents them from working properly on chimeric spectra where two or more peptides are co-fragmented. In this work, we present a new library search tool called ChimST, which is particularly capable of reliably identifying multiple peptides from a chimeric spectrum. It starts with associating each query MS/MS spectrum with MS precursor features. For each precursor feature, there is a list of peptide candidates extracted from an input spectral library. Then, it takes one peptide candidate from each associated feature and scores how well they could collectively interpret the query spectrum. The highest-scoring set of peptide candidates are finally reported as the identification of the query spectrum. Our experimental tests show that ChimST could significantly outperform the three state-of-the-art library search tools, SpectraST, reSpect, and MSPLIT, in terms of the numbers of both peptide-spectrum matches and unique peptides, especially when the acquisition isolation window is broad.
Collapse
|
16
|
Zolg DP, Gessulat S, Paschke C, Graber M, Rathke-Kuhnert M, Seefried F, Fitzemeier K, Berg F, Lopez-Ferrer D, Horn D, Henrich C, Huhmer A, Delanghe B, Frejno M. INFERYS rescoring: Boosting peptide identifications and scoring confidence of database search results. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2021:e9128. [PMID: 34015160 DOI: 10.1002/rcm.9128] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 04/14/2021] [Accepted: 05/17/2021] [Indexed: 06/12/2023]
Abstract
Database search engines for bottom-up proteomics largely ignore peptide fragment ion intensities during the automated scoring of tandem mass spectra against protein databases. Recent advances in deep learning allow the accurate prediction of peptide fragment ion intensities. Using these predictions to calculate additional intensity-based scores helps to overcome this drawback. Here, we describe a processing workflow termed INFERYS™ rescoring for the intensity-based rescoring of Sequest HT search engine results in Thermo Scientific™ Proteome Discoverer™ 2.5 software. The workflow is based on the deep learning platform INFERYS capable of predicting fragment ion intensities, which runs on personal computers without the need for graphics processing units. This workflow calculates intensity-based scores comparing peptide spectrum matches from Sequest HT and predicted spectra. Resulting scores are combined with classical search engine scores for input to the false discovery rate estimation tool Percolator. We demonstrate the merits of this approach by analyzing a classical HeLa standard sample and exemplify how this workflow leads to a better separation of target and decoy identifications, in turn resulting in increased peptide spectrum match, peptide and protein identification numbers. On an immunopeptidome dataset, this workflow leads to a 50% increase in identified peptides, emphasizing the advantage of intensity-based scores when analyzing low-intensity spectra or analytes with very similar physicochemical properties that require vast search spaces. Overall, the end-to-end integration of INFERYS rescoring enables simple and easy access to a powerful enhancement to classical database search engines, promising a deeper, more confident and more comprehensive analysis of proteomic data from any organism by unlocking the intensity dimension of tandem mass spectra for identification and more confident scoring.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Frank Berg
- Thermo Fisher Scientific (Bremen) GmbH, Bremen, Germany
| | | | - David Horn
- Thermo Fisher Scientific, San Jose, CA, USA
| | | | | | | | | |
Collapse
|
17
|
Wang L, Liu K, Li S, Tang H. A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing. Proteomics 2020; 20:e2000002. [PMID: 32415809 PMCID: PMC7669687 DOI: 10.1002/pmic.202000002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 04/17/2020] [Indexed: 01/07/2023]
Abstract
With the accumulation of MS/MS spectra collected in spectral libraries, the spectral library searching approach emerges as an important approach for peptide identification in proteomics, complementary to the commonly used protein database searching approach, in particular for the proteomic analyses of well-studied model organisms, such as human. Existing spectral library searching algorithms compare a query MS/MS spectrum with each spectrum in the library with matched precursor mass and charge state, which may become computationally intensive with the rapidly growing library size. Here, the software msSLASH, which implements a fast spectral library searching algorithm based on the Locality-Sensitive Hashing (LSH) technique, is presented. The algorithm first converts the library and query spectra into bit-strings using LSH functions, and then computes the similarity between the spectra with highly similar bit-string. Using the spectral library searching of large real-world MS/MS spectra datasets, it is demonstrated that the algorithm significantly reduced the number of spectral comparisons, and as a result, achieved 2-9X speedup in comparison with existing spectral library searching algorithm SpectraST. The spectral searching algorithm is implemented in C/C++, and is ready to be used in proteomic data analyses.
Collapse
Affiliation(s)
- Lei Wang
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| | - Kaiyuan Liu
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| | - Sujun Li
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| |
Collapse
|
18
|
Fernández-Costa C, Martínez-Bartolomé S, McClatchy DB, Saviola AJ, Yu NK, Yates JR. Impact of the Identification Strategy on the Reproducibility of the DDA and DIA Results. J Proteome Res 2020; 19:3153-3161. [PMID: 32510229 PMCID: PMC7898222 DOI: 10.1021/acs.jproteome.0c00153] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Data-independent acquisition (DIA) is a promising technique for the proteomic analysis of complex protein samples. A number of studies have claimed that DIA experiments are more reproducible than data-dependent acquisition (DDA), but these claims are unsubstantiated since different data analysis methods are used in the two methods. Data analysis in most DIA workflows depends on spectral library searches, whereas DDA typically employs sequence database searches. In this study, we examined the reproducibility of the DIA and DDA results using both sequence database and spectral library search. The comparison was first performed using a cell lysate and then extended to an interactome study. Protein overlap among the technical replicates in both DDA and DIA experiments was 30% higher with library-based identifications than with sequence database identifications. The reproducibility of quantification was also improved with library search compared to database search, with the mean of the coefficient of variation decreasing more than 30% and a reduction in the number of missing values of more than 35%. Our results show that regardless of the acquisition method, higher identification and quantification reproducibility is observed when library search was used.
Collapse
Affiliation(s)
- Carolina Fernández-Costa
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | | | - Daniel B. McClatchy
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Anthony J. Saviola
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nam-Kyung Yu
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | - John R. Yates
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
19
|
Bouwmeester R, Gabriels R, Van Den Bossche T, Martens L, Degroeve S. The Age of Data-Driven Proteomics: How Machine Learning Enables Novel Workflows. Proteomics 2020; 20:e1900351. [PMID: 32267083 DOI: 10.1002/pmic.201900351] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 03/21/2020] [Indexed: 12/30/2022]
Abstract
A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. Therefore, highly promising recent machine learning developments in proteomics are pointed out in this viewpoint, alongside some of the remaining challenges.
Collapse
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| |
Collapse
|
20
|
Shiferaw GA, Vandermarliere E, Hulstaert N, Gabriels R, Martens L, Volders PJ. COSS: A Fast and User-Friendly Tool for Spectral Library Searching. J Proteome Res 2020; 19:2786-2793. [PMID: 32384242 DOI: 10.1021/acs.jproteome.9b00743] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Spectral similarity searching to identify peptide-derived MS/MS spectra is a promising technique, and different spectrum similarity search tools have therefore been developed. Each of these tools, however, comes with some limitations, mainly because of low processing speed and issues with handling large databases. Furthermore, the number of spectral data formats supported is typically limited, which also creates a threshold to adoption. We have therefore developed COSS (CompOmics Spectral Searching), a new and user-friendly spectral library search tool supporting two scoring functions. COSS also includes decoy spectra generation for result validation. We have benchmarked COSS on three different spectral libraries and compared the results with established spectral searching tools and a sequence database search tool. Our comparison showed that COSS more reliably identifies spectra, is capable of handling large data sets and libraries, and is an easy to use tool that can run on low computer specifications. COSS binaries and source code can be freely downloaded from https://github.com/compomics/COSS.
Collapse
Affiliation(s)
- Genet Abay Shiferaw
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Elien Vandermarliere
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Niels Hulstaert
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Pieter-Jan Volders
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
21
|
Fernández-Costa C, Martínez-Bartolomé S, McClatchy D, Yates JR. Improving Proteomics Data Reproducibility with a Dual-Search Strategy. Anal Chem 2020; 92:1697-1701. [PMID: 31880919 DOI: 10.1021/acs.analchem.9b04955] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Mass spectrometry-based proteomics is an invaluable tool for addressing important biological questions. Data-dependent acquisition methods effectuate stochastic acquisition of data in complex mixtures, which results in missing identifications across replicates. We developed a search approach that improves the reproducibility of data acquired from any mass spectrometer. In our approach, a spectral library is built from the identification results from a database search, and then, the library is used to research the same data files to obtain the final result. We showed that higher identification and quantification reproducibility is achieved with the dual-search approach than with a typical database search. Four datasets with different complexity were compared: (1) data from a cell lysate study performed in our lab, (2) data from an interactome study performed in our lab, (3) a publicly available extracellular vesicles dataset, and (4) a publicly available phosphoproteomics dataset. Our results show that the dual-search approach can be widely and easily used to improve data quality in proteomics data.
Collapse
Affiliation(s)
- Carolina Fernández-Costa
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| | - Salvador Martínez-Bartolomé
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| | - Daniel McClatchy
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| | - John R Yates
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| |
Collapse
|
22
|
Lin YM, Chen CT, Chang JM. MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks. BMC Genomics 2019; 20:906. [PMID: 31874640 PMCID: PMC6929458 DOI: 10.1186/s12864-019-6297-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 11/15/2019] [Indexed: 01/22/2023] Open
Abstract
Background Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search. Results We propose MS2CNN, a non-linear regression model based on deep convolutional neural networks, a deep learning algorithm. The features for our model are amino acid composition, predicted secondary structure, and physical-chemical features such as isoelectric point, aromaticity, helicity, hydrophobicity, and basicity. MS2CNN was trained with five-fold cross validation on a three-way data split on the large-scale human HCD MS2 dataset of Orbitrap LC-MS/MS downloaded from the National Institute of Standards and Technology. It was then evaluated on a publicly available independent test dataset of human HeLa cell lysate from LC-MS experiments. On average, our model shows better cosine similarity and Pearson correlation coefficient (0.690 and 0.632) than MS2PIP (0.647 and 0.601) and is comparable with pDeep (0.692 and 0.642). Notably, for the more complex MS2 spectra of 3+ peptides, MS2PIP is significantly better than both MS2PIP and pDeep. Conclusions We showed that MS2CNN outperforms MS2PIP for 2+ and 3+ peptides and pDeep for 3+ peptides. This implies that MS2CNN, the proposed convolutional neural network model, generates highly accurate MS2 spectra for LC-MS/MS experiments using Orbitrap machines, which can be of great help in protein and peptide identifications. The results suggest that incorporating more data for deep learning model may improve performance.
Collapse
Affiliation(s)
- Yang-Ming Lin
- Department of Computer Science, National Chengchi University, 11605, Taipei City, Taiwan
| | - Ching-Tai Chen
- Institute of Information Science, Academia Sinica, 115, Taipei City, Taiwan
| | - Jia-Ming Chang
- Department of Computer Science, National Chengchi University, 11605, Taipei City, Taiwan.
| |
Collapse
|
23
|
den Ridder M, Daran-Lapujade P, Pabst M. Shot-gun proteomics: why thousands of unidentified signals matter. FEMS Yeast Res 2019; 20:5682490. [DOI: 10.1093/femsyr/foz088] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2019] [Accepted: 12/19/2019] [Indexed: 12/14/2022] Open
Abstract
ABSTRACT
Mass spectrometry-based proteomics has become a constitutional part of the multi-omics toolbox in yeast research, advancing fundamental knowledge of molecular processes and guiding decisions in strain and product developmental pipelines. Nevertheless, post-translational protein modifications (PTMs) continue to challenge the field of proteomics. PTMs are not directly encoded in the genome; therefore, they require a sensitive analysis of the proteome itself. In yeast, the relevance of post-translational regulators has already been established, such as for phosphorylation, which can directly affect the reaction rates of metabolic enzymes. Whereas, the selective analysis of single modifications has become a broadly employed technique, the sensitive analysis of a comprehensive set of modifications still remains a challenge. At the same time, a large number of fragmentation spectra in a typical shot-gun proteomics experiment remain unidentified. It has been estimated that a good proportion of those unidentified spectra originates from unexpected modifications or natural peptide variants. In this review, recent advancements in microbial proteomics for unrestricted protein modification discovery are reviewed, and recent research integrating this additional layer of information to elucidate protein interaction and regulation in yeast is briefly discussed.
Collapse
Affiliation(s)
- Maxime den Ridder
- Delft University of Technology, Department of Biotechnology, van der Maasweg 9, 2629 HZ Delft, The Netherlands
| | - Pascale Daran-Lapujade
- Delft University of Technology, Department of Biotechnology, van der Maasweg 9, 2629 HZ Delft, The Netherlands
| | - Martin Pabst
- Delft University of Technology, Department of Biotechnology, van der Maasweg 9, 2629 HZ Delft, The Netherlands
| |
Collapse
|
24
|
O’Bryon I, Tucker AE, Kaiser BLD, Wahl KL, Merkley ED. Constructing a Tandem Mass Spectral Library for Forensic Ricin Identification. J Proteome Res 2019; 18:3926-3935. [DOI: 10.1021/acs.jproteome.9b00377] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Isabelle O’Bryon
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Abigail E. Tucker
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Brooke L. D. Kaiser
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Karen L. Wahl
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Eric D. Merkley
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| |
Collapse
|
25
|
Burke MC, Zhang Z, Mirokhin YA, Tchekovskoi DV, Liang Y, Stein SE. False Discovery Rate Estimation for Hybrid Mass Spectral Library Search Identifications in Bottom-up Proteomics. J Proteome Res 2019; 18:3223-3234. [PMID: 31364354 PMCID: PMC11566722 DOI: 10.1021/acs.jproteome.8b00863] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
We present a method for FDR estimation of mass spectral library search identifications made by a recently developed method for peptide identification, the hybrid search, based on an extension of the target-decoy approach. In addition to estimating confidence for a given identification, this allows users to compare and integrate identifications from the hybrid mass spectral library search method with other peptide identification methods, such as a sequence database-based method. In addition to a score, each hybrid score is associated with a "DeltaMass" value, which is the difference in mass of the search and library peptide, which can correspond to the mass of a modification. We explored the relation between FDR and DeltaMass using 100 concatenated random decoy libraries and discovered that a small number of DeltaMass values were especially likely to result from decoy searches. Using these values, FDR values could be adjusted for these specific values and a reliable FDR generated for any DeltaMass value. Finally, using this method, we find and examine common, reliable identifications made by the hybrid search for a range of proteomic studies.
Collapse
Affiliation(s)
- Meghan C. Burke
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Zheng Zhang
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Yuri A. Mirokhin
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Dmitrii V. Tchekovskoi
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Yuxue Liang
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Stephen E. Stein
- Mass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
26
|
Applications and challenges of forensic proteomics. Forensic Sci Int 2019; 297:350-363. [DOI: 10.1016/j.forsciint.2019.01.022] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 01/09/2019] [Accepted: 01/13/2019] [Indexed: 12/23/2022]
|
27
|
Deutsch EW, Perez-Riverol Y, Chalkley RJ, Wilhelm M, Tate S, Sachsenberg T, Walzer M, Käll L, Delanghe B, Böcker S, Schymanski EL, Wilmes P, Dorfer V, Kuster B, Volders PJ, Jehmlich N, Vissers JP, Wolan DW, Wang AY, Mendoza L, Shofstahl J, Dowsey AW, Griss J, Salek RM, Neumann S, Binz PA, Lam H, Vizcaíno JA, Bandeira N, Röst H. Expanding the Use of Spectral Libraries in Proteomics. J Proteome Res 2018; 17:4051-4060. [PMID: 30270626 PMCID: PMC6443480 DOI: 10.1021/acs.jproteome.8b00485] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Robert J. Chalkley
- University of California San Francisco, San Francisco, 94158, California, United States
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
| | | | - Timo Sachsenberg
- Department of Computer Science, Center for Bioinformatics, University of Tübingen, Sand 14, Tübingen, 72076, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH − Royal Institute of Technology, Stockholm 114 28, Sweden
| | - Bernard Delanghe
- Thermo Fisher Scientific Bremen, Hanna-Kunath Str. 11, 28199 Bremen, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Viktoria Dorfer
- University of Applied Sciences Upper Austria, Bioinformatics Research Group, Hagenberg, 4232, Austria
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
- Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich, Freising, 85354, Germany
| | | | - Nico Jehmlich
- Helmholtz-Centre for Environmental Research - UFZ, Leipzig, Germany
| | | | - Dennis W. Wolan
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Ana Y. Wang
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Jim Shofstahl
- Thermo Fisher Scientific, 355 River Oaks Parkway San Jose, CA 95134
| | - Andrew W. Dowsey
- Department of Population Health Sciences and Bristol Veterinary School, Faculty of Health Sciences, University of Bristol, Bristol BS9 1BN, UK
| | - Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Währinger Gürtel 18-20, Vienna 1090, Austria
| | - Reza M. Salek
- The International Agency for Research on Cancer (IARC), 150 Cours Albert Thomas, 69372 Lyon CEDEX 08, France
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Department of Stress and Developmental Biology, 06120 Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Pierre-Alain Binz
- Clinical Chemistry Service, Centre Hospitalier Universitaire Vaudois, 1011 Lausanne, Switzerland
| | - Henry Lam
- Department of Chemical and Biological Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 92093-0404, USA
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, 160 College St., Toronto, ON, M5S 3E1, Canada
| |
Collapse
|
28
|
Muth T, Hartkopf F, Vaudel M, Renard BY. A Potential Golden Age to Come-Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics. Proteomics 2018; 18:e1700150. [PMID: 29968278 DOI: 10.1002/pmic.201700150] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/23/2018] [Indexed: 01/15/2023]
Abstract
In shotgun proteomics, peptide and protein identification is most commonly conducted using database search engines, the method of choice when reference protein sequences are available. Despite its widespread use the database-driven approach is limited, mainly because of its static search space. In contrast, de novo sequencing derives peptide sequence information in an unbiased manner, using only the fragment ion information from the tandem mass spectra. In recent years, with the improvements in MS instrumentation, various new methods have been proposed for de novo sequencing. This review article provides an overview of existing de novo sequencing algorithms and software tools ranging from peptide sequencing to sequence-to-protein mapping. Various use cases are described for which de novo sequencing was successfully applied. Finally, limitations of current methods are highlighted and new directions are discussed for a wider acceptance of de novo sequencing in the community.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020, Bergen, Norway.,Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, 5020, Bergen, Norway
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| |
Collapse
|
29
|
Zhang Z, Burke M, Mirokhin YA, Tchekhovskoi DV, Markey SP, Yu W, Chaerkady R, Hess S, Stein SE. Reverse and Random Decoy Methods for False Discovery Rate Estimation in High Mass Accuracy Peptide Spectral Library Searches. J Proteome Res 2018; 17:846-857. [DOI: 10.1021/acs.jproteome.7b00614] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Zheng Zhang
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Meghan Burke
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Yuri A. Mirokhin
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Dmitrii V. Tchekhovskoi
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Sanford P. Markey
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Wen Yu
- Research
Bioinformatics, MedImmune LLC, One MedImmune Way, Gaithersburg, Maryland 20878, United States
| | - Raghothama Chaerkady
- Antibody
Discovery and Protein Engineering, Protein Sciences, MedImmune LLC, One MedImmune Way, Gaithersburg, Maryland 20878, United States
| | - Sonja Hess
- Antibody
Discovery and Protein Engineering, Protein Sciences, MedImmune LLC, One MedImmune Way, Gaithersburg, Maryland 20878, United States
| | - Stephen E. Stein
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
30
|
Shao W, Lam H. Tandem mass spectral libraries of peptides and their roles in proteomics research. MASS SPECTROMETRY REVIEWS 2017; 36:634-648. [PMID: 27403644 DOI: 10.1002/mas.21512] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 05/21/2016] [Indexed: 05/15/2023]
Abstract
Proteomics is a rapidly maturing field aimed at the high-throughput identification and quantification of all proteins in a biological system. The cornerstone of proteomic technology is tandem mass spectrometry of peptides resulting from the digestion of protein mixtures. The fragmentation pattern of each peptide ion is captured in its tandem mass spectrum, which enables its identification and acts as a fingerprint for the peptide. Spectral libraries are simply searchable collections of these fingerprints, which have taken on an increasingly prominent role in proteomic data analysis. This review describes the historical development of spectral libraries in proteomics, details the computational procedures behind library building and searching, surveys the current applications of spectral libraries, and discusses the outstanding challenges. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:634-648, 2017.
Collapse
Affiliation(s)
- Wenguang Shao
- Department of Biology, Institute of Molecular Systems Biology, Eidgenössische Technische Hochschule (ETH) Zurich, Zurich, Switzerland
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
- Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| |
Collapse
|
31
|
Hu H, Khatri K, Zaia J. Algorithms and design strategies towards automated glycoproteomics analysis. MASS SPECTROMETRY REVIEWS 2017; 36:475-498. [PMID: 26728195 PMCID: PMC4931994 DOI: 10.1002/mas.21487] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 11/30/2015] [Indexed: 05/09/2023]
Abstract
Glycoproteomics involves the study of glycosylation events on protein sequences ranging from purified proteins to whole proteome scales. Understanding these complex post-translational modification (PTM) events requires elucidation of the glycan moieties (monosaccharide sequences and glycosidic linkages between residues), protein sequences, as well as site-specific attachment of glycan moieties onto protein sequences, in a spatial and temporal manner in a variety of biological contexts. Compared with proteomics, bioinformatics for glycoproteomics is immature and many researchers still rely on tedious manual interpretation of glycoproteomics data. As sample preparation protocols and analysis techniques have matured, the number of publications on glycoproteomics and bioinformatics has increased substantially; however, the lack of consensus on tool development and code reuse limits the dissemination of bioinformatics tools because it requires significant effort to migrate a computational tool tailored for one method design to alternative methods. This review discusses algorithms and methods in glycoproteomics, and refers to the general proteomics field for potential solutions. It also introduces general strategies for tool integration and pipeline construction in order to better serve the glycoproteomics community. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:475-498, 2017.
Collapse
Affiliation(s)
- Han Hu
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
- Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA
| | - Kshitij Khatri
- Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA
| | - Joseph Zaia
- Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA
| |
Collapse
|
32
|
Burke MC, Mirokhin YA, Tchekhovskoi DV, Markey SP, Heidbrink Thompson J, Larkin C, Stein SE. The Hybrid Search: A Mass Spectral Library Search Method for Discovery of Modifications in Proteomics. J Proteome Res 2017; 16:1924-1935. [DOI: 10.1021/acs.jproteome.6b00988] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Meghan C. Burke
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Yuri A. Mirokhin
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Dmitrii V. Tchekhovskoi
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Sanford P. Markey
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Jenny Heidbrink Thompson
- Analytical
Sciences, MedImmune LLC, One MedImmune Way, Gaithersburg, Maryland 20878, United States
| | - Christopher Larkin
- Analytical
Sciences, MedImmune LLC, One MedImmune Way, Gaithersburg, Maryland 20878, United States
| | - Stephen E. Stein
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
33
|
Zhang Z, Yang X, Mirokhin YA, Tchekhovskoi DV, Ji W, Markey SP, Roth J, Neta P, Hizal DB, Bowen MA, Stein SE. Interconversion of Peptide Mass Spectral Libraries Derivatized with iTRAQ or TMT Labels. J Proteome Res 2016; 15:3180-7. [DOI: 10.1021/acs.jproteome.6b00406] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Zheng Zhang
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Xiaoyu Yang
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Yuri A. Mirokhin
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Dmitrii V. Tchekhovskoi
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Weihua Ji
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Sanford P. Markey
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Jeri Roth
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Pedatsur Neta
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Deniz Baycin Hizal
- Antibody
Discovery and Protein Engineering Department, MedImmune LLC, One MedImmune
Way, Gaithersburg, Maryland 20878, United States
| | - Michael A. Bowen
- Antibody
Discovery and Protein Engineering Department, MedImmune LLC, One MedImmune
Way, Gaithersburg, Maryland 20878, United States
| | - Stephen E. Stein
- Mass
Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
34
|
Pai PJ, Hu Y, Lam H. Direct glycan structure determination of intact N-linked glycopeptides by low-energy collision-induced dissociation tandem mass spectrometry and predicted spectral library searching. Anal Chim Acta 2016; 934:152-62. [DOI: 10.1016/j.aca.2016.05.049] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Revised: 05/24/2016] [Accepted: 05/30/2016] [Indexed: 11/24/2022]
|
35
|
Griss J. Spectral library searching in proteomics. Proteomics 2016; 16:729-40. [PMID: 26616598 DOI: 10.1002/pmic.201500296] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 10/15/2015] [Accepted: 10/29/2015] [Indexed: 12/12/2022]
Abstract
Spectral library searching has become a mature method to identify tandem mass spectra in proteomics data analysis. This review provides a comprehensive overview of available spectral library search engines and highlights their distinct features. Additionally, resources providing spectral libraries are summarized and tools presented that extend experimental spectral libraries by simulating spectra. Finally, spectrum clustering algorithms are discussed that utilize the same spectrum-to-spectrum matching algorithms as spectral library search engines and allow novel methods to analyse proteomics data.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
36
|
Shanmugam AK, Nesvizhskii AI. Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics. J Proteome Res 2015; 14:5169-78. [PMID: 26569054 DOI: 10.1021/acs.jproteome.5b00504] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
In shotgun proteomics, peptides are typically identified using database searching, which involves scoring acquired tandem mass spectra against peptides derived from standard protein sequence databases such as Uniprot, Refseq, or Ensembl. In this strategy, the sensitivity of peptide identification is known to be affected by the size of the search space. Therefore, creating a targeted sequence database containing only peptides likely to be present in the analyzed sample can be a useful technique for improving the sensitivity of peptide identification. In this study, we describe how targeted peptide databases can be created based on the frequency of identification in the global proteome machine database (GPMDB), the largest publicly available repository of peptide and protein identification data. We demonstrate that targeted peptide databases can be easily integrated into existing proteome analysis workflows and describe a computational strategy for minimizing any loss of peptide identifications arising from potential search space incompleteness in the targeted search spaces. We demonstrate the performance of our workflow using several data sets of varying size and sample complexity.
Collapse
Affiliation(s)
- Avinash K Shanmugam
- Department of Computational Medicine and Bioinformatics and ‡Department of Pathology, University of Michigan , Ann Arbor, Michigan 48109, United States
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics and ‡Department of Pathology, University of Michigan , Ann Arbor, Michigan 48109, United States
| |
Collapse
|
37
|
Cho JY, Lee HJ, Jeong SK, Kim KY, Kwon KH, Yoo JS, Omenn GS, Baker MS, Hancock WS, Paik YK. Combination of Multiple Spectral Libraries Improves the Current Search Methods Used to Identify Missing Proteins in the Chromosome-Centric Human Proteome Project. J Proteome Res 2015; 14:4959-66. [DOI: 10.1021/acs.jproteome.5b00578] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jin-Young Cho
- Yonsei
Proteome Research Center, Department of Integrated OMICS for Biomedical
Science and Department of Biochemistry, College of Life Science and
Biotechnology, Yonsei University, 50 Yonsei-Ro, Seodaemoon-gu, Seoul 120-749, Korea
| | - Hyoung-Joo Lee
- Yonsei
Proteome Research Center, Department of Integrated OMICS for Biomedical
Science and Department of Biochemistry, College of Life Science and
Biotechnology, Yonsei University, 50 Yonsei-Ro, Seodaemoon-gu, Seoul 120-749, Korea
| | - Seul-Ki Jeong
- Yonsei
Proteome Research Center, Department of Integrated OMICS for Biomedical
Science and Department of Biochemistry, College of Life Science and
Biotechnology, Yonsei University, 50 Yonsei-Ro, Seodaemoon-gu, Seoul 120-749, Korea
| | - Kwang-Youl Kim
- Yonsei
Proteome Research Center, Department of Integrated OMICS for Biomedical
Science and Department of Biochemistry, College of Life Science and
Biotechnology, Yonsei University, 50 Yonsei-Ro, Seodaemoon-gu, Seoul 120-749, Korea
| | | | | | - Gilbert S. Omenn
- Center
for Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor 48109, Michigan United States
| | - Mark S. Baker
- Department
of Biomedical Science, Faculty of Medicine and Health Science, Macquarie University, New South Wales 2109, Australia
| | | | - Young-Ki Paik
- Yonsei
Proteome Research Center, Department of Integrated OMICS for Biomedical
Science and Department of Biochemistry, College of Life Science and
Biotechnology, Yonsei University, 50 Yonsei-Ro, Seodaemoon-gu, Seoul 120-749, Korea
| |
Collapse
|
38
|
The Pacific Northwest National Laboratory library of bacterial and archaeal proteomic biodiversity. Sci Data 2015; 2:150041. [PMID: 26306205 PMCID: PMC4540001 DOI: 10.1038/sdata.2015.41] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 07/22/2015] [Indexed: 01/09/2023] Open
Abstract
This Data Descriptor announces the submission to public repositories of the PNNL Biodiversity Library, a large collection of global proteomics data for 112 bacterial and archaeal organisms. The data comprises 35,162 tandem mass spectrometry (MS/MS) datasets from ~10 years of research. All data has been searched, annotated and organized in a consistent manner to promote reuse by the community. Protein identifications were cross-referenced with KEGG functional annotations which allows for pathway oriented investigation. We present the data as a freely available community resource. A variety of data re-use options are described for computational modelling, proteomics assay design and bioengineering. Instrument data and analysis files are available at ProteomeXchange via the MassIVE partner repository under the identifiers PXD001860 and MSV000079053.
Collapse
|
39
|
Wang J, Bourne PE, Bandeira N. MixGF: spectral probabilities for mixture spectra from more than one peptide. Mol Cell Proteomics 2014; 13:3688-97. [PMID: 25225354 DOI: 10.1074/mcp.o113.037218] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30-390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra.
Collapse
Affiliation(s)
- Jian Wang
- From the ‡Bioinformatics Program, University of California, San Diego, La Jolla, California
| | - Philip E Bourne
- §Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California
| | - Nuno Bandeira
- §Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California; ¶Center for Computational Mass Spectrometry, University of California, San Diego, La, Jolla, California; ‖Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92092
| |
Collapse
|
40
|
Titz B, Elamin A, Martin F, Schneider T, Dijon S, Ivanov NV, Hoeng J, Peitsch MC. Proteomics for systems toxicology. Comput Struct Biotechnol J 2014; 11:73-90. [PMID: 25379146 PMCID: PMC4212285 DOI: 10.1016/j.csbj.2014.08.004] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Current toxicology studies frequently lack measurements at molecular resolution to enable a more mechanism-based and predictive toxicological assessment. Recently, a systems toxicology assessment framework has been proposed, which combines conventional toxicological assessment strategies with system-wide measurement methods and computational analysis approaches from the field of systems biology. Proteomic measurements are an integral component of this integrative strategy because protein alterations closely mirror biological effects, such as biological stress responses or global tissue alterations. Here, we provide an overview of the technical foundations and highlight select applications of proteomics for systems toxicology studies. With a focus on mass spectrometry-based proteomics, we summarize the experimental methods for quantitative proteomics and describe the computational approaches used to derive biological/mechanistic insights from these datasets. To illustrate how proteomics has been successfully employed to address mechanistic questions in toxicology, we summarized several case studies. Overall, we provide the technical and conceptual foundation for the integration of proteomic measurements in a more comprehensive systems toxicology assessment framework. We conclude that, owing to the critical importance of protein-level measurements and recent technological advances, proteomics will be an integral part of integrative systems toxicology approaches in the future.
Collapse
|
41
|
Vu TN, Bittremieux W, Valkenborg D, Goethals B, Lemière F, Laukens K. Efficient reduction of candidate matches in peptide spectrum library searching using the top k most intense peaks. J Proteome Res 2014; 13:4175-83. [PMID: 25004400 DOI: 10.1021/pr401269z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Spectral library searching is a popular approach for MS/MS-based peptide identification. Because the size of spectral libraries continues to grow, the performance of searching algorithms is an important issue. This technical note introduces a strategy based on a minimum shared peak count between two spectra to reduce the set of admissible candidate spectra when issuing a query. A theoretical validation through time complexity analysis and an experimental validation based on an implementation of the candidate reduction strategy show that the approach can achieve a reduction of the set of candidate spectra by (at least) an order of magnitude, resulting in a significant improvement in the speed of the search. Meanwhile, more than 99% of the positive search results is retained. This efficient strategy to drastically improve the speed of spectral library searching with a negligible loss of sensitivity can be applied to any current spectral library search tool, irrespective of the employed similarity metric.
Collapse
Affiliation(s)
- Trung Nghia Vu
- Department of Mathematics and Computer Science, University of Antwerp , B-2020 Antwerp, Belgium
| | | | | | | | | | | |
Collapse
|
42
|
Rudnick PA. Refining spectral library searching. Proteomics 2014; 13:3247-50. [PMID: 24123856 DOI: 10.1002/pmic.201300426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 09/30/2013] [Accepted: 10/02/2013] [Indexed: 11/09/2022]
Abstract
Spectral library searching has many advantages over sequence database searching, yet it has not been widely adopted. One possible reason for this is that users are unsure exactly how to interpret the similarity scores (e.g., "dot products" are not probability-based scores). Methods to create decoys have been proposed, but, as developers caution, may produce proxies that are not equivalent to reversed sequences. In this issue, Shao et al. (Proteomics 2013, 13, 3273-3283) report advances in spectral library searching where the focus is not on improving the performance of their search engine, SpectraST, but is instead on improving the statistical meaningfulness of its discriminant score and removing the need for decoys. The results in their paper indicate that by "standardizing" the input and library spectra, sensitivity is not lost but is, surprisingly, gained. Their tests also show that false discovery rate (FDR) estimates, derived from their new score, track better with "ground truth" than decoy searching. It is possible that their work strikes a good balance between the theory of library searching and its application. And as such, they hope to have removed a major entrance barrier for some researchers previously unwilling to try library searching.
Collapse
Affiliation(s)
- Paul A Rudnick
- Spectragen Informatics, Rockville, MD, USA; Mass Spectrometry Data Center, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
43
|
Ma CWM, Lam H. Hunting for unexpected post-translational modifications by spectral library searching with tier-wise scoring. J Proteome Res 2014; 13:2262-71. [PMID: 24661115 DOI: 10.1021/pr401006g] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Discovering novel post-translational modifications (PTMs) to proteins and detecting specific modification sites on proteins is one of the last frontiers of proteomics. At present, hunting for post-translational modifications remains challenging in widely practiced shotgun proteomics workflows due to the typically low abundance of modified peptides and the greatly inflated search space as more potential mass shifts are considered by the search engines. Moreover, most popular search methods require that the user specifies the modification(s) for which to search; therefore, unexpected and novel PTMs will not be detected. Here a new algorithm is proposed to apply spectral library searching to the problem of open modification searches, namely, hunting for PTMs without prior knowledge of what PTMs are in the sample. The proposed tier-wise scoring method intelligently looks for unexpected PTMs by allowing mass-shifted peak matches but only when the number of matches found is deemed statistically significant. This allows the search engine to search for unexpected modifications while maintaining its ability to identify unmodified peptides effectively at the same time. The utility of the method is demonstrated using three different data sets, in which the numbers of spectrum identifications to both unmodified and modified peptides were substantially increased relative to a regular spectral library search as well as to another open modification spectral search method, pMatch.
Collapse
Affiliation(s)
- Chun Wai Manson Ma
- Division of Biomedical Engineering and ‡Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology , Clear Water Bay, Hong Kong, China
| | | |
Collapse
|
44
|
Hu Y, Lam H. Expanding Tandem Mass Spectral Libraries of Phosphorylated Peptides: Advances and Applications. J Proteome Res 2013; 12:5971-7. [DOI: 10.1021/pr4007443] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Yingwei Hu
- Department of Chemical and Biomolecular Engineering and ‡Division of Biomedical
Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Henry Lam
- Department of Chemical and Biomolecular Engineering and ‡Division of Biomedical
Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| |
Collapse
|
45
|
Shao W, Zhu K, Lam H. Refining similarity scoring to enable decoy-free validation in spectral library searching. Proteomics 2013; 13:3273-83. [DOI: 10.1002/pmic.201300232] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Revised: 08/06/2013] [Accepted: 09/10/2013] [Indexed: 12/30/2022]
Affiliation(s)
- Wenguang Shao
- Division of Biomedical Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong China
| | - Kan Zhu
- Department of Chemical and Biomolecular Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong China
| | - Henry Lam
- Division of Biomedical Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong China
- Department of Chemical and Biomolecular Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong China
| |
Collapse
|
46
|
Nagore LI, Nadeau RJ, Guo Q, Jadhav YLA, Jarrett HW, Haskins WE. Purification and characterization of transcription factors. MASS SPECTROMETRY REVIEWS 2013; 32:386-398. [PMID: 23832591 PMCID: PMC3758410 DOI: 10.1002/mas.21369] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2012] [Revised: 11/19/2012] [Accepted: 11/19/2012] [Indexed: 06/02/2023]
Abstract
Transcription factors (TFs) are essential for the expression of all proteins, including those involved in human health and disease. However, TFs are resistant to proteomic characterization because they are frequently masked by more abundant proteins due to the limited dynamic range of capillary liquid chromatography-tandem mass spectrometry and protein database searching. Purification methods, particularly strategies that exploit the high affinity of TFs for DNA response elements (REs) on gene promoters, can enrich TFs prior to proteomic analysis to improve dynamic range and penetrance of the TF proteome. For example, trapping of TF complexes specific for particular REs has been achieved by recovering the element DNA-protein complex on solid supports. Additional methods for improving dynamic range include two- and three-dimensional gel electrophoresis incorporating electrophoretic mobility shift assays and Southwestern blotting for detection. Here we review methods for TF purification and characterization. We fully expect that future investigations will apply these and other methods to illuminate this important but challenging proteome.
Collapse
Affiliation(s)
- LI Nagore
- Department of Chemistry, University of Texas at San Antonio, San Antonio, TX, 78249
| | - RJ Nadeau
- Department of Chemistry, University of Texas at San Antonio, San Antonio, TX, 78249
- Protein Biomarkers Cores, University of Texas at San Antonio, San Antonio, TX, 78249
- Center for Interdisciplinary Health Research, University of Texas at San Antonio, San Antonio, TX, 78249
- Center for Research & Training in the Sciences, University of Texas at San Antonio, San Antonio, TX, 78249
| | - Q Guo
- Department of Chemistry, University of Texas at San Antonio, San Antonio, TX, 78249
- Protein Biomarkers Cores, University of Texas at San Antonio, San Antonio, TX, 78249
- Center for Interdisciplinary Health Research, University of Texas at San Antonio, San Antonio, TX, 78249
- Center for Research & Training in the Sciences, University of Texas at San Antonio, San Antonio, TX, 78249
| | - YLA Jadhav
- Pediatric Biochemistry Laboratory, University of Texas at San Antonio, San Antonio, TX, 78249
- RCMI Proteomics, University of Texas at San Antonio, San Antonio, TX, 78249
- Protein Biomarkers Cores, University of Texas at San Antonio, San Antonio, TX, 78249
- Center for Interdisciplinary Health Research, University of Texas at San Antonio, San Antonio, TX, 78249
- Center for Research & Training in the Sciences, University of Texas at San Antonio, San Antonio, TX, 78249
| | - HW Jarrett
- Department of Chemistry, University of Texas at San Antonio, San Antonio, TX, 78249
- Protein Biomarkers Cores, University of Texas at San Antonio, San Antonio, TX, 78249
- Center for Interdisciplinary Health Research, University of Texas at San Antonio, San Antonio, TX, 78249
| | - WE Haskins
- Pediatric Biochemistry Laboratory, University of Texas at San Antonio, San Antonio, TX, 78249
- Department of Chemistry, University of Texas at San Antonio, San Antonio, TX, 78249
- Departments of Biology, University of Texas at San Antonio, San Antonio, TX, 78249
- RCMI Proteomics, University of Texas at San Antonio, San Antonio, TX, 78249
- Protein Biomarkers Cores, University of Texas at San Antonio, San Antonio, TX, 78249
- Center for Interdisciplinary Health Research, University of Texas at San Antonio, San Antonio, TX, 78249
- Center for Research & Training in the Sciences, University of Texas at San Antonio, San Antonio, TX, 78249
- Departments of Medicine, Division of Hematology & Medical Oncology, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229
- Cancer Therapy & Research Center, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229
| |
Collapse
|
47
|
Shao W, Lam H. Denoising Peptide Tandem Mass Spectra for Spectral Libraries: A Bayesian Approach. J Proteome Res 2013; 12:3223-32. [DOI: 10.1021/pr400080b] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Affiliation(s)
- Wenguang Shao
- Division
of Biomedical Engineering, and ‡Department of Chemical and Biomolecular Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Henry Lam
- Division
of Biomedical Engineering, and ‡Department of Chemical and Biomolecular Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| |
Collapse
|
48
|
Dong NP, Liang YZ, Yi LZ, Lu HM. Investigation of scrambled ions in tandem mass spectra, part 2. On the influence of the ions on peptide identification. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2013; 24:857-867. [PMID: 23504644 DOI: 10.1007/s13361-013-0591-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 01/19/2013] [Accepted: 01/20/2013] [Indexed: 06/01/2023]
Abstract
A comprehensive investigation was performed to understand the influence of sequence scrambling in peptide ions on peptide identification results. To achieve this, four tandem mass spectrometry datasets with scrambled ions included and with them excluded were analyzed by Crux, X!Tandem, SpectraST, Lutefisk, and PepNovo. While the different algorithms differed in their performance, an increase in the number of correctly identified peptides was generally observed when removing scrambled ions, with the exception of the SpectraST algorithm. However, the variation of the match scores upon removal was unpredictable. Following these investigations, an interpretation was given on how the scrambled ions affect peptide identification. Lastly, a simulated theoretical mass spectral library derived from the NIST peptide Libraries was constructed and searched by SpectraST to study whether scrambled ions in predicted mass spectra could affect peptide identification. Consistent with the peptide library search results, no significant variations for dot product scores as well as peptide identification results were observed when these ions were included in the theoretical MS/MS spectra. From the five adopted algorithms, the SpectraST and Crux provided the most robust results, whereas X!Tandem, PepNovo, and Lutefisk were sensitive to the existence of the scrambled ions, especially the latter two de novo sequencing algorithms.
Collapse
Affiliation(s)
- Nai-ping Dong
- College of Chemistry and Chemical Engineering, Central South University, Changsha, People's Republic of China
| | | | | | | |
Collapse
|
49
|
Cheng CY, Tsai CF, Chen YJ, Sung TY, Hsu WL. Spectrum-based Method to Generate Good Decoy Libraries for Spectral Library Searching in Peptide Identifications. J Proteome Res 2013; 12:2305-10. [DOI: 10.1021/pr301039b] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Chia-Ying Cheng
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Chia-Feng Tsai
- Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan
- Department of Chemistry, National Taiwan University, Taipei 106, Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan
- Department of Chemistry, National Taiwan University, Taipei 106, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
50
|
Ji C, Arnold RJ, Sokoloski KJ, Hardy RW, Tang H, Radivojac P. Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra. Proteomics 2013; 13:756-65. [PMID: 23303707 DOI: 10.1002/pmic.201100670] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Revised: 10/19/2012] [Accepted: 11/11/2012] [Indexed: 01/10/2023]
Abstract
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well-studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor-based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K-nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20-60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.
Collapse
Affiliation(s)
- Chao Ji
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | | | | | | | | | | |
Collapse
|