1
|
Chen YE, Ge X, Woyshner K, McDermott M, Manousopoulou A, Ficarro SB, Marto JA, Li K, Wang LD, Li JJ. APIR: Aggregating Universal Proteomics Database Search Algorithms for Peptide Identification with FDR Control. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae042. [PMID: 39198030 DOI: 10.1093/gpbjnl/qzae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 02/26/2024] [Accepted: 03/11/2024] [Indexed: 09/01/2024]
Abstract
Advances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide-spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs. However, no existing approaches can aggregate all user-specified database search algorithms with a guaranteed increase in the number of identified peptides and a control on the false discovery rate (FDR). To fill in this gap, we proposed a statistical framework, Aggregation of Peptide Identification Results (APIR), that is universally compatible with all database search algorithms. Notably, under an FDR threshold, APIR is guaranteed to identify at least as many, if not more, peptides as individual database search algorithms do. Evaluation of APIR on a complex proteomics standard dataset showed that APIR outpowers individual database search algorithms and empirically controls the FDR. Real data studies showed that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms. The APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis, e.g., differential gene expression analysis on RNA sequencing data. The APIR R package is available at https://github.com/yiling0210/APIR.
Collapse
Affiliation(s)
- Yiling Elaine Chen
- Department of Statistics and Data Science, University of California, Los Angeles, CA 90095, USA
| | - Xinzhou Ge
- Department of Statistics and Data Science, University of California, Los Angeles, CA 90095, USA
| | - Kyla Woyshner
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - MeiLu McDermott
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, Duarte, CA 91010, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Antigoni Manousopoulou
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Scott B Ficarro
- Department of Cancer Biology and Blais Proteomics Center, Dana-Farber Cancer Institute, Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02215, USA
| | - Jarrod A Marto
- Department of Cancer Biology and Blais Proteomics Center, Dana-Farber Cancer Institute, Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02215, USA
| | - Kexin Li
- Department of Statistics and Data Science, University of California, Los Angeles, CA 90095, USA
| | - Leo David Wang
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, Duarte, CA 91010, USA
- Department of Pediatrics, City of Hope National Medical Center, Duarte, CA 91010, USA
| | - Jingyi Jessica Li
- Department of Statistics and Data Science, University of California, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California, Los Angeles, CA 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, CA 90095, USA
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
2
|
Bugyi F, Szabó D, Szabó G, Révész Á, Pape VFS, Soltész-Katona E, Tóth E, Kovács O, Langó T, Vékey K, Drahos L. Influence of Post-Translational Modifications on Protein Identification in Database Searches. ACS OMEGA 2021; 6:7469-7477. [PMID: 33778259 PMCID: PMC7992065 DOI: 10.1021/acsomega.0c05997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/02/2021] [Indexed: 06/12/2023]
Abstract
Comprehensive analysis of post-translation modifications (PTMs) is an important mission of proteomics. However, the consideration of PTMs increases the search space and may therefore impair the efficiency of protein identification. Using thousands of proteomic searches, we investigated the practical aspects of considering multiple PTMs in Byonic searches for the maximization of protein and peptide hits. The inclusion of all PTMs, which occur with at least 2% frequency in the sample, has an advantageous effect on protein and peptide identification. A linear relationship was established between the number of considered PTMs and the number of reliably identified peptides and proteins. Even though they handle multiple modifications less efficiently, the results of MASCOT (using the Percolator function) and Andromeda (the search engine included in MaxQuant) became comparable to those of Byonic, in the case of a few PTMs.
Collapse
Affiliation(s)
- Fanni Bugyi
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
- Hevesy
György PhD School of Chemistry, Eötvös
Loránd University, Pázmány Péter sétány 1/A, H-1117 Budapest, Hungary
| | - Dániel Szabó
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
- Hevesy
György PhD School of Chemistry, Eötvös
Loránd University, Pázmány Péter sétány 1/A, H-1117 Budapest, Hungary
| | - Győző Szabó
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
- Faculty
of Informatics, Eötvös Loránd
University, Pázmány
Péter sétány 1/C, H-1117 Budapest, Hungary
| | - Ágnes Révész
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
| | - Veronika F. S. Pape
- Department
of Physiology, Faculty of Medicine, Semmelweis
University, Tűzoltó utca 37-47, H-1094 Budapest, Hungary
| | - Eszter Soltész-Katona
- Department
of Physiology, Faculty of Medicine, Semmelweis
University, Tűzoltó utca 37-47, H-1094 Budapest, Hungary
- ELKH
Supported Research Groups, Gellérthegy u. 30-32, H-1016 Budapest, Hungary
| | - Eszter Tóth
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
- Institute
of Enzymology, Research Centre for Natural
Sciences, Magyar Tudósok krt 2., H-1117 Budapest, Hungary
| | - Orsolya Kovács
- Department
of Physiology, Faculty of Medicine, Semmelweis
University, Tűzoltó utca 37-47, H-1094 Budapest, Hungary
- Department
of Genetics, Cell- and Immunobiology, Semmelweis
University, Nagyvárad tér 4, H-1089 Budapest, Hungary
| | - Tamás Langó
- Institute
of Enzymology, Research Centre for Natural
Sciences, Magyar Tudósok krt 2., H-1117 Budapest, Hungary
| | - Károly Vékey
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
| | - László Drahos
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
| |
Collapse
|
3
|
Rolfs Z, Solntsev SK, Shortreed MR, Frey BL, Smith LM. Global Identification of Post-Translationally Spliced Peptides with Neo-Fusion. J Proteome Res 2018; 18:349-358. [PMID: 30346791 DOI: 10.1021/acs.jproteome.8b00651] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Post-translationally spliced peptides have recently garnered significant interest as potential targets for cancer immunotherapy and as contributors to autoimmune diseases such as type 1 diabetes, yet feasible identification methods for spliced peptides have yet to be developed. Here we present Neo-Fusion, a search program for discovering spliced peptides in tandem mass spectrometry data. Neo-Fusion utilizes two separated ion database searches to identify the two halves of each spliced peptide, and then it infers the full spliced sequence. This strategy allows for the identification of spliced peptides without peptide length constraints, providing a broadly applicable tool suitable for identification of spliced peptides in a variety of systems, such as the HLA-I and HLA-II immunopeptidomes and in vitro digested protein samples obtained from organelles, cells, or tissues of interest. Using simulated spliced peptides to benchmark Neo-Fusion, 25% of all simulated spliced peptides were identified at a measured false-discovery rate of 5% for HLA-I. Neo-Fusion provides the research community with a powerful new tool to aid in the study of the prevalence and biological significance of post-translationally spliced peptides.
Collapse
Affiliation(s)
- Zach Rolfs
- Department of Chemistry , University of Wisconsin-Madison , Madison , Wisconsin 53706 , United States
| | - Stefan K Solntsev
- Department of Chemistry , University of Wisconsin-Madison , Madison , Wisconsin 53706 , United States
| | - Michael R Shortreed
- Department of Chemistry , University of Wisconsin-Madison , Madison , Wisconsin 53706 , United States
| | - Brian L Frey
- Department of Chemistry , University of Wisconsin-Madison , Madison , Wisconsin 53706 , United States
| | - Lloyd M Smith
- Department of Chemistry , University of Wisconsin-Madison , Madison , Wisconsin 53706 , United States
| |
Collapse
|
4
|
Perez‐Riverol Y, Vizcaíno JA, Griss J. Future Prospects of Spectral Clustering Approaches in Proteomics. Proteomics 2018; 18:e1700454. [PMID: 29882266 PMCID: PMC6099476 DOI: 10.1002/pmic.201700454] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 05/23/2018] [Indexed: 12/14/2022]
Abstract
In this article, current and future applications of spectral clustering are discussed in the context of mass spectrometry-based proteomics approaches. First of all, the main algorithms and tools that can currently be used to perform spectral clustering are introduced. In addition, its main applications and their use in current computational proteomics workflows are explained, including the generation of spectral libraries and spectral archives. Finally, possible future directions for spectral clustering, including its potential use to achieve a deeper coverage of the proteome and the discovery of novel post-translational modifications and single amino acid variants.
Collapse
Affiliation(s)
- Yasset Perez‐Riverol
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)Wellcome Trust Genome CampusHinxtonCambridgeCB10 1SDUK
| | - Juan Antonio Vizcaíno
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)Wellcome Trust Genome CampusHinxtonCambridgeCB10 1SDUK
| | - Johannes Griss
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)Wellcome Trust Genome CampusHinxtonCambridgeCB10 1SDUK
- Division of ImmunologyAllergy and Infectious DiseasesDepartment of DermatologyMedical University of Vienna1090ViennaAustria
| |
Collapse
|
5
|
David M, Fertin G, Rogniaux H, Tessier D. SpecOMS: A Full Open Modification Search Method Performing All-to-All Spectra Comparisons within Minutes. J Proteome Res 2017; 16:3030-3038. [PMID: 28660767 DOI: 10.1021/acs.jproteome.7b00308] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The analysis of discovery proteomics experiments relies on algorithms that identify peptides from their tandem mass spectra. The almost exhaustive interpretation of these spectra remains an unresolved issue. At present, an important number of missing interpretations is probably due to peptides displaying post-translational modifications and variants that yield spectra that are particularly difficult to interpret. However, the emergence of a new generation of mass spectrometers that provide high fragment ion accuracy has paved the way for more efficient algorithms. We present a new software, SpecOMS, that can handle the computational complexity of pairwise comparisons of spectra in the context of large volumes. SpecOMS can compare a whole set of experimental spectra generated by a discovery proteomics experiment to a whole set of theoretical spectra deduced from a protein database in a few minutes on a standard workstation. SpecOMS can ingeniously exploit those capabilities to improve the peptide identification process, allowing strong competition between all possible peptides for spectrum interpretation. Remarkably, this software resolves the drawbacks (i.e., efficiency problems and decreased sensitivity) that usually accompany open modification searches. We highlight this promising approach using results obtained from the analysis of a public human data set downloaded from the PRIDE (PRoteomics IDEntification) database.
Collapse
Affiliation(s)
- Matthieu David
- LS2N UMR CNRS 6004, Université de Nantes , F-44300 Nantes, France.,INRA UR1268 Biopolymères Interactions Assemblages, F-44316 Nantes, France
| | - Guillaume Fertin
- LS2N UMR CNRS 6004, Université de Nantes , F-44300 Nantes, France
| | - Hélène Rogniaux
- INRA UR1268 Biopolymères Interactions Assemblages, F-44316 Nantes, France
| | - Dominique Tessier
- INRA UR1268 Biopolymères Interactions Assemblages, F-44316 Nantes, France
| |
Collapse
|