1
|
Yi X, Wen B, Ji S, Saltzman AB, Jaehnig EJ, Lei JT, Gao Q, Zhang B. Deep Learning Prediction Boosts Phosphoproteomics-Based Discoveries Through Improved Phosphopeptide Identification. Mol Cell Proteomics 2024; 23:100707. [PMID: 38154692 PMCID: PMC10831110 DOI: 10.1016/j.mcpro.2023.100707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 11/06/2023] [Accepted: 12/23/2023] [Indexed: 12/30/2023] Open
Abstract
Shotgun phosphoproteomics enables high-throughput analysis of phosphopeptides in biological samples. One of the primary challenges associated with this technology is the relatively low rate of phosphopeptide identification during data analysis. This limitation hampers the full realization of the potential offered by shotgun phosphoproteomics. Here we present DeepRescore2, a computational workflow that leverages deep learning-based retention time and fragment ion intensity predictions to improve phosphopeptide identification and phosphosite localization. Using a state-of-the-art computational workflow as a benchmark, DeepRescore2 increases the number of correctly identified peptide-spectrum matches by 17% in a synthetic dataset and identifies 19% to 46% more phosphopeptides in biological datasets. In a liver cancer dataset, 30% of the significantly altered phosphosites between tumor and normal tissues and 60% of the prognosis-associated phosphosites identified from DeepRescore2-processed data could not be identified based on the state-of-the-art workflow. Notably, DeepRescore2-processed data uniquely identifies EGFR hyperactivation as a new target in poor-prognosis liver cancer, which is validated experimentally. Integration of deep learning prediction in DeepRescore2 improves phosphopeptide identification and facilitates biological discoveries.
Collapse
Affiliation(s)
- Xinpei Yi
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Bo Wen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Shuyi Ji
- Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital and Key Laboratory of Carcinogenesis and Cancer Invasion of the Ministry of China, Fudan University, Shanghai, China
| | - Alexander B Saltzman
- Mass Spectrometry Proteomics Core, Advanced Technology Cores, Baylor College of Medicine, Houston, Texas, USA
| | - Eric J Jaehnig
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Jonathan T Lei
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Qiang Gao
- Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital and Key Laboratory of Carcinogenesis and Cancer Invasion of the Ministry of China, Fudan University, Shanghai, China
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
2
|
Yi X, Wen B, Ji S, Saltzman A, Jaehnig EJ, Lei JT, Gao Q, Zhang B. Deep learning prediction boosts phosphoproteomics-based discoveries through improved phosphopeptide identification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.11.523329. [PMID: 36711982 PMCID: PMC9882090 DOI: 10.1101/2023.01.11.523329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Shotgun phosphoproteomics enables high-throughput analysis of phosphopeptides in biological samples, but low phosphopeptide identification rate in data analysis limits the potential of this technology. Here we present DeepRescore2, a computational workflow that leverages deep learning-based retention time and fragment ion intensity predictions to improve phosphopeptide identification and phosphosite localization. Using a state-of-the-art computational workflow as a benchmark, DeepRescore2 increases the number of correctly identified peptide-spectrum matches by 17% in a synthetic dataset and identifies 19%-46% more phosphopeptides in biological datasets. In a liver cancer dataset, 30% of the significantly altered phosphosites between tumor and normal tissues and 60% of the prognosis-associated phosphosites identified from DeepRescore2-processed data could not be identified based on the state-of-the-art workflow. Notably, DeepRescore2-processed data uniquely identifies EGFR hyperactivation as a new target in poor-prognosis liver cancer, which is validated experimentally. Integration of deep learning prediction in DeepRescore2 improves phosphopeptide identification and facilitates biological discoveries.
Collapse
|
3
|
Zeng X, Lan Y, Xiao J, Hu L, Tan L, Liang M, Wang X, Lu S, Peng T, Long F. Advances in phosphoproteomics and its application to COPD. Expert Rev Proteomics 2022; 19:311-324. [PMID: 36730079 DOI: 10.1080/14789450.2023.2176756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
INTRODUCTION Chronic obstructive pulmonary disease (COPD) was the third leading cause of global death in 2019, causing a huge economic burden to society. Therefore, it is urgent to identify specific phenotypes of COPD patients through early detection, and to promptly treat exacerbations. The field of phosphoproteomics has been a massive advancement, compelled by the developments in mass spectrometry, enrichment strategies, algorithms, and tools. Modern mass spectrometry-based phosphoproteomics allows understanding of disease pathobiology, biomarker discovery, and predicting new therapeutic modalities. AREAS COVERED In this article, we present an overview of phosphoproteomic research and strategies for enrichment and fractionation of phosphopeptides, identification of phosphorylation sites, chromatographic separation and mass spectrometry detection strategies, and the potential application of phosphorylated proteomic analysis in the diagnosis, treatment, and prognosis of COPD disease. EXPERT OPINION The role of phosphoproteomics in COPD is critical for understanding disease pathobiology, identifying potential biomarkers, and predicting new therapeutic approaches. However, the complexity of COPD requires the more comprehensive understanding that can be achieved through integrated multi-omics studies. Phosphoproteomics, as a part of these multi-omics approaches, can provide valuable insights into the underlying mechanisms of COPD.
Collapse
Affiliation(s)
- Xiaoyin Zeng
- Sino-French Hoffmann Institute, School of Basic Medical Science, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China
| | - Yanting Lan
- Sino-French Hoffmann Institute, School of Basic Medical Science, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China
| | - Jing Xiao
- Sino-French Hoffmann Institute, School of Basic Medical Science, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China
| | - Longbo Hu
- Sino-French Hoffmann Institute, School of Basic Medical Science, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China
| | - Long Tan
- Sino-French Hoffmann Institute, School of Basic Medical Science, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China
| | - Mengdi Liang
- Sino-French Hoffmann Institute, School of Basic Medical Science, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China
| | - Xufei Wang
- Sino-French Hoffmann Institute, School of Basic Medical Science, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China
| | - Shaohua Lu
- Sino-French Hoffmann Institute, School of Basic Medical Science, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China
| | - Tao Peng
- Sino-French Hoffmann Institute, School of Basic Medical Science, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China.,Guangdong South China Vaccine Co. Ltd, Guangzhou, China
| | - Fei Long
- Sino-French Hoffmann Institute, School of Basic Medical Science, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
4
|
An Z, Zhai L, Ying W, Qian X, Gong F, Tan M, Fu Y. PTMiner: Localization and Quality Control of Protein Modifications Detected in an Open Search and Its Application to Comprehensive Post-translational Modification Characterization in Human Proteome. Mol Cell Proteomics 2019; 18:391-405. [PMID: 30420486 PMCID: PMC6356076 DOI: 10.1074/mcp.ra118.000812] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 11/02/2018] [Indexed: 12/27/2022] Open
Abstract
The open (mass tolerant) search of tandem mass spectra of peptides shows great potential in the comprehensive detection of post-translational modifications (PTMs) in shotgun proteomics. However, this search strategy has not been widely used by the community, and one bottleneck of it is the lack of appropriate algorithms for automated and reliable post-processing of the coarse and error-prone search results. Here we present PTMiner, a software tool for confident filtering and localization of modifications (mass shifts) detected in an open search. After mass-shift-grouped false discovery rate (FDR) control of peptide-spectrum matches (PSMs), PTMiner uses an empirical Bayesian method to localize modifications through iterative learning of the prior probabilities of each type of modification occurring on different amino acids. The performance of PTMiner was evaluated on three data sets, including simulated data, chemically synthesized peptide library data and modified-peptide spiked-in proteome data. The results showed that PTMiner can effectively control the PSM FDR and accurately localize the modification sites. At 1% real false localization rate (FLR), PTMiner localized 93%, 84 and 83% of the modification sites in the three data sets, respectively, far higher than two open search engines we used and an extended version of the Ascore localization algorithm. We then used PTMiner to analyze a draft map of human proteome containing 25 million spectra from 30 tissues, and confidently identified over 1.7 million modified PSMs at 1% FDR and 1% FLR, which provided a system-wide view of both known and unknown PTMs in the human proteome.
Collapse
Affiliation(s)
- Zhiwu An
- National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Linhui Zhai
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Wantao Ying
- State key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, Beijing 102206, China, Beijing Institute of Lifeomics, Beijing 100850, China
| | - Xiaohong Qian
- State key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, Beijing 102206, China, Beijing Institute of Lifeomics, Beijing 100850, China
| | - Fuzhou Gong
- National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Minjia Tan
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China;.
| | - Yan Fu
- National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
5
|
Klement E, Medzihradszky KF. Extracellular Protein Phosphorylation, the Neglected Side of the Modification. Mol Cell Proteomics 2016; 16:1-7. [PMID: 27834735 DOI: 10.1074/mcp.o116.064188] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Revised: 11/10/2016] [Indexed: 12/18/2022] Open
Abstract
The very existence of extracellular phosphorylation has been questioned for a long time, although casein phosphorylation was discovered a century ago. In addition, several modification sites localized on secreted proteins or on extracellular or lumenal domains of transmembrane proteins have been catalogued in large scale phosphorylation analyses, though in most such studies this aspect of cellular localization was not considered. Our review presents examples when additional analyses were performed on already public data sets that revealed a wealth of information about this "neglected side" of the modification. We also sum up accumulated knowledge about extracellular phosphorylation, including the discovery of Golgi-residing kinases and the special difficulties encountered in targeted analyses. We hope future phosphorylation studies will not ignore the existence of phosphorylation outside of the cell, and further discoveries will shed more light on its biological role.
Collapse
Affiliation(s)
- Eva Klement
- From the ‡Laboratory of Proteomics Research, Institute of Biochemistry, Biological Research Centre of the Hungarian Academy of Sciences, Szeged, Hungary, and
| | - Katalin F Medzihradszky
- From the ‡Laboratory of Proteomics Research, Institute of Biochemistry, Biological Research Centre of the Hungarian Academy of Sciences, Szeged, Hungary, and .,the §Department of Pharmaceutical Chemistry, School of Pharmacy, University of California San Francisco, San Francisco, California
| |
Collapse
|
6
|
Awan MG, Saeed F. MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing. Bioinformatics 2016; 32:1518-26. [DOI: 10.1093/bioinformatics/btw023] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/12/2016] [Indexed: 12/16/2022] Open
|
7
|
Affiliation(s)
- Nicholas M. Riley
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Joshua J. Coon
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
8
|
Lee DCH, Jones AR, Hubbard SJ. Computational phosphoproteomics: from identification to localization. Proteomics 2015; 15:950-63. [PMID: 25475148 PMCID: PMC4384807 DOI: 10.1002/pmic.201400372] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 10/31/2014] [Accepted: 11/26/2014] [Indexed: 01/08/2023]
Abstract
Analysis of the phosphoproteome by MS has become a key technology for the characterization of dynamic regulatory processes in the cell, since kinase and phosphatase action underlie many major biological functions. However, the addition of a phosphate group to a suitable side chain often confounds informatic analysis by generating product ion spectra that are more difficult to interpret (and consequently identify) relative to unmodified peptides. Collectively, these challenges have motivated bioinformaticians to create novel software tools and pipelines to assist in the identification of phosphopeptides in proteomic mixtures, and help pinpoint or "localize" the most likely site of modification in cases where there is ambiguity. Here we review the challenges to be met and the informatics solutions available to address them for phosphoproteomic analysis, as well as highlighting the difficulties associated with using them and the implications for data standards.
Collapse
Affiliation(s)
- Dave C H Lee
- Faculty of Life Sciences, University of ManchesterManchester, UK
| | - Andrew R Jones
- Institute of Integrative Biology, University of LiverpoolLiverpool, UK
| | - Simon J Hubbard
- Faculty of Life Sciences, University of ManchesterManchester, UK
| |
Collapse
|