1
|
Zhu S, Yang C, Wu W. MSPoisDM: A Novel Peptide Identification Algorithm Optimized for Tandem Mass Spectra. BIO WEB OF CONFERENCES 2022. [DOI: 10.1051/bioconf/20225501003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Tandem mass spectrometry (MS/MS) plays an extremely important role in proteomics research. Thousands of spectra can be generated in modern experiments, how to interpret the LC-MS/MS is a challenging problem in tandem mass spectra analysis. Our peptide identification algorithm, MSPoisDM, is integrated the intensity information which produced by target-decoy statistics, although intensity information often undervalued. Furthermore, in order to combine the intensity information for better, we propose a novel concept scoring model which based on Poisson distribution. Compared with commonly used commercial software Mascot and Sequest at 1% FDR, the results show MSPoisDM is robust and versatile for various datasets which obtained from different instruments. We expect our algorithm MSPoisDM will be broadly applied in the proteomics studies.
Collapse
|
2
|
Liang X, Xia Z, Jian L, Wang Y, Niu X, Link AJ. A cost-sensitive online learning method for peptide identification. BMC Genomics 2020; 21:324. [PMID: 32334531 PMCID: PMC7183122 DOI: 10.1186/s12864-020-6693-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Accepted: 03/24/2020] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Post-database search is a key procedure in peptide identification with tandem mass spectrometry (MS/MS) strategies for refining peptide-spectrum matches (PSMs) generated by database search engines. Although many statistical and machine learning-based methods have been developed to improve the accuracy of peptide identification, the challenge remains on large-scale datasets and datasets with a distribution of unbalanced PSMs. A more efficient learning strategy is required for improving the accuracy of peptide identification on challenging datasets. While complex learning models have larger power of classification, they may cause overfitting problems and introduce computational complexity on large-scale datasets. Kernel methods map data from the sample space to high dimensional spaces where data relationships can be simplified for modeling. RESULTS In order to tackle the computational challenge of using the kernel-based learning model for practical peptide identification problems, we present an online learning algorithm, OLCS-Ranker, which iteratively feeds only one training sample into the learning model at each round, and, as a result, the memory requirement for computation is significantly reduced. Meanwhile, we propose a cost-sensitive learning model for OLCS-Ranker by using a larger loss of decoy PSMs than that of target PSMs in the loss function. CONCLUSIONS The new model can reduce its false discovery rate on datasets with a distribution of unbalanced PSMs. Experimental studies show that OLCS-Ranker outperforms other methods in terms of accuracy and stability, especially on datasets with a distribution of unbalanced PSMs. Furthermore, OLCS-Ranker is 15-85 times faster than CRanker.
Collapse
Affiliation(s)
- Xijun Liang
- College of Science, China University of Petroleum, Changjiang West Road, Qingdao, 266580 China
| | - Zhonghang Xia
- School of Engineering and Applied Science, Western Kentucky University, Bowling Green, 42101 KY USA
| | - Ling Jian
- School of Economics and Management, China University of Petroleum, Changjiang West Road, Qingdao, 266580 China
| | - Yongxiang Wang
- College of Science, China University of Petroleum, Changjiang West Road, Qingdao, 266580 China
| | - Xinnan Niu
- Dept. of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, 37232 TN USA
| | - Andrew J. Link
- Dept. of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, 37232 TN USA
| |
Collapse
|
3
|
Link AJ, Niu X, Weaver CM, Jennings JL, Duncan DT, McAfee KJ, Sammons M, Gerbasi VR, Farley AR, Fleischer TC, Browne CM, Samir P, Galassie A, Boone B. Targeted Identification of Protein Interactions in Eukaryotic mRNA Translation. Proteomics 2020; 20:e1900177. [PMID: 32027465 DOI: 10.1002/pmic.201900177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 12/13/2019] [Indexed: 11/09/2022]
Abstract
To identify protein-protein interactions and phosphorylated amino acid sites in eukaryotic mRNA translation, replicate TAP-MudPIT and control experiments are performed targeting Saccharomyces cerevisiae genes previously implicated in eukaryotic mRNA translation by their genetic and/or functional roles in translation initiation, elongation, termination, or interactions with ribosomal complexes. Replicate tandem affinity purifications of each targeted yeast TAP-tagged mRNA translation protein coupled with multidimensional liquid chromatography and tandem mass spectrometry analysis are used to identify and quantify copurifying proteins. To improve sensitivity and minimize spurious, nonspecific interactions, a novel cross-validation approach is employed to identify the most statistically significant protein-protein interactions. Using experimental and computational strategies discussed herein, the previously described protein composition of the canonical eukaryotic mRNA translation initiation, elongation, and termination complexes is calculated. In addition, statistically significant unpublished protein interactions and phosphorylation sites for S. cerevisiae's mRNA translation proteins and complexes are identified.
Collapse
Affiliation(s)
- Andrew J Link
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA.,Department of Biochemistry, Vanderbilt University, Nashville, TN, 37232, USA.,Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, USA
| | - Xinnan Niu
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Connie M Weaver
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Jennifer L Jennings
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Dexter T Duncan
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - K Jill McAfee
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Morgan Sammons
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, 37232, USA
| | - Vince R Gerbasi
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | - Adam R Farley
- Department of Biochemistry, Vanderbilt University, Nashville, TN, 37232, USA
| | - Tracey C Fleischer
- Department of Pathology, Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| | | | - Parimal Samir
- Department of Biochemistry, Vanderbilt University, Nashville, TN, 37232, USA
| | - Allison Galassie
- Department of Chemistry, Vanderbilt University, Nashville, TN, 37232, USA
| | - Braden Boone
- Department of Bioinformatics, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
| |
Collapse
|
4
|
Jian L, Xia Z, Niu X, Liang X, Samir P, Link AJ. l2 Multiple Kernel Fuzzy SVM-Based Data Fusion for Improving Peptide Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:804-809. [PMID: 26394437 DOI: 10.1109/tcbb.2015.2480084] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
SEQUEST is a database-searching engine, which calculates the correlation score between observed spectrum and theoretical spectrum deduced from protein sequences stored in a flat text file, even though it is not a relational and object-oriental repository. Nevertheless, the SEQUEST score functions fail to discriminate between true and false PSMs accurately. Some approaches, such as PeptideProphet and Percolator, have been proposed to address the task of distinguishing true and false PSMs. However, most of these methods employ time-consuming learning algorithms to validate peptide assignments [1] . In this paper, we propose a fast algorithm for validating peptide identification by incorporating heterogeneous information from SEQUEST scores and peptide digested knowledge. To automate the peptide identification process and incorporate additional information, we employ l2 multiple kernel learning (MKL) to implement the current peptide identification task. Results on experimental datasets indicate that compared with state-of-the-art methods, i.e., PeptideProphet and Percolator, our data fusing strategy has comparable performance but reduces the running time significantly.
Collapse
|
5
|
Gahoual R, Beck A, François YN, Leize-Wagner E. Independent highly sensitive characterization of asparagine deamidation and aspartic acid isomerization by sheathless CZE-ESI-MS/MS. JOURNAL OF MASS SPECTROMETRY : JMS 2016; 51:150-158. [PMID: 26889931 DOI: 10.1002/jms.3735] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 11/02/2015] [Accepted: 11/15/2015] [Indexed: 06/05/2023]
Abstract
Amino acids residues are commonly submitted to various physicochemical modifications occurring at physiological pH and temperature. Post-translational modifications (PTMs) require comprehensive characterization because of their major influence on protein structure and involvement in numerous in vivo process or signaling. Mass spectrometry (MS) has gradually become an analytical tool of choice to characterize PTMs; however, some modifications are still challenging because of sample faint modification levels or difficulty to separate an intact peptide from modified counterparts before their transfer to the ionization source. Here, we report the implementation of capillary zone electrophoresis coupled to electrospray ionization tandem mass spectrometry (CZE-ESI-MS/MS) by the intermediate of a sheathless interfacing for independent and highly sensitive characterization of asparagine deamidation (deaN) and aspartic acid isomerization (isoD). CZE selectivity regarding deaN and isoD was studied extensively using different sets of synthetic peptides based on actual tryptic peptides. Results demonstrated CZE ability to separate the unmodified peptide from modified homologous exhibiting deaN, isoD or both independently with a resolution systematically superior to 1.29. Developed CZE-ESI-MS/MS method was applied for the characterization of monoclonal antibodies and complex protein mixture. Conserved CZE selectivity could be demonstrated even for complex samples, and foremost results obtained showed that CZE selectivity is similar regardless of the composition of the peptide. Separation of modified peptides prior to the MS analysis allowed to characterize and estimate modification levels of the sample independently for deaN and isoD even for peptides affected by both modifications and, as a consequence, enables to distinguish the formation of l-aspartic acid or d-aspartic acid generated from deaN. Separation based on peptide modification allowed, as supported by the ESI efficiency provided by CZE-ESI-MS/MS properties, and enabled to characterize and estimate studied PTMs with an unprecedented sensitivity and proved the relevance of implementing an electrophoretic driven separation for MS-based peptide analysis.
Collapse
Affiliation(s)
- Rabah Gahoual
- Laboratoire de Spectrométrie de Masse des Interactions et des Systèmes (LSMIS), UMR 7140 (UdS-CNRS), Université de Strasbourg, Strasbourg, France
- Division of BioAnalytical Chemistry, AIMMS Research Group BioMolecular Analysis, VU University Amsterdam, De Boelelaan 1083, 1081 HV, Amsterdam, The Netherlands
| | - Alain Beck
- Centre d'Immunologie Pierre Fabre, Saint-Julien-en-Genevois, France
| | - Yannis-Nicolas François
- Laboratoire de Spectrométrie de Masse des Interactions et des Systèmes (LSMIS), UMR 7140 (UdS-CNRS), Université de Strasbourg, Strasbourg, France
| | - Emmanuelle Leize-Wagner
- Division of BioAnalytical Chemistry, AIMMS Research Group BioMolecular Analysis, VU University Amsterdam, De Boelelaan 1083, 1081 HV, Amsterdam, The Netherlands
| |
Collapse
|
6
|
Mayne J, Ning Z, Zhang X, Starr AE, Chen R, Deeke S, Chiang CK, Xu B, Wen M, Cheng K, Seebun D, Star A, Moore JI, Figeys D. Bottom-Up Proteomics (2013-2015): Keeping up in the Era of Systems Biology. Anal Chem 2015; 88:95-121. [PMID: 26558748 DOI: 10.1021/acs.analchem.5b04230] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Janice Mayne
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Zhibin Ning
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Xu Zhang
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Amanda E Starr
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Rui Chen
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Shelley Deeke
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Cheng-Kang Chiang
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Bo Xu
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Ming Wen
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Kai Cheng
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Deeptee Seebun
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Alexandra Star
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Jasmine I Moore
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| | - Daniel Figeys
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa , 451 Smyth Rd., Ottawa, Ontario, Canada , K1H8M5
| |
Collapse
|
7
|
Liang X, Xia Z, Jian L, Niu X, Link A. An adaptive classification model for peptide identification. BMC Genomics 2015; 16 Suppl 11:S1. [PMID: 26578406 PMCID: PMC4652454 DOI: 10.1186/1471-2164-16-s11-s1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Background Peptide sequence assignment is the central task in protein identification with MS/MS-based strategies. Although a number of post-database search algorithms for filtering target peptide spectrum matches (PSMs) have been developed, the discrepancy among the output PSMs is usually significant, remaining a few disputable PSMs. Current studies show that a number of target PSMs which are close to decoy PSMs can hardly be separated from those decoys by only using the discrimination function. Results In this paper, we assign each target PSM a weight showing its possibility of being correct. We employ a SVM-based learning model to search the optimal weight for each target PSM and develop a new score system, CRanker, to rank all target PSMs. Due to the large PSM datasets generated in routine database searches, we use the Cholesky factorization technique for storing a kernel matrix to reduce the memory requirement. Conclusions Compared with PeptideProphet and Percolator, CRanker has identified more PSMs under similar false discover rates over different datasets. CRanker has shown consistent performance on different test sets, validated the reasonability the proposed model.
Collapse
|