1
|
Seneviratne AJ, Peters S, Clarke D, Dausmann M, Hecker M, Tully B, Hains PG, Zhong Q. Improved identification and quantification of peptides in mass spectrometry data via chemical and random additive noise elimination (CRANE). Bioinformatics 2021; 37:4719-4726. [PMID: 34323970 PMCID: PMC8711017 DOI: 10.1093/bioinformatics/btab563] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 06/15/2021] [Accepted: 07/28/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation The output of electrospray ionization–liquid chromatography mass spectrometry (ESI-LC-MS) is influenced by multiple sources of noise and major contributors can be broadly categorized as baseline, random and chemical noise. Noise has a negative impact on the identification and quantification of peptides, which influences the reliability and reproducibility of MS-based proteomics data. Most attempts at denoising have been made on either spectra or chromatograms independently, thus, important 2D information is lost because the mass-to-charge ratio and retention time dimensions are not considered jointly. Results This article presents a novel technique for denoising raw ESI-LC-MS data via 2D undecimated wavelet transform, which is applied to proteomics data acquired by data-independent acquisition MS (DIA-MS). We demonstrate that denoising DIA-MS data results in the improvement of peptide identification and quantification in complex biological samples. Availability and implementation The software is available on Github (https://github.com/CMRI-ProCan/CRANE). The datasets were obtained from ProteomeXchange (Identifiers—PXD002952 and PXD008651). Preliminary data and intermediate files are available via ProteomeXchange (Identifiers—PXD020529 and PXD025103). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akila J Seneviratne
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
| | - Sean Peters
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
| | - David Clarke
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
| | - Michael Dausmann
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
| | - Michael Hecker
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
| | - Brett Tully
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
| | - Peter G Hains
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
| | - Qing Zhong
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
| |
Collapse
|
2
|
Chaerkady R, Zhou Y, Delmar JA, Weng SHS, Wang J, Awasthi S, Sims D, Bowen MA, Yu W, Cazares LH, Sims GP, Hess S. Characterization of Citrullination Sites in Neutrophils and Mast Cells Activated by Ionomycin via Integration of Mass Spectrometry and Machine Learning. J Proteome Res 2021; 20:3150-3164. [PMID: 34008986 DOI: 10.1021/acs.jproteome.1c00028] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Citrullination is an important post-translational modification implicated in many diseases including rheumatoid arthritis (RA), Alzheimer's disease, and cancer. Neutrophil and mast cells have different expression profiles for protein-arginine deiminases (PADs), and ionomycin-induced activation makes them an ideal cellular model to study proteins susceptible to citrullination. We performed high-resolution mass spectrometry and stringent data filtration to identify citrullination sites in neutrophil and mast cells treated with and without ionomycin. We identified a total of 833 validated citrullination sites on 395 proteins. Several of these citrullinated proteins are important components of pathways involved in innate immune responses. Using this benchmark primary sequence data set, we developed machine learning models to predict citrullination in neutrophil and mast cell proteins. We show that our models predict citrullination likelihood with 0.735 and 0.766 AUCs (area under the receiver operating characteristic curves), respectively, on independent validation sets. In summary, this study provides the largest number of validated citrullination sites in neutrophil and mast cell proteins. The use of our novel motif analysis approach to predict citrullination sites will facilitate the discovery of novel protein substrates of protein-arginine deiminases (PADs), which may be key to understanding immunopathologies of various diseases.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Michael A Bowen
- Antibody Discovery and Protein Engineering (ADPE), R&D AstraZeneca, Gaithersburg, Maryland 20878, United States
| | | | | | | | | |
Collapse
|
3
|
Abstract
Mass spectrometry (MS)-based proteomics is currently the most successful approach to measure and compare peptides and proteins in a large variety of biological samples. Modern mass spectrometers, equipped with high-resolution analyzers, provide large amounts of data output. This is the case of shotgun/bottom-up proteomics, which consists in the enzymatic digestion of protein into peptides that are then measured by MS-instruments through a data dependent acquisition (DDA) mode. Dedicated bioinformatic tools and platforms have been developed to face the increasing size and complexity of raw MS data that need to be processed and interpreted for large-scale protein identification and quantification. This chapter illustrates the most popular bioinformatics solution for the analysis of shotgun MS-proteomics data. A general description will be provided on the data preprocessing options and the different search engines available, including practical suggestions on how to optimize the parameters for peptide search, based on hands-on experience.
Collapse
Affiliation(s)
- Avinash Yadav
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Federica Marini
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Alessandro Cuomo
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Tiziana Bonaldi
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy.
| |
Collapse
|
4
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
5
|
Deng Y, Ren Z, Pan Q, Qi D, Wen B, Ren Y, Yang H, Wu L, Chen F, Liu S. pClean: An Algorithm To Preprocess High-Resolution Tandem Mass Spectra for Database Searching. J Proteome Res 2019; 18:3235-3244. [PMID: 31364357 DOI: 10.1021/acs.jproteome.9b00141] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Database searches of MS/MS spectra are the main approach to peptide/protein identification in proteomics. Since most database search engines only utilize a small portion of the original MS/MS signals for peptide detection, how to improve the quality of MS/MS signals is a primary concern for enhancement of the peptide/protein identification rate. A fundamental issue is that some noise MS signals, informative or uninformative, have to be filtered out prior to database searching. Herein, an integrative preprocessing algorithm was designed, termed pClean, which incorporates three modules to preprocess MS/MS spectra, such as the removal of isobaric-labeling related ions, the reduction in isotopic peaks, the deconvolution of ions with higher charges, and the clearance of uninformative MS/MS signals. In contrast to the currently available approaches to MS/MS data preprocessing, pClean enables treatment of MS/MS spectra with high mass accuracy and favors filtering for the labeling or nonlabeling of peptides. Data sets at various scales gained from mass spectrometers with high resolution were used to assess the quality of peptides identified after pClean treatment and to compare the pClean improvement with those of other software programs. On the basis of the analysis of peptides identified and the Mascot ion score, pClean was proven to be effective in the removal of mass spectral noise and the reduction of random matching. Compared with other software programs, pClean appeared to be beneficial in terms of preprocessing performances for the enhancement of confidence scores and the increase in peptides identified. pClean is available at https://github.com/AimeeD90/pClean_release .
Collapse
Affiliation(s)
- Yamei Deng
- CAS Key Laboratory of Genome Sciences and Information , Beijing Institute of Genomics, Chinese Academy of Sciences , Beijing 100101 , China.,University of the Chinese Academy of Sciences , Beijing 100049 , China.,BGI-Shenzhen , Shenzhen 518083 , China
| | - Zhe Ren
- BGI-Shenzhen , Shenzhen 518083 , China.,China National GeneBank, BGI-Shenzhen , Shenzhen 518120 , China
| | - Qingfei Pan
- CAS Key Laboratory of Genome Sciences and Information , Beijing Institute of Genomics, Chinese Academy of Sciences , Beijing 100101 , China.,University of the Chinese Academy of Sciences , Beijing 100049 , China.,BGI-Shenzhen , Shenzhen 518083 , China
| | - Da Qi
- BGI-Shenzhen , Shenzhen 518083 , China.,China National GeneBank, BGI-Shenzhen , Shenzhen 518120 , China
| | | | - Yan Ren
- BGI-Shenzhen , Shenzhen 518083 , China.,China National GeneBank, BGI-Shenzhen , Shenzhen 518120 , China
| | - Huanming Yang
- BGI-Shenzhen , Shenzhen 518083 , China.,China National GeneBank, BGI-Shenzhen , Shenzhen 518120 , China.,James D. Watson Institute of Genome Sciences , Hangzhou 310058 , China
| | - Lin Wu
- CAS Key Laboratory of Genome Sciences and Information , Beijing Institute of Genomics, Chinese Academy of Sciences , Beijing 100101 , China
| | - Fei Chen
- CAS Key Laboratory of Genome Sciences and Information , Beijing Institute of Genomics, Chinese Academy of Sciences , Beijing 100101 , China
| | - Siqi Liu
- CAS Key Laboratory of Genome Sciences and Information , Beijing Institute of Genomics, Chinese Academy of Sciences , Beijing 100101 , China.,BGI-Shenzhen , Shenzhen 518083 , China.,China National GeneBank, BGI-Shenzhen , Shenzhen 518120 , China
| |
Collapse
|
6
|
Tay AP, Liang A, Hamey JJ, Hart‐Smith G, Wilkins MR. MS2‐Deisotoper: A Tool for Deisotoping High‐Resolution MS/MS Spectra in Normal and Heavy Isotope‐Labelled Samples. Proteomics 2019; 19:e1800444. [DOI: 10.1002/pmic.201800444] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 07/05/2019] [Indexed: 01/09/2023]
Affiliation(s)
- Aidan P. Tay
- Systems Biology InitiativeSchool of Biotechnology and Biomolecular SciencesThe University of New South Wales Sydney New South Wales 2052 Australia
| | - Angelita Liang
- Systems Biology InitiativeSchool of Biotechnology and Biomolecular SciencesThe University of New South Wales Sydney New South Wales 2052 Australia
| | - Joshua J. Hamey
- Systems Biology InitiativeSchool of Biotechnology and Biomolecular SciencesThe University of New South Wales Sydney New South Wales 2052 Australia
| | - Gene Hart‐Smith
- Systems Biology InitiativeSchool of Biotechnology and Biomolecular SciencesThe University of New South Wales Sydney New South Wales 2052 Australia
| | - Marc R. Wilkins
- Systems Biology InitiativeSchool of Biotechnology and Biomolecular SciencesThe University of New South Wales Sydney New South Wales 2052 Australia
| |
Collapse
|
7
|
Awan MG, Saeed F. An Out-of-Core GPU based dimensionality reduction algorithm for Big Mass Spectrometry Data and its application in bottom-up Proteomics. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2017; 2017:550-555. [PMID: 28868521 DOI: 10.1145/3107411.3107466] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. Therefore, pre-processing of MS data to detect noisy and non-useful peaks are an active area of research. Most of the sequential noise reducing algorithms are impractical to use as a pre-processing step due to high time-complexity. In this paper, we present a GPU based dimensionality-reduction algorithm, called G-MSR, for MS2 spectra. Our proposed algorithm uses novel data structures which optimize the memory and computational operations inside GPU. These novel data structures include Binary Spectra and Quantized Indexed Spectra (QIS). The former helps in communicating essential information between CPU and GPU using minimum amount of data while latter enables us to store and process complex 3-D data structure into a 1-D array structure while maintaining the integrity of MS data. Our proposed algorithm also takes into account the limited memory of GPUs and switches between in-core and out-of-core modes based upon the size of input data. G-MSR achieves a peak speed-up of 386x over its sequential counterpart and is shown to process over a million spectra in just 32 seconds. The code for this algorithm is available as a GPL open-source at GitHub at the following link: https://github.com/pcdslab/G-MSR.
Collapse
Affiliation(s)
- Muaaz Gul Awan
- Department of Computer Science, Western Michigan University, 4601 Campus Drive, Kalamazoo, Michigan 49009,
| | - Fahad Saeed
- Department of Computer Science, Western Michigan University, 4601 Campus Drive, Kalamazoo, Michigan 49009,
| |
Collapse
|
8
|
Gorshkov V, Hotta SYK, Verano-Braga T, Kjeldsen F. Peptide de novo sequencing of mixture tandem mass spectra. Proteomics 2016; 16:2470-9. [PMID: 27329701 PMCID: PMC5297990 DOI: 10.1002/pmic.201500549] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2015] [Revised: 04/27/2016] [Accepted: 06/17/2016] [Indexed: 02/02/2023]
Abstract
The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co‐isolation and thus prone to false identifications. The deconvolution approach matched complementary b‐, y‐ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co‐isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20–35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications.
Collapse
Affiliation(s)
- Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark.
| | | | - Thiago Verano-Braga
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark.,Department of Physiology and Biophysics, Federal University of Minas Gerais Belo Horizonte - MG, Belo Horizonte, Brazil
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark
| |
Collapse
|
9
|
Chang C, Zhang J, Xu C, Zhao Y, Ma J, Chen T, He F, Xie H, Zhu Y. Quantitative and In-Depth Survey of the Isotopic Abundance Distribution Errors in Shotgun Proteomics. Anal Chem 2016; 88:6844-51. [DOI: 10.1021/acs.analchem.6b01409] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Cheng Chang
- State
Key Laboratory of Proteomics, Beijing Proteome Research Center, National
Engineering Research Center for Protein Drugs, National Center for
Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China
| | - Jiyang Zhang
- Department
of Automatic Control, College of Mechanical Engineering and Automation, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Changming Xu
- Department
of Automatic Control, College of Mechanical Engineering and Automation, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Yan Zhao
- State
Key Laboratory of Proteomics, Beijing Proteome Research Center, National
Engineering Research Center for Protein Drugs, National Center for
Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China
| | - Jie Ma
- State
Key Laboratory of Proteomics, Beijing Proteome Research Center, National
Engineering Research Center for Protein Drugs, National Center for
Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China
| | - Tao Chen
- State
Key Laboratory of Proteomics, Beijing Proteome Research Center, National
Engineering Research Center for Protein Drugs, National Center for
Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China
| | - Fuchu He
- State
Key Laboratory of Proteomics, Beijing Proteome Research Center, National
Engineering Research Center for Protein Drugs, National Center for
Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China
| | - Hongwei Xie
- Department
of Automatic Control, College of Mechanical Engineering and Automation, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Yunping Zhu
- State
Key Laboratory of Proteomics, Beijing Proteome Research Center, National
Engineering Research Center for Protein Drugs, National Center for
Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, Beijing, 102206, P.R. China
| |
Collapse
|
10
|
Awan MG, Saeed F. MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing. Bioinformatics 2016; 32:1518-26. [DOI: 10.1093/bioinformatics/btw023] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/12/2016] [Indexed: 12/16/2022] Open
|
11
|
Khemchyan LL, Khokhlova EA, Seitkalieva MM, Ananikov VP. Efficient Sustainable Tool for Monitoring Chemical Reactions and Structure Determination in Ionic Liquids by ESI-MS. ChemistryOpen 2013; 2:208-14. [PMID: 24551568 PMCID: PMC3892193 DOI: 10.1002/open.201300022] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Indexed: 01/28/2023] Open
Abstract
An easy and convenient procedure is described for monitoring chemical reactions and characterization of compounds dissolved in ionic liquids using the well-known tandem mass spectrometry (MS/MS) technique. Generation of wastes was avoided by utilizing an easy procedure for analysis of ionic liquid systems without preliminary isolation and purification. The described procedure also decreased the risk of plausible contamination and damage of the ESI-MS hardware and increased sensitivity and accuracy of the measurements. ESI-MS detection in MS/MS mode was shown to be efficient in ionic liquids systems for structural and mechanistic studies, which are rather difficult otherwise. The developed ESI-MS/MS approach was applied to study samples corresponding to peptide systems in ionic liquids and to platform chemical directed biomass conversion in ionic liquids.
Collapse
Affiliation(s)
- Levon L Khemchyan
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences Leninsky Prospect 47, Moscow 119991 (Russia) E-mail:
| | - Elena A Khokhlova
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences Leninsky Prospect 47, Moscow 119991 (Russia) E-mail:
| | - Marina M Seitkalieva
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences Leninsky Prospect 47, Moscow 119991 (Russia) E-mail:
| | - Valentine P Ananikov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences Leninsky Prospect 47, Moscow 119991 (Russia) E-mail:
| |
Collapse
|
12
|
Miladi M, Harper B, Solouki T. Evidence for sequence scrambling in collision-induced dissociation of y-type fragment ions. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2013; 24:1755-1766. [PMID: 23982935 DOI: 10.1007/s13361-013-0714-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2013] [Revised: 06/19/2013] [Accepted: 06/28/2013] [Indexed: 06/02/2023]
Abstract
Sequence scrambling from y-type fragment ions has not been previously reported. In a study designed to probe structural variations among b-type fragment ions, it was noted that y fragment ions might also yield sequence-scrambled ions. In this study, we examined the possibility and extent of sequence-scrambled fragment ion generation from collision-induced dissociation (CID) of y-type ions from four peptides (all containing basic residues near the C-terminus) including: AAAAHAA-NH2 (where "A" denotes carbon thirteen ((13)C1) isotope on the alanine carbonyl group), des-acetylated-α-melanocyte (SYSMEHFRWGKPV-NH2), angiotensin II antipeptide (EGVYVHPV), and glu-fibrinopeptide b (EGVNDNEEGFFSAR). We investigated fragmentation patterns of 32 y-type fragment ions, including y fragment ions with different charge states (+1 to +3) and sizes (3 to 12 amino acids). Sequence-scrambled fragment ions were observed from ~50 % (16 out of 32) of the studied y-type ions. However, observed sequence-scrambled ions had low relative intensities from ~0.1 % to a maximum of ~12 %. We present and discuss potential mechanisms for generation of sequence-scrambled fragment ions. To the best of our knowledge, results on y fragment dissociation presented here provide the first experimental evidence for generation of sequence-scrambled fragments from CID of y ions through intermediate cyclic "b-type" ions.
Collapse
Affiliation(s)
- Mahsan Miladi
- Department of Chemistry and Biochemistry, Baylor University, Waco, TX, 76798, USA
| | | | | |
Collapse
|
13
|
Shao W, Lam H. Denoising Peptide Tandem Mass Spectra for Spectral Libraries: A Bayesian Approach. J Proteome Res 2013; 12:3223-32. [DOI: 10.1021/pr400080b] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Affiliation(s)
- Wenguang Shao
- Division
of Biomedical Engineering, and ‡Department of Chemical and Biomolecular Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Henry Lam
- Division
of Biomedical Engineering, and ‡Department of Chemical and Biomolecular Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| |
Collapse
|
14
|
Reiz B, Kertész-Farkas A, Pongor S, Myers MP. Chemical rule-based filtering of MS/MS spectra. Bioinformatics 2013; 29:925-32. [DOI: 10.1093/bioinformatics/btt061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
15
|
Sidoli S, Cheng L, Jensen ON. Proteomics in chromatin biology and epigenetics: Elucidation of post-translational modifications of histone proteins by mass spectrometry. J Proteomics 2012; 75:3419-33. [PMID: 22234360 DOI: 10.1016/j.jprot.2011.12.029] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 12/18/2011] [Accepted: 12/20/2011] [Indexed: 12/11/2022]
Affiliation(s)
- Simone Sidoli
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark
| | | | | |
Collapse
|
16
|
Köcher T, Pichler P, Mazanek M, Swart R, Mechtler K. Altered Mascot search results by changing the m/z range of MS/MS spectra: analysis and potential applications. Anal Bioanal Chem 2010; 400:2339-47. [DOI: 10.1007/s00216-010-4572-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2010] [Revised: 11/24/2010] [Accepted: 11/29/2010] [Indexed: 11/30/2022]
|
17
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 358] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
18
|
Mujezinovic N, Schneider G, Wildpaner M, Mechtler K, Eisenhaber F. Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide MS/MS spectra and noise reduction. BMC Genomics 2010; 11 Suppl 1:S13. [PMID: 20158870 PMCID: PMC2822527 DOI: 10.1186/1471-2164-11-s1-s13] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background Tandem mass spectrometry (MS/MS) has become a standard method for identification of proteins extracted from biological samples but the huge number and the noise contamination of MS/MS spectra obstruct swift and reliable computer-aided interpretation. Typically, a minor fraction of the spectra per sample (most often, only a few %) and about 10% of the peaks per spectrum contribute to the final result if protein identification is not prevented by the noise at all. Results Two fast preprocessing screens can substantially reduce the haystack of MS/MS data. (1) Simple sequence ladder rules remove spectra non-interpretable in peptide sequences. (2) Modified Fourier-transform-based criteria clear background in the remaining data. In average, only a remainder of 35% of the MS/MS spectra (each reduced in size by about one quarter) has to be handed over to the interpretation software for reliable protein identification essentially without loss of information, with a trend to improved sequence coverage and with proportional decrease of computer resource consumption. Conclusions The search for sequence ladders in tandem MS/MS spectra with subsequent noise suppression is a promising strategy to reduce the number of MS/MS spectra from electro-spray instruments and to enhance the reliability of protein matches. Supplementary material and the software are available from an accompanying WWW-site with the URL http://mendel.bii.a-star.edu.sg/mass-spectrometry/MSCleaner-2.0/.
Collapse
Affiliation(s)
- Nedim Mujezinovic
- Sarajevo School of Science and Technology, Sarajevo, Bosnia-Herzegovina
| | | | | | | | | |
Collapse
|
19
|
Renard BY, Kirchner M, Monigatti F, Ivanov AR, Rappsilber J, Winter D, Steen JAJ, Hamprecht FA, Steen H. When less can yield more - Computational preprocessing of MS/MS spectra for peptide identification. Proteomics 2009; 9:4978-84. [PMID: 19743429 DOI: 10.1002/pmic.200900326] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The effectiveness of database search algorithms, such as Mascot, Sequest and ProteinPilot is limited by the quality of the input spectra: spurious peaks in MS/MS spectra can jeopardize the correct identification of peptides or reduce their score significantly. Consequently, an efficient preprocessing of MS/MS spectra can increase the sensitivity of peptide identification at reduced file sizes and run time without compromising its specificity. We investigate the performance of 25 MS/MS preprocessing methods on various data sets and make software for improved preprocessing of mgf/dta-files freely available from http://hci.iwr.uni-heidelberg.de/mip/proteomics or http://www.childrenshospital.org/research/steenlab.
Collapse
Affiliation(s)
- Bernhard Y Renard
- Interdisciplinary Center for Scientific Computing, University of Heidelberg, Heidelberg, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Merrell K, Thulin CD, Esplin MS, Graves SW. An integrated serum proteomic approach capable of monitoring the low molecular weight proteome with sequencing of intermediate to large peptides. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2009; 23:2685-2696. [PMID: 19630037 DOI: 10.1002/rcm.4168] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The low-abundance, low molecular weight serum proteome has high potential for the discovery of new biomarkers using mass spectrometry (MS). Because the serum proteome is large and complex, defining relative quantitative differences for a molecular species between comparison groups requires an approach with robust separation capability, high sensitivity, as well as high mass resolution. Capillary liquid chromatography (cLC)/MS provides both the necessary separation technique and the sensitivity to observe many low-abundance peptides. Subsequent identification of potential serum peptide biomarkers observed in the cLC/MS step can in principle be accomplished by in series cLC/MS/MS without further sample preparation or additional instrumentation. In this report a novel cLC/MS/MS method for peptide sequencing is described that surpasses previously reported size limits for amino acid sequencing accomplished by collisional fragmentation using a tandem time-of-flight MS instrument. As a demonstration of the approach, two low-abundance peptides with masses of approximately 4000-5000 Da were selected for MS/MS sequencing. The multi-channel analyzer (MCA) was used in a novel way that allowed for summation of 120 fragmentation spectra for each of several customized collision energies, providing more thorough fragmentation coverage of each peptide with improved signal to noise. The peak list from this composite analysis was submitted to Mascot for identification. The two index peptides, 4279 Da and 5061 Da, were successfully identified. The peptides were a 39 amino acid immunoglobulin G heavy chain variable region fragment and a 47 amino acid fibrin alpha isoform C-terminal fragment. The method described here provides the ability both to survey thousands of serum molecules and to couple that with markedly enhanced cLC/MS/MS peptide sequencing capabilities, providing a promising technique for serum biomarker discovery.
Collapse
Affiliation(s)
- Karen Merrell
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, USA
| | | | | | | |
Collapse
|
21
|
Wong JWH, Schwahn AB, Downard KM. ETISEQ--an algorithm for automated elution time ion sequencing of concurrently fragmented peptides for mass spectrometry-based proteomics. BMC Bioinformatics 2009; 10:244. [PMID: 19664259 PMCID: PMC2731054 DOI: 10.1186/1471-2105-10-244] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2009] [Accepted: 08/10/2009] [Indexed: 12/03/2022] Open
Abstract
Background Concurrent peptide fragmentation (i.e. shotgun CID, parallel CID or MSE) has emerged as an alternative to data-dependent acquisition in generating peptide fragmentation data in LC-MS/MS proteomics experiments. Concurrent peptide fragmentation data acquisition has been shown to be advantageous over data-dependent acquisition by providing greater detection dynamic range and providing more accurate quantitative information. Nevertheless, concurrent peptide fragmentation data acquisition remains to be widely adopted due to the lack of published algorithms designed specifically to process or interpret such data acquired on any mass spectrometer. Results An algorithm called Elution Time Ion Sequencing (ETISEQ), has been developed to enable automated conversion of concurrent peptide fragmentation data acquisition data to LC-MS/MS data. ETISEQ generates MS/MS-like spectra based on the correlation of precursor and product ion elution profiles. The performance of ETISEQ is demonstrated using concurrent peptide fragmentation data from tryptic digests of standard proteins and whole influenza virus. It is shown that the number of unique peptides identified from the digests is broadly comparable between ETISEQ processed concurrent peptide fragmentation data and the data-dependent acquired LC-MS/MS data. Conclusion The ETISEQ algorithm has been designed for easy integration with existing MS/MS analysis platforms. It is anticipated that it will popularize concurrent peptide fragmentation data acquisition in proteomics laboratories.
Collapse
Affiliation(s)
- Jason W H Wong
- UNSW Cancer Research Centre, University of New South Wales, Sydney, NSW 2052, Australia.
| | | | | |
Collapse
|
22
|
Salmi J, Nyman TA, Nevalainen OS, Aittokallio T. Filtering strategies for improving protein identification in high-throughput MS/MS studies. Proteomics 2009; 9:848-60. [PMID: 19160393 DOI: 10.1002/pmic.200800517] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Despite the recent advances in streamlining high-throughput proteomic pipelines using tandem mass spectrometry (MS/MS), reliable identification of peptides and proteins on a larger scale has remained a challenging task, still involving a considerable degree of user interaction. Recently, a number of papers have proposed computational strategies both for distinguishing poor MS/MS spectra prior to database search (pre-filtering) as well as for verifying the peptide identifications made by the search programs (post-filtering). Both of these filtering approaches can be very beneficial to the overall protein identification pipeline, since they can remove a substantial part of the time consuming manual validation work and convert large sets of MS/MS spectra into more reliable and interpretable proteome information. The choice of the filtering method depends both on the properties of the data and on the goals of the experiment. This review discusses the different pre- and post-filtering strategies available to the researchers, together with their relative merits and potential pitfalls. We also highlight some additional research topics, such as spectral denoising and statistical assessment of the identification results, which aim at further improving the coverage and accuracy of high-throughput protein identification studies.
Collapse
Affiliation(s)
- Jussi Salmi
- Department of Information Technology, University of Turku, Turku, Finland.
| | | | | | | |
Collapse
|
23
|
Ding J, Shi J, Poirier GG, Wu FX. A novel approach to denoising ion trap tandem mass spectra. Proteome Sci 2009; 7:9. [PMID: 19292921 PMCID: PMC2670284 DOI: 10.1186/1477-5956-7-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Accepted: 03/17/2009] [Indexed: 12/04/2022] Open
Abstract
Background Mass spectrometers can produce a large number of tandem mass spectra. They are unfortunately noise-contaminated. Noises can affect the quality of tandem mass spectra and thus increase the false positives and false negatives in the peptide identification. Therefore, it is appealing to develop an approach to denoising tandem mass spectra. Results We propose a novel approach to denoising tandem mass spectra. The proposed approach consists of two modules: spectral peak intensity adjustment and intensity local maximum extraction. In the spectral peak intensity adjustment module, we introduce five features to describe the quality of each peak. Based on these features, a score is calculated for each peak and is used to adjust its intensity. As a result, the intensity will be adjusted to a local maximum if a peak is a signal peak, and it will be decreased if the peak is a noisy one. The second module uses a morphological reconstruction filter to remove the peaks whose intensities are not the local maxima of the spectrum. Experiments have been conducted on two ion trap tandem mass spectral datasets: ISB and TOV. Experimental results show that our algorithm can remove about 69% of the peaks of a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31.23% and 14.12% for the two tandem mass spectra datasets, respectively. Conclusion The proposed denoising algorithm can be integrated into current popular peptide identification algorithms such as Mascot to improve the reliability of assigning peptides to spectra. Availability of the software The software created from this work is available upon request.
Collapse
Affiliation(s)
- Jiarui Ding
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Canada.
| | | | | | | |
Collapse
|
24
|
Nie L, Wu G, Zhang W. Statistical Application and Challenges in Global Gel-Free Proteomic Analysis by Mass Spectrometry. Crit Rev Biotechnol 2008; 28:297-307. [DOI: 10.1080/07388550802543158] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
25
|
Interferences and contaminants encountered in modern mass spectrometry. Anal Chim Acta 2008; 627:71-81. [PMID: 18790129 DOI: 10.1016/j.aca.2008.04.043] [Citation(s) in RCA: 416] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2008] [Revised: 04/14/2008] [Accepted: 04/16/2008] [Indexed: 12/26/2022]
|
26
|
Cao X, Nesvizhskii AI. Improved sequence tag generation method for peptide identification in tandem mass spectrometry. J Proteome Res 2008; 7:4422-34. [PMID: 18785767 DOI: 10.1021/pr800400q] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The sequence tag-based peptide identification methods are a promising alternative to the traditional database search approach. However, a more comprehensive analysis, optimization, and comparison with established methods are necessary before these methods can gain widespread use in the proteomics community. Using the InsPecT open source code base ( Tanner et al., Anal. Chem. 2005, 77, 4626- 39 ), we present an improved sequence tag generation method that directly incorporates multicharged fragment ion peaks present in many tandem mass spectra of higher charge states. We also investigate the performance of sequence tagging under different settings using control data sets generated on five different types of mass spectrometers, as well as using a complex phosphopeptide-enriched sample. We also demonstrate that additional modeling of InsPecT search scores using a semiparametric approach incorporating the accuracy of the precursor ion mass measurement provides additional improvement in the ability to discriminate between correct and incorrect peptide identifications. The overall superior performance of the sequence tag-based peptide identification method is demonstrated by comparison with a commonly used SEQUEST/PeptideProphet approach.
Collapse
Affiliation(s)
- Xia Cao
- Department of Pathology, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | |
Collapse
|
27
|
Nesvizhskii AI, Vitek O, Aebersold R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat Methods 2007; 4:787-97. [PMID: 17901868 DOI: 10.1038/nmeth1088] [Citation(s) in RCA: 438] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The analysis of the large amount of data generated in mass spectrometry-based proteomics experiments represents a significant challenge and is currently a bottleneck in many proteomics projects. In this review we discuss critical issues related to data processing and analysis in proteomics and describe available methods and tools. We place special emphasis on the elaboration of results that are supported by sound statistical arguments.
Collapse
Affiliation(s)
- Alexey I Nesvizhskii
- University of Michigan, Department of Pathology and Center for Computational Medicine and Biology, Ann Arbor, Michigan 48105, USA
| | | | | |
Collapse
|