1
|
Pane K, Quintavalle C, Nuzzo S, Ingenito F, Roscigno G, Affinito A, Scognamiglio I, Pattanayak B, Gallo E, Accardo A, Thomas G, Minic Z, Berezovski MV, Franzese M, Condorelli G. Comparative Proteomic Profiling of Secreted Extracellular Vesicles from Breast Fibroadenoma and Malignant Lesions: A Pilot Study. Int J Mol Sci 2022; 23:ijms23073989. [PMID: 35409352 PMCID: PMC8999736 DOI: 10.3390/ijms23073989] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 03/29/2022] [Accepted: 03/30/2022] [Indexed: 02/01/2023] Open
Abstract
Extracellular vesicles (EVs) shuttle proteins, RNA, DNA, and lipids crucial for cell-to-cell communication. Recent findings have highlighted that EVs, by virtue of their cargo, may also contribute to breast cancer (BC) growth and metastatic dissemination. Indeed, EVs are gaining great interest as non-invasive cancer biomarkers. However, little is known about the biological and physical properties of EVs from malignant BC lesions, and even less is understood about EVs from non-malignant lesions, such as breast fibroadenoma (FAD), which are clinically managed using conservative approaches. Thus, for this pilot study, we attempted to purify and explore the proteomic profiles of EVs from benign breast lesions, HER2+ BCs, triple–negative BCs (TNBCs), and continuous BC cell lines (i.e., BT-549, MCF–10A, and MDA-MB-231), combining experimental and semi-quantitative approaches. Of note, proteome-wide analyses showed 49 common proteins across EVs harvested from FAD, HER2+ BCs, TNBCs, and model BC lines. This is the first feasibility study evaluating the physicochemical composition and proteome of EVs from benign breast cells and primary and immortalized BC cells. Our preliminary results hold promise for possible implications in precision medicine for BC.
Collapse
Affiliation(s)
- Katia Pane
- IRCCS SYNLAB SDN, Via E. Gianturco 113, 80143 Naples, Italy; (K.P.); (S.N.); (E.G.)
| | - Cristina Quintavalle
- Institute for Experimental Endocrinology and Oncology (IEOS), National Research Council (CNR), Via Pansini 5, 80131 Naples, Italy;
| | - Silvia Nuzzo
- IRCCS SYNLAB SDN, Via E. Gianturco 113, 80143 Naples, Italy; (K.P.); (S.N.); (E.G.)
| | - Francesco Ingenito
- Percuros BV, Eerbeeklaan 42, 2573 HT Den Haag, The Netherlands; (F.I.); (G.R.); (A.A.)
- Department of Molecular Medicine and Medical Biotechnology, Federico II University of Naples, Via Pansini 15, 80131 Naples, Italy; (I.S.); (B.P.)
| | - Giuseppina Roscigno
- Percuros BV, Eerbeeklaan 42, 2573 HT Den Haag, The Netherlands; (F.I.); (G.R.); (A.A.)
- Department of Molecular Medicine and Medical Biotechnology, Federico II University of Naples, Via Pansini 15, 80131 Naples, Italy; (I.S.); (B.P.)
| | - Alessandra Affinito
- Percuros BV, Eerbeeklaan 42, 2573 HT Den Haag, The Netherlands; (F.I.); (G.R.); (A.A.)
- Department of Molecular Medicine and Medical Biotechnology, Federico II University of Naples, Via Pansini 15, 80131 Naples, Italy; (I.S.); (B.P.)
| | - Iolanda Scognamiglio
- Department of Molecular Medicine and Medical Biotechnology, Federico II University of Naples, Via Pansini 15, 80131 Naples, Italy; (I.S.); (B.P.)
| | - Birlipta Pattanayak
- Department of Molecular Medicine and Medical Biotechnology, Federico II University of Naples, Via Pansini 15, 80131 Naples, Italy; (I.S.); (B.P.)
| | - Enrico Gallo
- IRCCS SYNLAB SDN, Via E. Gianturco 113, 80143 Naples, Italy; (K.P.); (S.N.); (E.G.)
| | - Antonella Accardo
- Department of Pharmacy and Research Centre on Bioactive Peptides (CIRPeB), University of Naples “Federico II”, Via Mezzocannone 16, 80134 Naples, Italy;
| | - Guglielmo Thomas
- Breast Unit Clinica Mediterranea, Mediterranea Cardiocentro, Via Orazio 2, 80122 Naples, Italy;
| | - Zoran Minic
- John L. Holmes Mass Spectrometry Facility, Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, ON K1N 6N5, Canada; (Z.M.); (M.V.B.)
| | - Maxim V. Berezovski
- John L. Holmes Mass Spectrometry Facility, Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, ON K1N 6N5, Canada; (Z.M.); (M.V.B.)
| | - Monica Franzese
- IRCCS SYNLAB SDN, Via E. Gianturco 113, 80143 Naples, Italy; (K.P.); (S.N.); (E.G.)
- Correspondence: (M.F.); (G.C.)
| | - Gerolama Condorelli
- Department of Molecular Medicine and Medical Biotechnology, Federico II University of Naples, Via Pansini 15, 80131 Naples, Italy; (I.S.); (B.P.)
- IRCCS Istituto Neurologico Mediterraneo (INM) Neuromed, Via Atinense 18, 86077 Pozzilli, Italy
- Correspondence: (M.F.); (G.C.)
| |
Collapse
|
2
|
A systematic evaluation of yeast sample preparation protocols for spectral identifications, proteome coverage and post-isolation modifications. J Proteomics 2022; 261:104576. [DOI: 10.1016/j.jprot.2022.104576] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/17/2022] [Accepted: 03/17/2022] [Indexed: 11/20/2022]
|
3
|
Agten A, Van Houtven J, Askenazi M, Burzykowski T, Laukens K, Valkenborg D. Visualizing the agreement of peptide assignments between different search engines. JOURNAL OF MASS SPECTROMETRY : JMS 2020; 55:e4471. [PMID: 31713933 DOI: 10.1002/jms.4471] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 10/23/2019] [Accepted: 10/28/2019] [Indexed: 06/10/2023]
Abstract
There is a trend in the analysis of shotgun proteomics data that aims to combine information from multiple search engines to increase the number of peptide annotations in an experiment. Typically, the degree of search engine complementarity and search engine agreement is visually illustrated by means of Venn diagrams that present the findings of a database search on the level of the nonredundant peptide annotations. We argue this practice to be not fit-for-purpose since the diagrams do not take into account and often conceal the information on complementarity and agreement at the level of the spectrum identification. We promote a new type of visualization that provides insight on the peptide sequence agreement at the level of the peptide-spectrum match (PSM) as a measure of consensus between two search engines with nominal outcomes. We applied the visualizations and percentage sequence agreement to an in-house data set of our benchmark organism, Caenorhabditis elegans, and illustrated that when assessing the agreement between search engine, one should disentangle the notion of PSM confidence and PSM identity. The visualizations presented in this manuscript provide a more informative assessment of pairs of search engines and are made available as an R function in the Supporting Information.
Collapse
Affiliation(s)
- Annelies Agten
- Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Hasselt, Belgium
| | - Joris Van Houtven
- Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Hasselt, Belgium
- UA-VITO Center for Proteomics, University of Antwerp, Antwerp, Belgium
- Applied Bio and Molecular Systems, Flemish Institute for Technological Research (VITO), Mol, Belgium
| | | | - Tomasz Burzykowski
- Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Hasselt, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium
| | - Dirk Valkenborg
- Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Hasselt, Belgium
- UA-VITO Center for Proteomics, University of Antwerp, Antwerp, Belgium
- Applied Bio and Molecular Systems, Flemish Institute for Technological Research (VITO), Mol, Belgium
| |
Collapse
|
4
|
Alves G, Yu YK. Robust Accurate Identification and Biomass Estimates of Microorganisms via Tandem Mass Spectrometry. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2020; 31:85-102. [PMID: 32881514 PMCID: PMC10501333 DOI: 10.1021/jasms.9b00035] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Rapid and accurate identification of microorganisms and estimation of their biomasses are of extreme importance to public health. Mass spectrometry has become an important technique for these purposes. Previously we published a workflow named Microorganism Classification and Identification (MiCId v.12.26.2017) that was shown to perform no worse than other workflows. This manuscript presents MiCId v.12.13.2018 that, in comparison with the earlier version v.12.26.2017, allows for biomass estimates, provides more accurate microorganism identifications (better controls the number of false positives), and is robust against database size increase. This significant advance is made possible by several new ingredients introduced: first, we apply a modified expectation-maximization method to compute for each taxon considered a prior probability, which can be used for biomass estimate; second, we introduce a new concept called ownership, through which the participation ratio is computed and use it as the number of taxa to be kept within a cluster of closely related taxa; third, based on confidently identified peptides, we calculate for each taxon its degree of independence from the rest of taxa considered to determine whether or not to split this taxon off the cluster. Using 270 data files, each containing a large number of MS/MS spectra, we show that, in comparison with v.12.26.2017, version v.12.13.2018 yields superior retrieval results. We also show that MiCId v.12.13.2018 can estimate species biomass reasonably well. The new MiCId v.12.13.2018, designed to run in Linux environment, is freely available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotehnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, United States
| | - Yi-Kuo Yu
- National Center for Biotehnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, United States
| |
Collapse
|
5
|
Abstract
Mass spectrometry is extremely efficient for sequencing small peptides generated by, for example, a trypsin digestion of a complex mixture. Current instruments have the capacity to generate 50-100 K MSMS spectra from a single run. Of these ~30-50% is typically assigned to peptide matches on a 1% FDR threshold. The remaining spectra need more research to explain. We address here whether the 30-50% matched spectra provide consensus matches when using different database-dependent search pipelines. Although the majority of the spectra peptide assignments concur across search engines, our conclusion is that database-dependent search engines still require improvements.
Collapse
Affiliation(s)
- Rune Matthiesen
- Computational and Experimental Biology Group, CEDOC, Chronic Diseases Research Centre, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisboa, Portugal.
| | - Gorka Prieto
- Department of Communications Engineering, Faculty of Engineering of Bilbao, University of the Basque Country (UPV/EHU), Bilbao, Spain
| | - Hans Christian Beck
- Department of Clinical Biochemistry and Pharmacology, Odense University Hospital, Odense C, Denmark
| |
Collapse
|
6
|
Chen ZL, Meng JM, Cao Y, Yin JL, Fang RQ, Fan SB, Liu C, Zeng WF, Ding YH, Tan D, Wu L, Zhou WJ, Chi H, Sun RX, Dong MQ, He SM. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides. Nat Commun 2019; 10:3404. [PMID: 31363125 PMCID: PMC6667459 DOI: 10.1038/s41467-019-11337-z] [Citation(s) in RCA: 247] [Impact Index Per Article: 49.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Accepted: 06/20/2019] [Indexed: 01/05/2023] Open
Abstract
We describe pLink 2, a search engine with higher speed and reliability for proteome-scale identification of cross-linked peptides. With a two-stage open search strategy facilitated by fragment indexing, pLink 2 is ~40 times faster than pLink 1 and 3~10 times faster than Kojak. Furthermore, using simulated datasets, synthetic datasets, 15N metabolically labeled datasets, and entrapment databases, four analysis methods were designed to evaluate the credibility of ten state-of-the-art search engines. This systematic evaluation shows that pLink 2 outperforms these methods in precision and sensitivity, especially at proteome scales. Lastly, re-analysis of four published proteome-scale cross-linking datasets with pLink 2 required only a fraction of the time used by pLink 1, with up to 27% more cross-linked residue pairs identified. pLink 2 is therefore an efficient and reliable tool for cross-linking mass spectrometry analysis, and the systematic evaluation methods described here will be useful for future software development. The identification of cross-linked peptides at a proteome scale for interactome analyses represents a complex challenge. Here the authors report an efficient and reliable search engine pLink 2 for proteome-scale cross-linking mass spectrometry analyses, and demonstrate how to systematically evaluate the credibility of search engines.
Collapse
Affiliation(s)
- Zhen-Lin Chen
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jia-Ming Meng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yong Cao
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Ji-Li Yin
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Run-Qian Fang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sheng-Bo Fan
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chao Liu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yue-He Ding
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Dan Tan
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Long Wu
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wen-Jing Zhou
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Rui-Xiang Sun
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, 102206, China.
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
7
|
Maes E, Oeyen E, Boonen K, Schildermans K, Mertens I, Pauwels P, Valkenborg D, Baggerman G. The challenges of peptidomics in complementing proteomics in a clinical context. MASS SPECTROMETRY REVIEWS 2019; 38:253-264. [PMID: 30372792 DOI: 10.1002/mas.21581] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 10/01/2018] [Indexed: 06/08/2023]
Abstract
Naturally occurring peptides, including growth factors, hormones, and neurotransmitters, represent an important class of biomolecules and have crucial roles in human physiology. The study of these peptides in clinical samples is therefore as relevant as ever. Compared to more routine proteomics applications in clinical research, peptidomics research questions are more challenging and have special requirements with regard to sample handling, experimental design, and bioinformatics. In this review, we describe the issues that confront peptidomics in a clinical context. After these hurdles are (partially) overcome, peptidomics will be ready for a successful translation into medical practice.
Collapse
Affiliation(s)
- Evelyne Maes
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Food and Bio-Based Products, AgResearch Ltd., Lincoln, New Zealand
| | - Eline Oeyen
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Karin Schildermans
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Inge Mertens
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Patrick Pauwels
- Molecular Pathology Unit, Department of Pathology, Antwerp University Hospital, Edegem, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Center for Statistics, Hasselt University, Diepenbeek, Belgium
| | - Geert Baggerman
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
8
|
Lin A, Howbert JJ, Noble WS. Combining High-Resolution and Exact Calibration To Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data. J Proteome Res 2018; 17:3644-3656. [PMID: 30221945 DOI: 10.1021/acs.jproteome.8b00206] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
To achieve accurate assignment of peptide sequences to observed fragmentation spectra, a shotgun proteomics database search tool must make good use of the very high-resolution information produced by state-of-the-art mass spectrometers. However, making use of this information while also ensuring that the search engine's scores are well calibrated, that is, that the score assigned to one spectrum can be meaningfully compared to the score assigned to a different spectrum, has proven to be challenging. Here we describe a database search score function, the "residue evidence" (res-ev) score, that achieves both of these goals simultaneously. We also demonstrate how to combine calibrated res-ev scores with calibrated XCorr scores to produce a "combined p value" score function. We provide a benchmark consisting of four mass spectrometry data sets, which we use to compare the combined p value to the score functions used by several existing search engines. Our results suggest that the combined p value achieves state-of-the-art performance, generally outperforming MS Amanda and Morpheus and performing comparably to MS-GF+. The res-ev and combined p-value score functions are freely available as part of the Tide search engine in the Crux mass spectrometry toolkit ( http://crux.ms ).
Collapse
Affiliation(s)
- Andy Lin
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - J Jeffry Howbert
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - William Stafford Noble
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States.,Department of Computer Science and Engineering , University of Washington , Seattle , Washington 98195 , United States
| |
Collapse
|
9
|
Alves G, Wang G, Ogurtsov AY, Drake SK, Gucek M, Sacks DB, Yu YK. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2018; 29:1721-1737. [PMID: 29873019 PMCID: PMC6061032 DOI: 10.1007/s13361-018-1986-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 03/30/2018] [Accepted: 04/25/2018] [Indexed: 05/30/2023]
Abstract
Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Guanghui Wang
- Proteomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Aleksey Y Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Steven K Drake
- Critical Care Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Marjan Gucek
- Proteomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David B Sacks
- Department of Laboratory Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
10
|
Joyce B, Lee D, Rubio A, Ogurtsov A, Alves G, Yu YK. A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics. BMC Res Notes 2018; 11:182. [PMID: 29544540 PMCID: PMC5856202 DOI: 10.1186/s13104-018-3289-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Accepted: 03/09/2018] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. RESULTS We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .
Collapse
Affiliation(s)
- Brendan Joyce
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Danny Lee
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Alex Rubio
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Aleksey Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA.
| |
Collapse
|
11
|
Maabreh M, Qolomany B, Alsmadi I, Gupta A. Deep Learning-based MSMS Spectra Reduction in Support of Running Multiple Protein Search Engines on Cloud. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2017; 2017:1909-1914. [PMID: 34430067 PMCID: PMC8382039 DOI: 10.1109/bibm.2017.8217951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The diversity of the available protein search engines with respect to the utilized matching algorithms, the low overlap ratios among their results and the disparity of their coverage encourage the community of proteomics to utilize ensemble solutions of different search engines. The advancing in cloud computing technology and the availability of distributed processing clusters can also provide support to this task. However, data transferring and results' combining, in this case, could be the major bottleneck. The flood of billions of observed mass spectra, hundreds of Gigabytes or potentially Terabytes of data, could easily cause the congestions, increase the risk of failure, poor performance, add more computations' cost, and waste available resources. Therefore, in this study, we propose a deep learning model in order to mitigate the traffic over cloud network and, thus reduce the cost of cloud computing. The model, which depends on the top 50 intensities and their m/z values of each spectrum, removes any spectrum which is predicted not to pass the majority voting of the participated search engines. Our results using three search engines namely: pFind, Comet and X!Tandem, and four different datasets are promising and promote the investment in deep learning to solve such type of Big data problems.
Collapse
Affiliation(s)
- Majdi Maabreh
- Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA
| | - Basheer Qolomany
- Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA
| | - Izzat Alsmadi
- Department of Computing and Cyber Security, Texas A&M University, San Antonio, TX, USA
| | - Ajay Gupta
- Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA
| |
Collapse
|
12
|
Anderson DC, Lapp SA, Barnwell JW, Galinski MR. A large scale Plasmodium vivax- Saimiri boliviensis trophozoite-schizont transition proteome. PLoS One 2017; 12:e0182561. [PMID: 28829774 PMCID: PMC5567661 DOI: 10.1371/journal.pone.0182561] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 07/20/2017] [Indexed: 11/18/2022] Open
Abstract
Plasmodium vivax is a complex protozoan parasite with over 6,500 genes and stage-specific differential expression. Much of the unique biology of this pathogen remains unknown, including how it modifies and restructures the host reticulocyte. Using a recently published P. vivax reference genome, we report the proteome from two biological replicates of infected Saimiri boliviensis host reticulocytes undergoing transition from the late trophozoite to early schizont stages. Using five database search engines, we identified a total of 2000 P. vivax and 3487 S. boliviensis proteins, making this the most comprehensive P. vivax proteome to date. PlasmoDB GO-term enrichment analysis of proteins identified at least twice by a search engine highlighted core metabolic processes and molecular functions such as glycolysis, translation and protein folding, cell components such as ribosomes, proteasomes and the Golgi apparatus, and a number of vesicle and trafficking related clusters. Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.8 enriched functional annotation clusters of S. boliviensis proteins highlighted vesicle and trafficking-related clusters, elements of the cytoskeleton, oxidative processes and response to oxidative stress, macromolecular complexes such as the proteasome and ribosome, metabolism, translation, and cell death. Host and parasite proteins potentially involved in cell adhesion were also identified. Over 25% of the P. vivax proteins have no functional annotation; this group includes 45 VIR members of the large PIR family. A number of host and pathogen proteins contained highly oxidized or nitrated residues, extending prior trophozoite-enriched stage observations from S. boliviensis infections, and supporting the possibility of oxidative stress in relation to the disease. This proteome significantly expands the size and complexity of the known P. vivax and Saimiri host iRBC proteomes, and provides in-depth data that will be valuable for ongoing research on this parasite’s biology and pathogenesis.
Collapse
Affiliation(s)
- D. C. Anderson
- Bioscience Division, SRI International, Harrisonburg, VA, United States of America
- * E-mail:
| | - Stacey A. Lapp
- Emory Vaccine Center, Yerkes National Primate Research Center, Emory University, Atlanta, GA, United States of America
| | - John W. Barnwell
- Malaria Branch, Division of Parasitic Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Mary R. Galinski
- Emory Vaccine Center, Yerkes National Primate Research Center, Emory University, Atlanta, GA, United States of America
- Department of Medicine, Division of Infectious Diseases, Emory University School of Medicine, Atlanta, GA, United States of America
| |
Collapse
|
13
|
Tessier D, Lollier V, Larré C, Rogniaux H. Origin of Disagreements in Tandem Mass Spectra Interpretation by Search Engines. J Proteome Res 2016; 15:3481-3488. [PMID: 27571036 DOI: 10.1021/acs.jproteome.6b00024] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Several proteomic database search engines that interpret LC-MS/MS data do not identify the same set of peptides. These disagreements occur even when the scores of the peptide-to-spectrum matches suggest good confidence in the interpretation. Our study shows that these disagreements observed for the interpretations of a given spectrum are almost exclusively due to the variation of what we call the "peptide space", i.e., the set of peptides that are actually compared to the experimental spectra. We discuss the potential difficulties of precisely defining the "peptide space." Indeed, although several parameters that are generally reported in publications can easily be set to the same values, many additional parameters-with much less straightforward user access-might impact the "peptide space" used by each program. Moreover, in a configuration where each search engine identifies the same candidates for each spectrum, the inference of the proteins may remain quite different depending on the false discovery rate selected.
Collapse
Affiliation(s)
- Dominique Tessier
- INRA, UR 1268 Biopolymères Interactions Assemblages, F-44300 Nantes, France
| | - Virginie Lollier
- INRA, UR 1268 Biopolymères Interactions Assemblages, F-44300 Nantes, France
| | - Colette Larré
- INRA, UR 1268 Biopolymères Interactions Assemblages, F-44300 Nantes, France
| | - Hélène Rogniaux
- INRA, UR 1268 Biopolymères Interactions Assemblages, F-44300 Nantes, France
| |
Collapse
|
14
|
Branson OE, Freitas MA. A multi-model statistical approach for proteomic spectral count quantitation. J Proteomics 2016; 144:23-32. [PMID: 27260494 DOI: 10.1016/j.jprot.2016.05.032] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Revised: 05/23/2016] [Accepted: 05/24/2016] [Indexed: 01/16/2023]
Abstract
UNLABELLED The rapid development of mass spectrometry (MS) technologies has solidified shotgun proteomics as the most powerful analytical platform for large-scale proteome interrogation. The ability to map and determine differential expression profiles of the entire proteome is the ultimate goal of shotgun proteomics. Label-free quantitation has proven to be a valid approach for discovery shotgun proteomics, especially when sample is limited. Label-free spectral count quantitation is an approach analogous to RNA sequencing whereby count data is used to determine differential expression. Here we show that statistical approaches developed to evaluate differential expression in RNA sequencing experiments can be applied to detect differential protein expression in label-free discovery proteomics. This approach, termed MultiSpec, utilizes open-source statistical platforms; namely edgeR, DESeq and baySeq, to statistically select protein candidates for further investigation. Furthermore, to remove bias associated with a single statistical approach a single ranked list of differentially expressed proteins is assembled by comparing edgeR and DESeq q-values directly with the false discovery rate (FDR) calculated by baySeq. This statistical approach is then extended when applied to spectral count data derived from multiple proteomic pipelines. The individual statistical results from multiple proteomic pipelines are integrated and cross-validated by means of collapsing protein groups. BIOLOGICAL SIGNIFICANCE Spectral count data from shotgun proteomics experiments is semi-quantitative and semi-random, yet a robust way to estimate protein concentration. Tag-count approaches are routinely used to analyze RNA sequencing data sets. This approach, termed MultiSpec, utilizes multiple tag-count based statistical tests to determine differential protein expression from spectral counts. The statistical results from these tag-count approaches are combined in order to reach a final MultiSpec q-value to re-rank protein candidates. This re-ranking procedure is completed to remove bias associated with a single approach in order to better understand the true proteomic differences driving the biology in question. The MultiSpec approach can be extended to multiple proteomic pipelines. In such an instance, MultiSpec statistical results are integrated by collapsing protein groups across proteomic pipelines to provide a single ranked list of differentially expressed proteins. This integration mechanism is seamlessly integrated with the statistical analysis and provides the means to cross-validate protein inferences from multiple proteomic pipelines.
Collapse
Affiliation(s)
- Owen E Branson
- The Ohio State Biochemistry Graduate Program, The Ohio State University, Columbus, OH, USA; Department of Molecular Virology, Immunology and Medical Genetics, The Ohio State University, Columbus, OH, USA; Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA
| | - Michael A Freitas
- The Ohio State Biochemistry Graduate Program, The Ohio State University, Columbus, OH, USA; Department of Molecular Virology, Immunology and Medical Genetics, The Ohio State University, Columbus, OH, USA; Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
15
|
Mitchell CJ, Kim MS, Na CH, Pandey A. PyQuant: A Versatile Framework for Analysis of Quantitative Mass Spectrometry Data. Mol Cell Proteomics 2016; 15:2829-38. [PMID: 27231314 DOI: 10.1074/mcp.o115.056879] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Indexed: 12/14/2022] Open
Abstract
Quantitative mass spectrometry data necessitates an analytical pipeline that captures the accuracy and comprehensiveness of the experiments. Currently, data analysis is often coupled to specific software packages, which restricts the analysis to a given workflow and precludes a more thorough characterization of the data by other complementary tools. To address this, we have developed PyQuant, a cross-platform mass spectrometry data quantification application that is compatible with existing frameworks and can be used as a stand-alone quantification tool. PyQuant supports most types of quantitative mass spectrometry data including SILAC, NeuCode, (15)N, (13)C, or (18)O and chemical methods such as iTRAQ or TMT and provides the option of adding custom labeling strategies. In addition, PyQuant can perform specialized analyses such as quantifying isotopically labeled samples where the label has been metabolized into other amino acids and targeted quantification of selected ions independent of spectral assignment. PyQuant is capable of quantifying search results from popular proteomic frameworks such as MaxQuant, Proteome Discoverer, and the Trans-Proteomic Pipeline in addition to several standalone search engines. We have found that PyQuant routinely quantifies a greater proportion of spectral assignments, with increases ranging from 25-45% in this study. Finally, PyQuant is capable of complementing spectral assignments between replicates to quantify ions missed because of lack of MS/MS fragmentation or that were omitted because of issues such as spectra quality or false discovery rates. This results in an increase of biologically useful data available for interpretation. In summary, PyQuant is a flexible mass spectrometry data quantification platform that is capable of interfacing with a variety of existing formats and is highly customizable, which permits easy configuration for custom analysis.
Collapse
Affiliation(s)
- Christopher J Mitchell
- From the ‡McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205; §§Ginkgo Bioworks, 27 Drydock Ave, Boston, MA 02210, USA
| | - Min-Sik Kim
- From the ‡McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205; ‖Department of Applied Chemistry, Kyung Hee University, Yongin, Gyeonggi, South Korea
| | - Chan Hyun Na
- From the ‡McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205
| | - Akhilesh Pandey
- From the ‡McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205; §Departments of Biological Chemistry, Pathology and Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205;
| |
Collapse
|
16
|
Alves G, Yu YK. Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution. Bioinformatics 2016; 32:2642-9. [PMID: 27153659 DOI: 10.1093/bioinformatics/btw225] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 04/16/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION There is a growing trend for biomedical researchers to extract evidence and draw conclusions from mass spectrometry based proteomics experiments, the cornerstone of which is peptide identification. Inaccurate assignments of peptide identification confidence thus may have far-reaching and adverse consequences. Although some peptide identification methods report accurate statistics, they have been limited to certain types of scoring function. The extreme value statistics based method, while more general in the scoring functions it allows, demands accurate parameter estimates and requires, at least in its original design, excessive computational resources. Improving the parameter estimate accuracy and reducing the computational cost for this method has two advantages: it provides another feasible route to accurate significance assessment, and it could provide reliable statistics for scoring functions yet to be developed. RESULTS We have formulated and implemented an efficient algorithm for calculating the extreme value statistics for peptide identification applicable to various scoring functions, bypassing the need for searching large random databases. AVAILABILITY AND IMPLEMENTATION The source code, implemented in C ++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit CONTACT yyu@ncbi.nlm.nih.gov SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
17
|
Alves G, Wang G, Ogurtsov AY, Drake SK, Gucek M, Suffredini AF, Sacks DB, Yu YK. Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2016; 27:194-210. [PMID: 26510657 PMCID: PMC4723618 DOI: 10.1007/s13361-015-1271-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Revised: 09/04/2015] [Accepted: 09/05/2015] [Indexed: 05/13/2023]
Abstract
Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple 'fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Guanghui Wang
- Proteomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Aleksey Y Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Steven K Drake
- Critical Care Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Marjan Gucek
- Proteomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anthony F Suffredini
- Critical Care Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David B Sacks
- Department of Laboratory Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
18
|
Rahlouni F, Szarka S, Shulaev V, Prokai L. A Survey of the Impact of Deyolking on Biological Processes Covered by Shotgun Proteomic Analyses of Zebrafish Embryos. Zebrafish 2015; 12:398-407. [PMID: 26439676 DOI: 10.1089/zeb.2015.1121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Deyolking, the removal of the most abundant protein from the zebrafish (Danio rerio) embryo, is a common technique for in-depth exploration of proteome-level changes in vivo due to various environmental stressors or pharmacological impacts during embryonic stage of development. However, the effect of this procedure on the remaining proteome has not been fully studied. Here, we report a label-free shotgun proteomics survey on proteome coverage and biological processes that are enriched and depleted as a result of deyolking. Enriched proteins are involved in cellular energetics and development pathways, specifically implicating enrichment related to mitochondrial function. Although few proteins were removed completely by deyolking, depleted molecular pathways were associated with calcium signaling and signaling events implicating immune system response.
Collapse
Affiliation(s)
- Fatima Rahlouni
- 1 Department of Pharmacology and Neuroscience, University of North Texas Health Science Center , Fort Worth, Texas
| | - Szabolcs Szarka
- 1 Department of Pharmacology and Neuroscience, University of North Texas Health Science Center , Fort Worth, Texas
| | - Vladimir Shulaev
- 2 Department of Biological Sciences, University of North Texas , Denton, Texas
| | - Laszlo Prokai
- 1 Department of Pharmacology and Neuroscience, University of North Texas Health Science Center , Fort Worth, Texas
| |
Collapse
|
19
|
Affiliation(s)
- Gayatri Mohanty
- Department of Zoology, School of Life Sciences, Ravenshaw University, Cuttack, Orissa, India
| | - Nirlipta Swain
- Department of Zoology, School of Life Sciences, Ravenshaw University, Cuttack, Orissa, India
| | - Luna Samanta
- Department of Zoology, School of Life Sciences, Ravenshaw University, Cuttack, Orissa, India
| |
Collapse
|
20
|
Alves G, Yu YK. Mass spectrometry-based protein identification with accurate statistical significance assignment. ACTA ACUST UNITED AC 2014; 31:699-706. [PMID: 25362092 DOI: 10.1093/bioinformatics/btu717] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. RESULTS We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. AVAILABILITY AND IMPLEMENTATION The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
21
|
Alves G, Yu YK. Accuracy evaluation of the unified P-value from combining correlated P-values. PLoS One 2014; 9:e91225. [PMID: 24663491 PMCID: PMC3963868 DOI: 10.1371/journal.pone.0091225] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 02/09/2014] [Indexed: 01/29/2023] Open
Abstract
Meta-analysis methods that combine P-values into a single unified P-value are frequently employed to improve confidence in hypothesis testing. An assumption made by most meta-analysis methods is that the P-values to be combined are independent, which may not always be true. To investigate the accuracy of the unified P-value from combining correlated P-values, we have evaluated a family of statistical methods that combine: independent, weighted independent, correlated, and weighted correlated P-values. Statistical accuracy evaluation by combining simulated correlated P-values showed that correlation among P-values can have a significant effect on the accuracy of the combined P-value obtained. Among the statistical methods evaluated those that weight P-values compute more accurate combined P-values than those that do not. Also, statistical methods that utilize the correlation information have the best performance, producing significantly more accurate combined P-values. In our study we have demonstrated that statistical methods that combine P-values based on the assumption of independence can produce inaccurate P-values when combining correlated P-values, even when the P-values are only weakly correlated. Therefore, to prevent from drawing false conclusions during hypothesis testing, our study advises caution be used when interpreting the P-value obtained from combining P-values of unknown correlation. However, when the correlation information is available, the weighting-capable statistical method, first introduced by Brown and recently modified by Hou, seems to perform the best amongst the methods investigated.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
22
|
Shteynberg D, Nesvizhskii AI, Moritz RL, Deutsch EW. Combining results of multiple search engines in proteomics. Mol Cell Proteomics 2013; 12:2383-93. [PMID: 23720762 DOI: 10.1074/mcp.r113.027797] [Citation(s) in RCA: 134] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques.
Collapse
|
23
|
Milardi D, Grande G, Vincenzoni F, Castagnola M, Marana R. Proteomics of human seminal plasma: Identification of biomarker candidates for fertility and infertility and the evolution of technology. Mol Reprod Dev 2013; 80:350-7. [DOI: 10.1002/mrd.22178] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Accepted: 03/22/2013] [Indexed: 12/13/2022]
Affiliation(s)
- Domenico Milardi
- International Scientific Institute “PaoloVI”; Università Cattolica del S. Cuore; Rome; Italy
| | - Giuseppe Grande
- Department of Endocrinology; Università Cattolica del S. Cuore; Rome; Italy
| | - Federica Vincenzoni
- Institute of Biochemistry and Clinical Biochemistry; Università Cattolica del S. Cuore; Rome; Italy
| | - Massimo Castagnola
- Institute of Biochemistry and Clinical Biochemistry; Università Cattolica del S. Cuore; Rome; Italy
| | | |
Collapse
|
24
|
Zhang Y, Fonslow BR, Shan B, Baek MC, Yates JR. Protein analysis by shotgun/bottom-up proteomics. Chem Rev 2013; 113:2343-94. [PMID: 23438204 PMCID: PMC3751594 DOI: 10.1021/cr3003533] [Citation(s) in RCA: 979] [Impact Index Per Article: 89.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Yaoyang Zhang
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bryan R. Fonslow
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bing Shan
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Moon-Chang Baek
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Department of Molecular Medicine, Cell and Matrix Biology Research Institute, School of Medicine, Kyungpook National University, Daegu 700-422, Republic of Korea
| | - John R. Yates
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
25
|
Fu Z, Yan K, Rosenberg A, Jin Z, Crain B, Athas G, Vander Heide RS, Howard T, Everett AD, Herrington D, Van Eyk JE. Improved protein extraction and protein identification from archival formalin-fixed paraffin-embedded human aortas. Proteomics Clin Appl 2013; 7:217-24. [PMID: 23339088 PMCID: PMC4340701 DOI: 10.1002/prca.201200064] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 11/07/2012] [Accepted: 12/10/2012] [Indexed: 01/04/2023]
Abstract
PURPOSE Evaluate combination of heat and elevated pressure to enhance protein extraction and quality of formalin-fixed (FF), and FF paraffin-embedded (FFPE) aorta for proteomics. EXPERIMENT DESIGN Proteins were extracted from fresh frozen aorta at room temperature (RT). FF and FFPE aortas (3 months and 15 years) were extracted at RT, heat alone, or a combination of heat and high pressure. Protein yields were compared, and digested peptides from the extracts were analyzed with MS. RESULTS Combined heat and elevated pressure increased protein yield from human FF or FFPE aorta compared to matched tissues with heat alone (1.5-fold) or at RT (8.3-fold), resulting in more proteins identified and with more sequence coverage. The length of storage did adversely affect the quality of proteins from FF tissue. For long-term storage, aorta was preserved better with FFPE than FF alone. Periostin and MGF-E8 were demonstrated suitable for MRM assays from FFPE aorta. CONCLUSIONS AND CLINICAL RELEVANCE Combination of heat and high pressure is an effective method to extract proteins from FFPE aorta for downstream proteomics. This method opens the possibility for use of archival and often rare FFPE aortas and possibly other tissues available to proteomics for biomarker discovery and quantification.
Collapse
Affiliation(s)
- Zongming Fu
- Johns Hopkins School of Medicine, Department of Pediatrics, Johns Hopkins University, Baltimore, Maryland, 21224
| | - Kun Yan
- Johns Hopkins School of Medicine, Department of Medicine, Johns Hopkins University, Baltimore, Maryland, 21224
| | - Avraham Rosenberg
- Johns Hopkins School of Medicine, Department of Medicine, Johns Hopkins University, Baltimore, Maryland, 21224
| | - Zhicheng Jin
- Johns Hopkins School of Medicine, Department of Medicine, Johns Hopkins University, Baltimore, Maryland, 21224
| | - Barbara Crain
- Johns Hopkins School of Medicine, Department of Pathology, Johns Hopkins University, Baltimore, Maryland, 21224
| | - Grace Athas
- Louisiana State University School of Medicine, New Orleans, LA, 70112
| | - Richard S Vander Heide
- Johns Hopkins School of Medicine, Department of Pathology, Johns Hopkins University, Baltimore, Maryland, 21224
| | - Timothy Howard
- Wake Forest University School of Medicine, Winston-Salem, NC 27157
| | - Allen D. Everett
- Johns Hopkins School of Medicine, Department of Pediatrics, Johns Hopkins University, Baltimore, Maryland, 21224
| | - David Herrington
- Wake Forest University School of Medicine, Winston-Salem, NC 27157
| | - Jennifer E. Van Eyk
- Johns Hopkins School of Medicine, Department of Medicine, Johns Hopkins University, Baltimore, Maryland, 21224
| |
Collapse
|
26
|
Putman DM, Liu KY, Broughton HC, Bell GI, Hess DA. Umbilical cord blood-derived aldehyde dehydrogenase-expressing progenitor cells promote recovery from acute ischemic injury. Stem Cells 2013; 30:2248-60. [PMID: 22899443 DOI: 10.1002/stem.1206] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Umbilical cord blood (UCB) represents a readily available source of hematopoietic and endothelial precursors at early ontogeny. Understanding the proangiogenic functions of these somatic progenitor subtypes after transplantation is integral to the development of improved cell-based therapies to treat ischemic diseases. We used fluorescence-activated cell sorting to purify a rare (<0.5%) population of UCB cells with high aldehyde dehydrogenase (ALDH(hi) ) activity, a conserved stem/progenitor cell function. ALDH(hi) cells were depleted of mature monocytes and T- and B-lymphocytes and were enriched for early myeloid (CD33) and stem cell-associated (CD34, CD133, and CD117) phenotypes. Although these cells were primarily hematopoietic in origin, UCB ALDH(hi) cells demonstrated a proangiogenic transcription profile and were highly enriched for both multipotent myeloid and endothelial colony-forming cells in vitro. Coculture of ALDH(hi) cells in hanging transwells promoted the survival of human umbilical vein endothelial cells (HUVEC) under growth factor-free and serum-free conditions. On growth factor depleted matrigel, ALDH(hi) cells significantly increased tube-like cord formation by HUVEC. After induction of acute unilateral hind limb ischemia by femoral artery ligation, transplantation of ALDH(hi) cells significantly enhanced the recovery of perfusion in ischemic limbs. Despite transient engraftment in the ischemic hind limb, early recruitment of ALDH(hi) cells into ischemic muscle tissue correlated with increased murine von Willebrand factor blood vessel and CD31+ capillary densities. Thus, UCB ALDH(hi) cells represent a readily available population of proangiogenic progenitors that promote vascular regeneration. This work provides preclinical justification for the development of therapeutic strategies to treat ischemic diseases using UCB-derived ALDH(hi) mixed progenitor cells.
Collapse
Affiliation(s)
- David M Putman
- Krembil Centre for Stem Cell Biology, Robarts Research Institute, Department of Physiology and Pharmacology, The University of Western Ontario, London, Ontario, Canada
| | | | | | | | | |
Collapse
|
27
|
Van Riper SK, de Jong EP, Carlis JV, Griffin TJ. Mass Spectrometry-Based Proteomics: Basic Principles and Emerging Technologies and Directions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 990:1-35. [DOI: 10.1007/978-94-007-5896-4_1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
28
|
Hoopmann MR, Moritz RL. Current algorithmic solutions for peptide-based proteomics data generation and identification. Curr Opin Biotechnol 2012; 24:31-8. [PMID: 23142544 DOI: 10.1016/j.copbio.2012.10.013] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Revised: 10/08/2012] [Accepted: 10/18/2012] [Indexed: 12/28/2022]
Abstract
Peptide-based proteomic data sets are ever increasing in size and complexity. These data sets provide computational challenges when attempting to quickly analyze spectra and obtain correct protein identifications. Database search and de novo algorithms must consider high-resolution MS/MS spectra and alternative fragmentation methods. Protein inference is a tricky problem when analyzing large data sets of degenerate peptide identifications. Combining multiple algorithms for improved peptide identification puts significant strain on computational systems when investigating large data sets. This review highlights some of the recent developments in peptide and protein identification algorithms for analyzing shotgun mass spectrometry data when encountering the aforementioned hurdles. Also explored are the roles that analytical pipelines, public spectral libraries, and cloud computing play in the evolution of peptide-based proteomics.
Collapse
|
29
|
Blakeley P, Overton IM, Hubbard SJ. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. J Proteome Res 2012; 11:5221-34. [PMID: 23025403 PMCID: PMC3703792 DOI: 10.1021/pr300411q] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives.
Collapse
Affiliation(s)
- Paul Blakeley
- Faculty of Life Sciences, The University of Manchester, Manchester M13 9PT, UK
| | | | | |
Collapse
|
30
|
Wan C, Liu J, Fong V, Lugowski A, Stoilova S, Bethune-Waddell D, Borgeson B, Havugimana PC, Marcotte EM, Emili A. ComplexQuant: high-throughput computational pipeline for the global quantitative analysis of endogenous soluble protein complexes using high resolution protein HPLC and precision label-free LC/MS/MS. J Proteomics 2012; 81:102-11. [PMID: 23063720 DOI: 10.1016/j.jprot.2012.10.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Revised: 10/01/2012] [Accepted: 10/04/2012] [Indexed: 12/29/2022]
Abstract
The experimental isolation and characterization of stable multi-protein complexes are essential to understanding the molecular systems biology of a cell. To this end, we have developed a high-throughput proteomic platform for the systematic identification of native protein complexes based on extensive fractionation of soluble protein extracts by multi-bed ion exchange high performance liquid chromatography (IEX-HPLC) combined with exhaustive label-free LC/MS/MS shotgun profiling. To support these studies, we have built a companion data analysis software pipeline, termed ComplexQuant. Proteins present in the hundreds of fractions typically collected per experiment are first identified by exhaustively interrogating MS/MS spectra using multiple database search engines within an integrative probabilistic framework, while accounting for possible post-translation modifications. Protein abundance is then measured across the fractions based on normalized total spectral counts and precursor ion intensities using a dedicated tool, PepQuant. This analysis allows co-complex membership to be inferred based on the similarity of extracted protein co-elution profiles. Each computational step has been optimized for processing large-scale biochemical fractionation datasets, and the reliability of the integrated pipeline has been benchmarked extensively. This article is part of a Special Issue entitled: From protein structures to clinical applications.
Collapse
Affiliation(s)
- Cuihong Wan
- Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, Ontario, Canada M5S 3E1
| | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Guthals A, Bandeira N. Peptide identification by tandem mass spectrometry with alternate fragmentation modes. Mol Cell Proteomics 2012; 11:550-7. [PMID: 22595789 PMCID: PMC3434779 DOI: 10.1074/mcp.r112.018556] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Revised: 05/04/2012] [Indexed: 11/06/2022] Open
Abstract
The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California, San Diego, California, USA
| | | |
Collapse
|
32
|
Yang C, He Z, Yang C, Yu W. Peptide reranking with protein-peptide correspondence and precursor peak intensity information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1212-1219. [PMID: 22350209 DOI: 10.1109/tcbb.2012.29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Searching tandem mass spectra against a protein database has been a mainstream method for peptide identification. Improving peptide identification results by ranking true Peptide-Spectrum Matches (PSMs) over their false counterparts leads to the development of various reranking algorithms. In peptide reranking, discriminative information is essential to distinguish true PSMs from false PSMs. Generally, most peptide reranking methods obtain discriminative information directly from database search scores or by training machine learning models. Information in the protein database and MS1 spectra (i.e., single stage MS spectra) is ignored. In this paper, we propose to use information in the protein database and MS1 spectra to rerank peptide identification results. To quantitatively analyze their effects to peptide reranking results, three peptide reranking methods are proposed: PPMRanker, PPIRanker, and MIRanker. PPMRanker only uses Protein-Peptide Map (PPM) information from the protein database, PPIRanker only uses Precursor Peak Intensity (PPI) information, and MIRanker employs both PPM information and PPI information. According to our experiments on a standard protein mixture data set, a human data set and a mouse data set, PPMRanker and MIRanker achieve better peptide reranking results than PetideProphet, PeptideProphet+NSP (number of sibling peptides) and a score regularization method SRPI. The source codes of PPMRanker, PPIRanker, and MIRanker, and all supplementary documents are available at our website: http://bioinformatics.ust.hk/pepreranking/. Alternatively, these documents can also be downloaded from: http://sourceforge.net/projects/pepreranking/.
Collapse
Affiliation(s)
- Chao Yang
- The Hong Kong University of Science and Technology, RM B007D, University Apartment Tower B, Clear Water Bay, Kowloon, Hong Kong.
| | | | | | | |
Collapse
|
33
|
|
34
|
Sheng Q, Dai J, Wu Y, Tang H, Zeng R. BuildSummary: Using a Group-Based Approach To Improve the Sensitivity of Peptide/Protein Identification in Shotgun Proteomics. J Proteome Res 2012; 11:1494-502. [DOI: 10.1021/pr200194p] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Quanhu Sheng
- Key Laboratory of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jie Dai
- Key Laboratory of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yibo Wu
- Key Laboratory of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, Indiana 47406, United
States
| | - Rong Zeng
- Key Laboratory of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
35
|
Shteynberg D, Deutsch EW, Lam H, Eng JK, Sun Z, Tasman N, Mendoza L, Moritz RL, Aebersold R, Nesvizhskii AI. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol Cell Proteomics 2011; 10:M111.007690. [PMID: 21876204 PMCID: PMC3237071 DOI: 10.1074/mcp.m111.007690] [Citation(s) in RCA: 412] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2011] [Revised: 08/03/2011] [Indexed: 11/06/2022] Open
Abstract
The combination of tandem mass spectrometry and sequence database searching is the method of choice for the identification of peptides and the mapping of proteomes. Over the last several years, the volume of data generated in proteomic studies has increased dramatically, which challenges the computational approaches previously developed for these data. Furthermore, a multitude of search engines have been developed that identify different, overlapping subsets of the sample peptides from a particular set of tandem mass spectrometry spectra. We present iProphet, the new addition to the widely used open-source suite of proteomic data analysis tools Trans-Proteomics Pipeline. Applied in tandem with PeptideProphet, it provides more accurate representation of the multilevel nature of shotgun proteomic data. iProphet combines the evidence from multiple identifications of the same peptide sequences across different spectra, experiments, precursor ion charge states, and modified states. It also allows accurate and effective integration of the results from multiple database search engines applied to the same data. The use of iProphet in the Trans-Proteomics Pipeline increases the number of correctly identified peptides at a constant false discovery rate as compared with both PeptideProphet and another state-of-the-art tool Percolator. As the main outcome, iProphet permits the calculation of accurate posterior probabilities and false discovery rate estimates at the level of sequence identical peptide identifications, which in turn leads to more accurate probability estimates at the protein level. Fully integrated with the Trans-Proteomics Pipeline, it supports all commonly used MS instruments, search engines, and computer platforms. The performance of iProphet is demonstrated on two publicly available data sets: data from a human whole cell lysate proteome profiling experiment representative of typical proteomic data sets, and from a set of Streptococcus pyogenes experiments more representative of organism-specific composite data sets.
Collapse
Affiliation(s)
| | | | - Henry Lam
- §Department of Chemical and Biomolecular Engineering, the Hong Kong University of Science and Technology, Hong Kong
| | - Jimmy K. Eng
- ¶Department of Genome Sciences, University of Washington, Seattle, WA
| | - Zhi Sun
- From the ‡Institute for Systems Biology, Seattle, WA
| | | | - Luis Mendoza
- From the ‡Institute for Systems Biology, Seattle, WA
| | | | - Ruedi Aebersold
- ‖Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
- **Faculty of Sciences, University of Zurich, Zurich, Switzerland
- ‡‡Center for Systems Physiology and Metabolic Diseases, Zurich Switzerland
| | - Alexey I. Nesvizhskii
- §§Department of Pathology, University of Michigan, Ann Arbor, MI
- ¶¶Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
36
|
Alves G, Yu YK. Combining independent, weighted P-values: achieving computational stability by a systematic expansion with controllable accuracy. PLoS One 2011; 6:e22647. [PMID: 21912585 PMCID: PMC3166143 DOI: 10.1371/journal.pone.0022647] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Accepted: 06/27/2011] [Indexed: 11/27/2022] Open
Abstract
Given the expanding availability of scientific data and tools to analyze them, combining different assessments of the same piece of information has become increasingly important for social, biological, and even physical sciences. This task demands, to begin with, a method-independent standard, such as the -value, that can be used to assess the reliability of a piece of information. Good's formula and Fisher's method combine independent -values with respectively unequal and equal weights. Both approaches may be regarded as limiting instances of a general case of combining -values from groups; -values within each group are weighted equally, while weight varies by group. When some of the weights become nearly degenerate, as cautioned by Good, numeric instability occurs in computation of the combined -values. We deal explicitly with this difficulty by deriving a controlled expansion, in powers of differences in inverse weights, that provides both accurate statistics and stable numerics. We illustrate the utility of this systematic approach with a few examples. In addition, we also provide here an alternative derivation for the probability distribution function of the general case and show how the analytic formula obtained reduces to both Good's and Fisher's methods as special cases. A C++ program, which computes the combined -values with equal numerical stability regardless of whether weights are (nearly) degenerate or not, is available for download at our group website http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/CoinedPValues.html.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
37
|
Guo T, Lee SS, Ng WH, Zhu Y, Gan CS, Zhu J, Wang H, Huang S, Sze SK, Kon OL. Global molecular dysfunctions in gastric cancer revealed by an integrated analysis of the phosphoproteome and transcriptome. Cell Mol Life Sci 2011; 68:1983-2002. [PMID: 20953656 PMCID: PMC11114721 DOI: 10.1007/s00018-010-0545-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2010] [Revised: 09/27/2010] [Accepted: 09/28/2010] [Indexed: 12/30/2022]
Abstract
We integrated LC-MS/MS-based and protein antibody array-based proteomics with genomics approaches to investigate the phosphoproteome and transcriptome of gastric cancer cell lines and endoscopic gastric biopsies from normal subjects and patients with benign gastritis or gastric cancer. More than 3,000 non-redundant phosphorylation sites in over 1,200 proteins were identified in gastric cancer cells. We correlated phosphoproteome data with transcriptome data sets and reported the expression of 41 protein kinases, 5 phosphatases and 65 phosphorylated mitochondrial proteins in gastric cancer cells. Transcriptional expression levels of 190 phosphorylated proteins were >2-fold higher in gastric cancer cells compared to normal stomach tissue. Pathway analysis demonstrated over-presentation of DNA damage response pathway and underscored critical roles of phosphorylated p53 in gastric cancer. This is the first study to comprehensively report the gastric cancer phosphoproteome. Integrative analysis of the phosphoproteome and transcriptome provided an expansive view of molecular signaling pathways in gastric cancer.
Collapse
Affiliation(s)
- Tiannan Guo
- Division of Medical Sciences, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, 11 Hospital Drive, Singapore, 169610 Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551 Singapore
| | - Sze Sing Lee
- Division of Medical Sciences, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, 11 Hospital Drive, Singapore, 169610 Singapore
| | - Wai Har Ng
- Division of Medical Sciences, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, 11 Hospital Drive, Singapore, 169610 Singapore
| | - Yi Zhu
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551 Singapore
| | - Chee Sian Gan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551 Singapore
| | - Jiang Zhu
- Center for Stem Cell Research and Application, Union Hospital, Huazhong University of Science and Technology, 430022 Wuhan, People’s Republic of China
| | - Haixia Wang
- Center for Stem Cell Research and Application, Union Hospital, Huazhong University of Science and Technology, 430022 Wuhan, People’s Republic of China
| | - Shiang Huang
- Center for Stem Cell Research and Application, Union Hospital, Huazhong University of Science and Technology, 430022 Wuhan, People’s Republic of China
| | - Siu Kwan Sze
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551 Singapore
| | - Oi Lian Kon
- Division of Medical Sciences, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, 11 Hospital Drive, Singapore, 169610 Singapore
| |
Collapse
|
38
|
Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines. J Proteome Res 2011; 10:2949-58. [PMID: 21488652 DOI: 10.1021/pr2002116] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.
Collapse
Affiliation(s)
- Taejoon Kwon
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, USA
| | | | | | | | | |
Collapse
|
39
|
|
40
|
Vaudel M, Barsnes H, Berven FS, Sickmann A, Martens L. SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 2011; 11:996-9. [PMID: 21337703 DOI: 10.1002/pmic.201000595] [Citation(s) in RCA: 280] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2010] [Revised: 11/03/2010] [Accepted: 11/25/2010] [Indexed: 12/13/2022]
Abstract
The identification of proteins by mass spectrometry is a standard technique in the field of proteomics, relying on search engines to perform the identifications of the acquired spectra. Here, we present a user-friendly, lightweight and open-source graphical user interface called SearchGUI (http://searchgui.googlecode.com), for configuring and running the freely available OMSSA (open mass spectrometry search algorithm) and X!Tandem search engines simultaneously. Freely available under the permissible Apache2 license, SearchGUI is supported on Windows, Linux and OSX.
Collapse
Affiliation(s)
- Marc Vaudel
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Dortmund, Germany
| | | | | | | | | |
Collapse
|
41
|
Huang X, Tolmachev AV, Shen Y, Liu M, Huang L, Zhang Z, Anderson GA, Smith RD, Chan WC, Hinrichs SH, Fu K, Ding SJ. UNiquant, a program for quantitative proteomics analysis using stable isotope labeling. J Proteome Res 2011; 10:1228-37. [PMID: 21158445 DOI: 10.1021/pr1010058] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Stable isotope labeling (SIL) methods coupled with nanoscale liquid chromatography and high resolution tandem mass spectrometry are increasingly useful for elucidation of the proteome-wide differences between multiple biological samples. Development of more effective programs for the sensitive identification of peptide pairs and accurate measurement of the relative peptide/protein abundance are essential for quantitative proteomic analysis. We developed and evaluated the performance of a new program, termed UNiquant, for analyzing quantitative proteomics data using stable isotope labeling. UNiquant was compared with two other programs, MaxQuant and Mascot Distiller, using SILAC-labeled complex proteome mixtures having either known or unknown heavy/light ratios. For the SILAC-labeled Jeko-1 cell proteome digests with known heavy/light ratios (H/L = 1:1, 1:5, and 1:10), UNiquant quantified a similar number of peptide pairs as MaxQuant for the H/L = 1:1 and 1:5 mixtures. In addition, UNiquant quantified significantly more peptides than MaxQuant and Mascot Distiller in the H/L = 1:10 mixtures. UNiquant accurately measured relative peptide/protein abundance without the need for postmeasurement normalization of peptide ratios, which is required by the other programs.
Collapse
Affiliation(s)
- Xin Huang
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, Nebraska 68198, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Rosenqvist H, Ye J, Jensen ON. Analytical strategies in mass spectrometry-based phosphoproteomics. Methods Mol Biol 2011; 753:183-213. [PMID: 21604124 DOI: 10.1007/978-1-61779-148-2_13] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Phosphoproteomics, the systematic study of protein phosphorylation events and cell signaling networks in cells and tissues, is a rapidly evolving branch of functional proteomics. Current phosphoproteomics research provides a large toolbox of strategies and protocols that may assist researchers to reveal key regulatory events and phosphorylation-mediated processes in the cell and in whole organisms. We present an overview of sensitive and robust analytical methods for phosphopeptide analysis, including calcium phosphate precipitation and affinity enrichment methods such as IMAC and TiO(2). We then discuss various tandem mass spectrometry approaches for phosphopeptide sequencing and quantification, and we consider aspects of phosphoproteome data analysis and interpretation. Efficient integration of these stages of phosphoproteome analysis is highly important to ensure a successful outcome of large-scale experiments for studies of phosphorylation-mediated protein regulation.
Collapse
Affiliation(s)
- Heidi Rosenqvist
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow G4 0RE, Scotland, UK
| | | | | |
Collapse
|
43
|
RAId_aPS: MS/MS analysis with multiple scoring functions and spectrum-specific statistics. PLoS One 2010; 5:e15438. [PMID: 21103371 PMCID: PMC2982831 DOI: 10.1371/journal.pone.0015438] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2010] [Accepted: 09/20/2010] [Indexed: 11/26/2022] Open
Abstract
Statistically meaningful comparison/combination of peptide identification results from various search methods is impeded by the lack of a universal statistical standard. Providing an -value calibration protocol, we demonstrated earlier the feasibility of translating either the score or heuristic -value reported by any method into the textbook-defined -value, which may serve as the universal statistical standard. This protocol, although robust, may lose spectrum-specific statistics and might require a new calibration when changes in experimental setup occur. To mitigate these issues, we developed a new MS/MS search tool, RAId_aPS, that is able to provide spectrum-specific-values for additive scoring functions. Given a selection of scoring functions out of RAId score, K-score, Hyperscore and XCorr, RAId_aPS generates the corresponding score histograms of all possible peptides using dynamic programming. Using these score histograms to assign -values enables a calibration-free protocol for accurate significance assignment for each scoring function. RAId_aPS features four different modes: (i) compute the total number of possible peptides for a given molecular mass range, (ii) generate the score histogram given a MS/MS spectrum and a scoring function, (iii) reassign -values for a list of candidate peptides given a MS/MS spectrum and the scoring functions chosen, and (iv) perform database searches using selected scoring functions. In modes (iii) and (iv), RAId_aPS is also capable of combining results from different scoring functions using spectrum-specific statistics. The web link is http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid_aps/index.html. Relevant binaries for Linux, Windows, and Mac OS X are available from the same page.
Collapse
|
44
|
Alves G, Ogurtsov AY, Yu YK. Assigning statistical significance to proteotypic peptides via database searches. J Proteomics 2010; 74:199-211. [PMID: 21055489 DOI: 10.1016/j.jprot.2010.10.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Revised: 10/18/2010] [Accepted: 10/21/2010] [Indexed: 11/19/2022]
Abstract
Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId's knowledge database to include proteotypic information, utilized RAId's statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId's programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those that occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
45
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 358] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
46
|
Song L, Wang J, Liu J, Lu Z, Sui S, Jia W, Yang B, Chi H, Wang L, He S, Yu W, Meng L, Chen S, Peng X, Liang Y, Cai Y, Qian X. N-glycosylation proteome of endoplasmic reticulum in mouse liver by ConA affinity chromatography coupled with LTQ-FT mass spectrometry. Sci China Chem 2010. [DOI: 10.1007/s11426-010-0133-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
47
|
Kwon KH. Analytical methods for proteome data obtained from SDS-PAGE multi-dimensional separation and mass spectrometry. J Anal Sci Technol 2010. [DOI: 10.5355/jast.2010.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
48
|
Tharakan R, Edwards N, Graham DRM. Data maximization by multipass analysis of protein mass spectra. Proteomics 2010; 10:1160-71. [DOI: 10.1002/pmic.200900433] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
49
|
Rietschel B, Baeumlisberger D, Arrey TN, Bornemann S, Rohmer M, Schuerken M, Karas M, Meyer B. The Benefit of Combining nLC-MALDI-Orbitrap MS Data with nLC-MALDI-TOF/TOF Data for Proteomic Analyses Employing Elastase. J Proteome Res 2009; 8:5317-24. [DOI: 10.1021/pr900557k] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Benjamin Rietschel
- Cluster of Excellence “Macromolecular Complexes”, Institute for Pharmaceutical Chemistry, Goethe-University, Max-von-Laue-Strasse 9, D-60438 Frankfurt am Main, Germany
| | - Dominic Baeumlisberger
- Cluster of Excellence “Macromolecular Complexes”, Institute for Pharmaceutical Chemistry, Goethe-University, Max-von-Laue-Strasse 9, D-60438 Frankfurt am Main, Germany
| | - Tabiwang N. Arrey
- Cluster of Excellence “Macromolecular Complexes”, Institute for Pharmaceutical Chemistry, Goethe-University, Max-von-Laue-Strasse 9, D-60438 Frankfurt am Main, Germany
| | - Sandra Bornemann
- Cluster of Excellence “Macromolecular Complexes”, Institute for Pharmaceutical Chemistry, Goethe-University, Max-von-Laue-Strasse 9, D-60438 Frankfurt am Main, Germany
| | - Marion Rohmer
- Cluster of Excellence “Macromolecular Complexes”, Institute for Pharmaceutical Chemistry, Goethe-University, Max-von-Laue-Strasse 9, D-60438 Frankfurt am Main, Germany
| | - Malte Schuerken
- Cluster of Excellence “Macromolecular Complexes”, Institute for Pharmaceutical Chemistry, Goethe-University, Max-von-Laue-Strasse 9, D-60438 Frankfurt am Main, Germany
| | - Michael Karas
- Cluster of Excellence “Macromolecular Complexes”, Institute for Pharmaceutical Chemistry, Goethe-University, Max-von-Laue-Strasse 9, D-60438 Frankfurt am Main, Germany
| | - Bjoern Meyer
- Cluster of Excellence “Macromolecular Complexes”, Institute for Pharmaceutical Chemistry, Goethe-University, Max-von-Laue-Strasse 9, D-60438 Frankfurt am Main, Germany
| |
Collapse
|
50
|
Sultana T, Jordan R, Lyons-Weiler J. Optimization of the Use of Consensus Methods for the Detection and Putative Identification of Peptides via Mass Spectrometry Using Protein Standard Mixtures. ACTA ACUST UNITED AC 2009; 2:262-273. [PMID: 19779596 DOI: 10.4172/jpb.1000085] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Correct identification of peptides and proteins in complex biological samples from proteomic mass-spectra is a challenging problem in bioinformatics. The sensitivity and specificity of identification algorithms depend on underlying scoring methods, some being more sensitive, and others more specific. For high-throughput, automated peptide identification, control over the algorithms' performance in terms of trade-off between sensitivity and specificity is desirable. Combinations of algorithms, called 'consensus methods', have been shown to provide more accurate results than individual algorithms. However, due to the proliferation of algorithms and their varied internal settings, a systematic understanding of relative performance of individual and consensus methods are lacking. We performed an in-depth analysis of various approaches to consensus scoring using known protein mixtures, and evaluated the performance of 2310 settings generated from consensus of three different search algorithms: Mascot, Sequest, and X!Tandem. Our findings indicate that the union of Mascot, Sequest, and X!Tandem performed well (considering overall accuracy), and methods using 80-99.9% protein probability and/or minimum 2 peptides and/or 0-50% minimum peptide probability for protein identification performed better (on average) among all consensus methods tested in terms of overall accuracy. The results also suggest method selection strategies to provide direct control over sensitivity and specificity.
Collapse
Affiliation(s)
- Tamanna Sultana
- Bioinformatics Analysis Core, Genomics and Proteomics Core Laboratories and Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| | | | | |
Collapse
|