301
|
Cheng CY, Tsai CF, Chen YJ, Sung TY, Hsu WL. Spectrum-based Method to Generate Good Decoy Libraries for Spectral Library Searching in Peptide Identifications. J Proteome Res 2013; 12:2305-10. [DOI: 10.1021/pr301039b] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Chia-Ying Cheng
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Chia-Feng Tsai
- Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan
- Department of Chemistry, National Taiwan University, Taipei 106, Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan
- Department of Chemistry, National Taiwan University, Taipei 106, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
302
|
Gonzalez-Galarza FF, Qi D, Fan J, Bessant C, Jones AR. A tutorial for software development in quantitative proteomics using PSI standard formats. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:88-97. [PMID: 23584085 PMCID: PMC4008935 DOI: 10.1016/j.bbapap.2013.04.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Revised: 02/22/2013] [Accepted: 04/05/2013] [Indexed: 01/21/2023]
Abstract
The Human Proteome Organisation — Proteomics Standards Initiative (HUPO-PSI) has been working for ten years on the development of standardised formats that facilitate data sharing and public database deposition. In this article, we review three HUPO-PSI data standards — mzML, mzIdentML and mzQuantML, which can be used to design a complete quantitative analysis pipeline in mass spectrometry (MS)-based proteomics. In this tutorial, we briefly describe the content of each data model, sufficient for bioinformaticians to devise proteomics software. We also provide guidance on the use of recently released application programming interfaces (APIs) developed in Java for each of these standards, which makes it straightforward to read and write files of any size. We have produced a set of example Java classes and a basic graphical user interface to demonstrate how to use the most important parts of the PSI standards, available from http://code.google.com/p/psi-standard-formats-tutorial. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. A tutorial to help software developers use PSI standard formats. A description of programming interfaces and tools available. Code snippets and a basic graphical interface to assist understanding.
Collapse
|
303
|
Mohimani H, Kim S, Pevzner PA. A new approach to evaluating statistical significance of spectral identifications. J Proteome Res 2013; 12:1560-8. [PMID: 23343606 DOI: 10.1021/pr300453t] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
While nonlinear peptide natural products such as Vancomycin and Daptomycin are among the most effective antibiotics, the computational techniques for sequencing such peptides are still in their infancy. Previous methods for sequencing peptide natural products are based on Nuclear Magnetic Resonance spectroscopy and require large amounts (milligrams) of purified materials. Recently, development of mass spectrometry-based methods has enabled accurate sequencing of nonlinear peptide natural products using picograms of material, but the question of evaluating statistical significance of Peptide Spectrum Matches (PSM) for these peptides remains open. Moreover, it is unclear how to decide whether a given spectrum is produced by a linear, cyclic, or branch-cyclic peptide. Surprisingly, all previous mass spectrometry studies overlooked the fact that a very similar problem has been successfully addressed in particle physics in 1951. In this work, we develop a method for estimating statistical significance of PSMs defined by any peptide (including linear and nonlinear). This method enables us to identify whether a peptide is linear, cyclic, or branch-cyclic, an important step toward identification of peptide natural products.
Collapse
Affiliation(s)
- Hosein Mohimani
- Department of Electrical and Computer Engineering and ‡Department of Computer Science and Engineering, University of California-San Diego , San Diego, California 92093
| | | | | |
Collapse
|
304
|
Serang O, Froehlich JW, Muntel J, McDowell G, Steen H, Lee RS, Steen JA. SweetSEQer, simple de novo filtering and annotation of glycoconjugate mass spectra. Mol Cell Proteomics 2013; 12:1735-40. [PMID: 23443135 DOI: 10.1074/mcp.o112.025940] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The past 15 years have seen significant progress in LC-MS/MS peptide sequencing, including the advent of successful de novo and database search methods; however, analysis of glycopeptide and, more generally, glycoconjugate spectra remains a much more open problem, and much annotation is still performed manually. This is partly because glycans, unlike peptides, need not be linear chains and are instead described by trees. In this study, we introduce SweetSEQer, an extremely simple open source tool for identifying potential glycopeptide MS/MS spectra. We evaluate SweetSEQer on manually curated glycoconjugate spectra and on negative controls, and we demonstrate high quality filtering that can be easily improved for specific applications. We also demonstrate a high overlap between peaks annotated by experts and peaks annotated by SweetSEQer, as well as demonstrate inferred glycan graphs consistent with canonical glycan tree motifs. This study presents a novel tool for annotating spectra and producing glycan graphs from LC-MS/MS spectra. The tool is evaluated and shown to perform similarly to an expert on manually curated data.
Collapse
Affiliation(s)
- Oliver Serang
- Departments of Neurobiology, Harvard Medical School, Boston, Massachusetts 02119, USA.
| | | | | | | | | | | | | |
Collapse
|
305
|
Menschaert G, Van Criekinge W, Notelaers T, Koch A, Crappé J, Gevaert K, Van Damme P. Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events. Mol Cell Proteomics 2013; 12:1780-90. [PMID: 23429522 DOI: 10.1074/mcp.m113.027540] [Citation(s) in RCA: 134] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
An increasing number of studies involve integrative analysis of gene and protein expression data, taking advantage of new technologies such as next-generation transcriptome sequencing and highly sensitive mass spectrometry (MS) instrumentation. Recently, a strategy, termed ribosome profiling (or RIBO-seq), based on deep sequencing of ribosome-protected mRNA fragments, indirectly monitoring protein synthesis, has been described. We devised a proteogenomic approach constructing a custom protein sequence search space, built from both Swiss-Prot- and RIBO-seq-derived translation products, applicable for MS/MS spectrum identification. To record the impact of using the constructed deep proteome database, we performed two alternative MS-based proteomic strategies as follows: (i) a regular shotgun proteomic and (ii) an N-terminal combined fractional diagonal chromatography (COFRADIC) approach. Although the former technique gives an overall assessment on the protein and peptide level, the latter technique, specifically enabling the isolation of N-terminal peptides, is very appropriate in validating the RIBO-seq-derived (alternative) translation initiation site profile. We demonstrate that this proteogenomic approach increases the overall protein identification rate 2.5% (e.g. new protein products, new protein splice variants, single nucleotide polymorphism variant proteins, and N-terminally extended forms of known proteins) as compared with only searching UniProtKB-SwissProt. Furthermore, using this custom database, identification of N-terminal COFRADIC data resulted in detection of 16 alternative start sites giving rise to N-terminally extended protein variants besides the identification of four translated upstream ORFs. Notably, the characterization of these new translation products revealed the use of multiple near-cognate (non-AUG) start codons. As deep sequencing techniques are becoming more standard, less expensive, and widespread, we anticipate that mRNA sequencing and especially custom-tailored RIBO-seq will become indispensable in the MS-based protein or peptide identification process. The underlying mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the dataset identifier PXD000124.
Collapse
Affiliation(s)
- Gerben Menschaert
- Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium.
| | | | | | | | | | | | | |
Collapse
|
306
|
Reiz B, Kertész-Farkas A, Pongor S, Myers MP. Chemical rule-based filtering of MS/MS spectra. Bioinformatics 2013; 29:925-32. [DOI: 10.1093/bioinformatics/btt061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
307
|
Vaudel M, Breiter D, Beck F, Rahnenführer J, Martens L, Zahedi RP. D-score: a search engine independent MD-score. Proteomics 2013; 13:1036-41. [PMID: 23307401 DOI: 10.1002/pmic.201200408] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Revised: 11/11/2012] [Accepted: 12/04/2012] [Indexed: 01/29/2023]
Abstract
While peptides carrying PTMs are routinely identified in gel-free MS, the localization of the PTMs onto the peptide sequences remains challenging. Search engine scores of secondary peptide matches have been used in different approaches in order to infer the quality of site inference, by penalizing the localization whenever the search engine similarly scored two candidate peptides with different site assignments. In the present work, we show how the estimation of posterior error probabilities for peptide candidates allows the estimation of a PTM score called the D-score, for multiple search engine studies. We demonstrate the applicability of this score to three popular search engines: Mascot, OMSSA, and X!Tandem, and evaluate its performance using an already published high resolution data set of synthetic phosphopeptides. For those peptides with phosphorylation site inference uncertainty, the number of spectrum matches with correctly localized phosphorylation increased by up to 25.7% when compared to using Mascot alone, although the actual increase depended on the fragmentation method used. Since this method relies only on search engine scores, it can be readily applied to the scoring of the localization of virtually any modification at no additional experimental or in silico cost.
Collapse
Affiliation(s)
- Marc Vaudel
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Dortmund, Germany
| | | | | | | | | | | |
Collapse
|
308
|
Piehowski PD, Petyuk VA, Sandoval JD, Burnum KE, Kiebel GR, Monroe ME, Anderson GA, Camp DG, Smith RD. STEPS: a grid search methodology for optimized peptide identification filtering of MS/MS database search results. Proteomics 2013; 13:766-70. [PMID: 23303698 DOI: 10.1002/pmic.201200096] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Revised: 10/11/2012] [Accepted: 11/20/2012] [Indexed: 11/11/2022]
Abstract
For bottom-up proteomics, there are wide variety of database-searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid-search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection--referred to as STEPS--utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true-positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types.
Collapse
Affiliation(s)
- Paul D Piehowski
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA99352, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
309
|
Chalkley RJ. When target-decoy false discovery rate estimations are inaccurate and how to spot instances. J Proteome Res 2013; 12:1062-4. [PMID: 23298186 DOI: 10.1021/pr301063v] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
To address problems with estimating the reliability of proteomic search engine results from mass spectrometry fragmentation data, the use of target-decoy database searching has become the de facto approach for estimating a false discovery rate. Several articles have been written about the effects of different ways of creating the decoy database, effects of the search engine scoring, or effects of search parameters on whether this approach provides an accurate estimate, not all agreeing with each other's conclusions. Hence, there may be some confusion about how effective this approach is and how broadly it can be applied. Although it is generally very effective, in this article I will try to emphasize some of the pitfalls and dangers of using the target-decoy approach and will indicate tell-tale signs that something may be amiss. This information will hopefully help researchers become more astute in their assessment of search results.
Collapse
Affiliation(s)
- Robert J Chalkley
- Department of Pharmaceutical Chemistry, University of California San Francisco , 600 16th Street, Genentech Hall Room N474A, San Francisco, California 94158, USA.
| |
Collapse
|
310
|
Guthals A, Watrous JD, Dorrestein PC, Bandeira N. The spectral networks paradigm in high throughput mass spectrometry. MOLECULAR BIOSYSTEMS 2013; 8:2535-44. [PMID: 22610447 DOI: 10.1039/c2mb25085c] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.
Collapse
Affiliation(s)
- Adrian Guthals
- Dept. Computer Science and Engineering, University of California, San Diego, USA
| | | | | | | |
Collapse
|
311
|
Serang O, Paulo J, Steen H, Steen JA. A non-parametric cutout index for robust evaluation of identified proteins. Mol Cell Proteomics 2013; 12:807-12. [PMID: 23292186 DOI: 10.1074/mcp.o112.022863] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
This paper proposes a novel, automated method for evaluating sets of proteins identified using mass spectrometry. The remaining peptide-spectrum match score distributions of protein sets are compared to an empirical absent peptide-spectrum match score distribution, and a Bayesian non-parametric method reminiscent of the Dirichlet process is presented to accurately perform this comparison. Thus, for a given protein set, the process computes the likelihood that the proteins identified are correctly identified. First, the method is used to evaluate protein sets chosen using different protein-level false discovery rate (FDR) thresholds, assigning each protein set a likelihood. The protein set assigned the highest likelihood is used to choose a non-arbitrary protein-level FDR threshold. Because the method can be used to evaluate any protein identification strategy (and is not limited to mere comparisons of different FDR thresholds), we subsequently use the method to compare and evaluate multiple simple methods for merging peptide evidence over replicate experiments. The general statistical approach can be applied to other types of data (e.g. RNA sequencing) and generalizes to multivariate problems.
Collapse
Affiliation(s)
- Oliver Serang
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | | | | | | |
Collapse
|
312
|
Abstract
Peptides and proteins are routinely identified from peptide fragmentation spectra acquired in a mass spectrometer, analyzed by database search engines. The types of fragments that can be formed are known, and it is also well appreciated that certain fragment types are more common or more informative than others. However, most search engines do not use detailed knowledge of peptide fragmentation, but rather consider a limited range of fragments, giving each an equivalent weighting in their scoring system that decides which results are likely to be correct. This chapter discusses efforts to make use of information about the frequency of observation of different fragment ion types in order to produce more sophisticated and sensitive scoring systems and demonstrates how these new scoring systems are particularly powerful for analysis of electron capture or electron transfer dissociation data.
Collapse
|
313
|
Van Riper SK, de Jong EP, Carlis JV, Griffin TJ. Mass Spectrometry-Based Proteomics: Basic Principles and Emerging Technologies and Directions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 990:1-35. [DOI: 10.1007/978-94-007-5896-4_1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
314
|
Granholm V, Navarro JF, Noble WS, Käll L. Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics. J Proteomics 2012; 80:123-31. [PMID: 23268117 DOI: 10.1016/j.jprot.2012.12.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2012] [Revised: 11/30/2012] [Accepted: 12/11/2012] [Indexed: 01/10/2023]
Abstract
The analysis of a shotgun proteomics experiment results in a list of peptide-spectrum matches (PSMs) in which each fragmentation spectrum has been matched to a peptide in a database. Subsequently, most protein inference algorithms rank peptides according to the best-scoring PSM for each peptide. However, there is disagreement in the scientific literature on the best method to assess the statistical significance of the resulting peptide identifications. Here, we use a previously described calibration protocol to evaluate the accuracy of three different peptide-level statistical confidence estimation procedures: the classical Fisher's method, and two complementary procedures that estimate significance, respectively, before and after selecting the top-scoring PSM for each spectrum. Our experiments show that the latter method, which is employed by MaxQuant and Percolator, produces the most accurate, well-calibrated results.
Collapse
Affiliation(s)
- Viktor Granholm
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - José Fernández Navarro
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology (KTH), Solna, Sweden
| | - William Stafford Noble
- Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington, USA
| | - Lukas Käll
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology (KTH), Solna, Sweden.
| |
Collapse
|
315
|
Choi H, Liu G, Mellacheruvu D, Tyers M, Gingras AC, Nesvizhskii AI. Analyzing protein-protein interactions from affinity purification-mass spectrometry data with SAINT. ACTA ACUST UNITED AC 2012; Chapter 8:8.15.1-8.15.23. [PMID: 22948729 DOI: 10.1002/0471250953.bi0815s39] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Significance Analysis of INTeractome (SAINT) is a software package for scoring protein-protein interactions based on label-free quantitative proteomics data (e.g., spectral count or intensity) in affinity purification-mass spectrometry (AP-MS) experiments. SAINT allows bench scientists to select bona fide interactions and remove nonspecific interactions in an unbiased manner. However, there is no 'one-size-fits-all' statistical model for every dataset, since the experimental design varies across studies. Key variables include the number of baits, the number of biological replicates per bait, and control purifications. Here we give a detailed account of input data format, control data, selection of high-confidence interactions, and visualization of filtered data. We explain additional options for customizing the statistical model for optimal filtering in specific datasets. We also discuss a graphical user interface of SAINT in connection to the LIMS system ProHits, which can be installed as a virtual machine on Mac OS X or Windows computers.
Collapse
Affiliation(s)
- Hyungwon Choi
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore
| | | | | | | | | | | |
Collapse
|
316
|
Eng JK, Jahan TA, Hoopmann MR. Comet: An open-source MS/MS sequence database search tool. Proteomics 2012; 13:22-4. [DOI: 10.1002/pmic.201200439] [Citation(s) in RCA: 873] [Impact Index Per Article: 72.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Revised: 10/03/2012] [Accepted: 10/05/2012] [Indexed: 01/19/2023]
Affiliation(s)
- Jimmy K. Eng
- Department of Genome Sciences; University of Washington; Seattle WA USA
| | - Tahmina A. Jahan
- Department of Genome Sciences; University of Washington; Seattle WA USA
| | | |
Collapse
|
317
|
Evans VC, Barker G, Heesom KJ, Fan J, Bessant C, Matthews DA. De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat Methods 2012; 9:1207-11. [PMID: 23142869 PMCID: PMC3581816 DOI: 10.1038/nmeth.2227] [Citation(s) in RCA: 134] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Accepted: 09/21/2012] [Indexed: 11/30/2022]
Abstract
Identification of proteins by tandem mass spectrometry requires a reference protein database, but these are only available for model species. Here we demonstrate that, for a non-model species, the sequencing of expressed mRNA can generate a protein database for mass spectrometry-based identification. This combination of high-throughput sequencing and protein identification technologies allows detection of genes and proteins. We use human cells infected with human adenovirus as a complex and dynamic model to demonstrate the robustness of this approach. Our proteomics informed by transcriptomics (PIT) technique identifies >99% of over 3,700 distinct proteins identified using traditional analysis that relies on comprehensive human and adenovirus protein lists. We show that this approach can also be used to highlight genes and proteins undergoing dynamic changes in post-transcriptional protein stability.
Collapse
Affiliation(s)
- Vanessa C. Evans
- School of Cellular and Molecular Medicine, University of Bristol, University Walk, Bristol. BS8 1TD. UK
| | - Gary Barker
- School of Biological Sciences, University of Bristol, University Walk, Bristol. BS8 1TD. UK
| | - Kate J. Heesom
- School of Biochemistry, University of Bristol, University Walk, Bristol. BS8 1TD. UK
| | - Jun Fan
- Bioinformatics Group, Cranfield Health, Cranfield University, Cranfield, Bedfordshire. MK43 0AL. UK
| | - Conrad Bessant
- Bioinformatics Group, Cranfield Health, Cranfield University, Cranfield, Bedfordshire. MK43 0AL. UK
| | - David A. Matthews
- School of Cellular and Molecular Medicine, University of Bristol, University Walk, Bristol. BS8 1TD. UK
| |
Collapse
|
318
|
Yadav AK, Kumar D, Dash D. Learning from decoys to improve the sensitivity and specificity of proteomics database search results. PLoS One 2012. [PMID: 23189209 PMCID: PMC3506577 DOI: 10.1371/journal.pone.0050651] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The statistical validation of database search results is a complex issue in bottom-up proteomics. The correct and incorrect peptide spectrum match (PSM) scores overlap significantly, making an accurate assessment of true peptide matches challenging. Since the complete separation between the true and false hits is practically never achieved, there is need for better methods and rescoring algorithms to improve upon the primary database search results. Here we describe the calibration and False Discovery Rate (FDR) estimation of database search scores through a dynamic FDR calculation method, FlexiFDR, which increases both the sensitivity and specificity of search results. Modelling a simple linear regression on the decoy hits for different charge states, the method maximized the number of true positives and reduced the number of false negatives in several standard datasets of varying complexity (18-mix, 49-mix, 200-mix) and few complex datasets (E. coli and Yeast) obtained from a wide variety of MS platforms. The net positive gain for correct spectral and peptide identifications was up to 14.81% and 6.2% respectively. The approach is applicable to different search methodologies- separate as well as concatenated database search, high mass accuracy, and semi-tryptic and modification searches. FlexiFDR was also applied to Mascot results and showed better performance than before. We have shown that appropriate threshold learnt from decoys, can be very effective in improving the database search results. FlexiFDR adapts itself to different instruments, data types and MS platforms. It learns from the decoy hits and sets a flexible threshold that automatically aligns itself to the underlying variables of data quality and size.
Collapse
Affiliation(s)
- Amit Kumar Yadav
- GNR Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, Delhi, India
| | - Dhirendra Kumar
- GNR Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, Delhi, India
| | - Debasis Dash
- GNR Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, Delhi, India
- * E-mail:
| |
Collapse
|
319
|
Serang O, Moruz L, Hoopmann MR, Käll L. Recognizing uncertainty increases robustness and reproducibility of mass spectrometry-based protein inferences. J Proteome Res 2012; 11:5586-91. [PMID: 23148905 DOI: 10.1021/pr300426s] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Parsimony and protein grouping are widely employed to enforce economy in the number of identified proteins, with the goal of increasing the quality and reliability of protein identifications; however, in a counterintuitive manner, parsimony and protein grouping may actually decrease the reproducibility and interpretability of protein identifications. We present a simple illustration demonstrating ways in which parsimony and protein grouping may lower the reproducibility or interpretability of results. We then provide an example of a data set where a probabilistic method increases the reproducibility and interpretability of identifications made on replicate analyses of Human Du145 prostate cancer cell lines.
Collapse
Affiliation(s)
- Oliver Serang
- Department of Neurobiology, Harvard Medical School Children's Hospital Boston, Boston, Massachusetts, United States.
| | | | | | | |
Collapse
|
320
|
Hoopmann MR, Moritz RL. Current algorithmic solutions for peptide-based proteomics data generation and identification. Curr Opin Biotechnol 2012; 24:31-8. [PMID: 23142544 DOI: 10.1016/j.copbio.2012.10.013] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Revised: 10/08/2012] [Accepted: 10/18/2012] [Indexed: 12/28/2022]
Abstract
Peptide-based proteomic data sets are ever increasing in size and complexity. These data sets provide computational challenges when attempting to quickly analyze spectra and obtain correct protein identifications. Database search and de novo algorithms must consider high-resolution MS/MS spectra and alternative fragmentation methods. Protein inference is a tricky problem when analyzing large data sets of degenerate peptide identifications. Combining multiple algorithms for improved peptide identification puts significant strain on computational systems when investigating large data sets. This review highlights some of the recent developments in peptide and protein identification algorithms for analyzing shotgun mass spectrometry data when encountering the aforementioned hurdles. Also explored are the roles that analytical pipelines, public spectral libraries, and cloud computing play in the evolution of peptide-based proteomics.
Collapse
|
321
|
Ma K, Vitek O, Nesvizhskii AI. A statistical model-building perspective to identification of MS/MS spectra with PeptideProphet. BMC Bioinformatics 2012; 13 Suppl 16:S1. [PMID: 23176103 PMCID: PMC3489532 DOI: 10.1186/1471-2105-13-s16-s1] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
PeptideProphet is a post-processing algorithm designed to evaluate the confidence in identifications of MS/MS spectra returned by a database search. In this manuscript we describe the "what and how" of PeptideProphet in a manner aimed at statisticians and life scientists who would like to gain a more in-depth understanding of the underlying statistical modeling. The theory and rationale behind the mixture-modeling approach taken by PeptideProphet is discussed from a statistical model-building perspective followed by a description of how a model can be used to express confidence in the identification of individual peptides or sets of peptides. We also demonstrate how to evaluate the quality of model fit and select an appropriate model from several available alternatives. We illustrate the use of PeptideProphet in association with the Trans-Proteomic Pipeline, a free suite of software used for protein identification.
Collapse
Affiliation(s)
- Kelvin Ma
- Department of Statistics, Purdue University, 250 N. University Street, West Lafayette, Indiana, USA
| | | | | |
Collapse
|
322
|
Abstract
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.
Collapse
Affiliation(s)
- Yong Fuga Li
- School of Informatics and Computing, Indiana University, Bloomington 150 S, Woodlawn Avenue, Bloomington, Indiana 47405, USA
| | | |
Collapse
|
323
|
Abstract
Automated database search engines are one of the fundamental engines of high-throughput proteomics enabling daily identifications of hundreds of thousands of peptides and proteins from tandem mass (MS/MS) spectrometry data. Nevertheless, this automation also makes it humanly impossible to manually validate the vast lists of resulting identifications from such high-throughput searches. This challenge is usually addressed by using a Target-Decoy Approach (TDA) to impose an empirical False Discovery Rate (FDR) at a pre-determined threshold x% with the expectation that at most x% of the returned identifications would be false positives. But despite the fundamental importance of FDR estimates in ensuring the utility of large lists of identifications, there is surprisingly little consensus on exactly how TDA should be applied to minimize the chances of biased FDR estimates. In fact, since less rigorous TDA/FDR estimates tend to result in more identifications (at higher 'true' FDR), there is often little incentive to enforce strict TDA/FDR procedures in studies where the major metric of success is the size of the list of identifications and there are no follow up studies imposing hard cost constraints on the number of reported false positives. Here we address the problem of the accuracy of TDA estimates of empirical FDR. Using MS/MS spectra from samples where we were able to define a factual FDR estimator of 'true' FDR we evaluate several popular variants of the TDA procedure in a variety of database search contexts. We show that the fraction of false identifications can sometimes be over 10× higher than reported and may be unavoidably high for certain types of searches. In addition, we further report that the two-pass search strategy seems the most promising database search strategy. While unavoidably constrained by the particulars of any specific evaluation dataset, our observations support a series of recommendations towards maximizing the number of resulting identifications while controlling database searches with robust and reproducible TDA estimation of empirical FDR.
Collapse
Affiliation(s)
- Kyowon Jeong
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| | | | | |
Collapse
|
324
|
Cooper B. The problem with peptide presumption and the downfall of target-decoy false discovery rates. Anal Chem 2012; 84:9663-7. [PMID: 23106481 DOI: 10.1021/ac303051s] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In proteomics, peptide-tandem mass spectrum match scores and target-decoy database derived false discovery rates (FDR) are confidence indicators describing the quality of individual and sets of tandem mass spectrum matches. A user can impose a standard by prescribing a limit to these values, equivalent to drawing a line that separates better from poorer quality matches. As a result of setting narrower parent ion mass tolerances to reflect the better resolution of modern mass spectrometers, target-decoy derived FDRs can diminish. FDRs lowered this way consequently drive down the lower-limit for peptide-spectrum match score acceptance. Hence, data quality confidence appears to improve even while fragmentation evidence for some spectra remains weak. One negative outcome can be the presumed identification of peptides that do not exist. The options researchers have to improve proteomics data confidence are not panaceas, and there may be no satisfying solution as long as peptides are identified from a circumscribed list of proteins scientists wish to find.
Collapse
|
325
|
Pichler P, Mazanek M, Dusberger F, Weilnböck L, Huber CG, Stingl C, Luider TM, Straube WL, Köcher T, Mechtler K. SIMPATIQCO: a server-based software suite which facilitates monitoring the time course of LC-MS performance metrics on Orbitrap instruments. J Proteome Res 2012; 11:5540-7. [PMID: 23088386 PMCID: PMC3558011 DOI: 10.1021/pr300163u] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
![]()
While the performance of liquid chromatography (LC) and
mass spectrometry (MS) instrumentation continues to increase, applications
such as analyses of complete or near-complete proteomes and quantitative
studies require constant and optimal system performance. For this
reason, research laboratories and core facilities alike are recommended
to implement quality control (QC) measures as part of their routine
workflows. Many laboratories perform sporadic quality control checks.
However, successive and systematic longitudinal monitoring of system
performance would be facilitated by dedicated automatic or semiautomatic
software solutions that aid an effortless analysis and display of
QC metrics over time. We present the software package SIMPATIQCO (SIMPle AuTomatIc Quality COntrol) designed
for evaluation of data from LTQ Orbitrap, Q-Exactive, LTQ FT, and
LTQ instruments. A centralized SIMPATIQCO server can process QC data
from multiple instruments. The software calculates QC metrics supervising
every step of data acquisition from LC and electrospray to MS. For
each QC metric the software learns the range indicating adequate system
performance from the uploaded data using robust statistics. Results
are stored in a database and can be displayed in a comfortable manner
from any computer in the laboratory via a web browser. QC data can
be monitored for individual LC runs as well as plotted over time.
SIMPATIQCO thus assists the longitudinal monitoring of important QC
metrics such as peptide elution times, peak widths, intensities, total
ion current (TIC) as well as sensitivity, and overall LC–MS
system performance; in this way the software also helps identify potential
problems. The SIMPATIQCO software package is available free of charge.
Collapse
Affiliation(s)
- Peter Pichler
- Research Institute of Molecular Pathology, Vienna, Austria.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
326
|
Nesvizhskii AI. Computational and informatics strategies for identification of specific protein interaction partners in affinity purification mass spectrometry experiments. Proteomics 2012; 12:1639-55. [PMID: 22611043 DOI: 10.1002/pmic.201100537] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false-positive protein interactions present in unfiltered data sets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS data sets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data.
Collapse
|
327
|
Blakeley P, Overton IM, Hubbard SJ. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. J Proteome Res 2012; 11:5221-34. [PMID: 23025403 PMCID: PMC3703792 DOI: 10.1021/pr300411q] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives.
Collapse
Affiliation(s)
- Paul Blakeley
- Faculty of Life Sciences, The University of Manchester, Manchester M13 9PT, UK
| | | | | |
Collapse
|
328
|
Saha S, Dazard JE, Xu H, Ewing RM. Computational framework for analysis of prey-prey associations in interaction proteomics identifies novel human protein-protein interactions and networks. J Proteome Res 2012; 11:4476-87. [PMID: 22845868 DOI: 10.1021/pr300227y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Large-scale protein-protein interaction data sets have been generated for several species including yeast and human and have enabled the identification, quantification, and prediction of cellular molecular networks. Affinity purification-mass spectrometry (AP-MS) is the preeminent methodology for large-scale analysis of protein complexes, performed by immunopurifying a specific "bait" protein and its associated "prey" proteins. The analysis and interpretation of AP-MS data sets is, however, not straightforward. In addition, although yeast AP-MS data sets are relatively comprehensive, current human AP-MS data sets only sparsely cover the human interactome. Here we develop a framework for analysis of AP-MS data sets that addresses the issues of noise, missing data, and sparsity of coverage in the context of a current, real world human AP-MS data set. Our goal is to extend and increase the density of the known human interactome by integrating bait-prey and cocomplexed preys (prey-prey associations) into networks. Our framework incorporates a score for each identified protein, as well as elements of signal processing to improve the confidence of identified protein-protein interactions. We identify many protein networks enriched in known biological processes and functions. In addition, we show that integrated bait-prey and prey-prey interactions can be used to refine network topology and extend known protein networks.
Collapse
Affiliation(s)
- Sudipto Saha
- Center for Proteomics and Bioinformatics, Western Reserve University School of Medicine, Cleveland, Ohio 44106, USA
| | | | | | | |
Collapse
|
329
|
Guthals A, Bandeira N. Peptide identification by tandem mass spectrometry with alternate fragmentation modes. Mol Cell Proteomics 2012; 11:550-7. [PMID: 22595789 PMCID: PMC3434779 DOI: 10.1074/mcp.r112.018556] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Revised: 05/04/2012] [Indexed: 11/06/2022] Open
Abstract
The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California, San Diego, California, USA
| | | |
Collapse
|
330
|
Cappadona S, Baker PR, Cutillas PR, Heck AJR, van Breukelen B. Current challenges in software solutions for mass spectrometry-based quantitative proteomics. Amino Acids 2012. [PMID: 22821268 DOI: 10.1007/s00726-012-1289-1288] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Mass spectrometry-based proteomics has evolved as a high-throughput research field over the past decade. Significant advances in instrumentation, and the ability to produce huge volumes of data, have emphasized the need for adequate data analysis tools, which are nowadays often considered the main bottleneck for proteomics development. This review highlights important issues that directly impact the effectiveness of proteomic quantitation and educates software developers and end-users on available computational solutions to correct for the occurrence of these factors. Potential sources of errors specific for stable isotope-based methods or label-free approaches are explicitly outlined. The overall aim focuses on a generic proteomic workflow.
Collapse
Affiliation(s)
- Salvatore Cappadona
- Biomolecular Mass Spectrometry and Proteomics Group, Bijvoet Centre for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, Utrecht, The Netherlands
| | | | | | | | | |
Collapse
|
331
|
Cappadona S, Baker PR, Cutillas PR, Heck AJR, van Breukelen B. Current challenges in software solutions for mass spectrometry-based quantitative proteomics. Amino Acids 2012; 43:1087-108. [PMID: 22821268 PMCID: PMC3418498 DOI: 10.1007/s00726-012-1289-8] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2010] [Accepted: 04/03/2012] [Indexed: 10/31/2022]
Abstract
Mass spectrometry-based proteomics has evolved as a high-throughput research field over the past decade. Significant advances in instrumentation, and the ability to produce huge volumes of data, have emphasized the need for adequate data analysis tools, which are nowadays often considered the main bottleneck for proteomics development. This review highlights important issues that directly impact the effectiveness of proteomic quantitation and educates software developers and end-users on available computational solutions to correct for the occurrence of these factors. Potential sources of errors specific for stable isotope-based methods or label-free approaches are explicitly outlined. The overall aim focuses on a generic proteomic workflow.
Collapse
Affiliation(s)
- Salvatore Cappadona
- Biomolecular Mass Spectrometry and Proteomics Group, Bijvoet Centre for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Centre, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Peter R. Baker
- Department of Pharmaceutical Chemistry, Mass Spectrometry Facility, University of California San Francisco, San Francisco, USA
| | - Pedro R. Cutillas
- Analytical Signalling Group, Centre for Cell Signalling, Barts Cancer Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ UK
| | - Albert J. R. Heck
- Biomolecular Mass Spectrometry and Proteomics Group, Bijvoet Centre for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Centre, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Bas van Breukelen
- Biomolecular Mass Spectrometry and Proteomics Group, Bijvoet Centre for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Proteomics Centre, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Netherlands Bioinformatics Centre, Padualaan 8, 3584 CH Utrecht, The Netherlands
| |
Collapse
|
332
|
Baker ES, Liu T, Petyuk VA, Burnum-Johnson KE, Ibrahim YM, Anderson GA, Smith RD. Mass spectrometry for translational proteomics: progress and clinical implications. Genome Med 2012; 4:63. [PMID: 22943415 PMCID: PMC3580401 DOI: 10.1186/gm364] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
The utility of mass spectrometry (MS)-based proteomic analyses and their clinical applications have been increasingly recognized over the past decade due to their high sensitivity, specificity and throughput. MS-based proteomic measurements have been used in a wide range of biological and biomedical investigations, including analysis of cellular responses and disease-specific post-translational modifications. These studies greatly enhance our understanding of the complex and dynamic nature of the proteome in biology and disease. Some MS techniques, such as those for targeted analysis, are being successfully applied for biomarker verification, whereas others, including global quantitative analysis (for example, for biomarker discovery), are more challenging and require further development. However, recent technological improvements in sample processing, instrumental platforms, data acquisition approaches and informatics capabilities continue to advance MS-based applications. Improving the detection of significant changes in proteins through these advances shows great promise for the discovery of improved biomarker candidates that can be verified pre-clinically using targeted measurements, and ultimately used in clinical studies - for example, for early disease diagnosis or as targets for drug development and therapeutic intervention. Here, we review the current state of MS-based proteomics with regard to its advantages and current limitations, and we highlight its translational applications in studies of protein biomarkers.
Collapse
Affiliation(s)
- Erin Shammel Baker
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Tao Liu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Vladislav A Petyuk
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | | | - Yehia M Ibrahim
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Gordon A Anderson
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Richard D Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| |
Collapse
|
333
|
Abstract
Discovery or shotgun proteomics has emerged as the most powerful technique to comprehensively map out a proteome. Reconstruction of protein identities from the raw mass spectrometric data constitutes a cornerstone of any shotgun proteomics workflow. The inherent uncertainty of mass spectrometric data and the complexity of a proteome render protein inference and the statistical validation of protein identifications a non-trivial task, still being a subject of ongoing research. This review aims to survey the different conceptual approaches to the different tasks of inferring and statistically validating protein identifications and to discuss their implications on the scope of proteome exploration.
Collapse
Affiliation(s)
- Manfred Claassen
- Computer Science Department, Stanford University, Stanford, CA 94305-9010, USA.
| |
Collapse
|
334
|
Baliban RC, Dimaggio PA, Plazas-Mayorca MD, Garcia BA, Floudas CA. PILOT_PROTEIN: identification of unmodified and modified proteins via high-resolution mass spectrometry and mixed-integer linear optimization. J Proteome Res 2012; 11:4615-29. [PMID: 22788846 DOI: 10.1021/pr300418j] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
A novel protein identification framework, PILOT_PROTEIN, has been developed to construct a comprehensive list of all unmodified proteins that are present in a living sample. It uses the peptide identification results from the PILOT_SEQUEL algorithm to initially determine all unmodified proteins within the sample. Using a rigorous biclustering approach that groups incorrect peptide sequences with other homologous sequences, the number of false positives reported is minimized. A sequence tag procedure is then incorporated along with the untargeted PTM identification algorithm, PILOT_PTM, to determine a list of all modification types and sites for each protein. The unmodified protein identification algorithm, PILOT_PROTEIN, is compared to the methods SEQUEST, InsPecT, X!Tandem, VEMS, and ProteinProspector using both prepared protein samples and a more complex chromatin digest. The algorithm demonstrates superior protein identification accuracy with a lower false positive rate. All materials are freely available to the scientific community at http://pumpd.princeton.edu.
Collapse
Affiliation(s)
- Richard C Baliban
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, USA
| | | | | | | | | |
Collapse
|
335
|
Taverner T, Karpievitch YV, Polpitiya AD, Brown JN, Dabney AR, Anderson GA, Smith RD. DanteR: an extensible R-based tool for quantitative analysis of -omics data. ACTA ACUST UNITED AC 2012; 28:2404-6. [PMID: 22815360 DOI: 10.1093/bioinformatics/bts449] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION The size and complex nature of mass spectrometry-based proteomics datasets motivate development of specialized software for statistical data analysis and exploration. We present DanteR, a graphical R package that features extensive statistical and diagnostic functions for quantitative proteomics data analysis, including normalization, imputation, hypothesis testing, interactive visualization and peptide-to-protein rollup. More importantly, users can easily extend the existing functionality by including their own algorithms under the Add-On tab. AVAILABILITY DanteR and its associated user guide are available for download free of charge at http://omics.pnl.gov/software/. We have an updated binary source for the DanteR package up on our website together with a vignettes document. For Windows, a single click automatically installs DanteR along with the R programming environment. For Linux and Mac OS X, users must install R and then follow instructions on the DanteR website for package installation. CONTACT rds@pnnl.gov.
Collapse
Affiliation(s)
- Tom Taverner
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | | | | | | | | | | | | |
Collapse
|
336
|
Gonzalez-Galarza FF, Lawless C, Hubbard SJ, Fan J, Bessant C, Hermjakob H, Jones AR. A critical appraisal of techniques, software packages, and standards for quantitative proteomic analysis. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2012; 16:431-42. [PMID: 22804616 DOI: 10.1089/omi.2012.0022] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
New methods for performing quantitative proteome analyses based on differential labeling protocols or label-free techniques are reported in the literature on an almost monthly basis. In parallel, a correspondingly vast number of software tools for the analysis of quantitative proteomics data has also been described in the literature and produced by private companies. In this article we focus on the review of some of the most popular techniques in the field and present a critical appraisal of several software packages available to process and analyze the data produced. We also describe the importance of community standards to support the wide range of software, which may assist researchers in the analysis of data using different platforms and protocols. It is intended that this review will serve bench scientists both as a useful reference and a guide to the selection and use of different pipelines to perform quantitative proteomics data analysis. We have produced a web-based tool ( http://www.proteosuite.org/?q=other_resources ) to help researchers find appropriate software for their local instrumentation, available file formats, and quantitative methodology.
Collapse
|
337
|
Li N, Wu S, Zhang C, Chang C, Zhang J, Ma J, Li L, Qian X, Xu P, Zhu Y, He F. PepDistiller: A quality control tool to improve the sensitivity and accuracy of peptide identifications in shotgun proteomics. Proteomics 2012; 12:1720-5. [PMID: 22623377 DOI: 10.1002/pmic.201100167] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Affiliation(s)
- Ning Li
- State Key Laboratory of Proteomics; Beijing Proteome Research Center; Beijing Institute of Radiation Medicine; Beijing P. R. China
| | - Songfeng Wu
- State Key Laboratory of Proteomics; Beijing Proteome Research Center; Beijing Institute of Radiation Medicine; Beijing P. R. China
| | - Chengpu Zhang
- State Key Laboratory of Proteomics; Beijing Proteome Research Center; Beijing Institute of Radiation Medicine; Beijing P. R. China
| | - Cheng Chang
- State Key Laboratory of Proteomics; Beijing Proteome Research Center; Beijing Institute of Radiation Medicine; Beijing P. R. China
| | - Jiyang Zhang
- College of Mechanical and Electronic Engineering and Automatization; National University of Defense Technology; Changsha P. R. China
| | - Jie Ma
- State Key Laboratory of Proteomics; Beijing Proteome Research Center; Beijing Institute of Radiation Medicine; Beijing P. R. China
| | - Liwei Li
- State Key Laboratory of Proteomics; Beijing Proteome Research Center; Beijing Institute of Radiation Medicine; Beijing P. R. China
| | - Xiaohong Qian
- State Key Laboratory of Proteomics; Beijing Proteome Research Center; Beijing Institute of Radiation Medicine; Beijing P. R. China
| | - Ping Xu
- State Key Laboratory of Proteomics; Beijing Proteome Research Center; Beijing Institute of Radiation Medicine; Beijing P. R. China
| | - Yunping Zhu
- State Key Laboratory of Proteomics; Beijing Proteome Research Center; Beijing Institute of Radiation Medicine; Beijing P. R. China
| | - Fuchu He
- State Key Laboratory of Proteomics; Beijing Proteome Research Center; Beijing Institute of Radiation Medicine; Beijing P. R. China
- Institutes of Biomedical Sciences; Fudan University; Shanghai P. R. China
| |
Collapse
|
338
|
Abstract
High-throughput identification of proteins with the latest generation of hybrid high-resolution mass spectrometers is opening new perspectives in microbiology. I present, here, an overview of tandem mass spectrometry technology and bioinformatics for shotgun proteomics that make 2D-PAGE approaches obsolete. Non-labelling quantitative approaches have become more popular than labelling techniques on most proteomic platforms because they are easier to carry out while their quantitative outcome is rather robust. Parameters for recording mass spectrometry data, however, need to be chosen carefully and statistics to assess the confidence of the results should not be neglected. Interestingly, next-generation sequencing methodologies make any microbial model quickly amenable to proteomics, leading to the documentation of a wide range of organisms from diverse environments. Some recent discoveries made using microbial proteomics have challenged some biological dogma, such as: (i) initiation of the translation does not occur predominantly from ATG codons in some microorganisms, (ii) non-canonical initiation codons are used to regulate the production of specific but important proteins and (iii) a gene may code for multiple polypeptide species, heterogeneous in terms of sequences. Microbial diversity and microbial physiology can now be revisited by means of exhaustive comparative proteomic surveys where thousands of proteins are detected and quantified. Proteogenomics, consisting of better annotating of genomes with the help of proteomic evidence, is paving the way for integrated multi-omic approaches in microbiology. Finally, meta-proteomic tools and approaches are emerging for tackling the high complexity of the microbial world as a whole, opening new perspectives for assessing how microbial communities function.
Collapse
Affiliation(s)
- Jean Armengaud
- CEA, DSV, IBEB, Lab Biochim System Perturb, F-30207 Bagnols-sur-Cèze, France.
| |
Collapse
|
339
|
Milloy JA, Faherty BK, Gerber SA. Tempest: GPU-CPU computing for high-throughput database spectral matching. J Proteome Res 2012; 11:3581-91. [PMID: 22640374 DOI: 10.1021/pr300338p] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Modern mass spectrometers are now capable of producing hundreds of thousands of tandem (MS/MS) spectra per experiment, making the translation of these fragmentation spectra into peptide matches a common bottleneck in proteomics research. When coupled with experimental designs that enrich for post-translational modifications such as phosphorylation and/or include isotopically labeled amino acids for quantification, additional burdens are placed on this computational infrastructure by shotgun sequencing. To address this issue, we have developed a new database searching program that utilizes the massively parallel compute capabilities of a graphical processing unit (GPU) to produce peptide spectral matches in a very high throughput fashion. Our program, named Tempest, combines efficient database digestion and MS/MS spectral indexing on a CPU with fast similarity scoring on a GPU. In our implementation, the entire similarity score, including the generation of full theoretical peptide candidate fragmentation spectra and its comparison to experimental spectra, is conducted on the GPU. Although Tempest uses the classical SEQUEST XCorr score as a primary metric for evaluating similarity for spectra collected at unit resolution, we have developed a new "Accelerated Score" for MS/MS spectra collected at high resolution that is based on a computationally inexpensive dot product but exhibits scoring accuracy similar to that of the classical XCorr. In our experience, Tempest provides compute-cluster level performance in an affordable desktop computer.
Collapse
Affiliation(s)
- Jeffrey A Milloy
- Norris Cotton Cancer Center, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire 03756, USA
| | | | | |
Collapse
|
340
|
Angel TE, Aryal UK, Hengel SM, Baker ES, Kelly RT, Robinson EW, Smith RD. Mass spectrometry-based proteomics: existing capabilities and future directions. Chem Soc Rev 2012. [PMID: 22498958 DOI: 10.1039/c2cs15331a.mass] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023]
Abstract
Mass spectrometry (MS)-based proteomics is emerging as a broadly effective means for identification, characterization, and quantification of proteins that are integral components of the processes essential for life. Characterization of proteins at the proteome and sub-proteome (e.g., the phosphoproteome, proteoglycome, or degradome/peptidome) levels provides a foundation for understanding fundamental aspects of biology. Emerging technologies such as ion mobility separations coupled with MS and microchip-based-proteome measurements combined with MS instrumentation and chromatographic separation techniques, such as nanoscale reversed phase liquid chromatography and capillary electrophoresis, show great promise for both broad undirected and targeted highly sensitive measurements. MS-based proteomics increasingly contribute to our understanding of the dynamics, interactions, and roles that proteins and peptides play, advancing our understanding of biology on a systems wide level for a wide range of applications including investigations of microbial communities, bioremediation, and human health.
Collapse
Affiliation(s)
- Thomas E Angel
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | | | | | | | | | | | | |
Collapse
|
341
|
Starkey JM, Tilton RG. Proteomics and systems biology for understanding diabetic nephropathy. J Cardiovasc Transl Res 2012; 5:479-90. [PMID: 22581264 DOI: 10.1007/s12265-012-9372-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2012] [Accepted: 05/01/2012] [Indexed: 01/07/2023]
Abstract
Like many diseases, diabetic nephropathy is defined in a histopathological context and studied using reductionist approaches that attempt to ameliorate structural changes. Novel technologies in mass spectrometry-based proteomics have the ability to provide a deeper understanding of the disease beyond classical histopathology, redefine the characteristics of the disease state, and identify novel approaches to reduce renal failure. The goal is to translate these new definitions into improved patient outcomes through diagnostic, prognostic, and therapeutic tools. Here, we review progress made in studying the proteomics of diabetic nephropathy and provide an introduction to the informatics tools used in the analysis of systems biology data, while pointing out statistical issues for consideration. Novel bioinformatics methods may increase biomarker identification, and other tools, including selective reaction monitoring, may hasten clinical validation.
Collapse
Affiliation(s)
- Jonathan M Starkey
- Department of Internal Medicine, University of Texas Medical Branch, Galveston, TX 77555-1060, USA
| | | |
Collapse
|
342
|
Angel TE, Aryal UK, Hengel SM, Baker ES, Kelly RT, Robinson EW, Smith RD. Mass spectrometry-based proteomics: existing capabilities and future directions. Chem Soc Rev 2012; 41:3912-28. [PMID: 22498958 DOI: 10.1039/c2cs15331a] [Citation(s) in RCA: 263] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Mass spectrometry (MS)-based proteomics is emerging as a broadly effective means for identification, characterization, and quantification of proteins that are integral components of the processes essential for life. Characterization of proteins at the proteome and sub-proteome (e.g., the phosphoproteome, proteoglycome, or degradome/peptidome) levels provides a foundation for understanding fundamental aspects of biology. Emerging technologies such as ion mobility separations coupled with MS and microchip-based-proteome measurements combined with MS instrumentation and chromatographic separation techniques, such as nanoscale reversed phase liquid chromatography and capillary electrophoresis, show great promise for both broad undirected and targeted highly sensitive measurements. MS-based proteomics increasingly contribute to our understanding of the dynamics, interactions, and roles that proteins and peptides play, advancing our understanding of biology on a systems wide level for a wide range of applications including investigations of microbial communities, bioremediation, and human health.
Collapse
Affiliation(s)
- Thomas E Angel
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | | | | | | | | | | | | |
Collapse
|
343
|
Brusniak MYK, Chu CS, Kusebauch U, Sartain MJ, Watts JD, Moritz RL. An assessment of current bioinformatic solutions for analyzing LC-MS data acquired by selected reaction monitoring technology. Proteomics 2012; 12:1176-84. [PMID: 22577019 PMCID: PMC3857306 DOI: 10.1002/pmic.201100571] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2011] [Accepted: 01/10/2012] [Indexed: 12/18/2022]
Abstract
Selected reaction monitoring (SRM) is an accurate quantitative technique, typically used for small-molecule mass spectrometry (MS). SRM has emerged as an important technique for targeted and hypothesis-driven proteomic research, and is becoming the reference method for protein quantification in complex biological samples. SRM offers high selectivity, a lower limit of detection and improved reproducibility, compared to conventional shot-gun-based tandem MS (LC-MS/MS) methods. Unlike LC-MS/MS, which requires computationally intensive informatic postanalysis, SRM requires preacquisition bioinformatic analysis to determine proteotypic peptides and optimal transitions to uniquely identify and to accurately quantitate proteins of interest. Extensive arrays of bioinformatics software tools, both web-based and stand-alone, have been published to assist researchers to determine optimal peptides and transition sets. The transitions are oftentimes selected based on preferred precursor charge state, peptide molecular weight, hydrophobicity, fragmentation pattern at a given collision energy (CE), and instrumentation chosen. Validation of the selected transitions for each peptide is critical since peptide performance varies depending on the mass spectrometer used. In this review, we provide an overview of open source and commercial bioinformatic tools for analyzing LC-MS data acquired by SRM.
Collapse
Affiliation(s)
| | - Caroline S. Chu
- Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA, 98109 USA
| | - Ulrike Kusebauch
- Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA, 98109 USA
| | - Mark J. Sartain
- Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA, 98109 USA
| | - Julian D. Watts
- Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA, 98109 USA
| | - Robert L. Moritz
- Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA, 98109 USA
| |
Collapse
|
344
|
A support for the identification of non-tryptic peptides based on low resolution tandem and sequential mass spectrometry data: The INSPIRE software. Anal Chim Acta 2012; 718:70-7. [DOI: 10.1016/j.aca.2012.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2011] [Revised: 12/28/2011] [Accepted: 01/02/2012] [Indexed: 11/17/2022]
|
345
|
Ning K, Fermin D, Nesvizhskii AI. Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data. J Proteome Res 2012; 11:2261-71. [PMID: 22329341 DOI: 10.1021/pr201052x] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
An increasing number of studies involve integrative analysis of gene and protein expression data taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS) instrumentation. Thus, it becomes interesting to revisit the correlative analysis of gene and protein expression data using more recently generated data sets. Furthermore, within the proteomics community there is a substantial interest in comparing the performance of different label-free quantitative proteomic strategies. Gene expression data can be used as an indirect benchmark for such protein-level comparisons. In this work we use publicly available mouse data to perform a joint analysis of genomic and proteomic data obtained on the same organism. First, we perform a comparative analysis of different label-free protein quantification methods (intensity based and spectral count based and using various associated data normalization steps) using several software tools on the proteomic side. Similarly, we perform correlative analysis of gene expression data derived using microarray and RNA-Seq methods on the genomic side. We also investigate the correlation between gene and protein expression data, and various factors affecting the accuracy of quantitation at both levels. It is observed that spectral count based protein abundance metrics, which are easy to extract from any published data, are comparable to intensity based measures with respect to correlation with gene expression data. The results of this work should be useful for designing robust computational pipelines for extraction and joint analysis of gene and protein expression data in the context of integrative studies.
Collapse
Affiliation(s)
- Kang Ning
- Department of Pathology and §Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, United States
| | | | | |
Collapse
|
346
|
|
347
|
Benk AS, Roesli C. Label-free quantification using MALDI mass spectrometry: considerations and perspectives. Anal Bioanal Chem 2012; 404:1039-56. [DOI: 10.1007/s00216-012-5832-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Revised: 01/27/2012] [Accepted: 02/01/2012] [Indexed: 01/17/2023]
|
348
|
Chalkley RJ, Clauser KR. Modification site localization scoring: strategies and performance. Mol Cell Proteomics 2012; 11:3-14. [PMID: 22328712 DOI: 10.1074/mcp.r111.015305] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Using enrichment strategies many research groups are routinely producing large data sets of post-translationally modified peptides for proteomic analysis using tandem mass spectrometry. Although search engines are relatively effective at identifying these peptides with a defined measure of reliability, their localization of site/s of modification is often arbitrary and unreliable. The field continues to be in need of a widely accepted metric for false localization rate that accurately describes the certainty of site localization in published data sets and allows for consistent measurement of differences in performance of emerging scoring algorithms. In this article are discussed the main strategies currently used by software for modification site localization and ways of assessing the performance of these different tools. Methods for representing ambiguity are reviewed and a discussion of how the approaches transfer to different data types and modifications is presented.
Collapse
Affiliation(s)
- Robert J Chalkley
- University of California San Francisco, 600 16th Street, Genentech Hall N-474A, San Francisco, California 94158, USA.
| | | |
Collapse
|
349
|
Schaab C, Geiger T, Stoehr G, Cox J, Mann M. Analysis of high accuracy, quantitative proteomics data in the MaxQB database. Mol Cell Proteomics 2012; 11:M111.014068. [PMID: 22301388 PMCID: PMC3316731 DOI: 10.1074/mcp.m111.014068] [Citation(s) in RCA: 155] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database scores. The information contained in MaxQB, including high resolution fragment spectra, is accessible to the community via a user-friendly web interface at http://www.biochem.mpg.de/maxqb.
Collapse
Affiliation(s)
- Christoph Schaab
- Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, D-82152 Martinsried, Germany
| | | | | | | | | |
Collapse
|
350
|
Kirchner M, Selbach M. In vivo quantitative proteome profiling: planning and evaluation of SILAC experiments. Methods Mol Biol 2012; 893:175-199. [PMID: 22665302 DOI: 10.1007/978-1-61779-885-6_13] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Mass spectrometry-based quantitative proteomics can identify and quantify thousands of proteins in complex biological samples. Improved instrumentation, quantification strategies and data analysis tools now enable protein analysis on a genome-wide scale. Particularly, quantification based on stable isotope labeling with amino acids (SILAC) has emerged as a robust, reliable and simple method for accurate large-scale protein quantification. The spectrum of applications ranges from bacteria and eukaryotic cell culture systems to multicellular organisms. Here, we provide a step-by-step protocol on how to plan and perform large-scale quantitative proteome analysis using SILAC, from sample preparation to final data analysis.
Collapse
Affiliation(s)
- Marieluise Kirchner
- Cell Signalling and Mass Spectrometry Group, Max Delbrueck Center for Molecular Medicine, Berlin, Germany
| | | |
Collapse
|