51
|
Peng L, Dasari S, Tabb DL, Turesky RJ. Correction to Mapping Serum Albumin Adducts of the Food-Borne Carcinogen 2-Amino-1-methyl-6-phenylimidazo[4,5- b]pyridine by Data-Dependent Tandem Mass Spectrometry. Chem Res Toxicol 2013. [DOI: 10.1021/tx400006a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
52
|
Holman JD, Dasari S, Tabb DL. Informatics of protein and posttranslational modification detection via shotgun proteomics. Methods Mol Biol 2013; 1002:167-79. [PMID: 23625403 DOI: 10.1007/978-1-62703-360-2_14] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Frequently, proteomic LC-MS/MS data may contain sets of modifications that evade identification during standard database search. For many laboratories, the standard technique to seek posttranslational modifications (PTMs) adds a short list of specified mass shifts to database search configuration. This technique provides information for only the specified PTMs, takes substantial time to run, and drives false discoveries upward through an exponential expansion of search space. This protocol describes a more structured approach to blind PTM discovery through reducing protein lists, targeting attention to a data-driven list of mass shifts, and seeking the resulting short list of modifications through targeted search.
Collapse
|
53
|
Tabb DL. Quality assessment for clinical proteomics. Clin Biochem 2012; 46:411-20. [PMID: 23246537 DOI: 10.1016/j.clinbiochem.2012.12.003] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 12/01/2012] [Accepted: 12/03/2012] [Indexed: 12/21/2022]
Abstract
Proteomics has emerged from the labs of technologists to enter widespread application in clinical contexts. This transition, however, has been hindered by overstated early claims of accuracy, concerns about reproducibility, and the challenges of handling batch effects properly. New efforts have produced sets of performance metrics and measurements of variability that establish sound expectations for experiments in clinical proteomics. As researchers begin incorporating these metrics in a quality by design paradigm, the variability of individual steps in experimental pipelines will be reduced, regularizing overall outcomes. This review discusses the evolution of quality assessment in 2D gel electrophoresis, mass spectrometry-based proteomic profiling, tandem mass spectrometry-based protein inventories, and proteomic quantitation. Taken together, the advances in each of these technologies are establishing databases that will be increasingly useful for decision-making in clinical experimentation.
Collapse
|
54
|
Baycin-Hizal D, Tabb DL, Chaerkady R, Chen L, Lewis NE, Nagarajan H, Sarkaria V, Kumar A, Wolozny D, Colao J, Jacobson E, Tian Y, O'Meally RN, Krag SS, Cole RN, Palsson BO, Zhang H, Betenbaugh M. Proteomic analysis of Chinese hamster ovary cells. J Proteome Res 2012; 11:5265-76. [PMID: 22971049 DOI: 10.1021/pr300476w] [Citation(s) in RCA: 135] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
To complement the recent genomic sequencing of Chinese hamster ovary (CHO) cells, proteomic analysis was performed on CHO cells including the cellular proteome, secretome, and glycoproteome using tandem mass spectrometry (MS/MS) of multiple fractions obtained from gel electrophoresis, multidimensional liquid chromatography, and solid phase extraction of glycopeptides (SPEG). From the 120 different mass spectrometry analyses generating 682,097 MS/MS spectra, 93,548 unique peptide sequences were identified with at most 0.02 false discovery rate (FDR). A total of 6164 grouped proteins were identified from both glycoproteome and proteome analysis, representing an 8-fold increase in the number of proteins currently identified in the CHO proteome. Furthermore, this is the first proteomic study done using the CHO genome exclusively, which provides for more accurate identification of proteins. From this analysis, the CHO codon frequency was determined and found to be distinct from humans, which will facilitate expression of human proteins in CHO cells. Analysis of the combined proteomic and mRNA data sets indicated the enrichment of a number of pathways including protein processing and apoptosis but depletion of proteins involved in steroid hormone and glycosphingolipid metabolism. Five-hundred four of the detected proteins included N-acetylation modifications, and 1292 different proteins were observed to be N-glycosylated. This first large-scale proteomic analysis will enhance the knowledge base about CHO capabilities for recombinant expression and provide information useful in cell engineering efforts aimed at modifying CHO cellular functions.
Collapse
|
55
|
Wang X, Slebos RJC, Wang D, Halvey PJ, Tabb DL, Liebler DC, Zhang B. Correction to Protein Identification Using Customized Protein Sequence Databases Derived from RNA-Seq Data. J Proteome Res 2012. [DOI: 10.1021/pr300713g] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
56
|
|
57
|
Peng L, Dasari S, Tabb DL, Turesky RJ. Mapping serum albumin adducts of the food-borne carcinogen 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine by data-dependent tandem mass spectrometry. Chem Res Toxicol 2012; 25:2179-93. [PMID: 22827630 DOI: 10.1021/tx300253j] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
2-Amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (PhIP) is a heterocyclic aromatic amine that is formed during the cooking of meats. PhIP is a potential human carcinogen: it undergoes metabolic activation to form electrophilic metabolites that bind to DNA and proteins, including serum albumin (SA). The structures of PhIP-SA adducts formed in vivo are unknown and require elucidation before PhIP protein adducts can be implemented as biomarkers in human studies. We previously examined the reaction of genotoxic N-oxidized metabolites of PhIP with human SA in vitro and identified covalent adducts formed at cysteine³⁴ (Cys³⁴); however, other adduction products were thought to occur. We have now identified adducts of PhIP formed at multiple sites of SA reacted with isotopic mixtures of electrophilic metabolites of PhIP and 2-amino-1-methyl-6-[²H₅]-phenylimidazo[4,5-b]pyridine ([²H₅]-PhIP). The metabolites used for study were 2-nitro-1-methyl-6-phenylimidazo[4,5-b]pyridine (NO₂-PhIP), 2-hydroxyamino-1-methyl-6-phenylimidazo[4,5-b]pyridine (HONH-PhIP), or N-acetyloxy-2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (N-acetoxy-PhIP). Following proteolytic digestion, PhIP-adducted peptides were separated by ultra performance liquid chromatography and characterized by ion trap mass spectrometry, employing isotopic data-dependent scanning. Analysis of the tryptic or tryptic/chymotryptic digests of SA modified with NO₂-PhIP revealed that adduction occurred at Cys³⁴, Lys¹⁹⁵, Lys¹⁹⁹, Lys³⁵¹, Lys⁵⁴¹, Tyr¹³⁸, Tyr¹⁵⁰, Tyr⁴⁰¹, and Tyr⁴¹¹, whereas the only site of HONH-PhIP adduction was detected at Cys³⁴. N-Acetoxy-PhIP, a penultimate metabolite of PhIP that reacts with DNA to form covalent adducts, did not appear to form stable adducts with SA; instead, PhIP and 2-amino-1-methyl-6-(5-hydroxy)-phenylimidazo[4,5-b]pyridine, an aqueous reaction product of the proposed nitrenium ion of PhIP, were recovered during the proteolysis of N-acetoxy-PhIP-modified SA. Some of these SA adduction products of PhIP may be implemented in molecular epidemiology studies to assess the role of well-done cooked meat, PhIP, and the risk of cancer.
Collapse
|
58
|
Ma ZQ, Polzin KO, Dasari S, Chambers MC, Schilling B, Gibson BW, Tran BQ, Vega-Montoto L, Liebler DC, Tabb DL. QuaMeter: multivendor performance metrics for LC-MS/MS proteomics instrumentation. Anal Chem 2012; 84:5845-50. [PMID: 22697456 PMCID: PMC3730131 DOI: 10.1021/ac300629p] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
LC-MS/MS-based proteomics studies rely on stable analytical system performance that can be evaluated by objective criteria. The National Institute of Standards and Technology (NIST) introduced the MSQC software to compute diverse metrics from experimental LC-MS/MS data, enabling quality analysis and quality control (QA/QC) of proteomics instrumentation. In practice, however, several attributes of the MSQC software prevent its use for routine instrument monitoring. Here, we present QuaMeter, an open-source tool that improves MSQC in several aspects. QuaMeter can directly read raw data from instruments manufactured by different vendors. The software can work with a wide variety of peptide identification software for improved reliability and flexibility. Finally, QC metrics implemented in QuaMeter are rigorously defined and tested. The source code and binary versions of QuaMeter are available under Apache 2.0 License at http://fenchurch.mc.vanderbilt.edu.
Collapse
|
59
|
Kinsinger CR, Apffel J, Baker M, Bian X, Borchers CH, Bradshaw R, Brusniak MY, Chan DW, Deutsch EW, Domon B, Gorman J, Grimm R, Hancock W, Hermjakob H, Horn D, Hunter C, Kolar P, Kraus HJ, Langen H, Linding R, Moritz RL, Omenn GS, Orlando R, Pandey A, Ping P, Rahbar A, Rivers R, Seymour SL, Simpson RJ, Slotta D, Smith RD, Stein SE, Tabb DL, Tagle D, Yates JR, Rodriguez H. Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam principles). Proteomics Clin Appl 2012; 5:580-9. [PMID: 22213554 DOI: 10.1002/prca.201100097] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the U.S. National Cancer Institute (NCI) convened the "International Workshop on Proteomic Data Quality Metrics" in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed up on two primary needs for the wide use of quality metrics: (i) an evolving list of comprehensive quality metrics and (ii) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data. By agreement, this article is published simultaneously in Proteomics, Proteomics Clinical Applications, Journal of Proteome Research, and Molecular and Cellular Proteomics, as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
Collapse
|
60
|
Gibbons JG, Salichos L, Slot JC, Rinker DC, McGary KL, King JG, Klich MA, Tabb DL, McDonald WH, Rokas A. The evolutionary imprint of domestication on genome variation and function of the filamentous fungus Aspergillus oryzae. Curr Biol 2012; 22:1403-9. [PMID: 22795693 DOI: 10.1016/j.cub.2012.05.033] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 05/13/2012] [Accepted: 05/15/2012] [Indexed: 10/28/2022]
Abstract
The domestication of animals, plants, and microbes fundamentally transformed the lifestyle and demography of the human species [1]. Although the genetic and functional underpinnings of animal and plant domestication are well understood, little is known about microbe domestication [2-6]. Here, we systematically examined genome-wide sequence and functional variation between the domesticated fungus Aspergillus oryzae, whose saccharification abilities humans have harnessed for thousands of years to produce sake, soy sauce, and miso from starch-rich grains, and its wild relative A. flavus, a potentially toxigenic plant and animal pathogen [7]. We discovered dramatic changes in the sequence variation and abundance profiles of genes and wholesale primary and secondary metabolic pathways between domesticated and wild relative isolates during growth on rice. Our data suggest that, through selection by humans, an atoxigenic lineage of A. flavus gradually evolved into a "cell factory" for enzymes and metabolites involved in the saccharification process. These results suggest that whereas animal and plant domestication was largely driven by Neolithic "genetic tinkering" of developmental pathways, microbe domestication was driven by extensive remodeling of metabolism.
Collapse
|
61
|
Chen YY, Dasari S, Ma ZQ, Vega-Montoto LJ, Li M, Tabb DL. Refining comparative proteomics by spectral counting to account for shared peptides and multiple search engines. Anal Bioanal Chem 2012; 404:1115-25. [PMID: 22552787 DOI: 10.1007/s00216-012-6011-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Revised: 03/22/2012] [Accepted: 04/02/2012] [Indexed: 11/26/2022]
Abstract
Spectral counting has become a widely used approach for measuring and comparing protein abundance in label-free shotgun proteomics. However, when analyzing complex samples, the ambiguity of matching between peptides and proteins greatly affects the assessment of peptide and protein inventories, differentiation, and quantification. Meanwhile, the configuration of database searching algorithms that assign peptides to MS/MS spectra may produce different results in comparative proteomic analysis. Here, we present three strategies to improve comparative proteomics through spectral counting. We show that comparing spectral counts for peptide groups rather than for protein groups forestalls problems introduced by shared peptides. We demonstrate the advantage and flexibility of this new method in two datasets. We present four models to combine four popular search engines that lead to significant gains in spectral counting differentiation. Among these models, we demonstrate a powerful vote counting model that scales well for multiple search engines. We also show that semi-tryptic searching outperforms tryptic searching for comparative proteomics. Overall, these techniques considerably improve protein differentiation on the basis of spectral count tables.
Collapse
|
62
|
Holman JD, Ma ZQ, Tabb DL. Identifying proteomic LC-MS/MS data sets with Bumbershoot and IDPicker. ACTA ACUST UNITED AC 2012; Chapter 13:Unit13.17. [PMID: 22389012 DOI: 10.1002/0471250953.bi1317s37] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The identification of peptides and proteins by LC-MS/MS requires the use of bioinformatics. Tools developed in the Tabb Laboratory contribute significant flexibility and discrimination to this process. The Bumbershoot tools (MyriMatch, DirecTag, TagRecon, and Pepitome) enable the identification of peptides represented by MS/MS scans. All of these tools can work directly from instrument capture files of multiple vendors, such as Thermo RAW format, or from standard XML-based formats, such as mzML or mzXML. Peptide identifications are written to mzIdentML or pepXML format. Protein assembly is handled by the IDPicker algorithm. Raw identifications are filtered to a confident set by use of the target-decoy strategy. IDPicker arranges large sets of input files into a hierarchy for reporting, and the software applies a parsimony algorithm to report the smallest possible number of proteins to explain the observed peptides. This protocol details the use of these tools for new users.
Collapse
|
63
|
Martinez MN, Emfinger CH, Overton M, Hill S, Ramaswamy TS, Cappel DA, Wu K, Fazio S, McDonald WH, Hachey DL, Tabb DL, Stafford JM. Obesity and altered glucose metabolism impact HDL composition in CETP transgenic mice: a role for ovarian hormones. J Lipid Res 2012; 53:379-389. [PMID: 22215797 PMCID: PMC3276461 DOI: 10.1194/jlr.m019752] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2011] [Revised: 12/15/2011] [Indexed: 01/03/2023] Open
Abstract
Mechanisms underlying changes in HDL composition caused by obesity are poorly defined, partly because mice lack expression of cholesteryl ester transfer protein (CETP), which shuttles triglyceride and cholesteryl ester between lipoproteins. Because menopause is associated with weight gain, altered glucose metabolism, and changes in HDL, we tested the effect of feeding a high-fat diet (HFD) and ovariectomy (OVX) on glucose metabolism and HDL composition in CETP transgenic mice. After OVX, female CETP-expressing mice had accelerated weight gain with HFD-feeding and impaired glucose tolerance by hyperglycemic clamp techniques, compared with OVX mice fed a low-fat diet (LFD). Sham-operated mice (SHAM) did not show HFD-induced weight gain and had less glucose intolerance than OVX mice. Using shotgun HDL proteomics, HFD-feeding in OVX mice had a large effect on HDL composition, including increased levels of apoA2, apoA4, apoC2, and apoC3, proteins involved in TG metabolism. These changes were associated with decreased hepatic expression of SR-B1, ABCA1, and LDL receptor, proteins involved in modulating the lipid content of HDL. In SHAM mice, there were minimal changes in HDL composition with HFD feeding. These studies suggest that the absence of ovarian hormones negatively influences the response to high-fat feeding in terms of glucose tolerance and HDL composition. CETP-expressing mice may represent a useful model to define how metabolic changes affect HDL composition and function.
Collapse
|
64
|
Kinsinger CR, Apffel J, Baker M, Bian X, Borchers CH, Bradshaw R, Brusniak MY, Chan DW, Deutsch EW, Domon B, Gorman J, Grimm R, Hancock W, Hermjakob H, Horn D, Hunter C, Kolar P, Kraus HJ, Langen H, Linding R, Moritz RL, Omenn GS, Orlando R, Pandey A, Ping P, Rahbar A, Rivers R, Seymour SL, Simpson RJ, Slotta D, Smith RD, Stein SE, Tabb DL, Tagle D, Yates JR, Rodriguez H. Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam Principles). J Proteome Res 2012; 11:1412-9. [PMID: 22053864 PMCID: PMC3272102 DOI: 10.1021/pr201071t] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the U.S. National Cancer Institute (NCI) convened the "International Workshop on Proteomic Data Quality Metrics" in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed up on two primary needs for the wide use of quality metrics: (1) an evolving list of comprehensive quality metrics and (2) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data. By agreement, this article is published simultaneously in the Journal of Proteome Research, Molecular and Cellular Proteomics, Proteomics, and Proteomics Clinical Applications as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
Collapse
|
65
|
Dasari S, Chambers MC, Martinez MA, Carpenter KL, Ham AJL, Vega-Montoto LJ, Tabb DL. Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment. J Proteome Res 2012; 11:1686-95. [PMID: 22217208 DOI: 10.1021/pr200874e] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.
Collapse
|
66
|
Kinsinger CR, Apffel J, Baker M, Bian X, Borchers CH, Bradshaw R, Brusniak MY, Chan DW, Deutsch EW, Domon B, Gorman J, Grimm R, Hancock W, Hermjakob H, Horn D, Hunter C, Kolar P, Kraus HJ, Langen H, Linding R, Moritz RL, Omenn GS, Orlando R, Pandey A, Ping P, Rahbar A, Rivers R, Seymour SL, Simpson RJ, Slotta D, Smith RD, Stein SE, Tabb DL, Tagle D, Yates JR, Rodriguez H. Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam principles). Proteomics 2011; 12:11-20. [PMID: 22069307 DOI: 10.1002/pmic.201100562] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2011] [Accepted: 10/27/2011] [Indexed: 11/10/2022]
Abstract
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the U.S. National Cancer Institute (NCI) convened the "International Workshop on Proteomic Data Quality Metrics" in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed upon two primary needs for the wide use of quality metrics: (i) an evolving list of comprehensive quality metrics and (ii) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data. By agreement, this article is published simultaneously in Proteomics, Proteomics Clinical Applications, Journal of Proteome Research, and Molecular and Cellular Proteomics, as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
Collapse
|
67
|
Wang X, Slebos RJC, Wang D, Halvey PJ, Tabb DL, Liebler DC, Zhang B. Protein identification using customized protein sequence databases derived from RNA-Seq data. J Proteome Res 2011; 11:1009-17. [PMID: 22103967 DOI: 10.1021/pr200766z] [Citation(s) in RCA: 132] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The standard shotgun proteomics data analysis strategy relies on searching MS/MS spectra against a context-independent protein sequence database derived from the complete genome sequence of an organism. Because transcriptome sequence analysis (RNA-Seq) promises an unbiased and comprehensive picture of the transcriptome, we reason that a sample-specific protein database derived from RNA-Seq data can better approximate the real protein pool in the sample and thus improve protein identification. In this study, we have developed a two-step strategy for building sample-specific protein databases from RNA-Seq data. First, the database size is reduced by eliminating unexpressed or lowly expressed genes according to transcript quantification. Second, high-quality nonsynonymous coding single nucleotide variations (SNVs) are identified based on RNA-Seq data, and corresponding protein variants are added to the database. Using RNA-Seq and shotgun proteomics data from two colorectal cancer cell lines SW480 and RKO, we demonstrated that customized protein sequence databases could significantly increase the sensitivity of peptide identification, reduce ambiguity in protein assembly, and enable the detection of known and novel peptide variants. Thus, sample-specific databases from RNA-Seq data can enable more sensitive and comprehensive protein discovery in shotgun proteomics studies.
Collapse
|
68
|
Tabb DL, Liebler DC. Bioinformatic challenges for proteomic biomarkers of cancer. BMC Bioinformatics 2011. [PMCID: PMC3194207 DOI: 10.1186/1471-2105-12-s7-a17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
69
|
Kinsinger CR, Apffel J, Baker M, Bian X, Borchers CH, Bradshaw R, Brusniak MY, Chan DW, Deutsch EW, Domon B, Gorman J, Grimm R, Hancock W, Hermjakob H, Horn D, Hunter C, Kolar P, Kraus HJ, Langen H, Linding R, Moritz RL, Omenn GS, Orlando R, Pandey A, Ping P, Rahbar A, Rivers R, Seymour SL, Simpson RJ, Slotta D, Smith RD, Stein SE, Tabb DL, Tagle D, Yates JR, Rodriguez H. Recommendations for mass spectrometry data quality metrics for open access data (corollary to the Amsterdam Principles). Mol Cell Proteomics 2011; 10:O111.015446. [PMID: 22052993 DOI: 10.1074/mcp.o111.015446] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Policies supporting the rapid and open sharing of proteomic data are being implemented by the leading journals in the field. The proteomics community is taking steps to ensure that data are made publicly accessible and are of high quality, a challenging task that requires the development and deployment of methods for measuring and documenting data quality metrics. On September 18, 2010, the United States National Cancer Institute convened the "International Workshop on Proteomic Data Quality Metrics" in Sydney, Australia, to identify and address issues facing the development and use of such methods for open access proteomics data. The stakeholders at the workshop enumerated the key principles underlying a framework for data quality assessment in mass spectrometry data that will meet the needs of the research community, journals, funding agencies, and data repositories. Attendees discussed and agreed up on two primary needs for the wide use of quality metrics: 1) an evolving list of comprehensive quality metrics and 2) standards accompanied by software analytics. Attendees stressed the importance of increased education and training programs to promote reliable protocols in proteomics. This workshop report explores the historic precedents, key discussions, and necessary next steps to enhance the quality of open access data. By agreement, this article is published simultaneously in the Journal of Proteome Research, Molecular and Cellular Proteomics, Proteomics, and Proteomics Clinical Applications as a public service to the research community. The peer review process was a coordinated effort conducted by a panel of referees selected by the journals.
Collapse
|
70
|
Eng JK, Searle BC, Clauser KR, Tabb DL. A face in the crowd: recognizing peptides through database search. Mol Cell Proteomics 2011; 10:R111.009522. [PMID: 21876205 PMCID: PMC3226415 DOI: 10.1074/mcp.r111.009522] [Citation(s) in RCA: 111] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2011] [Revised: 07/19/2011] [Indexed: 12/31/2022] Open
Abstract
Peptide identification via tandem mass spectrometry sequence database searching is a key method in the array of tools available to the proteomics researcher. The ability to rapidly and sensitively acquire tandem mass spectrometry data and perform peptide and protein identifications has become a commonly used proteomics analysis technique because of advances in both instrumentation and software. Although many different tandem mass spectrometry database search tools are currently available from both academic and commercial sources, these algorithms share similar core elements while maintaining distinctive features. This review revisits the mechanism of sequence database searching and discusses how various parameter settings impact the underlying search.
Collapse
|
71
|
Ma ZQ, Tabb DL, Burden J, Chambers MC, Cox MB, Cantrell MJ, Ham AJL, Litton MD, Oreto MR, Schultz WC, Sobecki SM, Tsui TY, Wernke GR, Liebler DC. Supporting tool suite for production proteomics. Bioinformatics 2011; 27:3214-5. [PMID: 21965817 DOI: 10.1093/bioinformatics/btr544] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
SUMMARY The large amount of data produced by proteomics experiments requires effective bioinformatics tools for the integration of data management and data analysis. Here we introduce a suite of tools developed at Vanderbilt University to support production proteomics. We present the Backup Utility Service tool for automated instrument file backup and the ScanSifter tool for data conversion. We also describe a queuing system to coordinate identification pipelines and the File Collector tool for batch copying analytical results. These tools are individually useful but collectively reinforce each other. They are particularly valuable for proteomics core facilities or research institutions that need to manage multiple mass spectrometers. With minor changes, they could support other types of biomolecular resource facilities.
Collapse
|
72
|
McConnell RE, Benesh AE, Mao S, Tabb DL, Tyska MJ. Proteomic analysis of the enterocyte brush border. Am J Physiol Gastrointest Liver Physiol 2011; 300:G914-26. [PMID: 21330445 PMCID: PMC3094140 DOI: 10.1152/ajpgi.00005.2011] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The brush border domain at the apex of intestinal epithelial cells is the primary site of nutrient absorption in the intestinal tract and the primary surface of interaction with microbes that reside in the lumen. Because the brush border is positioned at such a critical physiological interface, we set out to create a comprehensive list of the proteins that reside in this domain using shotgun mass spectrometry. The resulting proteome contains 646 proteins with diverse functions. In addition to the expected collection of nutrient processing and transport components, we also identified molecules expected to function in the regulation of actin dynamics, membrane bending, and extracellular adhesion. These results provide a foundation for future studies aimed at defining the molecular mechanisms underpinning brush border assembly and function.
Collapse
|
73
|
Ma ZQ, Chambers MC, Ham AJL, Cheek KL, Whitwell CW, Aerni HR, Schilling B, Miller AW, Caprioli RM, Tabb DL. ScanRanker: Quality assessment of tandem mass spectra via sequence tagging. J Proteome Res 2011; 10:2896-904. [PMID: 21520941 DOI: 10.1021/pr200118r] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu.
Collapse
|
74
|
Li J, Su Z, Ma ZQ, Slebos RJC, Halvey P, Tabb DL, Liebler DC, Pao W, Zhang B. A bioinformatics workflow for variant peptide detection in shotgun proteomics. Mol Cell Proteomics 2011; 10:M110.006536. [PMID: 21389108 DOI: 10.1074/mcp.m110.006536] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics.
Collapse
|
75
|
Dasari S, Chambers MC, Codreanu SG, Liebler DC, Collins BC, Pennington SR, Gallagher WM, Tabb DL. Sequence tagging reveals unexpected modifications in toxicoproteomics. Chem Res Toxicol 2011; 24:204-16. [PMID: 21214251 DOI: 10.1021/tx100275t] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Toxicoproteomic samples are rich in posttranslational modifications (PTMs) of proteins. Identifying these modifications via standard database searching can incur significant performance penalties. Here, we describe the latest developments in TagRecon, an algorithm that leverages inferred sequence tags to identify modified peptides in toxicoproteomic data sets. TagRecon identifies known modifications more effectively than the MyriMatch database search engine. TagRecon outperformed state of the art software in recognizing unanticipated modifications from LTQ, Orbitrap, and QTOF data sets. We developed user-friendly software for detecting persistent mass shifts from samples. We follow a three-step strategy for detecting unanticipated PTMs in samples. First, we identify the proteins present in the sample with a standard database search. Next, identified proteins are interrogated for unexpected PTMs with a sequence tag-based search. Finally, additional evidence is gathered for the detected mass shifts with a refinement search. Application of this technology on toxicoproteomic data sets revealed unintended cross-reactions between proteins and sample processing reagents. Twenty-five proteins in rat liver showed signs of oxidative stress when exposed to potentially toxic drugs. These results demonstrate the value of mining toxicoproteomic data sets for modifications.
Collapse
|
76
|
Dasari S, Chambers MC, Slebos RJ, Zimmerman LJ, Ham AJL, Tabb DL. TagRecon: high-throughput mutation identification through sequence tagging. J Proteome Res 2010; 9:1716-26. [PMID: 20131910 DOI: 10.1021/pr900850m] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Shotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates. Here we describe TagRecon, an algorithm that leverages inferred sequence tags to identify unanticipated mutations in clinical proteomic data sets. TagRecon identifies unmodified peptides as sensitively as the related MyriMatch database search engine. In both LTQ and Orbitrap data sets, TagRecon outperformed state of the art software in recognizing sequence mismatches from data sets with known variants. We developed guidelines for filtering putative mutations from clinical samples, and we applied them in an analysis of cancer cell lines and an examination of colon tissue. Mutations were found in up to 6% of identified peptides, and only a small fraction corresponded to dbSNP entries. The RKO cell line, which is DNA mismatch repair deficient, yielded more mutant peptides than the mismatch repair proficient SW480 line. Analysis of colon cancer tumor and adjacent tissue revealed hydroxyproline modifications associated with extracellular matrix degradation. These results demonstrate the value of using sequence tagging algorithms to fully interrogate clinical proteomic data sets.
Collapse
|
77
|
Li J, Ma Z, Slebos RJC, Tabb DL, Liebler DC, Zhang B. Enabling proteomics-based identification of human cancer variations. BMC Bioinformatics 2010. [PMCID: PMC3290083 DOI: 10.1186/1471-2105-11-s4-p29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
78
|
Tabb DL, Vega-Montoto L, Rudnick PA, Variyath AM, Ham AJL, Bunk DM, Kilpatrick LE, Billheimer DD, Blackman RK, Cardasis HL, Carr SA, Clauser KR, Jaffe JD, Kowalski KA, Neubert TA, Regnier FE, Schilling B, Tegeler TJ, Wang M, Wang P, Whiteaker JR, Zimmerman LJ, Fisher SJ, Gibson BW, Kinsinger CR, Mesri M, Rodriguez H, Stein SE, Tempst P, Paulovich AG, Liebler DC, Spiegelman C. Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. J Proteome Res 2010; 9:761-76. [PMID: 19921851 DOI: 10.1021/pr9006365] [Citation(s) in RCA: 409] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The complexity of proteomic instrumentation for LC-MS/MS introduces many possible sources of variability. Data-dependent sampling of peptides constitutes a stochastic element at the heart of discovery proteomics. Although this variation impacts the identification of peptides, proteomic identifications are far from completely random. In this study, we analyzed interlaboratory data sets from the NCI Clinical Proteomic Technology Assessment for Cancer to examine repeatability and reproducibility in peptide and protein identifications. Included data spanned 144 LC-MS/MS experiments on four Thermo LTQ and four Orbitrap instruments. Samples included yeast lysate, the NCI-20 defined dynamic range protein mix, and the Sigma UPS 1 defined equimolar protein mix. Some of our findings reinforced conventional wisdom, such as repeatability and reproducibility being higher for proteins than for peptides. Most lessons from the data, however, were more subtle. Orbitraps proved capable of higher repeatability and reproducibility, but aberrant performance occasionally erased these gains. Even the simplest protein digestions yielded more peptide ions than LC-MS/MS could identify during a single experiment. We observed that peptide lists from pairs of technical replicates overlapped by 35-60%, giving a range for peptide-level repeatability in these experiments. Sample complexity did not appear to affect peptide identification repeatability, even as numbers of identified spectra changed by an order of magnitude. Statistical analysis of protein spectral counts revealed greater stability across technical replicates for Orbitraps, making them superior to LTQ instruments for biomarker candidate discovery. The most repeatable peptides were those corresponding to conventional tryptic cleavage sites, those that produced intense MS signals, and those that resulted from proteins generating many distinct peptides. Reproducibility among different instruments of the same type lagged behind repeatability of technical replicates on a single instrument by several percent. These findings reinforce the importance of evaluating repeatability as a fundamental characteristic of analytical technologies.
Collapse
|
79
|
MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B, Kern R, Tabb DL, Liebler DC, MacCoss MJ. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. ACTA ACUST UNITED AC 2010; 26:966-8. [PMID: 20147306 DOI: 10.1093/bioinformatics/btq054] [Citation(s) in RCA: 3311] [Impact Index Per Article: 236.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
SUMMARY Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. It is open source and freely available for academic and commercial use. The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously observed ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the experimental design document. The fast and compact Skyline file format is easily shared, even for experiments requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth analysis with common statistical tools. AVAILABILITY Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project.
Collapse
|
80
|
Baucum AJ, Jalan-Sakrikar N, Jiao Y, Gustin RM, Carmody LC, Tabb DL, Ham AJL, Colbran RJ. Identification and validation of novel spinophilin-associated proteins in rodent striatum using an enhanced ex vivo shotgun proteomics approach. Mol Cell Proteomics 2010; 9:1243-59. [PMID: 20124353 DOI: 10.1074/mcp.m900387-mcp200] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Spinophilin regulates excitatory postsynaptic function and morphology during development by virtue of its interactions with filamentous actin, protein phosphatase 1, and a plethora of additional signaling proteins. To provide insight into the roles of spinophilin in mature brain, we characterized the spinophilin interactome in subcellular fractions solubilized from adult rodent striatum by using a shotgun proteomics approach to identify proteins in spinophilin immune complexes. Initial analyses of samples generated using a mouse spinophilin antibody detected 23 proteins that were not present in an IgG control sample; however, 12 of these proteins were detected in complexes isolated from spinophilin knock-out tissue. A second screen using two different spinophilin antibodies and either knock-out or IgG controls identified a total of 125 proteins. The probability of each protein being specifically associated with spinophilin in each sample was calculated, and proteins were ranked according to a chi(2) analysis of the probabilities from analyses of multiple samples. Spinophilin and the known associated proteins neurabin and multiple isoforms of protein phosphatase 1 were specifically detected. Multiple, novel, spinophilin-associated proteins (myosin Va, calcium/calmodulin-dependent protein kinase II, neurofilament light polypeptide, postsynaptic density 95, alpha-actinin, and densin) were then shown to interact with GST fusion proteins containing fragments of spinophilin. Additional biochemical and transfected cell imaging studies showed that alpha-actinin and densin directly interact with residues 151-300 and 446-817, respectively, of spinophilin. Taken together, we have developed a multi-antibody, shotgun proteomics approach to characterize protein interactomes in native tissues, delineating the importance of knock-out tissue controls and providing novel insights into the nature and function of the spinophilin interactome in mature striatum.
Collapse
|
81
|
Benesh AE, Nambiar R, McConnell RE, Mao S, Tabb DL, Tyska MJ. Differential localization and dynamics of class I myosins in the enterocyte microvillus. Mol Biol Cell 2010; 21:970-8. [PMID: 20089841 PMCID: PMC2836977 DOI: 10.1091/mbc.e09-07-0638] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
These data establish myosin-1d as a component of the brush border cytoskeleton that demonstrates microvillar tip localization. Epithelial cells lining the intestinal tract build an apical array of microvilli known as the brush border. Each microvillus is a cylindrical membrane protrusion that is linked to a supporting actin bundle by myosin-1a (Myo1a). Mice lacking Myo1a demonstrate no overt physiological symptoms, suggesting that other myosins may compensate for the loss of Myo1a in these animals. To investigate changes in the microvillar myosin population that may limit the Myo1a KO phenotype, we performed proteomic analysis on WT and Myo1a KO brush borders. These studies revealed that WT brush borders also contain the short-tailed class I myosin, myosin-1d (Myo1d). Myo1d localizes to the terminal web and striking puncta at the tips of microvilli. In the absence of Myo1a, Myo1d peptide counts increase twofold; this motor also redistributes along the length of microvilli, into compartments normally occupied by Myo1a. FRAP studies demonstrate that Myo1a is less dynamic than Myo1d, providing a mechanistic explanation for the observed differential localization. These data suggest that Myo1d may be the primary compensating class I myosin in the Myo1a KO model; they also suggest that dynamics govern the localization and function of different yet closely related myosins that target common actin structures.
Collapse
|
82
|
Paulovich AG, Billheimer D, Ham AJL, Vega-Montoto L, Rudnick PA, Tabb DL, Wang P, Blackman RK, Bunk DM, Cardasis HL, Clauser KR, Kinsinger CR, Schilling B, Tegeler TJ, Variyath AM, Wang M, Whiteaker JR, Zimmerman LJ, Fenyo D, Carr SA, Fisher SJ, Gibson BW, Mesri M, Neubert TA, Regnier FE, Rodriguez H, Spiegelman C, Stein SE, Tempst P, Liebler DC. Interlaboratory study characterizing a yeast performance standard for benchmarking LC-MS platform performance. Mol Cell Proteomics 2009; 9:242-54. [PMID: 19858499 DOI: 10.1074/mcp.m900222-mcp200] [Citation(s) in RCA: 140] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize pre-analytical and analytical variation in comparative proteomics experiments.
Collapse
|
83
|
Rudnick PA, Clauser KR, Kilpatrick LE, Tchekhovskoi DV, Neta P, Blonder N, Billheimer DD, Blackman RK, Bunk DM, Cardasis HL, Ham AJL, Jaffe JD, Kinsinger CR, Mesri M, Neubert TA, Schilling B, Tabb DL, Tegeler TJ, Vega-Montoto L, Variyath AM, Wang M, Wang P, Whiteaker JR, Zimmerman LJ, Carr SA, Fisher SJ, Gibson BW, Paulovich AG, Regnier FE, Rodriguez H, Spiegelman C, Tempst P, Liebler DC, Stein SE. Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Mol Cell Proteomics 2009; 9:225-41. [PMID: 19837981 PMCID: PMC2830836 DOI: 10.1074/mcp.m900223-mcp200] [Citation(s) in RCA: 158] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
A major unmet need in LC-MS/MS-based proteomics analyses is a set of tools for quantitative assessment of system performance and evaluation of technical variability. Here we describe 46 system performance metrics for monitoring chromatographic performance, electrospray source stability, MS1 and MS2 signals, dynamic sampling of ions for MS/MS, and peptide identification. Applied to data sets from replicate LC-MS/MS analyses, these metrics displayed consistent, reasonable responses to controlled perturbations. The metrics typically displayed variations less than 10% and thus can reveal even subtle differences in performance of system components. Analyses of data from interlaboratory studies conducted under a common standard operating procedure identified outlier data and provided clues to specific causes. Moreover, interlaboratory variation reflected by the metrics indicates which system components vary the most between laboratories. Application of these metrics enables rational, quantitative quality assessment for proteomics and other LC-MS/MS analytical applications.
Collapse
|
84
|
Ma ZQ, Dasari S, Chambers MC, Litton MD, Sobecki SM, Zimmerman LJ, Halvey PJ, Schilling B, Drake PM, Gibson BW, Tabb DL. IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. J Proteome Res 2009; 8:3872-81. [PMID: 19522537 DOI: 10.1021/pr900360j] [Citation(s) in RCA: 274] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Tandem mass spectrometry-based shotgun proteomics has become a widespread technology for analyzing complex protein mixtures. A number of database searching algorithms have been developed to assign peptide sequences to tandem mass spectra. Assembling the peptide identifications to proteins, however, is a challenging issue because many peptides are shared among multiple proteins. IDPicker is an open-source protein assembly tool that derives a minimum protein list from peptide identifications filtered to a specified False Discovery Rate. Here, we update IDPicker to increase confident peptide identifications by combining multiple scores produced by database search tools. By segregating peptide identifications for thresholding using both the precursor charge state and the number of tryptic termini, IDPicker retrieves more peptides for protein assembly. The new version is more robust against false positive proteins, especially in searches using multispecies databases, by requiring additional novel peptides in the parsimony process. IDPicker has been designed for incorporation in many identification workflows by the addition of a graphical user interface and the ability to read identifications from the pepXML format. These advances position IDPicker for high peptide discrimination and reliable protein assembly in large-scale proteomics studies. The source code and binaries for the latest version of IDPicker are available from http://fenchurch.mc.vanderbilt.edu/ .
Collapse
|
85
|
Addona TA, Abbatiello SE, Schilling B, Skates SJ, Mani DR, Bunk DM, Spiegelman CH, Zimmerman LJ, Ham AJL, Keshishian H, Hall SC, Allen S, Blackman RK, Borchers CH, Buck C, Cardasis HL, Cusack MP, Dodder NG, Gibson BW, Held JM, Hiltke T, Jackson A, Johansen EB, Kinsinger CR, Li J, Mesri M, Neubert TA, Niles RK, Pulsipher TC, Ransohoff D, Rodriguez H, Rudnick PA, Smith D, Tabb DL, Tegeler TJ, Variyath AM, Vega-Montoto LJ, Wahlander Å, Waldemarson S, Wang M, Whiteaker JR, Zhao L, Anderson NL, Fisher SJ, Liebler DC, Paulovich AG, Regnier FE, Tempst P, Carr SA. Erratum: Corrigendum: Multi-site assessment of the precision and reproducibility of multiple reaction monitoring–based measurements of proteins in plasma. Nat Biotechnol 2009. [DOI: 10.1038/nbt0909-864b] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
86
|
Li J, Zimmerman LJ, Park BH, Tabb DL, Liebler DC, Zhang B. Network-assisted protein identification and data interpretation in shotgun proteomics. Mol Syst Biol 2009; 5:303. [PMID: 19690572 PMCID: PMC2736651 DOI: 10.1038/msb.2009.54] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Accepted: 07/07/2009] [Indexed: 11/30/2022] Open
Abstract
Protein assembly and biological interpretation of the assembled protein lists are critical steps in shotgun proteomics data analysis. Although most biological functions arise from interactions among proteins, current protein assembly pipelines treat proteins as independent entities. Usually, only individual proteins with strong experimental evidence, that is, confident proteins, are reported, whereas many possible proteins of biological interest are eliminated. We have developed a clique-enrichment approach (CEA) to rescue eliminated proteins by incorporating the relationship among proteins as embedded in a protein interaction network. In several data sets tested, CEA increased protein identification by 8–23% with an estimated accuracy of 85%. Rescued proteins were supported by existing literature or transcriptome profiling studies at similar levels as confident proteins and at a significantly higher level than abandoned ones. Applying CEA on a breast cancer data set, rescued proteins coded by well-known breast cancer genes. In addition, CEA generated a network view of the proteins and helped show the modular organization of proteins that may underpin the molecular mechanisms of the disease.
Collapse
|
87
|
McConnell RE, Higginbotham JN, Shifrin DA, Tabb DL, Coffey RJ, Tyska MJ. The enterocyte microvillus is a vesicle-generating organelle. J Cell Biol 2009; 185:1285-98. [PMID: 19564407 PMCID: PMC2712962 DOI: 10.1083/jcb.200902147] [Citation(s) in RCA: 178] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2009] [Accepted: 06/03/2009] [Indexed: 01/03/2023] Open
Abstract
For decades, enterocyte brush border microvilli have been viewed as passive cytoskeletal scaffolds that serve to increase apical membrane surface area. However, recent studies revealed that in the in vitro context of isolated brush borders, myosin-1a (myo1a) powers the sliding of microvillar membrane along core actin bundles. This activity also leads to the shedding of small vesicles from microvillar tips, suggesting that microvilli may function as vesicle-generating organelles in vivo. In this study, we present data in support of this hypothesis, showing that enterocyte microvilli release unilamellar vesicles into the intestinal lumen; these vesicles retain the right side out orientation of microvillar membrane, contain catalytically active brush border enzymes, and are specifically enriched in intestinal alkaline phosphatase. Moreover, myo1a knockout mice demonstrate striking perturbations in vesicle production, clearly implicating this motor in the in vivo regulation of this novel activity. In combination, these data show that microvilli function as vesicle-generating organelles, which enable enterocytes to deploy catalytic activities into the intestinal lumen.
Collapse
|
88
|
Addona TA, Abbatiello SE, Schilling B, Skates SJ, Mani DR, Bunk DM, Spiegelman CH, Zimmerman LJ, Ham AJL, Keshishian H, Hall SC, Allen S, Blackman RK, Borchers CH, Buck C, Cardasis HL, Cusack MP, Dodder NG, Gibson BW, Held JM, Hiltke T, Jackson A, Johansen EB, Kinsinger CR, Li J, Mesri M, Neubert TA, Niles RK, Pulsipher TC, Ransohoff D, Rodriguez H, Rudnick PA, Smith D, Tabb DL, Tegeler TJ, Variyath AM, Vega-Montoto LJ, Wahlander A, Waldemarson S, Wang M, Whiteaker JR, Zhao L, Anderson NL, Fisher SJ, Liebler DC, Paulovich AG, Regnier FE, Tempst P, Carr SA. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotechnol 2009; 27:633-41. [PMID: 19561596 DOI: 10.1038/nbt.1546] [Citation(s) in RCA: 819] [Impact Index Per Article: 54.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Accepted: 05/31/2009] [Indexed: 01/13/2023]
Abstract
Verification of candidate biomarkers relies upon specific, quantitative assays optimized for selective detection of target proteins, and is increasingly viewed as a critical step in the discovery pipeline that bridges unbiased biomarker discovery to preclinical validation. Although individual laboratories have demonstrated that multiple reaction monitoring (MRM) coupled with isotope dilution mass spectrometry can quantify candidate protein biomarkers in plasma, reproducibility and transferability of these assays between laboratories have not been demonstrated. We describe a multilaboratory study to assess reproducibility, recovery, linear dynamic range and limits of detection and quantification of multiplexed, MRM-based assays, conducted by NCI-CPTAC. Using common materials and standardized protocols, we demonstrate that these assays can be highly reproducible within and across laboratories and instrument platforms, and are sensitive to low mug/ml protein concentrations in unfractionated plasma. We provide data and benchmarks against which individual laboratories can compare their performance and evaluate new technologies for biomarker verification in plasma.
Collapse
|
89
|
Loecken EM, Dasari S, Hill S, Tabb DL, Guengerich FP. The bis-electrophile diepoxybutane cross-links DNA to human histones but does not result in enhanced mutagenesis in recombinant systems. Chem Res Toxicol 2009; 22:1069-76. [PMID: 19364102 PMCID: PMC2696559 DOI: 10.1021/tx900037u] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
1,2-Dibromoethane and 1,3-butadiene are cancer suspects present in the environment and have been used widely in industry. The mutagenic properties of 1,2-dibromoethane and the 1,3-butadiene oxidation product diepoxybutane are thought to be related to the bis-electrophilic character of these chemicals. The discovery that overexpression of O(6)-alkylguanine alkyltransferase (AGT) enhances bis-electrophile-induced mutagenesis prompted a search for other proteins that may act by a similar mechanism. A human liver screen for nuclear proteins that cross-link with DNA in the presence of 1,2-dibromoethane identified histones H2b and H3 as candidate proteins. Treatment of isolated histones H2b and H3 with diepoxybutane resulted in DNA-protein cross-links and produced protein adducts, and DNA-histone H2b cross-links were identified (immunochemically) in Escherichia coli cells expressing histone H2b. However, heterologous expression of histone H2b in E. coli failed to enhance bis-electrophile-induced mutagenesis. These results are similar to those found with the cross-link candidate glyceraldehyde 3-phosphate dehydrogenase (GAPDH) [ Loecken , E. M. and Guengerich , F. P. ( 2008 ) Chem. Res. Toxicol. 21 , 453 - 458 ], but in contrast to GAPDH, histone H2b bound DNA with even higher affinity than AGT. The extent of DNA cross-linking of isolated histone H2b was similar to that of AGT, suggesting that differences in postcross-linking events explain the difference in mutagenesis.
Collapse
|
90
|
Slebos RJC, Brock JWC, Winters NF, Stuart SR, Martinez MA, Li M, Chambers MC, Zimmerman LJ, Ham AJ, Tabb DL, Liebler DC. Evaluation of strong cation exchange versus isoelectric focusing of peptides for multidimensional liquid chromatography-tandem mass spectrometry. J Proteome Res 2009; 7:5286-94. [PMID: 18939861 DOI: 10.1021/pr8004666] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Shotgun proteome analysis platforms based on multidimensional liquid chromatography-tandem mass spectrometry (LC-MS/MS) provide a powerful means to discover biomarker candidates in tissue specimens. Analysis platforms must balance sensitivity for peptide detection, reproducibility of detected peptide inventories and analytical throughput for protein amounts commonly present in tissue biospecimens (< 100 microg), such that platform stability is sufficient to detect modest changes in complex proteomes. We compared shotgun proteomics platforms by analyzing tryptic digests of whole cell and tissue proteomes using strong cation exchange (SCX) and isoelectric focusing (IEF) separations of peptides prior to LC-MS/MS analysis on a LTQ-Orbitrap hybrid instrument. IEF separations provided superior reproducibility and resolution for peptide fractionation from samples corresponding to both large (100 microg) and small (10 microg) protein inputs. SCX generated more peptide and protein identifications than did IEF with small (10 microg) samples, whereas the two platforms yielded similar numbers of identifications with large (100 microg) samples. In nine replicate analyses of tryptic peptides from 50 microg colon adenocarcinoma protein, overlap in protein detection by the two platforms was 77% of all proteins detected by both methods combined. IEF more quickly approached maximal detection, with 90% of IEF-detectable medium abundance proteins (those detected with a total of 3-4 peptides) detected within three replicate analyses. In contrast, the SCX platform required six replicates to detect 90% of SCX-detectable medium abundance proteins. High reproducibility and efficient resolution of IEF peptide separations make the IEF platform superior to the SCX platform for biomarker discovery via shotgun proteomic analyses of tissue specimens.
Collapse
|
91
|
Burgess EF, Ham AJL, Tabb DL, Billheimer D, Roth BJ, Chang SS, Cookson MS, Hinton TJ, Cheek KL, Hill S, Pietenpol JA. Prostate cancer serum biomarker discovery through proteomic analysis of alpha-2 macroglobulin protein complexes. Proteomics Clin Appl 2008; 2:1223. [PMID: 20107526 DOI: 10.1002/prca.200780073] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Alpha-2 macroglobulin (A2M) functions as a universal protease inhibitor in serum and is capable of binding various cytokines and growth factors. In this study, we investigated if immunoaffinity enrichment and proteomic analysis of A2M protein complexes from human serum could improve detection of biologically relevant and novel candidate protein biomarkers in prostate cancer. Serum samples from six patients with androgen-independent, metastatic prostate cancer and six control patients without malignancy were analyzed by immunoaffinity enrichment of A2M protein complexes and MS identification of associated proteins. Known A2M substrates were reproducibly identified from patient serum in both cohorts, as well as proteins previously undetected in human serum. One example is heat shock protein 90 alpha (HSP90α), which was identified only in the serum of cancer patients in this study. Using an ELISA, the presence of HSP90α in human serum was validated on expanded test cohorts and found to exist in higher median serum concentrations in prostate cancer (n = 18) relative to control (n = 13) patients (median concentrations 50.7 versus 27.6 ng/mL, respectively, p = 0.001). Our results demonstrate the technical feasibility of this approach and support the analysis of A2M protein complexes for proteomic-based serum biomarker discovery.
Collapse
|
92
|
Tabb DL, Ma ZQ, Martin DB, Ham AJL, Chambers MC. DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. J Proteome Res 2008; 7:3838-46. [PMID: 18630943 DOI: 10.1021/pr800154p] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In shotgun proteomics, tandem mass spectra of peptides are typically identified through database search algorithms such as Sequest. We have developed DirecTag, an open-source algorithm to infer partial sequence tags directly from observed fragment ions. This algorithm is unique in its implementation of three separate scoring systems to evaluate each tag on the basis of peak intensity, m/ z fidelity, and complementarity. In data sets from several types of mass spectrometers, DirecTag reproducibly exceeded the accuracy and speed of InsPecT and GutenTag, two previously published algorithms for this purpose. The source code and binaries for DirecTag are available from http://fenchurch.mc.vanderbilt.edu.
Collapse
|
93
|
Arnett DR, Jennings JL, Tabb DL, Link AJ, Weil PA. A proteomics analysis of yeast Mot1p protein-protein associations: insights into mechanism. Mol Cell Proteomics 2008; 7:2090-106. [PMID: 18596064 DOI: 10.1074/mcp.m800221-mcp200] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Yeast Mot1p, a member of the Snf2 ATPase family of proteins, is a transcriptional regulator that has the unusual ability to both repress and activate mRNA gene transcription. To identify interactions with other proteins that may assist Mot1p in its regulatory processes, Mot1p was purified from replicate yeast cell extracts, and Mot1p-associated proteins were identified by coupled multidimensional liquid chromatography and tandem mass spectrometry. Using this approach we generated a catalog of Mot1p-interacting proteins. Mot1p interacts with a range of transcriptional co-regulators as well as proteins involved in chromatin remodeling. We propose that interaction with such a wide range of proteins may be one mechanism through which Mot1p subserves its roles as a transcriptional activator and repressor.
Collapse
|
94
|
Cao Z, Li C, Higginbotham JN, Franklin JL, Tabb DL, Graves-Deal R, Hill S, Cheek K, Jerome WG, Lapierre LA, Goldenring JR, Ham AJL, Coffey RJ. Use of fluorescence-activated vesicle sorting for isolation of Naked2-associated, basolaterally targeted exocytic vesicles for proteomics analysis. Mol Cell Proteomics 2008; 7:1651-67. [PMID: 18504258 DOI: 10.1074/mcp.m700155-mcp200] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
By interacting with the cytoplasmic tail of a Golgi-processed form of transforming growth factor-alpha (TGFalpha), Naked2 coats TGFalpha-containing exocytic vesicles and directs them to the basolateral corner of polarized epithelial cells where the vesicles dock and fuse in a Naked2 myristoylation-dependent manner. These TGFalpha-containing Naked2-associated vesicles are not directed to the subapical Sec6/8 exocyst complex as has been reported for other basolateral cargo, and thus they appear to represent a distinct set of basolaterally targeted vesicles. To identify constituents of these vesicles, we exploited our finding that myristoylation-deficient Naked2 G2A vesicles are unable to fuse at the plasma membrane. Isolation of a population of myristoylation-deficient, green fluorescent protein-tagged G2A Naked2-associated vesicles was achieved by biochemical enrichment followed by flow cytometric fluorescence-activated vesicle sorting. The protein content of these plasma membrane de-enriched, flow-sorted fluorescent G2A Naked2 vesicles was determined by LC/LC-MS/MS analysis. Three independent isolations were performed, and 389 proteins were found in all three sets of G2A Naked2 vesicles. Rab10 and myosin IIA were identified as core machinery, and Na(+)/K(+)-ATPase alpha1 was identified as an additional cargo within these vesicles. As an initial validation step, we confirmed their presence and that of three additional proteins tested (annexin A1, annexin A2, and IQGAP1) in wild-type Naked2 vesicles. To our knowledge, this is the first large scale protein characterization of a population of basolaterally targeted exocytic vesicles and supports the use of fluorescence-activated vesicle sorting as a useful tool for isolation of cellular organelles for comprehensive proteomics analysis.
Collapse
|
95
|
Cociorva D, L Tabb D, Yates JR. Validation of tandem mass spectrometry database search results using DTASelect. ACTA ACUST UNITED AC 2008; Chapter 13:Unit 13.4. [PMID: 18428785 DOI: 10.1002/0471250953.bi1304s16] [Citation(s) in RCA: 154] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
DTASelect provides a means by which complex SEQUEST results can be filtered, organized, and viewed. A single sample may produce tens of thousands of tandem mass spectra. Manually perusing and selecting SEQUEST matches among such a mass of data carries a risk of inconsistency. DTASelect allows the user to set complex criteria for acceptance or rejection of individual spectrum results. It also features rules for dealing with multiple, identical peptide matches and for removing proteins that are insufficiently evidenced. It provides its sorted and filtered summary as HTML and text documents for easy review and also offers several auxiliary reports. DTASelect is a powerful tool for automatic analysis of complex mixture tandem mass spectrometry.
Collapse
|
96
|
Padliya ND, Garrett WM, Campbell KB, Tabb DL, Cooper B. Tandem mass spectrometry for the detection of plant pathogenic fungi and the effects of database composition on protein inferences. Proteomics 2008; 7:3932-42. [PMID: 17922518 DOI: 10.1002/pmic.200700419] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.
Collapse
|
97
|
Abstract
The "Paris Guidelines" have begun the process of standardizing reporting for proteomics. New bioinformatics tools have improved the process for estimating error rates of peptide identifications. This perspective seeks to consider these advances in the context of proteomics' short history. As increasing numbers of proteomics papers come from biologists rather than technologists, developing consensus standards for estimating error will be increasingly necessary. Standardizing this assessment should be welcomed as a reflection of the growing impact of proteomic technologies.
Collapse
|
98
|
Tabb DL, Friedman DB, Ham AJL. Verification of automated peptide identifications from proteomic tandem mass spectra. Nat Protoc 2007; 1:2213-22. [PMID: 17406459 PMCID: PMC2819013 DOI: 10.1038/nprot.2006.330] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Shotgun proteomics yields tandem mass spectra of peptides that can be identified by database search algorithms. When only a few observed peptides suggest the presence of a protein, establishing the accuracy of the peptide identifications is necessary for accepting or rejecting the protein identification. In this protocol, we describe the properties of peptide identifications that can differentiate legitimately identified peptides from spurious ones. The chemistry of fragmentation, as embodied in the 'mobile proton' and 'pathways in competition' models, informs the process of confirming or rejecting each spectral match. Examples of ion-trap and tandem time-of-flight (TOF/TOF) mass spectra illustrate these principles of fragmentation.
Collapse
|
99
|
Zhang B, Chambers MC, Tabb DL. Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. J Proteome Res 2007; 6:3549-57. [PMID: 17676885 PMCID: PMC2810678 DOI: 10.1021/pr070230d] [Citation(s) in RCA: 269] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Assembling peptides identified from LC-MS/MS spectra into a list of proteins is a critical step in analyzing shotgun proteomics data. As one peptide sequence can be mapped to multiple proteins in a database, naïve protein assembly can substantially overstate the number of proteins found in samples. We model the peptide-protein relationships in a bipartite graph and use efficient graph algorithms to identify protein clusters with shared peptides and to derive the minimal list of proteins. We test the effects of this parsimony analysis approach using MS/MS data sets generated from a defined human protein mixture, a yeast whole cell extract, and a human serum proteome after MARS column depletion. The results demonstrate that the bipartite parsimony technique not only simplifies protein lists but also improves the accuracy of protein identification. We use bipartite graphs for the visualization of the protein assembly results to render the parsimony analysis process transparent to users. Our approach also groups functionally related proteins together and improves the comprehensibility of the results. We have implemented the tool in the IDPicker package. The source code and binaries for this protein assembly pipeline are available under Mozilla Public License at the following URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.
Collapse
|
100
|
Pan C, Kora G, Tabb DL, Pelletier DA, McDonald WH, Hurst GB, Hettich RL, Samatova NF. Robust estimation of peptide abundance ratios and rigorous scoring of their variability and bias in quantitative shotgun proteomics. Anal Chem 2007; 78:7110-20. [PMID: 17037910 DOI: 10.1021/ac0606554] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The abundance ratio between the light and heavy isotopologues of an isotopically labeled peptide can be estimated from their selected ion chromatograms. However, quantitative shotgun proteomics measurements yield selected ion chromatograms at highly variable signal-to-noise ratios for tens of thousands of peptides. This challenge calls for algorithms that not only robustly estimate the abundance ratios of different peptides but also rigorously score each abundance ratio for the expected estimation bias and variability. Scoring of the abundance ratios, much like scoring of sequence assignment for tandem mass spectra by peptide identification algorithms, enables filtering of unreliable peptide quantification and use of formal statistical inference in the subsequent protein abundance ratio estimation. In this study, a parallel paired covariance algorithm is used for robust peak detection in selected ion chromatograms. A peak profile is generated for each peptide, which is a scatterplot of ion intensities measured for the two isotopologues within their chromatographic peaks. Principal component analysis of the peak profile is proposed to estimate the peptide abundance ratio and to score the estimation with the signal-to-noise ratio of the peak profile (profile signal-to-noise ratio). We demonstrate that the profile signal-to-noise ratio is inversely correlated with the variability and bias of peptide abundance ratio estimation.
Collapse
|