51
|
Thierolf M, Hagmann ML, Pfeffer M, Berntenis N, Wild N, Roeßler M, Palme S, Karl J, Bodenmüller H, Rüschoff J, Rossol S, Rohr G, Rösch W, Friess H, Eickhoff A, Jauch KW, Langen H, Zolg W, Tacke M. Towards a comprehensive proteome of normal and malignant human colon tissue by 2-D-LC-ESI-MS and 2-DE proteomics and identification of S100A12 as potential cancer biomarker. Proteomics Clin Appl 2007; 2:11-22. [PMID: 21136775 DOI: 10.1002/prca.200780046] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2007] [Indexed: 01/02/2023]
Affiliation(s)
- Michael Thierolf
- Roche Diagnostics GmbH, Roche Professional Diagnostics, Penzberg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
52
|
Tanner S, Payne SH, Dasari S, Shen Z, Wilmarth PA, David LL, Loomis WF, Briggs SP, Bafna V. Accurate annotation of peptide modifications through unrestrictive database search. J Proteome Res 2007; 7:170-81. [PMID: 18034453 DOI: 10.1021/pr070444v] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Proteins are extensively modified after translation due to cellular regulation, signal transduction, or chemical damage. Peptide tandem mass spectrometry can discover post-translational modifications, as well as sequence polymorphisms. Recent efforts have studied modifications at the proteomic scale. In this context, it becomes crucial to assess the accuracy of modification discovery. We discuss methods to quantify the false discovery rate from a search and demonstrate how several features can be used to distinguish valid modifications from search artifacts. We present a tool, PTMFinder, which implements these methods. We summarize the corpus of post-translational modifications identified on large data sets. Thousands of known and novel modification sites are identified, including site-specific modifications conserved over vast evolutionary distances.
Collapse
Affiliation(s)
- Stephen Tanner
- Bioinformatics Program, University of California San Diego, La Jolla, California 92093, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
53
|
Tabb DL, Friedman DB, Ham AJL. Verification of automated peptide identifications from proteomic tandem mass spectra. Nat Protoc 2007; 1:2213-22. [PMID: 17406459 PMCID: PMC2819013 DOI: 10.1038/nprot.2006.330] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Shotgun proteomics yields tandem mass spectra of peptides that can be identified by database search algorithms. When only a few observed peptides suggest the presence of a protein, establishing the accuracy of the peptide identifications is necessary for accepting or rejecting the protein identification. In this protocol, we describe the properties of peptide identifications that can differentiate legitimately identified peptides from spurious ones. The chemistry of fragmentation, as embodied in the 'mobile proton' and 'pathways in competition' models, informs the process of confirming or rejecting each spectral match. Examples of ion-trap and tandem time-of-flight (TOF/TOF) mass spectra illustrate these principles of fragmentation.
Collapse
Affiliation(s)
- David L Tabb
- Department of Biochemistry, Vanderbilt University Medical Center, Nashville, Tennessee 37232-8340, USA.
| | | | | |
Collapse
|
54
|
Kolker E, Hogan JM, Higdon R, Kolker N, Landorf E, Yakunin AF, Collart FR, van Belle G. Development of BIATECH-54 standard mixtures for assessment of protein identification and relative expression. Proteomics 2007; 7:3693-8. [PMID: 17890649 DOI: 10.1002/pmic.200700088] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Mixtures of known proteins have been very useful in the assessment and validation of methods for high-throughput (HTP) MS (MS/MS) proteomics experiments. However, these test mixtures have generally consisted of few proteins at near equal concentration or of a single protein at varied concentrations. Such mixtures are too simple to effectively assess the validity of error rates for protein identification and differential expression in HTP MS/MS studies. This work aimed at overcoming these limitations and simulating studies of complex biological samples. We introduced a pair of 54-protein standard mixtures of variable concentrations with up to a 1000-fold dynamic range in concentration and up to ten-fold expression ratios with additional negative controls (infinite expression ratios). These test mixtures comprised 16 off-the-shelf Sigma-Aldrich proteins and 38 Shewanella oneidensis proteins produced in-house. The standard proteins were systematically distributed into three main concentration groups (high, medium, and low) and then the concentrations were varied differently for each mixture within the groups to generate different expression ratios. The mixtures were analyzed with both low mass accuracy LCQ and high mass accuracy FT-LTQ instruments. In addition, these 54 standard proteins closely follow the molecular weight distributions of both bacterial and human proteomes. As a result, these new standard mixtures allow for a much more realistic assessment of approaches for protein identification and label-free differential expression than previous mixtures. Finally, methodology and experimental design developed in this work can be readily applied in future to development of more complex standard mixtures for HTP proteomics studies.
Collapse
|
55
|
Fournier ML, Gilmore JM, Martin-Brown SA, Washburn MP. Multidimensional Separations-Based Shotgun Proteomics. Chem Rev 2007; 107:3654-86. [PMID: 17649983 DOI: 10.1021/cr068279a] [Citation(s) in RCA: 171] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
56
|
Foettinger A, Leitner A, Lindner W. Selective Enrichment of Tryptophan-Containing Peptides from Protein Digests Employing a Reversible Derivatization with Malondialdehyde and Solid-Phase Capture on Hydrazide Beads. J Proteome Res 2007; 6:3827-34. [PMID: 17655347 DOI: 10.1021/pr0702767] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A method for the selective enrichment of tryptophan-containing peptides from complex peptide mixtures such as protein digests is presented. It is based on the reversible reaction of tryptophan with malondialdehyde and trapping of the derivatized Trp-peptides on hydrazide beads via the free aldehyde group of the modified peptides. The peptides are subsequently recovered in their native form by specific cleavage reactions for further (mass spectrometric) analysis. The method was optimized and evaluated using a tryptic digest of a mixture of 10 model proteins, demonstrating a significant reduction in sample complexity while still allowing the identification of all proteins. The applicability of the tryptophan-specific enrichment procedure to complex biological samples is demonstrated for a total yeast cell lysate. Analysis of the processed fraction by 1D-LC-MS/MS confirms the specificity of the enrichment procedure, as more than 85% of the peptides recovered from the enrichment step contained tryptophan. The reduction in sample complexity also resulted in the identification of additional proteins in comparison to the untreated lysate.
Collapse
Affiliation(s)
- Alexandra Foettinger
- Department of Analytical Chemistry and Food Chemistry, University of Vienna, Waehringer Strasse 38, 1090 Vienna, Austria
| | | | | |
Collapse
|
57
|
Lubec G, Afjehi-Sadat L. Limitations and pitfalls in protein identification by mass spectrometry. Chem Rev 2007; 107:3568-84. [PMID: 17645314 DOI: 10.1021/cr068213f] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Gert Lubec
- Medical University of Vienna, Department of Pediatrics, Waehringer Guertel 18, A-1090 Vienna, Austria.
| | | |
Collapse
|
58
|
Huttlin EL, Hegeman AD, Harms AC, Sussman MR. Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy. J Proteome Res 2007; 6:392-8. [PMID: 17203984 PMCID: PMC2572755 DOI: 10.1021/pr0603194] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In recent years, a variety of approaches have been developed using decoy databases to empirically assess the error associated with peptide identifications from large-scale proteomics experiments. We have developed an approach for calculating the expected uncertainty associated with false-positive rate determination using concatenated reverse and forward protein sequence databases. After explaining the theoretical basis of our model, we compare predicted error with the results of experiments characterizing a series of mixtures containing known proteins. In general, results from characterization of known proteins show good agreement with our predictions. Finally, we consider how these approaches may be applied to more complicated data sets, as when peptides are separated by charge state prior to false-positive determination.
Collapse
Affiliation(s)
- Edward L. Huttlin
- University of Wisconsin Department of Biochemistry, 433 Babcock Drive, Madison, WI, 53706
- University of Wisconsin Biotechnology Center, 425 Henry Mall, Madison, WI, 53706
| | - Adrian D. Hegeman
- University of Wisconsin Biotechnology Center, 425 Henry Mall, Madison, WI, 53706
| | - Amy C. Harms
- University of Wisconsin Biotechnology Center, 425 Henry Mall, Madison, WI, 53706
| | - Michael R. Sussman
- University of Wisconsin Department of Biochemistry, 433 Babcock Drive, Madison, WI, 53706
- University of Wisconsin Biotechnology Center, 425 Henry Mall, Madison, WI, 53706
- Correspondence: Michael R. Sussman, Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706, Phone: (608) 262-8608, Fax: (608) 262-6748, E-mail:
| |
Collapse
|
59
|
Brusic V, Marina O, Wu CJ, Reinherz EL. Proteome informatics for cancer research: from molecules to clinic. Proteomics 2007; 7:976-91. [PMID: 17370257 DOI: 10.1002/pmic.200600965] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Proteomics offers the most direct approach to understand disease and its molecular biomarkers. Biomarkers denote the biological states of tissues, cells, or body fluids that are useful for disease detection and classification. Clinical proteomics is used for early disease detection, molecular diagnosis of disease, identification and formulation of therapies, and disease monitoring and prognostics. Bioinformatics tools are essential for converting raw proteomics data into knowledge and subsequently into useful applications. These tools are used for the collection, processing, analysis, and interpretation of the vast amounts of proteomics data. Management, analysis, and interpretation of large quantities of raw and processed data require a combination of various informatics technologies such as databases, sequence comparison, predictive models, and statistical tools. We have demonstrated the utility of bioinformatics in clinical proteomics through the analysis of the cancer antigen survivin and its suitability as a target for cancer immunotherapy.
Collapse
Affiliation(s)
- Vladimir Brusic
- Cancer Vaccine Center, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
| | | | | | | |
Collapse
|
60
|
Smith JC, Duchesne MA, Tozzi P, Ethier M, Figeys D. A Differential Phosphoproteomic Analysis of Retinoic Acid-Treated P19 Cells. J Proteome Res 2007; 6:3174-86. [PMID: 17622165 DOI: 10.1021/pr070122r] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
External stimuli trigger internal signaling events within a cell that may represent either a temporary or permanent shift in the phosphorylation state of its proteome. Numerous reports have elucidated phosphorylation sites from a variety of biological samples and more recent studies have monitored the temporal dynamics of protein phosphorylation as a given system is perturbed. Understanding which proteins are phosphorylated as well as when they are phosphorylated may indicate novel functional roles within a system and allow new therapeutic avenues to be explored. To elucidate the dynamics of protein phosphorylation within differentiating murine P19 embryonal carcinoma cells, we induced P19 cells to differentiate using all-trans-retinoic acid and developed a strategy that combines isotopically labeled methyl esterification, immobilized metal affinity chromatography, mass spectrometric analysis, and a rigorous and unique data evaluation approach. We present the largest differential phosphoproteomic analysis using isotopically labeled methyl esterification to date, identifying a total of 472 phosphorylation sites on 151 proteins; 56 of these proteins had altered abundances following treatment with retinoic acid and approximately one-third of these have been previously associated with cellular differentiation. A series of bioinformatic tools were used to extract information from the data and explore the implications of our findings. This study represents the first global gel-free analysis that elucidates protein phosphorylation dynamics during cellular differentiation.
Collapse
Affiliation(s)
- Jeffrey C Smith
- Ottawa Institute of Systems Biology and Biochemistry, Microbiology and Immunology Department, Faculty of Medicine, University of Ottawa, 451 Smyth Road, Ottawa, Ontario K1H 8M5, Canada
| | | | | | | | | |
Collapse
|
61
|
Starkweather R, Barnes CS, Wyckoff GJ, Keightley JA. Virtual polymorphism: finding divergent peptide matches in mass spectrometry data. Anal Chem 2007; 79:5030-9. [PMID: 17521167 DOI: 10.1021/ac0703496] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The prevailing method of analyzing tandem-MS data for protein identification involves the comparison of peptide molecular weight and fragmentation data to theoretically predicted values, based on known protein sequences in databases. This is generally effective since proteins from most species under study are in the database or have sufficient homology to allow significant matching. We have encountered difficulties identifying proteins from fungal species Alternaria alternata due to significant interspecies protein sequence differences (divergence) and its absence from the database. This common household mold causes asthma and allergy problems, but the genome has not been sequenced. De novo sequencing and error-tolerant methods can facilitate protein identifications in divergent, unsequenced species. But these standard methods can be laborious and only allow single amino acid substitution, respectively. We have developed an alternative approach focusing on database engineering, predicting biologically rational polymorphism using statistically weighted amino acid substitution information held in BLOSUM62. Like other second pass methods, it is based on the initially identified protein. However, this approach allows more control over sequences to be considered, including multiple changes per peptide. The results show considerable improvement for routine protein identification and the potential for rescuing otherwise unconvincing identifications in unusually divergent species.
Collapse
Affiliation(s)
- Rebekah Starkweather
- Division of Molecular Biology and Biochemistry, University of Missouri-Kansas City, 5007 Rockhill Road, Kansas City, Missouri 64110, USA
| | | | | | | |
Collapse
|
62
|
Feng J, Naiman DQ, Cooper B. Probability-based pattern recognition and statistical framework for randomization: modeling tandem mass spectrum/peptide sequence false match frequencies. Bioinformatics 2007; 23:2210-7. [PMID: 17510167 DOI: 10.1093/bioinformatics/btm267] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In proteomics, reverse database searching is used to control the false match frequency for tandem mass spectrum/peptide sequence matches, but reversal creates sequences devoid of patterns that usually challenge database-search software. RESULTS We designed an unsupervised pattern recognition algorithm for detecting patterns with various lengths from large sequence datasets. The patterns found in a protein sequence database were used to create decoy databases using a Monte Carlo sampling algorithm. Searching these decoy databases led to the prediction of false positive rates for spectrum/peptide sequence matches. We show examples where this method, independent of instrumentation, database-search software and samples, provides better estimation of false positive identification rates than a prevailing reverse database searching method. The pattern detection algorithm can also be used to analyze sequences for other purposes in biology or cryptology. AVAILABILITY On request from the authors. SUPPLEMENTARY INFORMATION http://bioinformatics.psb.ugent.be/.
Collapse
Affiliation(s)
- Jian Feng
- Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland, USA
| | | | | |
Collapse
|
63
|
Mitra SK, Gantt JA, Ruby JF, Clouse SD, Goshe MB. Membrane proteomic analysis of Arabidopsis thaliana using alternative solubilization techniques. J Proteome Res 2007; 6:1933-50. [PMID: 17432890 DOI: 10.1021/pr060525b] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This study presents a comparative proteomic analysis of the membrane subproteome of whole Arabidopsis seedlings using 2% Brij-58 or 60% methanol to enrich and solubilize membrane proteins for strong cation exchange fractionation and reversed-phase liquid chromatography-tandem mass spectrometry (LC-MS/MS). A total of 441 proteins were identified by our Brij-58 method, and 300 proteins were detected by our methanol-based solubilization approach. Although the total number of proteins obtained using the nonionic detergent was higher than the total obtained by organic solvent, the ratio of predicted membrane proteins to total proteins identified indicates up to an 18.6% greater enrichment efficiency using methanol. Using two different bioinformatics approaches, between 31.0 and 40.0% of the total proteins identified by the methanol-based method were classified as containing at least one putative transmembrane domain as compared to 22.0-23.4% for Brij-58. In terms of protein hydrophobicity as determined by the GRAVY index, it was revealed that methanol was more effective than Brij-58 for solubilizing membrane proteins ranging from -0.4 (hydrophilic) to +0.4 (hydrophobic). Methanol was also approximately 3-fold more effective than Brij-58 in identifying leucine-rich repeat receptor-like kinases. The ability of methanol to effectively solubilize and denature both hydrophobic and hydrophilic proteins was demonstrated using bacteriorhodopsin and cytochrome c, respectively, where both proteins were identified with at least 82% sequence coverage from a single reversed-phase LC-MS/MS analysis. Overall, our data show that methanol is a better alternative for identifying a wider range of membrane proteins than the nonionic detergent Brij-58.
Collapse
Affiliation(s)
- Srijeet K Mitra
- Department of Horticultural Science, North Carolina State University, Raleigh, North Carolina 27695-7609, USA
| | | | | | | | | |
Collapse
|
64
|
Zybailov BL, Florens L, Washburn MP. Quantitative shotgun proteomics using a protease with broad specificity and normalized spectral abundance factors. MOLECULAR BIOSYSTEMS 2007; 3:354-60. [PMID: 17460794 DOI: 10.1039/b701483j] [Citation(s) in RCA: 120] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Non-specific proteases are rarely used in quantitative shotgun proteomics due to potentially high false discovery rates. Yet, there are instances when application of a non-specific protease is desirable to obtain sufficient sequence coverage of otherwise poorly accessible proteins or structural domains. Using the non-specific protease, proteinase K, we analyzed Saccharomyces cerevisiae preparations grown in (14)N rich media and (15)N minimal media and obtained relative quantitation from the dataset using normalized spectral abundance factors (NSAFs). A critical step in using a spectral counting based approach for quantitative proteomics is ensuring the inclusion of high quality spectra in the dataset. One way to do this is to minimize the false discovery rate, which can be accomplished by applying different filters to a searched dataset. Natural log transformation of proteinase K derived NSAF values followed a normal distribution and allowed for statistical analysis by the t-test. Using this approach, we generated a dataset of 719 unique proteins found in each of the three independent biological replicates, of which 84 showed a statistically significant difference in expression levels between the two growth conditions.
Collapse
Affiliation(s)
- Boris L Zybailov
- Stowers Institute for Medical Research, 1000 E. 50th St., Kansas City, MO 64110, USA
| | | | | |
Collapse
|
65
|
Elias JE, Gygi SP. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 2007; 4:207-14. [PMID: 17327847 DOI: 10.1038/nmeth1019] [Citation(s) in RCA: 3046] [Impact Index Per Article: 179.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this database search strategy are reasonable; (ii) concatenated target-decoy database searches are preferable to separate target and decoy database searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy databases are similarly effective once certain considerations are taken into account.
Collapse
Affiliation(s)
- Joshua E Elias
- Department of Cell Biology, 240 Longwood Avenue, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | |
Collapse
|
66
|
Tabb DL, Fernando CG, Chambers MC. MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 2007; 6:654-61. [PMID: 17269722 PMCID: PMC2525619 DOI: 10.1021/pr0604054] [Citation(s) in RCA: 428] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.
Collapse
Affiliation(s)
- David L Tabb
- Mass Spectrometry Research Center / Departments of Biomedical Informatics and Biochemistry, Vanderbilt University Medical Center, Nashville, TN 37232-8575, USA.
| | | | | |
Collapse
|
67
|
Abstract
MOTIVATION Tandem mass-spectrometry of trypsin digests, followed by database searching, is one of the most popular approaches in high-throughput proteomics studies. Peptides are considered identified if they pass certain scoring thresholds. To avoid false positive protein identification, > or = 2 unique peptides identified within a single protein are generally recommended. Still, in a typical high-throughput experiment, hundreds of proteins are identified only by a single peptide. We introduce here a method for distinguishing between true and false identifications among single-hit proteins. The approach is based on randomized database searching and usage of logistic regression models with cross-validation. This approach is implemented to analyze three bacterial samples enabling recovery 68-98% of the correct single-hit proteins with an error rate of < 2%. This results in a 22-65% increase in number of identified proteins. Identifying true single-hit proteins will lead to discovering many crucial regulators, biomarkers and other low abundance proteins. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
68
|
Reidegeld KA, Müller M, Stephan C, Blüggel M, Hamacher M, Martens L, Körting G, Chamrad DC, Parkinson D, Apweiler R, Meyer HE, Marcus K. The power of cooperative investigation: summary and comparison of the HUPO Brain Proteome Project pilot study results. Proteomics 2006; 6:4997-5014. [PMID: 16912976 DOI: 10.1002/pmic.200600305] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Within the pilot phase of the HUPO Brain Proteome Project, nine participating laboratories analysed human (epilepsy and/or post mortem material) and mouse brain samples (embryonic, juvenile and adult), respectively, using a variety of different state of the art techniques. Thirty-seven different analytical approaches were accomplished. Of these analyses, 17 were done differentially, i.e. the protein expression patterns of the different samples (human or mouse) were compared. A catalogue of all proteins present in the respective sample was built in 20 analyses (mapping). All data were collected in the Data Collection Center in Bochum, Germany, and were reprocessed according to thoroughly defined parameters. In this report, a summary of all results and inter-laboratory comparisons with respect to the number of identified proteins, the analysed organism, and the used techniques is presented.
Collapse
Affiliation(s)
- Kai A Reidegeld
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
69
|
Stephan C, Reidegeld KA, Hamacher M, van Hall A, Marcus K, Taylor C, Jones P, Müller M, Apweiler R, Martens L, Körting G, Chamrad DC, Thiele H, Blüggel M, Parkinson D, Binz PA, Lyall A, Meyer HE. Automated reprocessing pipeline for searching heterogeneous mass spectrometric data of the HUPO Brain Proteome Project pilot phase. Proteomics 2006; 6:5015-29. [PMID: 16927432 DOI: 10.1002/pmic.200600294] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The newly available techniques for sensitive proteome analysis and the resulting amount of data require a new bioinformatics focus on automatic methods for spectrum reprocessing and peptide/protein validation. Manual validation of results in such studies is not feasible and objective enough for quality relevant interpretation. The necessity for tools enabling an automatic quality control is, therefore, important to produce reliable and comparable data in such big consortia as the Human Proteome Organization Brain Proteome Project. Standards and well-defined processing pipelines are important for these consortia. We show a way for choosing the right database model, through collecting data, processing these with a decoy database and end up with a quality controlled protein list merged from several search engines, including a known false-positive rate.
Collapse
Affiliation(s)
- Christian Stephan
- Medizinisches Proteom-Center, Ruhr-Universität Bochum, Bochum, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
70
|
Hogan JM, Higdon R, Kolker E. Experimental Standards for High-Throughput Proteomics. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2006; 10:152-7. [PMID: 16901220 DOI: 10.1089/omi.2006.10.152] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Proteome analysis, utilizing high-throughput proteomics approaches, involves studying proteins that a whole organism (or specific tissue or cellular compartment) expresses under certain conditions. Intrinsic difficulties of these studies, as well as the enormous volumes of data they typically produce, make the proteome analysis and interpretation very difficult. As with any high-throughput approach, proteomics experiments should be carefully designed, analyzed, and verified. In addition to computational standards,experimental standards--simple and complex mixtures of known proteins--for high-throughput proteomics have to be developed and utilized. This article discusses such experimental standards and their implementations.
Collapse
Affiliation(s)
- Jason M Hogan
- The BIATECH Institute, Bothell, Washington 98011, USA
| | | | | |
Collapse
|
71
|
Kolker E, Higdon R, Hogan JM. Protein identification and expression analysis using mass spectrometry. Trends Microbiol 2006; 14:229-35. [PMID: 16603360 DOI: 10.1016/j.tim.2006.03.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2005] [Revised: 03/02/2006] [Accepted: 03/22/2006] [Indexed: 11/28/2022]
Abstract
The identification and quantification of the proteins that a whole organism expresses under certain conditions is a main focus of high-throughput proteomics. Advanced proteomics approaches generate new biologically relevant data and potent hypotheses. A practical report of what proteome studies can and cannot accomplish in common laboratory settings is presented here. The review discusses the most popular tandem mass-spectrometry-based methods and focuses on how to produce reliable results. A step-by-step description of proteome experiments is given, including sample preparation, digestion, labeling, liquid chromatography, data processing, database searching and statistical analysis. The difficulties and bottlenecks of proteome analysis are addressed and the requirements for further improvements are discussed. Several diverse high-throughput proteomics-based studies of microorganisms are described.
Collapse
Affiliation(s)
- Eugene Kolker
- The BIATECH Institute, 19310 North Creek Parkway, Suite 115, Bothell, WA 98011, USA.
| | | | | |
Collapse
|