Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

123
(from Reference Citation Analysis)

Article PDFs (25)

Cited by > 0 (114)

Searched Name

David L Tabb

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Number	Citation Analysis
101	Narasimhan C, Tabb DL, Verberkmoes NC, Thompson MR, Hettich RL, Uberbacher EC. MASPIC: intensity-based tandem mass spectrometry scoring scheme that improves peptide identification at high confidence. Anal Chem 2007;77:7581-93. [PMID: 16316165 DOI: 10.1021/ac0501745] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Abstract Algorithmic search engines bridge the gap between large tandem mass spectrometry data sets and the identification of proteins associated with biological samples. Improvements in these tools can greatly enhance biological discovery. We present a new scoring scheme for comparing tandem mass spectra with a protein sequence database. The MASPIC (Multinomial Algorithm for Spectral Profile-based Intensity Comparison) scorer converts an experimental tandem mass spectrum into a m/z profile of probability and then scores peak lists from potential candidate peptides using a multinomial distribution model. The MASPIC scoring scheme incorporates intensity, spectral peak density variations, and m/z error distribution associated with peak matches into a multinomial distribution. The scoring scheme was validated on two standard protein mixtures and an additional set of spectra collected on a complex ribosomal protein mixture from Rhodopseudomonas palustris. The results indicate a 5-15% improvement over Sequest for high-confidence identifications. The performance gap grows as sequence database size increases. Additional tests on spectra from proteinase-K digest data showed similar performance improvements demonstrating the advantages in using MASPIC for studying proteins digested with less specific proteases. All these investigations show MASPIC to be a versatile and reliable system for peptide tandem mass spectral identification. Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Sequence Endopeptidase K/chemistry Endopeptidase K/metabolism Molecular Sequence Data Peptides/analysis Peptides/chemistry Rhodopseudomonas/chemistry Ribosomes/chemistry Tandem Mass Spectrometry/methods Collapse Grants Collapse
102	Tabb DL, Fernando CG, Chambers MC. MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 2007;6:654-61. [PMID: 17269722 PMCID: PMC2525619 DOI: 10.1021/pr0604054] [Citation(s) in RCA: 428] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/. Collapse Key Words Collapse MESH Headings Databases, Factual Multivariate Analysis Peptide Fragments/chemistry Peptide Fragments/isolation & purification Peptides/chemistry Peptides/isolation & purification Protein Conformation Proteins/chemistry Sequence Homology, Amino Acid Collapse Grants P30 ES000267 NIEHS NIH HHS P30 ES000267-40 NIEHS NIH HHS R01 CA126218 NCI NIH HHS R01 CA126218-01 NCI NIH HHS R01 HL071002 NHLBI NIH HHS U24 CA126479-01 NCI NIH HHS U24 CA126479 NCI NIH HHS 1U24 CA 126479-01 NCI NIH HHS HL 071002 NHLBI NIH HHS P30 ES 000267 NIEHS NIH HHS 1R01 CA 126218-01 NCI NIH HHS Collapse
103	Tabb DL, Narasimhan C, Strader MB, Hettich RL. DBDigger: reorganized proteomic database identification that improves flexibility and speed. Anal Chem 2007;77:2464-74. [PMID: 15828782 DOI: 10.1021/ac0487000] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Abstract Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead, DBDigger determines which spectra can be compared to each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization also reduces the number of times a spectrum must be predicted for a particular candidate sequence and charge state. As a result, DBDigger can accelerate some database searches by more than an order of magnitude. In addition, the software offers features to reduce the performance degradation introduced by posttranslational modification (PTM) searching. DBDigger allows researchers to specify the sequence context in which each PTM is possible. In the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini of peptides. Use of "context-dependent" PTM searching reduces the performance penalty relative to traditional PTM searching. We characterize the performance possible with DBDigger, showcasing MASPIC, a new statistical scorer. We describe the implementation of these innovations in the hope that other researchers will employ them for rapid and highly flexible proteomic database search. Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Sequence Chromatography, High Pressure Liquid Databases, Protein Mass Spectrometry/methods Molecular Sequence Data Protein Processing, Post-Translational Proteins/analysis Proteins/chemistry Proteins/metabolism Proteomics/methods Ribosomal Proteins/analysis Ribosomal Proteins/chemistry Ribosomal Proteins/metabolism Software Collapse Grants Collapse
104	Pan C, Kora G, McDonald WH, Tabb DL, VerBerkmoes NC, Hurst GB, Pelletier DA, Samatova NF, Hettich RL. ProRata: A Quantitative Proteomics Program for Accurate Protein Abundance Ratio Estimation with Confidence Interval Evaluation. Anal Chem 2006;78:7121-31. [PMID: 17037911 DOI: 10.1021/ac060654b] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract A profile likelihood algorithm is proposed for quantitative shotgun proteomics to infer the abundance ratios of proteins from the abundance ratios of isotopically labeled peptides derived from proteolysis. Previously, we have shown that the estimation variability and bias of peptide abundance ratios can be predicted from their profile signal-to-noise ratios. Given multiple quantified peptides for a protein, the profile likelihood algorithm probabilistically weighs the peptide abundance ratios by their inferred estimation variability, accounts for their expected estimation bias, and suppresses contribution from outliers. This algorithm yields maximum likelihood point estimation and profile likelihood confidence interval estimation of protein abundance ratios. This point estimator is more accurate than an estimator based on the average of peptide abundance ratios. The confidence interval estimation provides an "error bar" for each protein abundance ratio that reflects its estimation precision and statistical uncertainty. The accuracy of the point estimation and the precision and confidence level of the interval estimation were benchmarked with standard mixtures of isotopically labeled proteomes. The profile likelihood algorithm was integrated into a quantitative proteomics program, called ProRata, freely available at www.MSProRata.org. Collapse Key Words Collapse MESH Headings Algorithms Bacterial Proteins/analysis Bias Confidence Intervals Hot Temperature Proteome Proteomics/methods Rhodopseudomonas Software Software Design Collapse Grants Collapse
105	Tabb DL, Shah MB, Strader MB, Connelly HM, Hettich RL, Hurst GB. Determination of peptide and protein ion charge states by Fourier transformation of isotope-resolved mass spectra. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2006;17:903-915. [PMID: 16713712 DOI: 10.1016/j.jasms.2006.02.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2005] [Revised: 01/30/2006] [Accepted: 02/01/2006] [Indexed: 05/09/2023] Abstract We report an automated method for determining charge states from high-resolution mass spectra. Fourier transforms of isotope packets from high-resolution mass spectra are compared to Fourier transforms of modeled isotopic peak packets for a range of charge states. The charge state for the experimental ion packet is determined by the model isotope packet that yields the best match in the comparison of the Fourier transforms. This strategy is demonstrated for determining peptide ion charge states from "zoom scan" data from a linear quadrupole ion trap mass spectrometer, enabling the subsequent automated identification of singly- through quadruply-charged peptide ions, while reducing the numbers of conflicting identifications from ambiguous charge state assignments. We also apply this technique to determine the charges of intact protein ions from LC-FTICR data, demonstrating that it is more sensitive under these experimental conditions than two existing algorithms. The strategy outlined in this paper should be generally applicable to mass spectra obtained from any instrument capable of isotopic resolution. Collapse Key Words Collapse MESH Headings Algorithms Chromatography, High Pressure Liquid/methods Computer Simulation Fourier Analysis Ions Isotope Labeling/methods Models, Chemical Peptides/chemistry Proteins/chemistry Signal Processing, Computer-Assisted Spectrometry, Mass, Electrospray Ionization/methods Static Electricity Collapse Grants Collapse
106	VerBerkmoes NC, Shah MB, Lankford PK, Pelletier DA, Strader MB, Tabb DL, McDonald WH, Barton JW, Hurst GB, Hauser L, Davison BH, Beatty JT, Harwood CS, Tabita FR, Hettich RL, Larimer FW. Determination and comparison of the baseline proteomes of the versatile microbe Rhodopseudomonas palustris under its major metabolic states. J Proteome Res 2006;5:287-98. [PMID: 16457594 DOI: 10.1021/pr0503230] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract Rhodopseudomonas palustris is a purple nonsulfur anoxygenic phototrophic bacterium that is ubiquitous in soil and water. R. palustris is metabolically versatile with respect to energy generation and carbon and nitrogen metabolism. We have characterized and compared the baseline proteome of a R. palustris wild-type strain grown under six metabolic conditions. The methodology for proteome analysis involved protein fractionation by centrifugation, subsequent digestion with trypsin, and analysis of peptides by liquid chromatography coupled with tandem mass spectrometry. Using these methods, we identified 1664 proteins out of 4836 predicted proteins with conservative filtering constraints. A total of 107 novel hypothetical proteins and 218 conserved hypothetical proteins were detected. Qualitative analyses revealed over 311 proteins exhibiting marked differences between conditions, many of these being hypothetical or conserved hypothetical proteins showing strong correlations with different metabolic modes. For example, five proteins encoded by genes from a novel operon appeared only after anaerobic growth with no evidence of these proteins in extracts of aerobically grown cells. Proteins known to be associated with specialized growth states such as nitrogen fixation, photoautotrophic, or growth on benzoate, were observed to be up-regulated under those states. Collapse Key Words Collapse MESH Headings Aerobiosis/physiology Anaerobiosis/physiology Bacterial Proteins/metabolism Chromatography, Liquid Gene Expression Regulation, Bacterial Light Nitrogen Fixation Proteome Rhodopseudomonas/metabolism Spectrometry, Mass, Electrospray Ionization Collapse Grants Collapse
107	Strader MB, Tabb DL, Hervey WJ, Pan C, Hurst GB. Efficient and specific trypsin digestion of microgram to nanogram quantities of proteins in organic-aqueous solvent systems. Anal Chem 2006;78:125-34. [PMID: 16383319 DOI: 10.1021/ac051348l] [Citation(s) in RCA: 144] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract Mass spectrometry-based identification of the components of multiprotein complexes often involves solution-phase proteolytic digestion of the complex. The affinity purification of individual protein complexes often yields nanogram to low-microgram amounts of protein, which poses several challenges for enzymatic digestion and protein identification. We tested different solvent systems to optimize trypsin digestions of samples containing limited amounts of protein for subsequent analysis by LC-MS-MS. Data collected from digestion of 10-, 2-, 1-, and 0.2-microg portions of a protein standard mixture indicated that an organic-aqueous solvent system containing 80% acetonitrile consistently provided the most complete digestion, producing more peptide identifications than the other solvent systems tested. For example, a 1-h digestion in 80% acetonitrile yielded over 52% more peptides than the overnight digestion of 1 microg of a protein mixture in purely aqueous buffer. This trend was also observed for peptides from digested ribosomal proteins isolated from Rhodopseudomonas palustris. In addition to improved digestion efficiency, the shorter digestion times possible with the organic solvent also improved trypsin specificity, resulting in smaller numbers of semitryptic peptides than an overnight digestion protocol using an aqueous solvent. The technique was also demonstrated for an affinity-isolated protein complex, GroEL. To our knowledge, this report is the first using mass spectrometry data to show a linkage between digestion solvent and trypsin specificity. Collapse Key Words Collapse MESH Headings Acetonitriles Chaperonin 60/metabolism Chromatography, Liquid Electrophoresis, Gel, Two-Dimensional Hydrolysis Mass Spectrometry/methods Peptide Fragments/analysis Peptide Mapping Proteins/analysis Proteins/metabolism Proteomics Ribosomes/metabolism Rumex/chemistry Trypsin/pharmacology Water Collapse Grants Collapse
108	Tabb DL, Thompson MR, Khalsa-Moyers G, VerBerkmoes NC, McDonald WH. MS2Grouper: group assessment and synthetic replacement of duplicate proteomic tandem mass spectra. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2005;16:1250-61. [PMID: 15979332 DOI: 10.1016/j.jasms.2005.04.010] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2005] [Revised: 04/18/2005] [Accepted: 04/19/2005] [Indexed: 05/03/2023] Abstract Shotgun proteomics experiments require the collection of thousands of tandem mass spectra; these sets of data will continue to grow as new instruments become available that can scan at even higher rates. Such data contain substantial amounts of redundancy with spectra from a particular peptide being acquired many times during a single LC-MS/MS experiment. In this article, we present MS2Grouper, an algorithm that detects spectral duplication, assesses groups of related spectra, and replaces these groups with synthetic representative spectra. Errors in detecting spectral similarity are corrected using a paraclique criterion-spectra are only assessed as groups if they are part of a clique of at least three completely interrelated spectra or are subsequently added to such cliques by being similar to all but one of the clique members. A greedy algorithm constructs a representative spectrum for each group by iteratively removing the tallest peaks from the spectral collection and matching to peaks in the other spectra. This strategy is shown to be effective in reducing spectral counts by up to 20% in LC-MS/MS datasets from protein standard mixtures and proteomes, reducing database search times without a concomitant reduction in identified peptides. Collapse Key Words Collapse MESH Headings Algorithms Animals Databases, Protein Mass Spectrometry/instrumentation Mass Spectrometry/methods Peptides/analysis Proteome/analysis Proteomics/instrumentation Proteomics/methods Software Collapse Grants Collapse
109	Tabb DL, Saraf A, Yates JR. GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal Chem 2004;75:6415-21. [PMID: 14640709 PMCID: PMC2915448 DOI: 10.1021/ac0347462] [Citation(s) in RCA: 177] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract Shotgun proteomics is a powerful tool for identifying the protein content of complex mixtures via liquid chromatography and tandem mass spectrometry. The most widely used class of algorithms for analyzing mass spectra of peptides has been database search software such as SEQUEST. A new sequence tag database search algorithm, called GutenTag, makes it possible to identify peptides with unknown posttranslational modifications or sequence variations. This software automates the process of inferring partial sequence "tags" directly from the spectrum and efficiently examines a sequence database for peptides that match these tags. When multiple candidate sequences result from the database search, the software evaluates which is the best match by a rapid examination of spectral fragment ions. We compare GutenTag's accuracy to that of SEQUEST on a defined protein mixture, showing that both modified and unmodified peptides can be successfully identified by this approach. GutenTag analyzed 33,000 spectra from a human lens sample, identifying peptides that were missed in prior SEQUEST analysis due to sequence polymorphisms and posttranslational modifications. The software is available under license; visit http://fields.scripps.edu for information. Collapse Key Words Collapse MESH Headings Algorithms Databases, Genetic/standards Sequence Analysis, Protein/methods Sequence Analysis, Protein/standards Software/standards Collapse Grants R01 EY013288-03 NEI NIH HHS R01 EY013288 NEI NIH HHS RR11823-08 NCRR NIH HHS P41 RR011823-08 NCRR NIH HHS P41 RR011823 NCRR NIH HHS R01 EY13288-03 NEI NIH HHS R33 CA82665 NCI NIH HHS R33 CA081665-04 NCI NIH HHS Collapse
110	Strader MB, Verberkmoes NC, Tabb DL, Connelly HM, Barton JW, Bruce BD, Pelletier DA, Davison BH, Hettich RL, Larimer FW, Hurst GB. Characterization of the 70S Ribosome from Rhodopseudomonas palustris Using an Integrated “Top-Down” and “Bottom-Up” Mass Spectrometric Approach. J Proteome Res 2004;3:965-78. [PMID: 15473684 DOI: 10.1021/pr049940z] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract We present a comprehensive mass spectrometric approach that integrates intact protein molecular mass measurement ("top-down") and proteolytic fragment identification ("bottom-up") to characterize the 70S ribosome from Rhodopseudomonas palustris. Forty-two intact protein identifications were obtained by the top-down approach and 53 out of the 54 orthologs to Escherichia coli ribosomal proteins were identified from bottom-up analysis. This integrated approach simplified the assignment of post-translational modifications by increasing the confidence of identifications, distinguishing between isoforms, and identifying the amino acid positions at which particular post-translational modifications occurred. Our combined mass spectrometry data also allowed us to check and validate the gene annotations for three ribosomal proteins predicted to possess extended C-termini. In particular, we identified a highly repetitive C-terminal "alanine tail" on L25. This type of low complexity sequence, common to eukaryotic proteins, has previously not been reported in prokaryotic proteins. To our knowledge, this is the most comprehensive protein complex analysis to date that integrates two MS techniques. Collapse Key Words Collapse MESH Headings Acetylation Amino Acid Sequence Bacterial Proteins/analysis Bacterial Proteins/chemistry Chromatography, High Pressure Liquid/methods Chromatography, Liquid Databases, Protein Escherichia coli Proteins Fourier Analysis Mass Spectrometry/methods Methionine/chemistry Methylation Molecular Sequence Data Protein Processing, Post-Translational Proteomics/methods Rhodopseudomonas/metabolism Ribosomal Protein L3 Ribosomal Proteins/analysis Ribosomal Proteins/chemistry Ribosomes/metabolism Sequence Alignment Sequence Homology Sequence Homology, Amino Acid Spectrometry, Mass, Electrospray Ionization Trypsin/metabolism Collapse Grants Collapse
111	Tabb DL, Huang Y, Wysocki VH, Yates JR. Influence of basic residue content on fragment ion peak intensities in low-energy collision-induced dissociation spectra of peptides. Anal Chem 2004;76:1243-8. [PMID: 14987077 PMCID: PMC2813199 DOI: 10.1021/ac0351163] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Abstract The primary utility of trypsin digestion in proteomics is that it cleaves proteins at predictable locations, but it is also notable for yielding peptides that terminate in basic arginine and lysine residues. Tryptic peptides fragment in ion trap tandem mass spectrometry to produce prominent C-terminal y series ions. Alternative proteolytic digests may produce peptides that do not follow these rules. In this study, we examine 2568 peptides generated through proteinase K digestion, a technique that produces a greater diversity of basic residue content in peptides. We show that the position of basic residues within peptides influences the peak intensities of b and y series ions; a basic residue near the N-terminus of a peptide can lead to prominent b series peaks rather than the intense y series peaks associated with tryptic peptides. The effects of presence and position for arginine, lysine, and histidine are explored separately and in combination. Arg shows the most dominant effects followed by His and then by Lys. Fragment ions containing basic residues produce more intense peaks than those without basic residues. Doubly charged precursor ions have generally been modeled as producing only singly charged fragment ions, but fragment ions that contain two basic residues may accept both protons during fragmentation. By characterizing the influence of basic residues on gas-phase fragmentation of peptides, this research makes possible more accurate fragmentation models for peptide identification algorithms. Collapse Key Words Collapse MESH Headings Algorithms Animals Endopeptidase K/metabolism Hydrogen-Ion Concentration Ions/chemistry Peptides/analysis Peptides/chemistry Peptides/metabolism Rats Rats, Sprague-Dawley Spectrum Analysis Collapse Grants R01 MH067880 NIMH NIH HHS R01 GM051387 NIGMS NIH HHS RR11823-08 NCRR NIH HHS P41 RR011823-08 NCRR NIH HHS P41 RR011823 NCRR NIH HHS R33 CA81665 NCI NIH HHS R01 GM51387 NIGMS NIH HHS R01 MH067880-01 NIMH NIH HHS R01MH067880 NIMH NIH HHS R01 GM051387-09A1 NIGMS NIH HHS R33 CA081665-04 NCI NIH HHS Collapse
112	McDonald WH, Tabb DL, Sadygov RG, MacCoss MJ, Venable J, Graumann J, Johnson JR, Cociorva D, Yates JR. MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2004;18:2162-2168. [PMID: 15317041 DOI: 10.1002/rcm.1603] [Citation(s) in RCA: 303] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023] Abstract As the speed with which proteomic labs generate data increases along with the scale of projects they are undertaking, the resulting data storage and data processing problems will continue to challenge computational resources. This is especially true for shotgun proteomic techniques that can generate tens of thousands of spectra per instrument each day. One design factor leading to many of these problems is caused by storing spectra and the database identifications for a given spectrum as individual files. While these problems can be addressed by storing all of the spectra and search results in large relational databases, the infrastructure to implement such a strategy can be beyond the means of academic labs. We report here a series of unified text file formats for storing spectral data (MS1 and MS2) and search results (SQT) that are compact, easily parsed by both machine and humans, and yet flexible enough to be coupled with new algorithms and data-mining strategies. Collapse Key Words Collapse MESH Headings Database Management Systems Databases, Protein Documentation Electronic Data Processing Information Storage and Retrieval/methods Mass Spectrometry/methods Proteins/analysis Proteins/chemistry Proteome/analysis Proteome/chemistry Sequence Analysis, Protein/methods Collapse Grants Collapse
113	Tabb DL, MacCoss MJ, Wu CC, Anderson SD, Yates JR. Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. Anal Chem 2003;75:2470-7. [PMID: 12918992 DOI: 10.1021/ac026424o] [Citation(s) in RCA: 130] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Abstract Liquid chromatography paired with tandem mass spectrometry is a standard technique for identifying peptides from complex protein mixtures. Most fragment ion spectra acquired by this technique are unique, but some are repeated. Similarities among the spectra from 1D and 2D liquid chromatography experiments were calculated by the dot product algorithm. Similar spectra were grouped, and the degree of duplication was calculated for each sample. In 1D liquid chromatography data from 1D gel bands, 18% of the fragment ion spectra were duplicates. A six-cycle 2D liquid chromatographic separation of more than 200 proteins produced 28% duplicate spectra. A rat hippocampal homogenate analyzed by a 12-cycle 2D liquid chromatographic separation contained 25% duplicate spectra. Removal of these duplicate spectra, however, resulted in fewer peptides being successfully identified by SEQUEST. We propose a modification for peptide identification algorithms that would improve their performance and accuracy by explicitly recognizing and making use of spectral similarity. Collapse Key Words Collapse MESH Headings Algorithms Animals Hippocampus/chemistry Mass Spectrometry/methods Peptide Fragments/analysis Proteome/analysis Proteomics/methods Rats Collapse Grants F32 DK59731 NIDDK NIH HHS R33 CA81665 NCI NIH HHS RR11 823 NCRR NIH HHS Collapse
114	Breci LA, Tabb DL, Yates JR, Wysocki VH. Cleavage N-terminal to proline: analysis of a database of peptide tandem mass spectra. Anal Chem 2003;75:1963-71. [PMID: 12720328 DOI: 10.1021/ac026359i] [Citation(s) in RCA: 234] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Abstract Fragmentation at the Xxx-Pro bond was analyzed for a group of peptide mass spectra that were acquired in a Finnigan ion trap mass spectrometer and were generated from proteins digested by enzymes and identified by the Sequest algorithm. Cleavage with formation of a + b + y ions occurred more readily at the Xxx-Pro bond than at other locations in these peptides, and the importance of this cleavage varied by the identity of Xxx. The most abundant Xxx-Pro relative bond cleavage ratios were observed when Xxx was Val, His, Asp, Ile, and Leu, whereas the least abundant cleavage ratios occurred when Xxx was Gly or Pro. Rationalization for these cleavage ratios at Xxx-Pro may include contribution of the Asp or His side chain to enhanced cleavage or the conformation of Pro, Gly, and the aliphatic residues Val, Ile, and Leu at the Xxx location in the Xxx-Pro bond. Although unusual fragmentation behavior has been noted for Pro-containing peptides, this analysis suggests that fragmentation at the Xxx-Pro bond is predictable and that this information may be used to improve the identification of proteins if it is incorporated into peptide sequencing algorithms. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Gas Chromatography-Mass Spectrometry Molecular Sequence Data Peptides/chemistry Proline/chemistry Spectrometry, Mass, Electrospray Ionization Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization Collapse Grants GM R0151387 NIGMS NIH HHS Collapse
115	Lin D, Tabb DL, Yates JR. Large-scale protein identification using mass spectrometry. BIOCHIMICA ET BIOPHYSICA ACTA 2003;1646:1-10. [PMID: 12637006 DOI: 10.1016/s1570-9639(02)00546-0] [Citation(s) in RCA: 141] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Abstract Recent achievements in genomics have created an infrastructure of biological information. The enormous success of genomics promptly induced a subsequent explosion in proteomics technology, the emerging science for systematic study of proteins in complexes, organelles, and cells. Proteomics is developing powerful technologies to identify proteins, to map proteomes in cells, to quantify the differential expression of proteins under different states, and to study aspects of protein-protein interaction. The dynamic nature of protein expression, protein interactions, and protein modifications requires measurement as a function of time and cellular state. These types of studies require many measurements and thus high throughput protein identification is essential. This review will discuss aspects of mass spectrometry with emphasis on methods and applications for large-scale protein identification, a fundamental tool for proteomics. Collapse Key Words Collapse MESH Headings Chromatography, Liquid Electrophoresis, Gel, Two-Dimensional Mass Spectrometry/methods Proteins/analysis Proteins/chemistry Proteomics/methods Spectrometry, Mass, Electrospray Ionization Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization Collapse Grants R33CA8165-01 NCI NIH HHS RR11823-05 NCRR NIH HHS Collapse
116	Tabb DL, Smith LL, Breci LA, Wysocki VH, Lin D, Yates JR. Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. Anal Chem 2003;75:1155-63. [PMID: 12641236 PMCID: PMC2819022 DOI: 10.1021/ac026122m] [Citation(s) in RCA: 205] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract Collision-induced dissociation (CID) is a common ion activation technique used to energize mass-selected peptide ions during tandem mass spectrometry. Characteristic fragment ions form from the cleavage of amide bonds within a peptide undergoing CID, allowing the inference of its amino acid sequence. The statistical characterization of these fragment ions is essential for improving peptide identification algorithms and for understanding the complex reactions taking place during CID. An examination of 1465 ion trap spectra from doubly charged tryptic peptides reveals several trends important to understanding this fragmentation process. While less abundant than y ions, b ions are present in sufficient numbers to aid sequencing algorithms. Fragment ions exhibit a characteristic series-specific relationship between their masses and intensities. Each residue influences fragmentation at adjacent amide bonds, with Pro quantifiably enhancing cleavage at its N-terminal amide bond and His increasing the formation of b ions at its C-terminal amide bond. Fragment ions corresponding to a formal loss of ammonia appear preferentially in peptides containing Gln and Asn. These trends are partially responsible for the complexity of peptide tandem mass spectra. Collapse Key Words Collapse MESH Headings Mass Spectrometry Peptides/chemistry Protein Hydrolysates/chemistry Saccharomyces cerevisiae/chemistry Trypsin/chemistry Collapse Grants R33 CA081665-04 NCI NIH HHS R33 CA81665-04 NCI NIH HHS Collapse
117	Muhammad WT, Tabb DL, Fox KF, Fox A. Automated discrimination of polymerase chain reaction products with closely related sequences by software-based detection of characteristic peaks in product ion spectra. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2003;17:2755-2762. [PMID: 14673823 DOI: 10.1002/rcm.1262] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023] Abstract A computer-based method is described for automated detection of peaks in product ion spectra that allows discrimination of structurally related polymerase chain reaction (PCR) products. PCR products of K-ras mutants having single nucleotide substitutions and isomeric sequence changes in positions 1 and 2 of codon 12 (e.g. TGT and GTT) were used as a model system. SpecDiff, a tool for differentiating pairs of mass spectra by identifying peaks that either differ in relative intensity between spectra or only appear in one of a pair of spectra, was created to help automate detection. This program was demonstrated to have great utility in detection of mutations and could also be useful as a general tool for differentiating other molecules of closely related structure. Collapse Key Words Collapse MESH Headings Algorithms Base Sequence Cluster Analysis Codon/analysis Codon/chemistry Molecular Sequence Data Oligonucleotides/analysis Oligonucleotides/chemistry Pattern Recognition, Automated Polymerase Chain Reaction/methods Reproducibility of Results Sensitivity and Specificity Sequence Alignment/methods Sequence Analysis, DNA/methods Spectrometry, Mass, Electrospray Ionization/methods Collapse Grants R33 CA811665 NCI NIH HHS Collapse
118	Florens L, Washburn MP, Raine JD, Anthony RM, Grainger M, Haynes JD, Moch JK, Muster N, Sacci JB, Tabb DL, Witney AA, Wolters D, Wu Y, Gardner MJ, Holder AA, Sinden RE, Yates JR, Carucci DJ. A proteomic view of the Plasmodium falciparum life cycle. Nature 2002;419:520-6. [PMID: 12368866 DOI: 10.1038/nature01107] [Citation(s) in RCA: 935] [Impact Index Per Article: 42.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2002] [Accepted: 09/09/2002] [Indexed: 12/31/2022] Abstract The completion of the Plasmodium falciparum clone 3D7 genome provides a basis on which to conduct comparative proteomics studies of this human pathogen. Here, we applied a high-throughput proteomics approach to identify new potential drug and vaccine targets and to better understand the biology of this complex protozoan parasite. We characterized four stages of the parasite life cycle (sporozoites, merozoites, trophozoites and gametocytes) by multidimensional protein identification technology. Functional profiling of over 2,400 proteins agreed with the physiology of each stage. Unexpectedly, the antigenically variant proteins of var and rif genes, defined as molecules on the surface of infected erythrocytes, were also largely expressed in sporozoites. The detection of chromosomal clusters encoding co-expressed proteins suggested a potential mechanism for controlling gene expression. Collapse Key Words Collapse MESH Headings Animals Antimalarials/pharmacology Chromosomes Erythrocytes/parasitology Female Genome, Protozoan Germ Cells Humans Life Cycle Stages Malaria Vaccines Male Plasmodium falciparum/genetics Plasmodium falciparum/growth & development Plasmodium falciparum/physiology Proteome Protozoan Proteins/genetics Protozoan Proteins/physiology Collapse Grants MC_U117532067 Medical Research Council Collapse
119	Tabb DL, McDonald WH, Yates JR. DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. J Proteome Res 2002;1:21-6. [PMID: 12643522 PMCID: PMC2811961 DOI: 10.1021/pr015504q] [Citation(s) in RCA: 1098] [Impact Index Per Article: 49.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Abstract The components of complex peptide mixtures can be separated by liquid chromatography, fragmented by tandem mass spectrometry, and identified by the SEQUEST algorithm. Inferring a mixture's source proteins requires that the identified peptides be reassociated. This process becomes more challenging as the number of peptides increases. DTASelect, a new software package, assembles SEQUEST identifications and highlights the most significant matches. The accompanying Contrast tool compares DTASelect results from multiple experiments. The two programs improve the speed and precision of proteomic data analysis. Collapse Key Words proteomics protein identification sequest data analysis software differential analysis subtractive analysis mudpit assembly Collapse MESH Headings Algorithms Amino Acid Sequence Mass Spectrometry Molecular Sequence Data Peptides/analysis Proteins/analysis Proteomics Software Collapse Grants P41 RR011823-04 NCRR NIH HHS P41 RR011823 NCRR NIH HHS R33CA81665 NCI NIH HHS R33 CA081665-04 NCI NIH HHS RR11823 NCRR NIH HHS Collapse
120	Tabb DL, Eng JK, Yates JR. Protein Identification by SEQUEST. PROTEOME RESEARCH: MASS SPECTROMETRY 2001. [DOI: 10.1007/978-3-642-56895-4_7] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
121	Krahmer MT, Walters JJ, Fox KF, Fox A, Creek KE, Pirisi L, Wunschel DS, Smith RD, Tabb DL, Yates JR. MS for identification of single nucleotide polymorphisms and MS/MS for discrimination of isomeric PCR products. Anal Chem 2000;72:4033-40. [PMID: 10994962 DOI: 10.1021/ac000142b] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract ESI (electrospray ionization) MS and tandem mass spectrometry (MS/MS) were used for the analysis of single nucleotide polymorphisms (SNPs) and more complex genetic variations. Double-stranded (ds) PCR products were studied. PCR products of the proline [5'-x(G17)-x(C38)x-3'] and arginine variants [(5'-x(Gl7)-x(G38)x-3'] of the p53 gene are distinguished by an SNP (cytosine or guanine) and were discriminated using both quadrupole and quadrupole ion trap MS analysis. A 69 bp arginine mutant PCR product [5'-x(C17)-x(G38)x-3'] with a negating switch has the same mass as the proline variant but was readily distinguishable on ion trap MS/MS analysis; fragments containing the mutation site, but not the polymorphism, were identified. The 69 bp PCR products were restriction-enzyme-digested, to create 43 bp fragments. ESI quadrupole ion trap MS/MS analysis of the 43 bp product-ion spectra readily demonstrated both polymorphism and negating switch sites. MS and MS/MS are powerful and complementary techniques for analysis of DNA. MS can readily distinguish SNPs but MS/MS is required to differentiate isomeric PCR products (same nucleotide composition but different sequence). Collapse Key Words Collapse MESH Headings Mass Spectrometry Polymerase Chain Reaction Polymorphism, Genetic Collapse Grants R21 HG01810-01 NHGRI NIH HHS Collapse
122	Mccord TJ, Smith SC, Tabb DL, Davis AL. The synthesis, configuration, and conformation ofcis- andtrans-3-amino-3,4-dihydro-1-hydroxy-4-methylcarbostyrils and other configurationally related compounds. J Heterocycl Chem 1981. [DOI: 10.1002/jhet.5570180536] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
123	Davis AL, Tabb DL, Swan JK, Mccord TJ. Synthesis of the 3-methyl and 4-methyl derivatives of 3-amino-3,4-dihydro-1-hydroxycarbostyril and related compounds. J Heterocycl Chem 1980. [DOI: 10.1002/jhet.5570170711] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse