101
|
Xu C, Ma B. Software for computational peptide identification from MS-MS data. Drug Discov Today 2007; 11:595-600. [PMID: 16793527 DOI: 10.1016/j.drudis.2006.05.011] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2005] [Revised: 04/07/2006] [Accepted: 05/16/2006] [Indexed: 01/22/2023]
Abstract
Protein identification in biological samples is an important task in drug discovery research. Protein identification is nowadays regularly performed by tandem mass spectrometry (MS-MS). Because of the difficulty of measuring intact proteins using MS-MS, typically a protein is enzymically digested into peptides and the MS-MS spectrum of each peptide is measured. Computational methods are then invoked to identify the peptides, which are later combined together to identify the protein. The most recognized peptide identification software packages can be classified into four categories: database searching, de novo sequencing, sequence tagging and consensus of multiple engines.
Collapse
Affiliation(s)
- Changjiang Xu
- Department of Computer Science, University of Western Ontario, London, Ontario N6A 5B7, Canada
| | | |
Collapse
|
102
|
Bandeira N. Spectral networks: a new approach to de novo discovery of protein sequences and posttranslational modifications. Biotechniques 2007; 42:687, 689, 691 passim. [PMID: 17612289 DOI: 10.2144/000112487] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Significant technological advances have accelerated high-throughput proteomics to the automated generation of millions of tandem mass spectra on a daily basis. In such a setup, the desire for greater sequence coverage combines with standard experimental procedures to commonly yield multiple tandem mass spectra from overlapping peptides—typical observations include peptides differing by one or two terminal amino acids and spectra from modified and unmodified variants of the same peptides. In a departure from the traditional spectrum identification algorithms that analyze each tandem mass spectrum in isolation, spectral networks define a new computational approach that instead finds and simultaneously interprets sets of spectra from overlapping peptides. In shotgun protein sequencing, spectral networks capitalize on the redundant sequence information in the aligned spectra to deliver the longest and most accurate de novo sequences ever reported for ion trap data. Also, by combining spectra from multiple modified and unmodified variants of the same peptides, spectral networks are able to bypass the dominant guess/confirm approach to the identification of posttranslational modifications and alternatively discover modifications and highly modified peptides directly from experimental data. Open-source implementations of these algorithms may be downloaded from peptide.ucsd.edu .
Collapse
|
103
|
Bandeira N, Clauser KR, Pevzner PA. Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins. Mol Cell Proteomics 2007; 6:1123-34. [PMID: 17446555 DOI: 10.1074/mcp.m700001-mcp200] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.
Collapse
Affiliation(s)
- Nuno Bandeira
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA.
| | | | | |
Collapse
|
104
|
DiMaggio PA, Floudas CA. De novo peptide identification via tandem mass spectrometry and integer linear optimization. Anal Chem 2007; 79:1433-46. [PMID: 17297942 PMCID: PMC2730153 DOI: 10.1021/ac0618425] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A novel methodology for the automated de novo identification of peptides via integer linear optimization (also referred to as integer linear programming or ILP) and tandem mass spectrometry is presented in this article. The various features of the mathematical model are presented and examples are used to illustrate the key concepts of the proposed approach. A variety of challenging peptide identification problems, accompanied by a comparative study with five state-of-the-art methods, are examined to illustrate the proposed method's ability to address (a) residue-dependent fragmentation properties that result in missing ion peaks and (b) the variability of resolution in different mass analyzers. A preprocessing algorithm is utilized to identify important m/z values in the tandem mass spectrum. Missing peaks, due to residue-dependent fragmentation characteristics, are dealt with using a two-stage algorithmic framework. A cross-correlation approach is used to resolve missing amino acid assignments and to select the most probable peptide by comparing the theoretical spectra of the candidate sequences that were generated from the ILP sequencing stages with the experimental tandem mass spectrum. The novel, proposed de novo method, denoted as PILOT, is compared to existing popular methods such as Lutefisk, PEAKS, PepNovo, EigenMS, and NovoHMM for a set of spectra resulting from QTOF and ion trap instruments.
Collapse
|
105
|
Liu J, Carrillo B, Yanofsky C, Beaudrie C, Morales F, Kearney R. A novel approach to speed up peptide sequencing via MS/MS spectra analysis. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2005:4441-4. [PMID: 17281222 DOI: 10.1109/iembs.2005.1615452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In proteomics, tandem mass spectrometry is the key technology for peptide sequencing from the cells. Different methods have been proposed to sequence peptides through tandem mass spectra. While the methods are capable of providing more robust and accurate results, they are also computationally expensive, and create a bottleneck in high throughput peptide identification. In this work, we introduce a novel approach to speedup peptide sequencing. In contrast to the traditional approaches, we conduct coarse comparison of spectral profiles to drastically shrink the size of candidate peptides. A fast algorithm has been developed for this goal. It is shown in our experiments that such an approach can significantly improve the speed for peptide sequencing.
Collapse
Affiliation(s)
- J Liu
- Department of Biomedical Engineering, McGill University, Montreal Quebec, Canada.
| | | | | | | | | | | |
Collapse
|
106
|
Liu J, Bell AW, Bergeron JJM, Yanofsky CM, Carrillo B, Beaudrie CEH, Kearney RE. Methods for peptide identification by spectral comparison. Proteome Sci 2007; 5:3. [PMID: 17227583 PMCID: PMC1783643 DOI: 10.1186/1477-5956-5-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2006] [Accepted: 01/16/2007] [Indexed: 11/15/2022] Open
Abstract
Background Tandem mass spectrometry followed by database search is currently the predominant technology for peptide sequencing in shotgun proteomics experiments. Most methods compare experimentally observed spectra to the theoretical spectra predicted from the sequences in protein databases. There is a growing interest, however, in comparing unknown experimental spectra to a library of previously identified spectra. This approach has the advantage of taking into account instrument-dependent factors and peptide-specific differences in fragmentation probabilities. It is also computationally more efficient for high-throughput proteomics studies. Results This paper investigates computational issues related to this spectral comparison approach. Different methods have been empirically evaluated over several large sets of spectra. First, we illustrate that the peak intensities follow a Poisson distribution. This implies that applying a square root transform will optimally stabilize the peak intensity variance. Our results show that the square root did indeed outperform other transforms, resulting in improved accuracy of spectral matching. Second, different measures of spectral similarity were compared, and the results illustrated that the correlation coefficient was most robust. Finally, we examine how to assemble multiple spectra associated with the same peptide to generate a synthetic reference spectrum. Ensemble averaging is shown to provide the best combination of accuracy and efficiency. Conclusion Our results demonstrate that when combined, these methods can boost the sensitivity and specificity of spectral comparison. Therefore they are capable of enhancing and complementing existing tools for consistent and accurate peptide identification.
Collapse
Affiliation(s)
- Jian Liu
- Center for Cellular & Biomolecular Research, University of Toronto, Toronto, Canada
| | | | - John JM Bergeron
- Department of Anatomy and Cell Biology, McGill University, Montreal, Canada
| | - Corey M Yanofsky
- Department of Biomedical Engineering, McGill University, Montreal, Canada
| | - Brian Carrillo
- Department of Biomedical Engineering, McGill University, Montreal, Canada
| | | | - Robert E Kearney
- Department of Biomedical Engineering, McGill University, Montreal, Canada
| |
Collapse
|
107
|
Dimaggio PA, Floudas CA. A Mixed-Integer Optimization Framework for De Novo Peptide Identification. AIChE J 2007; 53:160-173. [PMID: 19412358 DOI: 10.1002/aic.11061] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A novel methodology for the de novo identification of peptides by mixed-integer optimization and tandem mass spectrometry is presented in this article. The various features of the mathematical model are presented and examples are used to illustrate the key concepts of the proposed approach. Several problems are examined to illustrate the proposed method's ability to address (1) residue-dependent fragmentation properties and (2) the variability of resolution in different mass analyzers. A preprocessing algorithm is used to identify important m/z values in the tandem mass spectrum. Missing peaks, resulting from residue-dependent fragmentation characteristics, are dealt with using a two-stage algorithmic framework. A cross-correlation approach is used to resolve missing amino acid assignments and to identify the most probable peptide by comparing the theoretical spectra of the candidate sequences that were generated from the MILP sequencing stages with the experimental tandem mass spectrum.
Collapse
Affiliation(s)
- Peter A Dimaggio
- Dept. of Chemical Engineering, Princeton University, Princeton, NJ 08544
| | | |
Collapse
|
108
|
Frank AM, Savitski MM, Nielsen MN, Zubarev RA, Pevzner PA. De novo peptide sequencing and identification with precision mass spectrometry. J Proteome Res 2007; 6:114-23. [PMID: 17203955 PMCID: PMC2538556 DOI: 10.1021/pr060271u] [Citation(s) in RCA: 143] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The recent proliferation of novel mass spectrometers such as Fourier transform, QTOF, and OrbiTrap marks a transition into the era of precision mass spectrometry, providing a 2 orders of magnitude boost to the mass resolution, as compared to low-precision ion-trap detectors. We investigate peptide de novo sequencing by precision mass spectrometry and explore some of the differences when compared to analysis of low-precision data. We demonstrate how the dramatically improved performance of de novo sequencing with precision mass spectrometry paves the way for novel approaches to peptide identification that are based on direct sequence lookups, rather than comparisons of spectra to a database. With the direct sequence lookup, it is not only possible to search a database very efficiently, but also to use the database in novel ways, such as searching for products of alternative splicing or products of fusion proteins in cancer. Our de novo sequencing software is available for download at http://peptide.ucsd.edu/.
Collapse
Affiliation(s)
- Ari M. Frank
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, California 92093-0404
| | - Mikhail M. Savitski
- Laboratory for Biological and Medical Mass Spectrometry, Uppsala University, Uppsala, Sweden
| | - Michael N. Nielsen
- Laboratory for Biological and Medical Mass Spectrometry, Uppsala University, Uppsala, Sweden
| | - Roman A. Zubarev
- Laboratory for Biological and Medical Mass Spectrometry, Uppsala University, Uppsala, Sweden
| | - Pavel A. Pevzner
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, California 92093-0404
| |
Collapse
|
109
|
Pevtsov S, Fedulova I, Mirzaei H, Buck C, Zhang X. Performance Evaluation of ExistingDe NovoSequencing Algorithms. J Proteome Res 2006; 5:3018-28. [PMID: 17081053 DOI: 10.1021/pr060222h] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Two methods have been developed for protein identification from tandem mass spectra: database searching and de novo sequencing. De novo sequencing identifies peptide directly from tandem mass spectra. Among many proposed algorithms, we evaluated the performance of the five de novo sequencing algorithms, AUDENS, Lutefisk, NovoHMM, PepNovo, and PEAKS. Our evaluation methods are based on calculation of relative sequence distance (RSD), algorithm sensitivity, and spectrum quality. We found that de novo sequencing algorithms have different performance in analyzing QSTAR and LCQ mass spectrometer data, but in general, perform better in analyzing QSTAR data than LCQ data. For the QSTAR data, the performance order of the five algorithms is PEAKS > Lutefisk, PepNovo > AUDENS, NovoHMM. The performance of PEAKS, Lutefisk, and PepNovo strongly depends on the spectrum quality and increases with an increase of spectrum quality. However, AUDENS and NovoHMM are not sensitive to the spectrum quality. Compared with other four algorithms, PEAKS has the best sensitivity and also has the best performance in the entire range of spectrum quality. For the LCQ data, the performance order is NovoHMM > PepNovo, PEAKS > Lutefisk > AUDENS. NovoHMM has the best sensitivity, and its performance is the best in the entire range of spectrum quality. But the overall performance of NovoHMM is not significantly different from the performance of PEAKS and PepNovo. AUDENS does not give a good performance in analyzing either QSTAR and LCQ data.
Collapse
Affiliation(s)
- Sergey Pevtsov
- Department of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Moscow 119992
| | | | | | | | | |
Collapse
|
110
|
Bakhtiar R, Guan Z. Electron Capture Dissociation Mass Spectrometry in Characterization of Peptides and Proteins. Biotechnol Lett 2006; 28:1047-59. [PMID: 16794768 DOI: 10.1007/s10529-006-9065-z] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2006] [Accepted: 03/29/2006] [Indexed: 10/24/2022]
Abstract
Electron capture dissociation (ECD) represents one of the most recent and significant advancements in tandem mass spectrometry (MS/MS) for the identification and characterization of polypeptides. In comparison with the conventional fragmentation techniques, such as collisionally activated dissociation (CAD), ECD provides more extensive sequence fragments, while allowing the labile modifications to remain intact during backbone fragmentation--an important attribute for characterizing post-translational modifications. Herein, we present a brief overview of the ECD technique as well as selected applications in characterization of peptides and proteins. Case studies including characterization and localization of amino acid glycosylation, methionine oxidation, acylation, and "top-down" protein mass spectrometry using ECD will be presented. A recent technique, coined as electron transfer dissociation (ETD), will be also discussed briefly.
Collapse
Affiliation(s)
- Ray Bakhtiar
- Merck Research Laboratories, Rahway, NJ 07065, USA.
| | | |
Collapse
|