1
|
de Graaf SC, Hoek M, Tamara S, Heck AJR. A perspective toward mass spectrometry-based de novo sequencing of endogenous antibodies. MAbs 2022; 14:2079449. [PMID: 35699511 PMCID: PMC9225641 DOI: 10.1080/19420862.2022.2079449] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
A key step in therapeutic and endogenous humoral antibody characterization is identifying the amino acid sequence. So far, this task has been mainly tackled through sequencing of B-cell receptor (BCR) repertoires at the nucleotide level. Mass spectrometry (MS) has emerged as an alternative tool for obtaining sequence information directly at the – most relevant – protein level. Although several MS methods are now well established, analysis of recombinant and endogenous antibodies comes with a specific set of challenges, requiring approaches beyond the conventional proteomics workflows. Here, we review the challenges in MS-based sequencing of both recombinant as well as endogenous humoral antibodies and outline state-of-the-art methods attempting to overcome these obstacles. We highlight recent examples and discuss remaining challenges. We foresee a great future for these approaches making de novo antibody sequencing and discovery by MS-based techniques feasible, even for complex clinical samples from endogenous sources such as serum and other liquid biopsies.
Collapse
Affiliation(s)
- Sebastiaan C de Graaf
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Max Hoek
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Sem Tamara
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Albert J R Heck
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| |
Collapse
|
2
|
Takan S, Allmer J. DNMSO; an ontology for representing de novo sequencing results from Tandem-MS data. PeerJ 2020; 8:e10216. [PMID: 33150092 PMCID: PMC7585381 DOI: 10.7717/peerj.10216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 09/28/2020] [Indexed: 11/20/2022] Open
Abstract
For the identification and sequencing of proteins, mass spectrometry (MS) has become the tool of choice and, as such, drives proteomics. MS/MS spectra need to be assigned a peptide sequence for which two strategies exist. Either database search or de novo sequencing can be employed to establish peptide spectrum matches. For database search, mzIdentML is the current community standard for data representation. There is no community standard for representing de novo sequencing results, but we previously proposed the de novo markup language (DNML). At the moment, each de novo sequencing solution uses different data representation, complicating downstream data integration, which is crucial since ensemble predictions may be more useful than predictions of a single tool. We here propose the de novo MS Ontology (DNMSO), which can, for example, provide many-to-many mappings between spectra and peptide predictions. Additionally, an application programming interface (API) that supports any file operation necessary for de novo sequencing from spectra input to reading, writing, creating, of the DNMSO format, as well as conversion from many other file formats, has been implemented. This API removes all overhead from the production of de novo sequencing tools and allows developers to concentrate on algorithm development completely. We make the API and formal descriptions of the format freely available at https://github.com/savastakan/dnmso.
Collapse
Affiliation(s)
- Savaş Takan
- Department of Computer Engineering, Faculty of Engineering, Izmir Institute of Technology, Izmir, Turkey
| | - Jens Allmer
- Hochschule Ruhr West, University of Applied Sciences, Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Mülheim an der Ruhr, Germany
| |
Collapse
|
3
|
Horton AP, Robotham SA, Cannon JR, Holden DD, Marcotte EM, Brodbelt JS. Comprehensive de Novo Peptide Sequencing from MS/MS Pairs Generated through Complementary Collision Induced Dissociation and 351 nm Ultraviolet Photodissociation. Anal Chem 2017; 89:3747-3753. [PMID: 28234449 PMCID: PMC5480239 DOI: 10.1021/acs.analchem.7b00130] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
We describe a strategy for de novo peptide sequencing based on matched pairs of tandem mass spectra (MS/MS) obtained by collision induced dissociation (CID) and 351 nm ultraviolet photodissociation (UVPD). Each precursor ion is isolated twice with the mass spectrometer switching between CID and UVPD activation modes to obtain a complementary MS/MS pair. To interpret these paired spectra, we modified the UVnovo de novo sequencing software to automatically learn from and interpret fragmentation spectra, provided a representative set of training data. This machine learning procedure, using random forests, synthesizes information from one or multiple complementary spectra, such as the CID/UVPD pairs, into peptide fragmentation site predictions. In doing so, the burden of fragmentation model definition shifts from programmer to machine and opens up the model parameter space for inclusion of nonobvious features and interactions. This spectral synthesis also serves to transform distinct types of spectra into a common representation for subsequent activation-independent processing steps. Then, independent from precursor activation constraints, UVnovo's de novo sequencing procedure generates and scores sequence candidates for each precursor. We demonstrate the combined experimental and computational approach for de novo sequencing using whole cell E. coli lysate. In benchmarks on the CID/UVPD data, UVnovo assigned correct full-length sequences to 83% of the spectral pairs of doubly charged ions with high-confidence database identifications. Considering only top-ranked de novo predictions, 70% of the pairs were deciphered correctly. This de novo sequencing performance exceeds that of PEAKS and PepNovo on the CID spectra and that of UVnovo on CID or UVPD spectra alone. As presented here, the methods for paired CID/UVPD spectral acquisition and interpretation constitute a powerful workflow for high-throughput and accurate de novo peptide sequencing.
Collapse
Affiliation(s)
- Andrew P Horton
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas , Austin, Texas 78712, United States
| | - Scott A Robotham
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| | - Joe R Cannon
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| | - Dustin D Holden
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas , Austin, Texas 78712, United States
| | - Jennifer S Brodbelt
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| |
Collapse
|
4
|
Ma B. De novo Peptide Sequencing. PROTEOME INFORMATICS 2016:15-38. [DOI: 10.1039/9781782626732-00015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
De novo peptide sequencing refers to the process of determining a peptide’s amino acid sequence from its MS/MS spectrum alone. The principle of this process is fairly straightforward: a high-quality spectrum may present a ladder of fragment ion peaks. The mass difference between every two adjacent peaks in the ladder is used to determine a residue of the peptide. However, most practical spectra do not have sufficient quality to support this straightforward process. Therefore, research in de novo sequencing has largely been a battle against the errors in the data. This chapter reviews some of the major developments in this field. The chapter starts with a quick review of the history in Section 1. Then manual de novo sequencing is examined in Section 2. Section 3 introduces a few commonly used de novo sequencing algorithms. An important aspect of automated de novo sequencing software is a good scoring function that serves as the optimization goal of the algorithm. Thus, Section 4 is devoted for the methods to define good scoring functions. Section 5 reviews a list of relevant software. The chapter concludes with a discussion of the applications and limitations of de novosequencing in Section 6.
Collapse
Affiliation(s)
- Bin Ma
- School of Computer Science, University of Waterloo Canada
| |
Collapse
|
5
|
Guthals A, Gan Y, Murray L, Chen Y, Stinson J, Nakamura G, Lill JR, Sandoval W, Bandeira N. De Novo MS/MS Sequencing of Native Human Antibodies. J Proteome Res 2016; 16:45-54. [PMID: 27779884 DOI: 10.1021/acs.jproteome.6b00608] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
One direct route for the discovery of therapeutic human monoclonal antibodies (mAbs) involves the isolation of peripheral B cells from survivors/sero-positive individuals after exposure to an infectious reagent or disease etiology, followed by single-cell sequencing or hybridoma generation. Peripheral B cells, however, are not always easy to obtain and represent only a small percentage of the total B-cell population across all bodily tissues. Although it has been demonstrated that tandem mass spectrometry (MS/MS) techniques can interrogate the full polyclonal antibody (pAb) response to an antigen in vivo, all current approaches identify MS/MS spectra against databases derived from genetic sequencing of B cells from the same patient. In this proof-of-concept study, we demonstrate the feasibility of a novel MS/MS antibody discovery approach in which only serum antibodies are required without the need for sequencing of genetic material. Peripheral pAbs from a cytomegalovirus-exposed individual were purified by glycoprotein B antigen affinity and de novo sequenced from MS/MS data. Purely MS-derived mAbs were then manufactured in mammalian cells to validate potency via antigen-binding ELISA. Interestingly, we found that these mAbs accounted for 1 to 2% of total donor IgG but were not detected in parallel sequencing of memory B cells from the same patient.
Collapse
Affiliation(s)
- Adrian Guthals
- Mapp Biopharmaceutical, Inc. , 6160 Lusk Boulevard #C105, San Diego, California 92121, United States
| | - Yutian Gan
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Laura Murray
- Department of Protein Chemistry, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Yongmei Chen
- Department of Antibody Engineering, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Jeremy Stinson
- Department of Molecular Biology, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Gerald Nakamura
- Department of Antibody Engineering, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Jennie R Lill
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Wendy Sandoval
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California, San Diego , 9500 Gilman Drive, Mail Code 0404, La Jolla, California 92093, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego , 9500 Gilman Drive, Mail Code 0657, La Jolla, California 92093, United States
| |
Collapse
|
6
|
Robotham SA, Horton AP, Cannon JR, Cotham VC, Marcotte EM, Brodbelt JS. UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry. Anal Chem 2016; 88:3990-7. [PMID: 26938041 PMCID: PMC4850734 DOI: 10.1021/acs.analchem.6b00261] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
De novo peptide sequencing by mass spectrometry represents an important strategy for characterizing novel peptides and proteins, in which a peptide's amino acid sequence is inferred directly from the precursor peptide mass and tandem mass spectrum (MS/MS or MS(3)) fragment ions, without comparison to a reference proteome. This method is ideal for organisms or samples lacking a complete or well-annotated reference sequence set. One of the major barriers to de novo spectral interpretation arises from confusion of N- and C-terminal ion series due to the symmetry between b and y ion pairs created by collisional activation methods (or c, z ions for electron-based activation methods). This is known as the "antisymmetric path problem" and leads to inverted amino acid subsequences within a de novo reconstruction. Here, we combine several key strategies for de novo peptide sequencing into a single high-throughput pipeline: high-efficiency carbamylation blocks lysine side chains, and subsequent tryptic digestion and N-terminal peptide derivatization with the ultraviolet chromophore AMCA yield peptides susceptible to 351 nm ultraviolet photodissociation (UVPD). UVPD-MS/MS of the AMCA-modified peptides then predominantly produces y ions in the MS/MS spectra, specifically addressing the antisymmetric path problem. Finally, the program UVnovo applies a random forest algorithm to automatically learn from and then interpret UVPD mass spectra, passing results to a hidden Markov model for de novo sequence prediction and scoring. We show this combined strategy provides high-performance de novo peptide sequencing, enabling the de novo sequencing of thousands of peptides from an Escherichia coli lysate at high confidence.
Collapse
Affiliation(s)
- Scott A Robotham
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| | - Andrew P Horton
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas , Austin, Texas 78712, United States
| | - Joe R Cannon
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| | - Victoria C Cotham
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas , Austin, Texas 78712, United States
| | - Jennifer S Brodbelt
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| |
Collapse
|
7
|
Devabhaktuni A, Elias JE. Application of de Novo Sequencing to Large-Scale Complex Proteomics Data Sets. J Proteome Res 2016; 15:732-42. [PMID: 26743026 DOI: 10.1021/acs.jproteome.5b00861] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Dependent on concise, predefined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large-scale proteomics data sets and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) that leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to that of other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches.
Collapse
Affiliation(s)
- Arun Devabhaktuni
- Department of Chemical & Systems Biology, Stanford University , Stanford, California 94035, United States
| | - Joshua E Elias
- Department of Chemical & Systems Biology, Stanford University , Stanford, California 94035, United States
| |
Collapse
|
8
|
Sunagar K, Morgenstern D, Reitzel AM, Moran Y. Ecological venomics: How genomics, transcriptomics and proteomics can shed new light on the ecology and evolution of venom. J Proteomics 2015; 135:62-72. [PMID: 26385003 DOI: 10.1016/j.jprot.2015.09.015] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Revised: 09/02/2015] [Accepted: 09/09/2015] [Indexed: 01/18/2023]
Abstract
Animal venom is a complex cocktail of bioactive chemicals that traditionally drew interest mostly from biochemists and pharmacologists. However, in recent years the evolutionary and ecological importance of venom is realized as this trait has direct and strong influence on interactions between species. Moreover, venom content can be modulated by environmental factors. Like many other fields of biology, venom research has been revolutionized in recent years by the introduction of systems biology approaches, i.e., genomics, transcriptomics and proteomics. The employment of these methods in venom research is known as 'venomics'. In this review we describe the history and recent advancements of venomics and discuss how they are employed in studying venom in general and in particular in the context of evolutionary ecology. We also discuss the pitfalls and challenges of venomics and what the future may hold for this emerging scientific field.
Collapse
Affiliation(s)
- Kartik Sunagar
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - David Morgenstern
- Proteomics Resource Center, Langone Medical Center, New York University, New York, USA.
| | - Adam M Reitzel
- Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Yehu Moran
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Hebrew University of Jerusalem, Jerusalem 91904, Israel.
| |
Collapse
|
9
|
Medzihradszky KF, Chalkley RJ. Lessons in de novo peptide sequencing by tandem mass spectrometry. MASS SPECTROMETRY REVIEWS 2015; 34:43-63. [PMID: 25667941 PMCID: PMC4367481 DOI: 10.1002/mas.21406] [Citation(s) in RCA: 137] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Mass spectrometry has become the method of choice for the qualitative and quantitative characterization of protein mixtures isolated from all kinds of living organisms. The raw data in these studies are MS/MS spectra, usually of peptides produced by proteolytic digestion of a protein. These spectra are "translated" into peptide sequences, normally with the help of various search engines. Data acquisition and interpretation have both been automated, and most researchers look only at the summary of the identifications without ever viewing the underlying raw data used for assignments. Automated analysis of data is essential due to the volume produced. However, being familiar with the finer intricacies of peptide fragmentation processes, and experiencing the difficulties of manual data interpretation allow a researcher to be able to more critically evaluate key results, particularly because there are many known rules of peptide fragmentation that are not incorporated into search engine scoring. Since the most commonly used MS/MS activation method is collision-induced dissociation (CID), in this article we present a brief review of the history of peptide CID analysis. Next, we provide a detailed tutorial on how to determine peptide sequences from CID data. Although the focus of the tutorial is de novo sequencing, the lessons learned and resources supplied are useful for data interpretation in general.
Collapse
|
10
|
Abstract
Motivation: Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but de novo peptide sequencing algorithms to analyze tandem mass (MS/MS) spectra are lagging behind. Although existing de novo sequencing tools perform well on certain types of spectra [e.g. Collision Induced Dissociation (CID) spectra of tryptic peptides], their performance often deteriorates on other types of spectra, such as Electron Transfer Dissociation (ETD), Higher-energy Collisional Dissociation (HCD) spectra or spectra of non-tryptic digests. Thus, rather than developing a new algorithm for each type of spectra, we develop a universal de novo sequencing algorithm called UniNovo that works well for all types of spectra or even for spectral pairs (e.g. CID/ETD spectral pairs). UniNovo uses an improved scoring function that captures the dependences between different ion types, where such dependencies are learned automatically using a modified offset frequency function. Results: The performance of UniNovo is compared with PepNovo+, PEAKS and pNovo using various types of spectra. The results show that the performance of UniNovo is superior to other tools for ETD spectra and superior or comparable with others for CID and HCD spectra. UniNovo also estimates the probability that each reported reconstruction is correct, using simple statistics that are readily obtained from a small training dataset. We demonstrate that the estimation is accurate for all tested types of spectra (including CID, HCD, ETD, CID/ETD and HCD/ETD spectra of trypsin, LysC or AspN digested peptides). Availability: UniNovo is implemented in JAVA and tested on Windows, Ubuntu and OS X machines. UniNovo is available at http://proteomics.ucsd.edu/Software/UniNovo.html along with the manual. Contact:kwj@ucsd.edu or ppevzner@ucsd.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kyowon Jeong
- Department of Electrical and Computer Engineering and Department of Computer Science and Engineering, University of California-San Diego, CA 92093, USA.
| | | | | |
Collapse
|
11
|
Guthals A, Clauser KR, Frank AM, Bandeira N. Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. J Proteome Res 2013; 12:2846-57. [PMID: 23679345 DOI: 10.1021/pr400173d] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditional methods that sequence spectra individually are limited by short peptide length, incomplete peptide fragmentation, and ambiguous de novo interpretations. We address these issues by determining consensus sequences for assembled tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using multiple enzymatic digests). We have combined electron-transfer dissociation (ETD) with collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD) fragmentation methods to boost interpretation of long, highly charged peptides and take advantage of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies, we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides yield de novo sequences of average length 70 AA and as long as 200 AA at up to 99% sequencing accuracy.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, United States
| | | | | | | |
Collapse
|
12
|
Van Riper SK, de Jong EP, Carlis JV, Griffin TJ. Mass Spectrometry-Based Proteomics: Basic Principles and Emerging Technologies and Directions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 990:1-35. [DOI: 10.1007/978-94-007-5896-4_1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
13
|
Chi H, Chen H, He K, Wu L, Yang B, Sun RX, Liu J, Zeng WF, Song CQ, He SM, Dong MQ. pNovo+: De Novo Peptide Sequencing Using Complementary HCD and ETD Tandem Mass Spectra. J Proteome Res 2012; 12:615-25. [DOI: 10.1021/pr3006843] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haifeng Chen
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kun He
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Long Wu
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bing Yang
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Jianyun Liu
- Laboratory of Intelligent Recognition
and Image Processing, Beijing Key Laboratory of Digital Media, Beihang University, Beijing, 100191, China
| | - Wen-Feng Zeng
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chun-Qing Song
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Si-Min He
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| |
Collapse
|
14
|
Guthals A, Bandeira N. Peptide identification by tandem mass spectrometry with alternate fragmentation modes. Mol Cell Proteomics 2012; 11:550-7. [PMID: 22595789 PMCID: PMC3434779 DOI: 10.1074/mcp.r112.018556] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Revised: 05/04/2012] [Indexed: 11/06/2022] Open
Abstract
The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California, San Diego, California, USA
| | | |
Collapse
|
15
|
Bhatia S, Kil YJ, Ueberheide B, Chait BT, Tayo L, Cruz L, Lu B, Yates JR, Bern M. Constrained de novo sequencing of conotoxins. J Proteome Res 2012; 11:4191-200. [PMID: 22709442 DOI: 10.1021/pr300312h] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
De novo peptide sequencing by mass spectrometry (MS) can determine the amino acid sequence of an unknown peptide without reference to a protein database. MS-based de novo sequencing assumes special importance in focused studies of families of biologically active peptides and proteins, such as hormones, toxins, and antibodies, for which amino acid sequences may be difficult to obtain through genomic methods. These protein families often exhibit sequence homology or characteristic amino acid content; yet, current de novo sequencing approaches do not take advantage of this prior knowledge and, hence, search an unnecessarily large space of possible sequences. Here, we describe an algorithm for de novo sequencing that incorporates sequence constraints into the core graph algorithm and thereby reduces the search space by many orders of magnitude. We demonstrate our algorithm in a study of cysteine-rich toxins from two cone snail species (Conus textile and Conus stercusmuscarum) and report 13 de novo and about 60 total toxins.
Collapse
Affiliation(s)
- Swapnil Bhatia
- Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, California 94304, United States
| | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Allmer J. Algorithms for the de novo sequencing of peptides from tandem mass spectra. Expert Rev Proteomics 2012; 8:645-57. [PMID: 21999834 DOI: 10.1586/epr.11.54] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field.
Collapse
Affiliation(s)
- Jens Allmer
- Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir 35430, Turkey.
| |
Collapse
|
17
|
Ma B, Johnson R. De novo sequencing and homology searching. Mol Cell Proteomics 2012; 11:O111.014902. [PMID: 22090170 PMCID: PMC3277775 DOI: 10.1074/mcp.o111.014902] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Revised: 11/08/2011] [Indexed: 11/06/2022] Open
Abstract
In proteomics, de novo sequencing is the process of deriving peptide sequences from tandem mass spectra without the assistance of a sequence database. Such analyses have traditionally been performed manually by human experts, and more recently by computer programs that have been developed because of the need for higher throughput. Although powerful, de novo sequencing often can only determine partially correct sequence tags because of imperfect tandem mass spectra. However, these sequence tags can then be searched in a sequence database to identify the exact or a homologous peptide. Homology searches are particularly useful for the study of organisms whose genomes have not been sequenced. This tutorial will present background important to understanding de novo sequencing, suggestions on how to do this manually, plus descriptions of computer algorithms used to automate this process and to subsequently carryout homology-based database searches. This Tutorial is part of the International Proteomics Tutorial Programme (IPTP 1).
Collapse
Affiliation(s)
- Bin Ma
- From the ‡School of Computer Science, University of Waterloo, 200 University Ave. W, Waterloo, ON, Canada N2L 3G1
| | | |
Collapse
|
18
|
Database independent proteomics analysis of the ostrich and human proteome. Proc Natl Acad Sci U S A 2011; 109:407-12. [PMID: 22198768 DOI: 10.1073/pnas.1108399108] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Mass spectrometry (MS)-based proteome analysis relies heavily on the presence of complete protein databases. Such a strategy is extremely powerful, albeit not adequate in the analysis of unpredicted postgenome events, such as posttranslational modifications, which exponentially increase the search space. Therefore, it is of interest to explore "database-free" approaches. Here, we sampled the ostrich and human proteomes with a method facilitating de novo sequencing, utilizing the protease Lys-N in combination with electron transfer dissociation. By implementing several validation steps, including the combined use of collision-induced dissociation/electron transfer dissociation data and a cross-validation with conventional database search strategies, we identified approximately 2,500 unique de novo peptide sequences from the ostrich sample with over 900 peptides generating full backbone sequence coverage. This dataset allowed the appropriate positioning of ostrich in the evolutionary tree. The described database-free sequencing approach is generically applicable and has great potential in important proteomics applications such as in the analysis of variable parts of endogenous antibodies or proteins modified by a plethora of complex posttranslational modifications.
Collapse
|
19
|
Ning Z, Zhou H, Wang F, Abu-Farha M, Figeys D. Analytical Aspects of Proteomics: 2009–2010. Anal Chem 2011; 83:4407-26. [DOI: 10.1021/ac200857t] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
| | - Hu Zhou
- Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China 201203
| | - Fangjun Wang
- Key Lab of Separation Sciences for Analytical Chemistry, National Chromatographic Research and Analysis Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China 116023
| | | | | |
Collapse
|
20
|
Andreotti S, Klau GW, Reinert K. Antilope--a Lagrangian relaxation approach to the de novo peptide sequencing problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 9:385-394. [PMID: 21464512 DOI: 10.1109/tcbb.2011.59] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Peptide sequencing from mass spectrometry data is a key step in proteome research. Especially de novo sequencing, the identification of a peptide from its spectrum alone, is still a challenge even for state-of-the-art algorithmic approaches. In this paper, we present ANTILOPE, a new fast and flexible approach based on mathematical programming. It builds on the spectrum graph model and works with a variety of scoring schemes. ANTILOPE combines Lagrangian relaxation for solving an integer linear programming formulation with an adaptation of Yen’s k shortest paths algorithm. It shows a significant improvement in running time compared to mixed integer optimization and performs at the same speed like other state-of-the-art tools. We also implemented a generic probabilistic scoring scheme that can be trained automatically for a data set of annotated spectra and is independent of the mass spectrometer type. Evaluations on benchmark data show that ANTILOPE is competitive to the popular state-of-the-art programs PepNovo and NovoHMM both in terms of runtime and accuracy. Furthermore, it offers increased flexibility in the number of considered ion types. ANTILOPE will be freely available as part of the open source proteomics library OpenMS.
Collapse
Affiliation(s)
- Sandro Andreotti
- Freie Universität Berlin, Germany and the International Max Planck Research School for Computational Biology and Scientific Computing, Berlin
| | | | | |
Collapse
|
21
|
SUN HC, ZHANG JY, LIU H, ZHANG W, XU CM, MA HB, ZHU YP, XIE HW. Algorithm Development of de novo Peptide Sequencing Via Tandem Mass Spectrometry. PROG BIOCHEM BIOPHYS 2011. [DOI: 10.3724/sp.j.1206.2010.00226] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
22
|
Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJR, Pevzner PA. The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search. Mol Cell Proteomics 2010; 9:2840-52. [PMID: 20829449 DOI: 10.1074/mcp.m110.003731] [Citation(s) in RCA: 189] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Recent emergence of new mass spectrometry techniques (e.g. electron transfer dissociation, ETD) and improved availability of additional proteases (e.g. Lys-N) for protein digestion in high-throughput experiments raised the challenge of designing new algorithms for interpreting the resulting new types of tandem mass (MS/MS) spectra. Traditional MS/MS database search algorithms such as SEQUEST and Mascot were originally designed for collision induced dissociation (CID) of tryptic peptides and are largely based on expert knowledge about fragmentation of tryptic peptides (rather than machine learning techniques) to design CID-specific scoring functions. As a result, the performance of these algorithms is suboptimal for new mass spectrometry technologies or nontryptic peptides. We recently proposed the generating function approach (MS-GF) for CID spectra of tryptic peptides. In this study, we extend MS-GF to automatically derive scoring parameters from a set of annotated MS/MS spectra of any type (e.g. CID, ETD, etc.), and present a new database search tool MS-GFDB based on MS-GF. We show that MS-GFDB outperforms Mascot for ETD spectra or peptides digested with Lys-N. For example, in the case of ETD spectra, the number of tryptic and Lys-N peptides identified by MS-GFDB increased by a factor of 2.7 and 2.6 as compared with Mascot. Moreover, even following a decade of Mascot developments for analyzing CID spectra of tryptic peptides, MS-GFDB (that is not particularly tailored for CID spectra or tryptic peptides) resulted in 28% increase over Mascot in the number of peptide identifications. Finally, we propose a statistical framework for analyzing multiple spectra from the same precursor (e.g. CID/ETD spectral pairs) and assigning p values to peptide-spectrum-spectrum matches.
Collapse
Affiliation(s)
- Sangtae Kim
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | | | | | | | | | | | | | | |
Collapse
|
23
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 358] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
24
|
Becker CH, Bern M. Recent developments in quantitative proteomics. Mutat Res 2010; 722:171-82. [PMID: 20620221 DOI: 10.1016/j.mrgentox.2010.06.016] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 06/30/2010] [Indexed: 01/01/2023]
Abstract
Proteomics is the study of proteins on a large scale, encompassing the many interests scientists and physicians have in their expression and physical properties. Proteomics continues to be a rapidly expanding field, with a wealth of reports regularly appearing on technology enhancements and scientific studies using these new tools. This review focuses primarily on the quantitative aspect of protein expression and the associated computational machinery for making large-scale identifications of proteins and their post-translational modifications. The primary emphasis is on the combination of liquid chromatography-mass spectrometry (LC-MS) methods and associated tandem mass spectrometry (LC-MS/MS). Tandem mass spectrometry, or MS/MS, involves a second analysis within the instrument after a molecular dissociative event in order to obtain structural information including but not limited to sequence information. This review further focuses primarily on the study of in vitro digested proteins known as bottom-up or shotgun proteomics. A brief discussion of recent instrumental improvements precedes a discussion on affinity enrichment and depletion of proteins, followed by a review of the major approaches (label-free and isotope-labeling) to making protein expression measurements quantitative, especially in the context of profiling large numbers of proteins. Then a discussion follows on the various computational techniques used to identify peptides and proteins from LC-MS/MS data. This review article then includes a short discussion of LC-MS approaches to three-dimensional structure determination and concludes with a section on statistics and data mining for proteomics, including comments on properly powering clinical studies and avoiding over-fitting with large data sets.
Collapse
|
25
|
Chi H, Sun RX, Yang B, Song CQ, Wang LH, Liu C, Fu Y, Yuan ZF, Wang HP, He SM, Dong MQ. pNovo: de novo peptide sequencing and identification using HCD spectra. J Proteome Res 2010; 9:2713-24. [PMID: 20329752 DOI: 10.1021/pr100182k] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
De novo peptide sequencing has improved remarkably in the past decade as a result of better instruments and computational algorithms. However, de novo sequencing can correctly interpret only approximately 30% of high- and medium-quality spectra generated by collision-induced dissociation (CID), which is much less than database search. This is mainly due to incomplete fragmentation and overlap of different ion series in CID spectra. In this study, we show that higher-energy collisional dissociation (HCD) is of great help to de novo sequencing because it produces high mass accuracy tandem mass spectrometry (MS/MS) spectra without the low-mass cutoff associated with CID in ion trap instruments. Besides, abundant internal and immonium ions in the HCD spectra can help differentiate similar peptide sequences. Taking advantage of these characteristics, we developed an algorithm called pNovo for efficient de novo sequencing of peptides from HCD spectra. pNovo gave correct identifications to 80% or more of the HCD spectra identified by database search. The number of correct full-length peptides sequenced by pNovo is comparable with that obtained by database search. A distinct advantage of de novo sequencing is that deamidated peptides and peptides with amino acid mutations can be identified efficiently without extra cost in computation. In summary, implementation of the HCD characteristics makes pNovo an excellent tool for de novo peptide sequencing from HCD spectra.
Collapse
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Li Y, Chi H, Wang LH, Wang HP, Fu Y, Yuan ZF, Li SJ, Liu YS, Sun RX, Zeng R, He SM. Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2010; 24:807-814. [PMID: 20187083 DOI: 10.1002/rcm.4448] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Database searching is the technique of choice for shotgun proteomics, and to date much research effort has been spent on improving its effectiveness. However, database searching faces a serious challenge of efficiency, considering the large numbers of mass spectra and the ever fast increase in peptide databases resulting from genome translations, enzymatic digestions, and post-translational modifications. In this study, we conducted systematic research on speeding up database search engines for protein identification and illustrate the key points with the specific design of the pFind 2.1 search engine as a running example. Firstly, by constructing peptide indexes, pFind achieves a speedup of two to three compared with that without peptide indexes. Secondly, by constructing indexes for observed precursor and fragment ions, pFind achieves another speedup of two. As a result, pFind compares very favorably with predominant search engines such as Mascot, SEQUEST and X!Tandem.
Collapse
Affiliation(s)
- You Li
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|