51
|
Xu F, Wang L, Ju X, Zhang J, Yin S, Shi J, He R, Yuan Q. Transepithelial Transport of YWDHNNPQIR and Its Metabolic Fate with Cytoprotection against Oxidative Stress in Human Intestinal Caco-2 Cells. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2017; 65:2056-2065. [PMID: 28218523 DOI: 10.1021/acs.jafc.6b04731] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Studies on antioxidant peptides extracted from foodstuff sources have included not only experiments to elucidate their chemical characteristics but also to investigate their bioavailability and intracellular mechanisms. This study was designed to clarify the absorption and antioxidative activity of YWDHNNPQIR (named RAP), which is derived from rapeseed protein using a Caco-2 cell transwell model. Results showed that 0.8% RAP (C0 = 0.2 mM, t = 90 min) could maintain the original structure across the Caco-2 cell monolayers via the intracellular transcytosis pathway, and the apparent drug absorption rate (Papp) was (6.6 ± 1.24) × 10-7 cm/s. Three main fragments (WDHNNPQIR, DHNNPQIR, and YWDHNNPQ) and five modified peptides derived from RAP were found in both the apical and basolateral side of the Caco-2 cell transwell model. Among these new metabolites, WDHNNPQIR had the greatest antioxidative activity in Caco-2 cells apart from the DPPH assay. With a RAP concentration of 200 μM, there were significant differences in four antioxidative indicators (T-AOC, GSH-Px, SOD, and MDA) compared to the oxidative stress control (P < 0.05). In addition, RAP may also influence apoptosis of the Caco-2 cells, which was caused by AAPH-induced oxidative damage.
Collapse
Affiliation(s)
- Feiran Xu
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Lifeng Wang
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Xingrong Ju
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Jing Zhang
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Shi Yin
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Jiayi Shi
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Rong He
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Qiang Yuan
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| |
Collapse
|
52
|
Liu Y, Ma B, Zhang K, Lajoie G. An Approach for Peptide Identification by De Novo Sequencing of Mixture Spectra. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:326-336. [PMID: 28368810 DOI: 10.1109/tcbb.2015.2407401] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Mixture spectra occur quite frequently in a typical wet-lab mass spectrometry experiment, which result from the concurrent fragmentation of multiple precursors. The ability to efficiently and confidently identify mixture spectra is essential to alleviate the existent bottleneck of low mass spectra identification rate. However, most of the traditional computational methods are not suitable for interpreting mixture spectra, because they still take the assumption that the acquired spectra come from the fragmentation of a single precursor. In this manuscript, we formulate the mixture spectra de novo sequencing problem mathematically, and propose a dynamic programming algorithm for the problem. Additionally, we use both simulated and real mixture spectra data sets to verify the merits of the proposed algorithm.
Collapse
|
53
|
Vyatkina K. De Novo Sequencing of Top-Down Tandem Mass Spectra: A Next Step towards Retrieving a Complete Protein Sequence. Proteomes 2017; 5:E6. [PMID: 28248257 PMCID: PMC5372227 DOI: 10.3390/proteomes5010006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 01/30/2017] [Accepted: 02/04/2017] [Indexed: 11/16/2022] Open
Abstract
De novo sequencing of tandem (MS/MS) mass spectra represents the only way to determine the sequence of proteins from organisms with unknown genomes, or the ones not directly inscribed in a genome-such as antibodies, or novel splice variants. Top-down mass spectrometry provides new opportunities for analyzing such proteins; however, retrieving a complete protein sequence from top-down MS/MS spectra still remains a distant goal. In this paper, we review the state-of-the-art on this subject, and enhance our previously developed Twister algorithm for de novo sequencing of peptides from top-down MS/MS spectra to derive longer sequence fragments of a target protein.
Collapse
Affiliation(s)
- Kira Vyatkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, 7-9 Universitetskaya nab., St. Petersburg 199034, Russia.
- Department of Mathematical and Information Technologies, Saint Petersburg Academic University, 8/3 Khlopina st., St. Petersburg 194021, Russia.
| |
Collapse
|
54
|
Zhang S, Shan Y, Zhang S, Sui Z, Zhang L, Liang Z, Zhang Y. NIPTL-Novo: Non-isobaric peptide termini labeling assisted peptide de novo sequencing. J Proteomics 2017; 154:40-48. [DOI: 10.1016/j.jprot.2016.12.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Revised: 12/07/2016] [Accepted: 12/08/2016] [Indexed: 12/28/2022]
|
55
|
Yang H, Chi H, Zhou WJ, Zeng WF, He K, Liu C, Sun RX, He SM. Open-pNovo: De Novo Peptide Sequencing with Thousands of Protein Modifications. J Proteome Res 2017; 16:645-654. [PMID: 28019094 DOI: 10.1021/acs.jproteome.6b00716] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
De novo peptide sequencing has improved remarkably, but sequencing full-length peptides with unexpected modifications is still a challenging problem. Here we present an open de novo sequencing tool, Open-pNovo, for de novo sequencing of peptides with arbitrary types of modifications. Although the search space increases by ∼300 times, Open-pNovo is close to or even ∼10-times faster than the other three proposed algorithms. Furthermore, considering top-1 candidates on three MS/MS data sets, Open-pNovo can recall over 90% of the results obtained by any one traditional algorithm and report 5-87% more peptides, including 14-250% more modified peptides. On a high-quality simulated data set, ∼85% peptides with arbitrary modifications can be recalled by Open-pNovo, while hardly any results can be recalled by others. In summary, Open-pNovo is an excellent tool for open de novo sequencing and has great potential for discovering unexpected modifications in the real biological applications.
Collapse
Affiliation(s)
- Hao Yang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Chi
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China
| | - Wen-Jing Zhou
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wen-Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kun He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chao Liu
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China
| | - Si-Min He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
56
|
Fomin E. A Simple Approach to the Reconstruction of a Set of Points from the Multiset of n2 Pairwise Distances in n2 Steps for the Sequencing Problem: II. Algorithm. J Comput Biol 2016; 23:934-942. [DOI: 10.1089/cmb.2016.0046] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Eduard Fomin
- Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia
| |
Collapse
|
57
|
Ma B. De novo Peptide Sequencing. PROTEOME INFORMATICS 2016:15-38. [DOI: 10.1039/9781782626732-00015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
De novo peptide sequencing refers to the process of determining a peptide’s amino acid sequence from its MS/MS spectrum alone. The principle of this process is fairly straightforward: a high-quality spectrum may present a ladder of fragment ion peaks. The mass difference between every two adjacent peaks in the ladder is used to determine a residue of the peptide. However, most practical spectra do not have sufficient quality to support this straightforward process. Therefore, research in de novo sequencing has largely been a battle against the errors in the data. This chapter reviews some of the major developments in this field. The chapter starts with a quick review of the history in Section 1. Then manual de novo sequencing is examined in Section 2. Section 3 introduces a few commonly used de novo sequencing algorithms. An important aspect of automated de novo sequencing software is a good scoring function that serves as the optimization goal of the algorithm. Thus, Section 4 is devoted for the methods to define good scoring functions. Section 5 reviews a list of relevant software. The chapter concludes with a discussion of the applications and limitations of de novosequencing in Section 6.
Collapse
Affiliation(s)
- Bin Ma
- School of Computer Science, University of Waterloo Canada
| |
Collapse
|
58
|
Xiao K, Yu F, Tian Z. Top-down protein identification using isotopic envelope fingerprinting. J Proteomics 2016; 152:41-47. [PMID: 27989944 DOI: 10.1016/j.jprot.2016.10.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Revised: 10/11/2016] [Accepted: 10/23/2016] [Indexed: 12/14/2022]
Abstract
For top-down protein database search and identification from tandem mass spectra, our isotopic envelope fingerprinting search algorithm and ProteinGoggle search engine have demonstrated their strength of efficiently resolving heavily overlapping data as well separating non-ideal data with non-ideal isotopic envelopes from ideal ones with ideal isotopic envelopes. Here we report our updated ProteinGoggle 2.0 for intact protein database search with full-capacity. The indispensable updates include users' optional definition of dynamic post-translational modifications and static chemical labeling during database creation, comprehensive dissociation methods and ion series, as well as a Proteoform Score for each proteoform. ProteinGoggle has previously been benchmarked with both collision-based dissociation (CID, HCD) and electron-based dissociation (ETD) data of either intact proteins or intact proteomes. Here we report our further benchmarking of the new version of ProteinGoggle with publically available photon-based dissociation (UVPD) data (http://hdl.handle.net/2022/17316) of intact E. coli ribosomal proteins. BIOLOGICAL SIGNIFICANCE Protein species (aka proteoforms) function at their molecular level, and diverse structures and biological roles of every proteoform come from often co-occurring proteolysis, amino acid variation and post-translational modifications. Complete and high-throughput capture of this combinatorial information of proteoforms has become possible in evolving top-down proteomics; yet, various methods and technologies, especially database search and bioinformatics identification tools, in the top-down pipeline are still in their infancy stages and demand intensive research and development.
Collapse
Affiliation(s)
- Kaijie Xiao
- School of Chemical Science and Engineering, Tongji University, Shanghai, China; Shanghai Key Laboratory of Chemical Assessment and Sustainability, Tongji University, Shanghai, China
| | - Fan Yu
- School of Chemical Science and Engineering, Tongji University, Shanghai, China; Shanghai Key Laboratory of Chemical Assessment and Sustainability, Tongji University, Shanghai, China
| | - Zhixin Tian
- School of Chemical Science and Engineering, Tongji University, Shanghai, China; Shanghai Key Laboratory of Chemical Assessment and Sustainability, Tongji University, Shanghai, China.
| |
Collapse
|
59
|
Vyatkina K, Wu S, Dekker LJM, VanDuijn MM, Liu X, Tolić N, Luider TM, Paša-Tolić L, Pevzner PA. Top-down analysis of protein samples by de novo sequencing techniques. Bioinformatics 2016; 32:2753-9. [PMID: 27187201 PMCID: PMC6280873 DOI: 10.1093/bioinformatics/btw307] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Revised: 03/31/2016] [Accepted: 05/09/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. RESULTS We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. The former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns. AVAILABILITY AND IMPLEMENTATION Freely available on the web at http://bioinf.spbau.ru/en/twister CONTACT vyatkina@spbau.ru or ppevzner@ucsd.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kira Vyatkina
- Algorithmic Biology Laboratory, Saint Petersburg Academic University, St Petersburg, Russia Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, St Petersburg, Russia
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
| | - Lennard J M Dekker
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Martijn M VanDuijn
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Nikola Tolić
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Theo M Luider
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Ljiljana Paša-Tolić
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Pavel A Pevzner
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, St Petersburg, Russia Department of Computer Science and Engineering, University of California, San Diego, CA, USA
| |
Collapse
|
60
|
Complete De Novo Assembly of Monoclonal Antibody Sequences. Sci Rep 2016; 6:31730. [PMID: 27562653 PMCID: PMC4999880 DOI: 10.1038/srep31730] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 07/20/2016] [Indexed: 11/25/2022] Open
Abstract
De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216–441 AA, at 100% coverage, and 96.64–100% accuracy.
Collapse
|
61
|
Yan Y, Kusalik AJ, Wu FX. De novopeptide sequencing using CID and HCD spectra pairs. Proteomics 2016; 16:2615-2624. [DOI: 10.1002/pmic.201500251] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 05/31/2016] [Accepted: 07/08/2016] [Indexed: 11/06/2022]
Affiliation(s)
- Yan Yan
- Division; of Biomedical Engineering; University of Saskatchewan; Saskatoon Saskatchewan Canada
| | - Anthony J. Kusalik
- Division; of Biomedical Engineering; University of Saskatchewan; Saskatoon Saskatchewan Canada
- Department of Computer Science; University of Saskatchewan; Saskatoon Saskatchewan Canada
| | - Fang-Xiang Wu
- Division; of Biomedical Engineering; University of Saskatchewan; Saskatoon Saskatchewan Canada
- Department of Mechanical Engineering; University of Saskatchewan; Saskatoon Saskatchewan Canada
| |
Collapse
|
62
|
Gorshkov V, Hotta SYK, Verano-Braga T, Kjeldsen F. Peptide de novo sequencing of mixture tandem mass spectra. Proteomics 2016; 16:2470-9. [PMID: 27329701 PMCID: PMC5297990 DOI: 10.1002/pmic.201500549] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2015] [Revised: 04/27/2016] [Accepted: 06/17/2016] [Indexed: 02/02/2023]
Abstract
The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co‐isolation and thus prone to false identifications. The deconvolution approach matched complementary b‐, y‐ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co‐isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20–35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications.
Collapse
Affiliation(s)
- Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark.
| | | | - Thiago Verano-Braga
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark.,Department of Physiology and Biophysics, Federal University of Minas Gerais Belo Horizonte - MG, Belo Horizonte, Brazil
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark
| |
Collapse
|
63
|
Statistical prediction of protein structural, localization and functional properties by the analysis of its fragment mass distributions after proteolytic cleavage. Sci Rep 2016; 6:22286. [PMID: 26924271 PMCID: PMC4770285 DOI: 10.1038/srep22286] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2015] [Accepted: 02/11/2016] [Indexed: 12/03/2022] Open
Abstract
Structural, localization and functional properties of unknown proteins are often being predicted from their primary polypeptide chains using sequence alignment with already characterized proteins and consequent molecular modeling. Here we suggest an approach to predict various structural and structure-associated properties of proteins directly from the mass distributions of their proteolytic cleavage fragments. For amino-acid-specific cleavages, the distributions of fragment masses are determined by the distributions of inter-amino-acid intervals in the protein, that in turn apparently reflect its structural and structure-related features. Large-scale computer simulations revealed that for transmembrane proteins, either α-helical or β -barrel secondary structure could be predicted with about 90% accuracy after thermolysin cleavage. Moreover, 3/4 intrinsically disordered proteins could be correctly distinguished from proteins with fixed three-dimensional structure belonging to all four SCOP structural classes by combining 3–4 different cleavages. Additionally, in some cases the protein cellular localization (cytosolic or membrane-associated) and its host organism (Firmicute or Proteobacteria) could be predicted with around 80% accuracy. In contrast to cytosolic proteins, for membrane-associated proteins exhibiting specific structural conformations, their monotopic or transmembrane localization and functional group (ATP-binding, transporters, sensors and so on) could be also predicted with high accuracy and particular robustness against missing cleavages.
Collapse
|
64
|
Samgina TY, Tolpina MD, Trebse P, Torkar G, Artemenko KA, Bergquist J, Lebedev AT. LTQ Orbitrap Velos in routine de novo sequencing of non-tryptic skin peptides from the frog Rana latastei with traditional and reliable manual spectra interpretation. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2016; 30:265-276. [PMID: 27071218 DOI: 10.1002/rcm.7436] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
RATIONALE Mass spectrometry has shown itself to be the most efficient tool for the sequencing of peptides. However, de novo sequencing of novel natural peptides is significantly more challenging in comparison with the same procedure applied for the tryptic peptides. To reach the goal in this case it is essential to select the most efficient methods of triggering fragmentation and combine all the possible complementary techniques. METHODS Collision-induced dissociation (CID), high-energy collision dissociation (HCD), and electron-transfer dissociation (ETD) tandem mass spectra recorded with a LTQ Orbitrap Velos instrument were used for the elucidation of the sequence of the natural non-tryptic peptides from the skin secretion of Rana latastei. Manual interpretation of the spectra was applied. RESULTS The combined approach using CID, HCD, and ETD tandem mass spectra of the multiprotonated peptides in various charge states, as well as of their proteolytic fragments, allowed the sequences of seven novel peptides from the skin secretion of Rana latastei to be established. CONCLUSIONS Manual mass spectrometry sequencing of natural non-tryptic peptides from the skin secretion of Rana latastei provided the opportunity to work successfully with these species and demonstrated once again its advantage over automatic approaches.
Collapse
|
65
|
Devabhaktuni A, Elias JE. Application of de Novo Sequencing to Large-Scale Complex Proteomics Data Sets. J Proteome Res 2016; 15:732-42. [PMID: 26743026 DOI: 10.1021/acs.jproteome.5b00861] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Dependent on concise, predefined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large-scale proteomics data sets and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) that leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to that of other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches.
Collapse
Affiliation(s)
- Arun Devabhaktuni
- Department of Chemical & Systems Biology, Stanford University , Stanford, California 94035, United States
| | - Joshua E Elias
- Department of Chemical & Systems Biology, Stanford University , Stanford, California 94035, United States
| |
Collapse
|
66
|
Liu Y, Sun W, John J, Lajoie G, Ma B, Zhang K. De Novo Sequencing Assisted Approach for Characterizing Mixture MS/MS Spectra. IEEE Trans Nanobioscience 2016; 15:166-76. [PMID: 26800542 DOI: 10.1109/tnb.2016.2519841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Extensive research has been conducted for the computational analysis of mass spectrometry based proteomics data. However, there are still remaining challenges, among which, one particular challenge is the low identification rate of the collected spectral data. A specific contributing factor is the existence of mixture spectra in the collected MS/MS spectra which are generated by the concurrent fragmentation of multiple precursors in one sequencing attempt. The quite frequently observed mixture spectra necessitates the development of effective computational approaches to characterize those non-conventional spectral data. In this research, we proposed an approach for matching the query mixture spectra with a pair of peptide sequences acquired from the protein database by incorporating a special de novo assisted filtration strategy. The experiment results on two different datasets of MS/MS spectra containing mixed ion fragments from multiple peptides demonstrated the efficiency of the integrated filtration strategy in reducing examination space and verified the effectiveness of the proposed matching scheme as well.
Collapse
|
67
|
Sadygov RG. Using SEQUEST with theoretically complete sequence databases. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:1858-1864. [PMID: 26238326 PMCID: PMC4607654 DOI: 10.1007/s13361-015-1228-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 05/08/2015] [Accepted: 06/17/2015] [Indexed: 06/04/2023]
Abstract
SEQUEST has long been used to identify peptides/proteins from their tandem mass spectra and protein sequence databases. The algorithm has proven to be hugely successful for its sensitivity and specificity in identifying peptides/proteins, the sequences of which are present in the protein sequence databases. In this work, we report on work that attempts a new use for the algorithm by applying it to search a complete list of theoretically possible peptides, a de novo-like sequencing. We used freely available mass spectral data and determined a number of unique peptides as identified by SEQUEST. Using masses of these peptides and the mass accuracy of 0.001 Da, we have created a database of all theoretically possible peptide sequences corresponding to the precursor masses. We used our recently developed algorithm for determining all amino acid compositions corresponding to a mass interval, and used a lexicographic ordering to generate theoretical sequences from the compositions. The newly generated theoretical database was many-fold more complex than the original protein sequence database. We used SEQUEST to search and identify the best matches to the spectra from all theoretically possible peptide sequences. We found that SEQUEST cross-correlation score ranked the correct peptide match among the top sequence matches. The results testify to the high specificity of SEQUEST when combined with the high mass accuracy for intact peptides. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Rovshan G Sadygov
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, 77555, USA.
- Sealy Center for Molecular Medicine, The University of Texas Medical Branch, Galveston, TX, 77555, USA.
| |
Collapse
|
68
|
Lavallée-Adam M, Park SKR, Martínez-Bartolomé S, He L, Yates JR. From raw data to biological discoveries: a computational analysis pipeline for mass spectrometry-based proteomics. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:1820-1826. [PMID: 26002791 PMCID: PMC4607643 DOI: 10.1007/s13361-015-1161-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 04/03/2015] [Accepted: 04/05/2015] [Indexed: 06/04/2023]
Abstract
In the last two decades, computational tools for mass spectrometry-based proteomics data analysis have evolved from a few stand-alone software solutions serving specific goals, such as the identification of amino acid sequences based on mass spectrometry spectra, to large-scale complex pipelines integrating multiple computer programs to solve a collection of problems. This software evolution has been mostly driven by the appearance of novel technologies that allowed the community to tackle complex biological problems, such as the identification of proteins that are differentially expressed in two samples under different conditions. The achievement of such objectives requires a large suite of programs to analyze the intricate mass spectrometry data. Our laboratory addresses complex proteomics questions by producing and using algorithms and software packages. Our current computational pipeline includes, among other things, tools for mass spectrometry raw data processing, peptide and protein identification and quantification, post-translational modification analysis, and protein functional enrichment analysis. In this paper, we describe a suite of software packages we have developed to process mass spectrometry-based proteomics data and we highlight some of the new features of previously published programs as well as tools currently under development. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Mathieu Lavallée-Adam
- Department of Chemical Physiology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA, 92037, USA
| | - Sung Kyu Robin Park
- Department of Chemical Physiology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA, 92037, USA
| | - Salvador Martínez-Bartolomé
- Department of Chemical Physiology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA, 92037, USA
| | - Lin He
- Department of Chemical Physiology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA, 92037, USA
| | - John R Yates
- Department of Chemical Physiology and Molecular and Cellular Neurobiology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA, 92037, USA.
| |
Collapse
|
69
|
Ma B. Novor: real-time peptide de novo sequencing software. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:1885-94. [PMID: 26122521 PMCID: PMC4604512 DOI: 10.1007/s13361-015-1204-0] [Citation(s) in RCA: 143] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Revised: 05/12/2015] [Accepted: 05/17/2015] [Indexed: 05/09/2023]
Abstract
De novo sequencing software has been widely used in proteomics to sequence new peptides from tandem mass spectrometry data. This study presents a new software tool, Novor, to greatly improve both the speed and accuracy of today's peptide de novo sequencing analyses. To improve the accuracy, Novor's scoring functions are based on two large decision trees built from a peptide spectral library with more than 300,000 spectra with machine learning. Important knowledge about peptide fragmentation is extracted automatically from the library and incorporated into the scoring functions. The decision tree model also enables efficient score calculation and contributes to the speed improvement. To further improve the speed, a two-stage algorithmic approach, namely dynamic programming and refinement, is used. The software program was also carefully optimized. On the testing datasets, Novor sequenced 7%-37% more correct residues than the state-of-the-art de novo sequencing tool, PEAKS, while being an order of magnitude faster. Novor can de novo sequence more than 300 MS/MS spectra per second on a laptop computer. The speed surpasses the acquisition speed of today's mass spectrometer and, therefore, opens a new possibility to de novo sequence in real time while the spectrometer is acquiring the spectral data. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Bin Ma
- School of Computer Science, University of Waterloo, 200 University Ave. W., Waterloo, ON, N2L3G1, Canada.
| |
Collapse
|
70
|
da Veiga Leprevost F, Barbosa VC, Carvalho PC. Using PepExplorer to Filter and Organize
De Novo
Peptide Sequencing Results. ACTA ACUST UNITED AC 2015; 51:13.27.1-13.27.9. [DOI: 10.1002/0471250953.bi1327s51] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Felipe da Veiga Leprevost
- Computational Mass Spectrometry Group, Carlos Chagas Institute–Fiocruz. Curitiba Paraná Brazil
- Department of Pathology, University of Michigan Ann Arbor Michigan
| | - Valmir C. Barbosa
- Systems Engineering and Computer Science Program, Federal University of Rio de Janeiro Rio de Janeiro Brazil
| | - Paulo Costa Carvalho
- Computational Mass Spectrometry Group, Carlos Chagas Institute–Fiocruz. Curitiba Paraná Brazil
| |
Collapse
|
71
|
Melani RD, Araujo GD, Carvalho PC, Goto L, Nogueira FC, Junqueira M, Domont GB. Seeing beyond the tip of the iceberg: A deep analysis of the venome of the Brazilian Rattlesnake, Crotalus durissus terrificus. EUPA OPEN PROTEOMICS 2015. [DOI: 10.1016/j.euprot.2015.05.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
72
|
Chi H, He K, Yang B, Chen Z, Sun RX, Fan SB, Zhang K, Liu C, Yuan ZF, Wang QH, Liu SQ, Dong MQ, He SM. Reprint of "pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data". J Proteomics 2015; 129:33-41. [PMID: 26232248 DOI: 10.1016/j.jprot.2015.07.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/04/2015] [Accepted: 05/10/2015] [Indexed: 01/23/2023]
Abstract
Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.This article is part of a Special Issue entitled: Computational Proteomics.
Collapse
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Bing Yang
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Zhen Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Sheng-Bo Fan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun Zhang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Chao Liu
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Zuo-Fei Yuan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Quan-Hui Wang
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Si-Qi Liu
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Si-Min He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.
| |
Collapse
|
73
|
Xu T, Park SK, Venable JD, Wohlschlegel JA, Diedrich JK, Cociorva D, Lu B, Liao L, Hewel J, Han X, Wong CCL, Fonslow B, Delahunty C, Gao Y, Shah H, Yates JR. ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J Proteomics 2015; 129:16-24. [PMID: 26171723 DOI: 10.1016/j.jprot.2015.07.001] [Citation(s) in RCA: 361] [Impact Index Per Article: 40.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 06/08/2015] [Accepted: 07/04/2015] [Indexed: 12/25/2022]
Abstract
ProLuCID, a new algorithm for peptide identification using tandem mass spectrometry and protein sequence databases has been developed. This algorithm uses a three tier scoring scheme. First, a binomial probability is used as a preliminary scoring scheme to select candidate peptides. The binomial probability scores generated by ProLuCID minimize molecular weight bias and are independent of database size. A modified cross-correlation score is calculated for each candidate peptide identified by the binomial probability. This cross-correlation scoring function models the isotopic distributions of fragment ions of candidate peptides which ultimately results in higher sensitivity and specificity than that obtained with the SEQUEST XCorr. Finally, ProLuCID uses the distribution of XCorr values for all of the selected candidate peptides to compute a Z score for the peptide hit with the highest XCorr. The ProLuCID Z score combines the discriminative power of XCorr and DeltaCN, the standard parameters for assessing the quality of the peptide identification using SEQUEST, and displays significant improvement in specificity over ProLuCID XCorr alone. ProLuCID is also able to take advantage of high resolution MS/MS spectra leading to further improvements in specificity when compared to low resolution tandem MS data. A comparison of filtered data searched with SEQUEST and ProLuCID using the same false discovery rate as estimated by a target-decoy database strategy, shows that ProLuCID was able to identify as many as 25% more proteins than SEQUEST. ProLuCID is implemented in Java and can be easily installed on a single computer or a computer cluster. This article is part of a Special Issue entitled: Computational Proteomics.
Collapse
Affiliation(s)
- T Xu
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA; Dow AgroSciences LLC, Indianapolis, IN 46268, USA
| | - S K Park
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - J D Venable
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - J A Wohlschlegel
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - J K Diedrich
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - D Cociorva
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - B Lu
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - L Liao
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - J Hewel
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - X Han
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - C C L Wong
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - B Fonslow
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - C Delahunty
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - Y Gao
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - H Shah
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - J R Yates
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA.
| |
Collapse
|
74
|
Pejchinovski M, Klein J, Ramírez-Torres A, Bitsika V, Mermelekas G, Vlahou A, Mullen W, Mischak H, Jankowski V. Comparison of higher energy collisional dissociation and collision-induced dissociation MS/MS sequencing methods for identification of naturally occurring peptides in human urine. Proteomics Clin Appl 2015; 9:531-42. [DOI: 10.1002/prca.201400163] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Revised: 02/27/2015] [Accepted: 03/23/2015] [Indexed: 01/11/2023]
Affiliation(s)
- Martin Pejchinovski
- Charite-Universitätsmedizin Berlin; Berlin Germany
- Mosaiques Diagnostics GmbH; Hanover Germany
| | | | | | - Vasiliki Bitsika
- Biotechnology Division; Biomedical Research Foundation; Academy of Athens; Athens Greece
| | - George Mermelekas
- Biotechnology Division; Biomedical Research Foundation; Academy of Athens; Athens Greece
| | - Antonia Vlahou
- Biotechnology Division; Biomedical Research Foundation; Academy of Athens; Athens Greece
| | - William Mullen
- Institute of Cardiovascular and Medical Sciences; University of Glasgow; Glasgow UK
| | - Harald Mischak
- Mosaiques Diagnostics GmbH; Hanover Germany
- Institute of Cardiovascular and Medical Sciences; University of Glasgow; Glasgow UK
| | - Vera Jankowski
- Universitätsklinikum RWTH Aachen; Institute of Molecular Cardiovascular Research; Aachen Germany
| |
Collapse
|
75
|
Yan Y, Kusalik AJ, Wu FX. A Framework of De Novo Peptide Sequencing for Multiple Tandem Mass Spectra. IEEE Trans Nanobioscience 2015; 14:478-484. [DOI: 10.1109/tnb.2015.2419194] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
76
|
Chi H, He K, Yang B, Chen Z, Sun RX, Fan SB, Zhang K, Liu C, Yuan ZF, Wang QH, Liu SQ, Dong MQ, He SM. pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data. J Proteomics 2015; 125:89-97. [PMID: 25979774 DOI: 10.1016/j.jprot.2015.05.009] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/04/2015] [Accepted: 05/10/2015] [Indexed: 10/23/2022]
Abstract
Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.
Collapse
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Bing Yang
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Zhen Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Sheng-Bo Fan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Kun Zhang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Chao Liu
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Zuo-Fei Yuan
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
| | - Quan-Hui Wang
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Si-Qi Liu
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Si-Min He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.
| |
Collapse
|
77
|
Nam J, Kwon H, Jang I, Jeon A, Moon J, Lee SY, Kang D, Han SY, Moon B, Oh HB. Bromine isotopic signature facilitates de novo sequencing of peptides in free-radical-initiated peptide sequencing (FRIPS) mass spectrometry. JOURNAL OF MASS SPECTROMETRY : JMS 2015; 50:378-387. [PMID: 25800020 DOI: 10.1002/jms.3539] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2014] [Revised: 08/04/2014] [Accepted: 11/02/2014] [Indexed: 06/04/2023]
Abstract
We recently showed that free-radical-initiated peptide sequencing mass spectrometry (FRIPS MS) assisted by the remarkable thermochemical stability of (2,2,6,6-tetramethyl-piperidin-1-yl)oxyl (TEMPO) is another attractive radical-driven peptide fragmentation MS tool. Facile homolytic cleavage of the bond between the benzylic carbon and the oxygen of the TEMPO moiety in o-TEMPO-Bz-C(O)-peptide and the high reactivity of the benzylic radical species generated in •Bz-C(O)-peptide are key elements leading to extensive radical-driven peptide backbone fragmentation. In the present study, we demonstrate that the incorporation of bromine into the benzene ring, i.e. o-TEMPO-Bz(Br)-C(O)-peptide, allows unambiguous distinction of the N-terminal peptide fragments from the C-terminal fragments through the unique bromine doublet isotopic signature. Furthermore, bromine substitution does not alter the overall radical-driven peptide backbone dissociation pathways of o-TEMPO-Bz-C(O)-peptide. From a practical perspective, the presence of the bromine isotopic signature in the N-terminal peptide fragments in TEMPO-assisted FRIPS MS represents a useful and cost-effective opportunity for de novo peptide sequencing.
Collapse
Affiliation(s)
- Jungjoo Nam
- Department of Chemistry, Sogang University, Seoul, 121-742, Korea
| | | | | | | | | | | | | | | | | | | |
Collapse
|
78
|
Medzihradszky KF, Chalkley RJ. Lessons in de novo peptide sequencing by tandem mass spectrometry. MASS SPECTROMETRY REVIEWS 2015; 34:43-63. [PMID: 25667941 PMCID: PMC4367481 DOI: 10.1002/mas.21406] [Citation(s) in RCA: 137] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Mass spectrometry has become the method of choice for the qualitative and quantitative characterization of protein mixtures isolated from all kinds of living organisms. The raw data in these studies are MS/MS spectra, usually of peptides produced by proteolytic digestion of a protein. These spectra are "translated" into peptide sequences, normally with the help of various search engines. Data acquisition and interpretation have both been automated, and most researchers look only at the summary of the identifications without ever viewing the underlying raw data used for assignments. Automated analysis of data is essential due to the volume produced. However, being familiar with the finer intricacies of peptide fragmentation processes, and experiencing the difficulties of manual data interpretation allow a researcher to be able to more critically evaluate key results, particularly because there are many known rules of peptide fragmentation that are not incorporated into search engine scoring. Since the most commonly used MS/MS activation method is collision-induced dissociation (CID), in this article we present a brief review of the history of peptide CID analysis. Next, we provide a detailed tutorial on how to determine peptide sequences from CID data. Although the focus of the tutorial is de novo sequencing, the lessons learned and resources supplied are useful for data interpretation in general.
Collapse
|
79
|
Samgina TY, Vorontsov EA, Gorshkov VA, Artemenko KA, Zubarev RA, Lebedev AT. Mass spectrometric de novo sequencing of natural non-tryptic peptides: comparing peculiarities of collision-induced dissociation (CID) and high energy collision dissociation (HCD). RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2014; 28:2595-2604. [PMID: 25366406 DOI: 10.1002/rcm.7049] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Revised: 09/09/2014] [Accepted: 09/09/2014] [Indexed: 06/04/2023]
Abstract
RATIONALE Mass spectrometry has shown itself as the most efficient tool for the sequencing of peptides. However, de novo sequencing of novel natural peptides is significantly more challenging in comparison with the same procedure applied for the tryptic peptides. To reach the goal in this case it is essential to select the most useful methods of triggering fragmentation and combine complementary techniques. METHODS Comparison of low-energy collision-induced dissociation (CID) and higher energy collision-induced dissociation (HCD) modes for sequencing of the natural non-tryptic peptides with disulfide bonds and/or several proline residues in the backbone was achieved using an LTQ FT Ultra Fourier transform ion cyclotron resonance (FTICR) mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a 7 T magnet and an LTQ Orbitrap Velos ETD (Thermo Fisher Scientific, Bremen, Germany) instrument. Peptide fractions were obtained by high-performance liquid chromatography (HPLC) separation of frog skin secretion samples from ten species of Rana temporaria, caught in the Kolomna district of Moscow region (Russia). RESULTS HCD makes the b/y series longer and more pronounced, thus increasing sequence coverage. Fragment ions due to cleavages at the C-termini of proline residues make the sequencing more reliable and may be used to detect missed cleavages in the case of tryptic peptides. Another HCD peculiarity involves formation of pronounced inner fragment ions (secondary y(n)b(m) ion series formed from the abundant primary y-ions). Differences in de novo sequencing of natural non-tryptic peptides with CID and HCD, involving thorough manual expert interpretation of spectra and two automatic sequencing algorithms, are discussed. CONCLUSIONS Although HCD provides better results, a combination of CID and HCD data may notably increase reliability of de novo sequencing. Several pairs of b2 /a2 -ions may be formed in HCD, complicating the spectra. Automatic de novo sequencing with the available programs remains less efficient than the manual one, independently of the collision energy.
Collapse
Affiliation(s)
- Tatyana Yu Samgina
- Department of Chemistry, Moscow State University, Russian Federation, 119991, Leninskie Gory 1/3, Moscow, Russia
| | | | | | | | | | | |
Collapse
|
80
|
Wang X, Li Y, Wu Z, Wang H, Tan H, Peng J. JUMP: a tag-based database search tool for peptide identification with high sensitivity and accuracy. Mol Cell Proteomics 2014; 13:3663-73. [PMID: 25202125 DOI: 10.1074/mcp.o114.039586] [Citation(s) in RCA: 102] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Database search programs are essential tools for identifying peptides via mass spectrometry (MS) in shotgun proteomics. Simultaneously achieving high sensitivity and high specificity during a database search is crucial for improving proteome coverage. Here we present JUMP, a new hybrid database search program that generates amino acid tags and ranks peptide spectrum matches (PSMs) by an integrated score from the tags and pattern matching. In a typical run of liquid chromatography coupled with high-resolution tandem MS, more than 95% of MS/MS spectra can generate at least one tag, whereas the remaining spectra are usually too poor to derive genuine PSMs. To enhance search sensitivity, the JUMP program enables the use of tags as short as one amino acid. Using a target-decoy strategy, we compared JUMP with other programs (e.g. SEQUEST, Mascot, PEAKS DB, and InsPecT) in the analysis of multiple datasets and found that JUMP outperformed these preexisting programs. JUMP also permitted the analysis of multiple co-fragmented peptides from "mixture spectra" to further increase PSMs. In addition, JUMP-derived tags allowed partial de novo sequencing and facilitated the unambiguous assignment of modified residues. In summary, JUMP is an effective database search algorithm complementary to current search programs.
Collapse
Affiliation(s)
- Xusheng Wang
- From the ‡St. Jude Proteomics Facility, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Yuxin Li
- §Departments of Structural Biology and Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Zhiping Wu
- §Departments of Structural Biology and Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Hong Wang
- §Departments of Structural Biology and Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105; ‡‡Integrated Biomedical Sciences Program, The University of Tennessee Health Science Center, Memphis, Tennessee 38163
| | - Haiyan Tan
- From the ‡St. Jude Proteomics Facility, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Junmin Peng
- From the ‡St. Jude Proteomics Facility, St. Jude Children's Research Hospital, Memphis, Tennessee 38105; §Departments of Structural Biology and Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105;
| |
Collapse
|
81
|
Statistical characterization of HCD fragmentation patterns of tryptic peptides on an LTQ Orbitrap Velos mass spectrometer. J Proteomics 2014; 109:26-37. [DOI: 10.1016/j.jprot.2014.06.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Revised: 06/13/2014] [Accepted: 06/16/2014] [Indexed: 11/23/2022]
|
82
|
Cunsolo V, Muccilli V, Saletti R, Foti S. Mass spectrometry in food proteomics: a tutorial. JOURNAL OF MASS SPECTROMETRY : JMS 2014; 49:768-784. [PMID: 25230173 DOI: 10.1002/jms.3374] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Revised: 04/10/2014] [Accepted: 04/11/2014] [Indexed: 06/03/2023]
Abstract
In the last decades, the continuous and rapid evolution of proteomic approaches has provided an efficient platform for the characterization of food-derived proteins. Particularly, the impressive increasing in performance and versatility of the MS instrumentation has contributed to the development of new analytical strategies for proteins, evidencing how MS arguably represents an indispensable tool in food proteomics. Investigation of protein composition in foodstuffs is helpful for understanding the relationship between the protein content and the nutritional and technological properties of foods, the production of methods for food traceability, the assessment of food quality and safety, including the detection of allergens and microbial contaminants in foods, or even the characterization of genetically modified products. Given the high variety of the food-derived proteins and considering their differences in chemical and physical properties, a single proteomic strategy for all purposes does not exist. Rather, proteomic approaches need to be adapted to each analytical problem, and development of new strategies is necessary in order to obtain always the best results. In this tutorial, the most relevant aspects of MS-based methodologies in food proteomics will be examined, and their advantages and drawbacks will be discussed.
Collapse
Affiliation(s)
- Vincenzo Cunsolo
- Department of Chemical Sciences, University of Catania, Viale A. Doria, 6, I-95125, Catania, Italy
| | | | | | | |
Collapse
|
83
|
Zelanis A, Keiji Tashima A. Unraveling snake venom complexity with ‘omics’ approaches: Challenges and perspectives. Toxicon 2014; 87:131-4. [DOI: 10.1016/j.toxicon.2014.05.011] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Revised: 04/14/2014] [Accepted: 05/07/2014] [Indexed: 11/29/2022]
|
84
|
Rapid development of proteomics in China: from the perspective of the Human Liver Proteome Project and technology development. SCIENCE CHINA-LIFE SCIENCES 2014; 57:1162-71. [PMID: 25119674 DOI: 10.1007/s11427-014-4714-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 07/01/2014] [Indexed: 12/17/2022]
|
85
|
Wilhelm T, Jones AME. Identification of related peptides through the analysis of fragment ion mass shifts. J Proteome Res 2014; 13:4002-11. [PMID: 25058668 DOI: 10.1021/pr500347e] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Mass spectrometry (MS) has become the method of choice to identify and quantify proteins, typically by fragmenting peptides and inferring protein identification by reference to sequence databases. Well-established programs have largely solved the problem of identifying peptides in complex mixtures. However, to prevent the search space from becoming prohibitively large, most search engines need a list of expected modifications. Therefore, unexpected modifications limit both the identification of proteins and peptide-based quantification. We developed mass spectrometry-peak shift analysis (MS-PSA) to rapidly identify related spectra in large data sets without reference to databases or specified modifications. Peptide identifications from established tools, such as MASCOT or SEQUEST, may be propagated onto MS-PSA results. Modification of a peptide alters the mass of the precursor ion and some of the fragmentation ions. MS-PSA identifies characteristic fragmentation masses from MS/MS spectra. Related spectra are identified by pattern matching of unchanged and mass-shifted fragment ions. We illustrate the use of MS-PSA with simple and complex mixtures with both high and low mass accuracy data sets. MS-PSA is not limited to the analysis of peptides but can be used for the identification of related groups of spectra in any set of fragmentation patterns.
Collapse
Affiliation(s)
- Thomas Wilhelm
- Institute of Food Research , Norwich Research Park, Norwich NR4 7UA, United Kingdom
| | | |
Collapse
|
86
|
Su ZD, Sheng QH, Li QR, Chi H, Jiang X, Yan Z, Fu N, He SM, Khaitovich P, Wu JR, Zeng R. De novo identification and quantification of single amino-acid variants in human brain. J Mol Cell Biol 2014; 6:421-33. [PMID: 25007923 DOI: 10.1093/jmcb/mju031] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The detection of single amino-acid variants (SAVs) usually depends on single-nucleotide polymorphisms (SNPs) database. Here, we describe a novel method that discovers SAVs at proteome level independent of SNPs data. Using mass spectrometry-based de novo sequencing algorithm, peptide-candidates are identified and compared with theoretical protein database to generate SAVs under pairing strategy, which is followed by database re-searching to control false discovery rate. In human brain tissues, we can confidently identify known and novel protein variants with diverse origins. Combined with DNA/RNA sequencing, we verify SAVs derived from DNA mutations, RNA alternative splicing, and unknown post-transcriptional mechanisms. Furthermore, quantitative analysis in human brain tissues reveals several tissue-specific differential expressions of SAVs. This approach provides a novel access to high-throughput detection of protein variants, which may offer the potential for clinical biomarker discovery and mechanistic research.
Collapse
Affiliation(s)
- Zhi-Duan Su
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Quan-Hu Sheng
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Qing-Run Li
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Hao Chi
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Xi Jiang
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Zheng Yan
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Ning Fu
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Si-Min He
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Philipp Khaitovich
- Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jia-Rui Wu
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| | - Rong Zeng
- Key Laboratory of Systems Biology, Chinese Academy of Sciences, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Shanghai 200031, China
| |
Collapse
|
87
|
WANG Y, LI SM, HE MW. Fragmentation Characteristics and Utility of Immonium Ions for Peptide Identification by MALDI-TOF/TOF-Mass Spectrometry. CHINESE JOURNAL OF ANALYTICAL CHEMISTRY 2014. [DOI: 10.1016/s1872-2040(14)60752-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
88
|
Liu X, Dekker LJM, Wu S, Vanduijn MM, Luider TM, Tolić N, Kou Q, Dvorkin M, Alexandrova S, Vyatkina K, Paša-Tolić L, Pevzner PA. De Novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. J Proteome Res 2014; 13:3241-8. [DOI: 10.1021/pr401300m] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Xiaowen Liu
- Department
of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 535 West Michigan Street, IT 475, Indianapolis, Indiana 46202, United States
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, Indiana 46202, United States
| | - Lennard J. M. Dekker
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040, 3000
CA Rotterdam, The Netherlands
| | - Si Wu
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Martijn M. Vanduijn
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040, 3000
CA Rotterdam, The Netherlands
| | - Theo M. Luider
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040, 3000
CA Rotterdam, The Netherlands
| | - Nikola Tolić
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Qiang Kou
- Department
of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 535 West Michigan Street, IT 475, Indianapolis, Indiana 46202, United States
| | - Mikhail Dvorkin
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, St. Petersburg 194021, Russia
| | - Sonya Alexandrova
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, St. Petersburg 194021, Russia
| | - Kira Vyatkina
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, St. Petersburg 194021, Russia
| | - Ljiljana Paša-Tolić
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Pavel A. Pevzner
- Department
of Computer Science and Engineering, University of California, 9500 Gilman
Drive, San Diego, California 92093, United States
| |
Collapse
|
89
|
Kelstrup CD, Frese C, Heck AJR, Olsen JV, Nielsen ML. Analytical utility of mass spectral binning in proteomic experiments by SPectral Immonium Ion Detection (SPIID). Mol Cell Proteomics 2014; 13:1914-24. [PMID: 24895383 PMCID: PMC4125726 DOI: 10.1074/mcp.o113.035915] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Unambiguous identification of tandem mass spectra is a cornerstone in mass-spectrometry-based proteomics. As the study of post-translational modifications (PTMs) by means of shotgun proteomics progresses in depth and coverage, the ability to correctly identify PTM-bearing peptides is essential, increasing the demand for advanced data interpretation. Several PTMs are known to generate unique fragment ions during tandem mass spectrometry, the so-called diagnostic ions, which unequivocally identify a given mass spectrum as related to a specific PTM. Although such ions offer tremendous analytical advantages, algorithms to decipher MS/MS spectra for the presence of diagnostic ions in an unbiased manner are currently lacking. Here, we present a systematic spectral-pattern-based approach for the discovery of diagnostic ions and new fragmentation mechanisms in shotgun proteomics datasets. The developed software tool is designed to analyze large sets of high-resolution peptide fragmentation spectra independent of the fragmentation method, instrument type, or protease employed. To benchmark the software tool, we analyzed large higher-energy collisional activation dissociation datasets of samples containing phosphorylation, ubiquitylation, SUMOylation, formylation, and lysine acetylation. Using the developed software tool, we were able to identify known diagnostic ions by comparing histograms of modified and unmodified peptide spectra. Because the investigated tandem mass spectra data were acquired with high mass accuracy, unambiguous interpretation and determination of the chemical composition for the majority of detected fragment ions was feasible. Collectively we present a freely available software tool that allows for comprehensive and automatic analysis of analogous product ions in tandem mass spectra and systematic mapping of fragmentation mechanisms related to common amino acids.
Collapse
Affiliation(s)
- Christian D Kelstrup
- From the ‡Department of Proteomics, The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Faculty of Health Sciences, DK-2200 Copenhagen, Denmark
| | - Christian Frese
- §Biomolecular Mass Spectrometry and Proteomics Group, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584CH Utrecht, The Netherlands
| | - Albert J R Heck
- §Biomolecular Mass Spectrometry and Proteomics Group, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584CH Utrecht, The Netherlands
| | - Jesper V Olsen
- From the ‡Department of Proteomics, The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Faculty of Health Sciences, DK-2200 Copenhagen, Denmark
| | - Michael L Nielsen
- From the ‡Department of Proteomics, The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Faculty of Health Sciences, DK-2200 Copenhagen, Denmark;
| |
Collapse
|
90
|
|
91
|
Leprevost FV, Valente RH, Lima DB, Perales J, Melani R, Yates JR, Barbosa VC, Junqueira M, Carvalho PC. PepExplorer: a similarity-driven tool for analyzing de novo sequencing results. Mol Cell Proteomics 2014; 13:2480-9. [PMID: 24878498 DOI: 10.1074/mcp.m113.037002] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Peptide spectrum matching is the current gold standard for protein identification via mass-spectrometry-based proteomics. Peptide spectrum matching compares experimental mass spectra against theoretical spectra generated from a protein sequence database to perform identification, but protein sequences not present in a database cannot be identified unless their sequences are in part conserved. The alternative approach, de novo sequencing, can make it possible to infer a peptide sequence directly from a mass spectrum, but interpreting long lists of peptide sequences resulting from large-scale experiments is not trivial. With this as motivation, PepExplorer was developed to use rigorous pattern recognition to assemble a list of homologue proteins using de novo sequencing data coupled to sequence alignment to allow biological interpretation of the data. PepExplorer can read the output of various widely adopted de novo sequencing tools and converge to a list of proteins with a global false-discovery rate. To this end, it employs a radial basis function neural network that considers precursor charge states, de novo sequencing scores, peptide lengths, and alignment scores to select similar protein candidates, from a target-decoy database, usually obtained from phylogenetically related species. Alignments are performed using a modified Smith-Waterman algorithm tailored for the task at hand. We verified the effectiveness of our approach using a reference set of identifications generated by ProLuCID when searching for Pyrococcus furiosus mass spectra on the corresponding NCBI RefSeq database. We then modified the sequence database by swapping amino acids until ProLuCID was no longer capable of identifying any proteins. By searching the mass spectra using PepExplorer on the modified database, we were able to recover most of the identifications at a 1% false-discovery rate. Finally, we employed PepExplorer to disclose a comprehensive proteomic assessment of the Bothrops jararaca plasma, a known biological source of natural inhibitors of snake toxins. PepExplorer is integrated into the PatternLab for Proteomics environment, which makes available various tools for downstream data analysis, including resources for quantitative and differential proteomics.
Collapse
Affiliation(s)
- Felipe V Leprevost
- From the ‡Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil
| | - Richard H Valente
- §Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro, Brazil; ¶Instituto Nacional de Ciência e Tecnologia em Toxinas (INCTTox/CNPq), Brazil
| | - Diogo B Lima
- From the ‡Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil
| | - Jonas Perales
- §Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro, Brazil; ¶Instituto Nacional de Ciência e Tecnologia em Toxinas (INCTTox/CNPq), Brazil
| | - Rafael Melani
- ‖Proteomics Unit, Rio de Janeiro Proteomics Network, Department of Biochemistry, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - John R Yates
- **Department of Chemical Physiology, The Scripps Research Institute, La Jolla, California
| | - Valmir C Barbosa
- ‡‡Systems Engineering and Computer Science Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Magno Junqueira
- ‖Proteomics Unit, Rio de Janeiro Proteomics Network, Department of Biochemistry, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Paulo C Carvalho
- From the ‡Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil
| |
Collapse
|
92
|
Abstract
In recent years, de novo peptide sequencing from mass spectrometry data has developed as one of the major peptide identification methods with the emergence of new instruments and advanced computational methods. However, there are still limitations to this method; for example, the typically used spectrum graph model cannot represent all the information and relationships inherent in tandem mass spectra (MS/MS spectra). Here, we present a new method named NovoHCD which applies a spectrum graph model with multiple types of edges (called a multi-edge graph), and integrates into it amino acid combination (AAC) information and peptide tags. In addition, information on immonium ions observed particularly in higher-energy collisional dissociation (HCD) spectra is incorporated. Comparisons between NovoHCD and another successful de novo peptide sequencing method for HCD spectra, pNovo, were performed. Experiments were conducted on five HCD spectral datasets. Results show that NovoHCD outperforms pNovo in terms of full length peptide identification accuracy; specifically, the accuracy increases 13%-21% over the five datasets.
Collapse
|
93
|
Sun H, Xing X, Li J, Zhou F, Chen Y, He Y, Li W, Wei G, Chang X, Jia J, Li Y, Xie L. Identification of gene fusions from human lung cancer mass spectrometry data. BMC Genomics 2013; 14 Suppl 8:S5. [PMID: 24564548 PMCID: PMC4042237 DOI: 10.1186/1471-2164-14-s8-s5] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Tandem mass spectrometry (MS/MS) technology has been applied to identify proteins, as an ultimate approach to confirm the original genome annotation. To be able to identify gene fusion proteins, a special database containing peptides that cross over gene fusion breakpoints is needed. Methods It is impractical to construct a database that includes all possible fusion peptides originated from potential breakpoints. Focusing on 6259 reported and predicted gene fusion pairs from ChimerDB 2.0 and Cancer Gene Census, we for the first time created a database CanProFu that comprehensively annotates fusion peptides formed by exon-exon linkage between these pairing genes. Results Applying this database to mass spectrometry datasets of 40 human non-small cell lung cancer (NSCLC) samples and 39 normal lung samples with stringent searching criteria, we were able to identify 19 unique fusion peptides characterizing gene fusion events. Among them 11 gene fusion events were only found in NSCLC samples. And also, 4 alternative splicing events were characterized in cancerous or normal lung samples. Conclusions The database and workflow in this work can be flexibly applied to other MS/MS based human cancer experiments to detect gene fusions as potential disease biomarkers or drug targets.
Collapse
|
94
|
Abstract
Independent of the approach used, the ability to correctly interpret tandem MS data depends on the quality of the original spectra. Even in the case of the highest quality spectra, the majority of spectral peaks can not be reliably interpreted. The accuracy of sequencing algorithms can be improved by filtering out such 'noise' peaks. Preprocessing MS/MS spectra to select informative ion peaks increases accuracy and reduces the processing time. Intuitively, the mix of informative versus non-informative peaks has a direct effect on the quality and size of the resulting candidate peptide search space. As the number of selected peaks increases, the corresponding search space increases exponentially. If we select too few peaks then the ion-ladder interpretation of the spectrum will contain gaps that can only be explained by permutations of combinations of amino acids. This will result in a larger candidate peptide search space and poorer quality candidates. The dependency that peptide sequencing accuracy has on an initial peak selection regime makes this preprocessing step a crucial facet of any approach, whether de novo or not, to MS/MS spectra interpretation. We have developed a novel approach to address this problem. Our approach uses a staged neural network to model ion fragmentation patterns and estimate the posterior probability of each ion type. Our method improves upon other preprocessing techniques and shows a significant reduction in the search space for candidate peptides without sacrificing candidate peptide quality.
Collapse
|
95
|
Robotham SA, Kluwe C, Cannon JR, Ellington A, Brodbelt JS. De novo sequencing of peptides using selective 351 nm ultraviolet photodissociation mass spectrometry. Anal Chem 2013; 85:9832-8. [PMID: 24050806 DOI: 10.1021/ac402309h] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Although in silico database search methods remain more popular for shotgun proteomics methods, de novo sequencing offers the ability to identify peptides derived from proteins lacking sequenced genomes and ones with subtle splice variants or truncations. Ultraviolet photodissociation (UVPD) of peptides derivatized by selective attachment of a chromophore at the N-terminus generates a characteristic series of y ions. The UVPD spectra of the chromophore-labeled peptides are simplified and thus amenable to de novo sequencing. This method resulted in an observed sequence coverage of 79% for cytochrome C (eight peptides), 47% for β-lactoglobulin (five peptides), 25% for carbonic anhydrase (six peptides), and 51% for bovine serum albumin (33 peptides). This strategy also allowed differentiation of proteins with high sequence homology as evidenced by de novo sequencing of two variants of green fluorescent protein.
Collapse
Affiliation(s)
- Scott A Robotham
- Department of Chemistry, University of Texas , Austin, Texas 78712, United States
| | | | | | | | | |
Collapse
|
96
|
Richards AL, Vincent CE, Guthals A, Rose CM, Westphall MS, Bandeira N, Coon JJ. Neutron-encoded signatures enable product ion annotation from tandem mass spectra. Mol Cell Proteomics 2013; 12:3812-23. [PMID: 24043425 DOI: 10.1074/mcp.m113.028951] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
We report the use of neutron-encoded (NeuCode) stable isotope labeling of amino acids in cell culture for the purpose of C-terminal product ion annotation. Two NeuCode labeling isotopologues of lysine, (13)C6(15)N2 and (2)H8, which differ by 36 mDa, were metabolically embedded in a sample proteome, and the resultant labeled proteins were combined, digested, and analyzed via liquid chromatography and mass spectrometry. With MS/MS scan resolving powers of ~50,000 or higher, product ions containing the C terminus (i.e. lysine) appear as a doublet spaced by exactly 36 mDa, whereas N-terminal fragments exist as a single m/z peak. Through theory and experiment, we demonstrate that over 90% of all y-type product ions have detectable doublets. We report on an algorithm that can extract these neutron signatures with high sensitivity and specificity. In other words, of 15,503 y-type product ion peaks, the y-type ion identification algorithm correctly identified 14,552 (93.2%) based on detection of the NeuCode doublet; 6.8% were misclassified (i.e. other ion types that were assigned as y-type products). Searching NeuCode labeled yeast with PepNovo(+) resulted in a 34% increase in correct de novo identifications relative to searching through MS/MS only. We use this tool to simplify spectra prior to database searching, to sort unmatched tandem mass spectra for spectral richness, for correlation of co-fragmented ions to their parent precursor, and for de novo sequence identification.
Collapse
Affiliation(s)
- Alicia L Richards
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706
| | | | | | | | | | | | | |
Collapse
|
97
|
Abstract
Motivation: Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but de novo peptide sequencing algorithms to analyze tandem mass (MS/MS) spectra are lagging behind. Although existing de novo sequencing tools perform well on certain types of spectra [e.g. Collision Induced Dissociation (CID) spectra of tryptic peptides], their performance often deteriorates on other types of spectra, such as Electron Transfer Dissociation (ETD), Higher-energy Collisional Dissociation (HCD) spectra or spectra of non-tryptic digests. Thus, rather than developing a new algorithm for each type of spectra, we develop a universal de novo sequencing algorithm called UniNovo that works well for all types of spectra or even for spectral pairs (e.g. CID/ETD spectral pairs). UniNovo uses an improved scoring function that captures the dependences between different ion types, where such dependencies are learned automatically using a modified offset frequency function. Results: The performance of UniNovo is compared with PepNovo+, PEAKS and pNovo using various types of spectra. The results show that the performance of UniNovo is superior to other tools for ETD spectra and superior or comparable with others for CID and HCD spectra. UniNovo also estimates the probability that each reported reconstruction is correct, using simple statistics that are readily obtained from a small training dataset. We demonstrate that the estimation is accurate for all tested types of spectra (including CID, HCD, ETD, CID/ETD and HCD/ETD spectra of trypsin, LysC or AspN digested peptides). Availability: UniNovo is implemented in JAVA and tested on Windows, Ubuntu and OS X machines. UniNovo is available at http://proteomics.ucsd.edu/Software/UniNovo.html along with the manual. Contact:kwj@ucsd.edu or ppevzner@ucsd.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kyowon Jeong
- Department of Electrical and Computer Engineering and Department of Computer Science and Engineering, University of California-San Diego, CA 92093, USA.
| | | | | |
Collapse
|
98
|
Guthals A, Clauser KR, Frank AM, Bandeira N. Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. J Proteome Res 2013; 12:2846-57. [PMID: 23679345 DOI: 10.1021/pr400173d] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditional methods that sequence spectra individually are limited by short peptide length, incomplete peptide fragmentation, and ambiguous de novo interpretations. We address these issues by determining consensus sequences for assembled tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using multiple enzymatic digests). We have combined electron-transfer dissociation (ETD) with collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD) fragmentation methods to boost interpretation of long, highly charged peptides and take advantage of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies, we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides yield de novo sequences of average length 70 AA and as long as 200 AA at up to 99% sequencing accuracy.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, United States
| | | | | | | |
Collapse
|
99
|
Xiao CL, Chen XZ, Du YL, Li ZF, Wei L, Zhang G, He QY. Dispec: a novel peptide scoring algorithm based on peptide matching discriminability. PLoS One 2013; 8:e62724. [PMID: 23675420 PMCID: PMC3652849 DOI: 10.1371/journal.pone.0062724] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 03/25/2013] [Indexed: 11/20/2022] Open
Abstract
Identifying peptides from the fragmentation spectra is a fundamental step in mass spectrometry (MS) data processing. The significance (discriminability) of every peak varies, providing additional information for potentially enhancing the identification sensitivity and the correct match rate. However this important information was not considered in previous algorithms. Here we presented a novel method based on Peptide Matching Discriminability (PMD), in which the PMD information of every peak reflects the discriminability of candidate peptides. In addition, we developed a novel peptide scoring algorithm Dispec based on PMD, by taking three aspects of discriminability into consideration: PMD, intensity discriminability and m/z error discriminability. Compared with Mascot and Sequest, Dispec identified remarkably more peptides from three experimental datasets with the same confidence at 1% PSM-level FDR. Dispec is also robust and versatile for various datasets obtained on different instruments. The concept of discriminability enhances the peptide identification and thus may contribute largely to the proteome studies. As an open-source program, Dispec is freely available at http://bioinformatics.jnu.edu.cn/software/dispec/.
Collapse
Affiliation(s)
- Chuan-Le Xiao
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
| | - Xiao-Zhou Chen
- School of Mathematics and Computer Science, Yunnan University of Nationalities, Kunming, China
| | - Yang-Li Du
- School of Mathematics and Computer Science, Yunnan University of Nationalities, Kunming, China
| | - Zhe-Fu Li
- Jinan University Network and Educational Technology Center, Guangzhou, China
| | - Li Wei
- School of Mathematics and Computer Science, Yunnan University of Nationalities, Kunming, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
- * E-mail: (QYH); (GZ)
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University, Guangzhou, China
- * E-mail: (QYH); (GZ)
| |
Collapse
|
100
|
An M, Zou X, Wang Q, Zhao X, Wu J, Xu LM, Shen HY, Xiao X, He D, Ji J. High-confidence de novo peptide sequencing using positive charge derivatization and tandem MS spectra merging. Anal Chem 2013; 85:4530-7. [PMID: 23536960 DOI: 10.1021/ac4001699] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
De novo peptide sequencing holds great promise in discovering new protein sequences and modifications but has often been hindered by low success rate of mass spectra interpretation, mainly due to the diversity of fragment ion types and insufficient information for each ion series. Here, we describe a novel methodology that combines highly efficient on-tip charge derivatization and tandem MS spectra merging, which greatly boosts the performance of interpretation. TMPP-Ac-OSu (succinimidyloxycarbonylmethyl tris(2,4,6-trimethoxyphenyl)phosphonium bromide) was used to derivatize peptides at N-termini on tips to reduce mass spectra complexity. Then, a novel approach of spectra merging was adopted to combine the benefits of collision-induced dissociation (CID) and electron transfer dissociation (ETD) fragmentation. We applied this methodology to rat C6 glioma cells and the Cyprinus carpio and searched the resulting peptide sequences against the protein database. Then, we achieved thousands of high-confidence peptide sequences, a level that conventional de novo sequencing methods could not reach. Next, we identified dozens of novel peptide sequences by homology searching of sequences that were fully backbone covered but unmatched during the database search. Furthermore, we randomly chose 34 sequences discovered in rat C6 cells and verified them. Finally, we conclude that this novel methodology that combines on-tip positive charge derivatization and tandem MS spectra merging will greatly facilitate the discovery of novel proteins and the proteome analysis of nonmodel organisms.
Collapse
Affiliation(s)
- Mingrui An
- State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing 100871, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|