1
|
Ng CCA, Zhou Y, Yao ZP. Algorithms for de-novo sequencing of peptides by tandem mass spectrometry: A review. Anal Chim Acta 2023; 1268:341330. [PMID: 37268337 DOI: 10.1016/j.aca.2023.341330] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 06/04/2023]
Abstract
Peptide sequencing is of great significance to fundamental and applied research in the fields such as chemical, biological, medicinal and pharmaceutical sciences. With the rapid development of mass spectrometry and sequencing algorithms, de-novo peptide sequencing using tandem mass spectrometry (MS/MS) has become the main method for determining amino acid sequences of novel and unknown peptides. Advanced algorithms allow the amino acid sequence information to be accurately obtained from MS/MS spectra in short time. In this review, algorithms from exhaustive search to the state-of-art machine learning and neural network for high-throughput and automated de-novo sequencing are introduced and compared. Impacts of datasets on algorithm performance are highlighted. The current limitations and promising direction of de-novo peptide sequencing are also discussed in this review.
Collapse
Affiliation(s)
- Cheuk Chi A Ng
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Yin Zhou
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Zhong-Ping Yao
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China.
| |
Collapse
|
2
|
The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms. Comput Struct Biotechnol J 2022; 20:1402-1412. [PMID: 35386104 PMCID: PMC8956878 DOI: 10.1016/j.csbj.2022.03.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 03/09/2022] [Accepted: 03/09/2022] [Indexed: 01/24/2023] Open
Abstract
Most correct de novo peptides have ⩽1 missing fragmentation cleavages. DeepNovo outperforms Novor for peptide accuracy for both data types. Novor excels at amino acid recall when many fragmentation cleavages are missing. Deep learning allows DeepNovo to predict amino acids without adjacent peaks.
Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms’ correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms’ improvements and offer potential avenues to overcome current inherent data limitations.
Collapse
|
3
|
Dai J, Yu F, Zhou C, Yu W. Understanding the Limit of Open Search in the Identification of Peptides With Post-translational Modifications - A Simulation-Based Study. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2884-2890. [PMID: 32356758 DOI: 10.1109/tcbb.2020.2991207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Peptide identification from tandem mass spectrometry data is a fundamental task in computational proteomics. Traditional algorithms perform well when facing unmodified peptides. However, when peptides have post-translational modifications (PTMs), these methods cannot provide satisfactory results. Recently, open search methods have been proposed to identify peptides with PTMs. While the performance of these new methods is promising, the identification results vary greatly with respect to the quality of tandem mass spectra and the number of PTMs in peptides. This motivates us to systematically study the relationship between the performance of open search methods and the quality parameters of tandem mass spectrometry data as well as the number of PTMs in peptides. In this paper, we have proposed an analytical model derived from simulated data to describe the relationship between the probability of obtaining correct results and the spectrum quality as well as the number of PTMs. The proposed model is verified using 1,464,146 real experimental spectra. The consistent trend observed in both simulated data and real data reveals the necessary conditions to effectively apply open search methods. Source code of our study is available at http://bioinformatics.ust.hk/PST.html.
Collapse
|
4
|
Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00304-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
5
|
Fei Z, Wang K, Chi H. GameTag: A New Sequence Tag Generation Algorithm Based on Cooperative Game Theory. Proteomics 2020; 20:e2000021. [PMID: 32927502 DOI: 10.1002/pmic.202000021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 08/06/2020] [Indexed: 02/02/2023]
Abstract
Sequence tag-based peptide search is a critical technology in proteomics for the characterization of proteins from tandem mass spectrometry data. However, the main reason for hindering the full application of such an approach lies that accurately extracting sequence tags responsible for each experimental spectrum. Toward that end, GameTag, a novel cooperative game framework for sequence tag generation is proposed, which includes a tag generator and a tag discriminator to collaboratively generate sequence tags. Specifically, the tag generator works to extract as many correct tag candidates as possible and the tag discriminator serves to determine the correctness of tag candidates and reduce the total number of output tags simultaneously. Through the dynamic two-player game, the number of extracted tags is decreased while the number of correct tags gets boosted. The performance of the proposed method is also investigated under various hyperparameter and structure settings. Extensive experiments on a wide variety of data sets from different species demonstrate that GameTag outperforms previous state-of-the-art methods, InsPecT, PepNovo+, DirecTag, and the existing tag-extraction method in Open-pFind, increasing by at least 10% the number of spectra extracted more than one correct tag.
Collapse
Affiliation(s)
- Zhengcong Fei
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, No. 6 Zhongguancun South Road, Beijing, 100190, China.,University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Kaifei Wang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, No. 6 Zhongguancun South Road, Beijing, 100190, China.,University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, No. 6 Zhongguancun South Road, Beijing, 100190, China.,University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Shijingshan District, Beijing, 100049, China
| |
Collapse
|
6
|
Alber M, Buganza Tepole A, Cannon WR, De S, Dura-Bernal S, Garikipati K, Karniadakis G, Lytton WW, Perdikaris P, Petzold L, Kuhl E. Integrating machine learning and multiscale modeling-perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. NPJ Digit Med 2019; 2:115. [PMID: 31799423 PMCID: PMC6877584 DOI: 10.1038/s41746-019-0193-y] [Citation(s) in RCA: 160] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 11/01/2019] [Indexed: 12/12/2022] Open
Abstract
Fueled by breakthrough technology developments, the biological, biomedical, and behavioral sciences are now collecting more data than ever before. There is a critical need for time- and cost-efficient strategies to analyze and interpret these data to advance human health. The recent rise of machine learning as a powerful technique to integrate multimodality, multifidelity data, and reveal correlations between intertwined phenomena presents a special opportunity in this regard. However, machine learning alone ignores the fundamental laws of physics and can result in ill-posed problems or non-physical solutions. Multiscale modeling is a successful strategy to integrate multiscale, multiphysics data and uncover mechanisms that explain the emergence of function. However, multiscale modeling alone often fails to efficiently combine large datasets from different sources and different levels of resolution. Here we demonstrate that machine learning and multiscale modeling can naturally complement each other to create robust predictive models that integrate the underlying physics to manage ill-posed problems and explore massive design spaces. We review the current literature, highlight applications and opportunities, address open questions, and discuss potential challenges and limitations in four overarching topical areas: ordinary differential equations, partial differential equations, data-driven approaches, and theory-driven approaches. Towards these goals, we leverage expertise in applied mathematics, computer science, computational biology, biophysics, biomechanics, engineering mechanics, experimentation, and medicine. Our multidisciplinary perspective suggests that integrating machine learning and multiscale modeling can provide new insights into disease mechanisms, help identify new targets and treatment strategies, and inform decision making for the benefit of human health.
Collapse
Affiliation(s)
- Mark Alber
- Department of Mathematics, University of California, Riverside, CA USA
| | | | - William R. Cannon
- Computational Biology Group, Pacific Northwest National Laboratory, Richland, WA USA
| | - Suvranu De
- Department of Mechanical, Aerospace and Nuclear Engineering, Rensselaer Polytechnic Institute, Troy, NY USA
| | | | - Krishna Garikipati
- Departments of Mechanical Engineering and Mathematics, University of Michigan, Ann Arbor, MI USA
| | | | - William W. Lytton
- SUNY Downstate Medical Center and Kings County Hospital, Brooklyn, NY USA
| | - Paris Perdikaris
- Department of Mechanical Engineering, University of Pennsylvania, Philadelphia, PA USA
| | - Linda Petzold
- Department of Computer Science and Mechanical Engineering, University of California, Santa Barbara, CA USA
| | - Ellen Kuhl
- Departments of Mechanical Engineering and Bioengineering, Stanford University, Stanford, CA USA
| |
Collapse
|
7
|
A classification of liquid chromatography mass spectrometry techniques for evaluation of chemical composition and quality control of traditional medicines. J Chromatogr A 2019; 1609:460501. [PMID: 31515074 DOI: 10.1016/j.chroma.2019.460501] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Revised: 08/06/2019] [Accepted: 08/29/2019] [Indexed: 12/25/2022]
Abstract
Natural products (NPs) and traditional medicines (TMs) are used for treatment of various diseases and also to develop new drugs. However, identification of drug leads within the immense biodiversity of living organisms is a challenging task that requires considerable time, labor, and computational resources as well as the application of modern analytical instruments. LC-MS platforms are widely used for both drug discovery and quality control of TMs and food supplements. Moreover, a large dataset generated during LC-MS analysis contains valuable information that could be extracted and handled by means of various data mining and statistical tools. Novel sophisticated LC-MS based approaches are being introduced every year. Therefore, this review is prepared for the scientists specialized in pharmacognosy and analytical chemistry of NPs as well as working in related areas, in order to navigate them in the world of diverse LC-MS based techniques and strategies currently employed for NP discovery and dereplication, quality control, pattern recognition and sample comparison, and also in targeted and untargeted metabolomic studies. The suggested classification system includes the following LC-MS based procedures: elemental composition determination, isotopic fine structure analysis, mass defect filtering, de novo identification, clustering of the compounds in Molecular Networking (MN), diagnostic fragment ion (or neutral loss) filtering, manual dereplication using MS/MS data, database-assisted peak annotation, annotation of spectral trees, MS fingerprinting, feature extraction, bucketing of LC-MS data, peak profiling, predicted metabolite screening, targeted quantification of biomarkers, quantitative analysis of multi-component system, construction of chemical fingerprints, multi-targeted and untargeted metabolite profiling.
Collapse
|
8
|
Na S, Kim J, Paek E. MODplus: Robust and Unrestrictive Identification of Post-Translational Modifications Using Mass Spectrometry. Anal Chem 2019; 91:11324-11333. [PMID: 31365238 DOI: 10.1021/acs.analchem.9b02445] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Post-translational modifications regulate various cellular processes and are of great biological interest. Unrestrictive searches of mass spectrometry data enable the detection of any type of modification. Here we propose MODplus, which makes practical unrestrictive searches possible by allowing (1) hundreds of modifications, (2) multiple modifications per peptide, (3) the whole proteome database, and (4) any tolerant values in search parameters. The utility of MODplus was demonstrated in large human data sets of HEK293 cells and TMT-labeled phosphorylation enrichment. Notably, MODplus supports identifying different modification types at multiple sites and reports real chemical and biological modifications, as it has been very labor intensive to link unrestrictive search results to real modifications. We also confirmed the presence of Missing Precursor (MP) spectra that were not identifiable using targeted precursor masses. The MP spectra mostly resulted in identifications of wrong modifications and negatively affected the overall performance, often by as much as 10%. MODplus can rapidly recognize MP spectra and correct their identifications, resulting in increased identification rate up to 70% in the HEK293 data set as well as improved reliability.
Collapse
Affiliation(s)
- Seungjin Na
- Department of Computer Science , Hanyang University , Seoul 04763 , South Korea
| | - Jihyung Kim
- Department of Computer Science , Hanyang University , Seoul 04763 , South Korea
| | - Eunok Paek
- Department of Computer Science , Hanyang University , Seoul 04763 , South Korea
| |
Collapse
|
9
|
De novo glycan structural identification from mass spectra using tree merging strategy. Comput Biol Chem 2019; 80:217-224. [DOI: 10.1016/j.compbiolchem.2019.03.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Accepted: 03/23/2019] [Indexed: 11/19/2022]
|
10
|
Muth T, Renard BY. Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification? Brief Bioinform 2019; 19:954-970. [PMID: 28369237 DOI: 10.1093/bib/bbx033] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Indexed: 01/24/2023] Open
Abstract
While peptide identifications in mass spectrometry (MS)-based shotgun proteomics are mostly obtained using database search methods, high-resolution spectrum data from modern MS instruments nowadays offer the prospect of improving the performance of computational de novo peptide sequencing. The major benefit of de novo sequencing is that it does not require a reference database to deduce full-length or partial tag-based peptide sequences directly from experimental tandem mass spectrometry spectra. Although various algorithms have been developed for automated de novo sequencing, the prediction accuracy of proposed solutions has been rarely evaluated in independent benchmarking studies. The main objective of this work is to provide a detailed evaluation on the performance of de novo sequencing algorithms on high-resolution data. For this purpose, we processed four experimental data sets acquired from different instrument types from collision-induced dissociation and higher energy collisional dissociation (HCD) fragmentation mode using the software packages Novor, PEAKS and PepNovo. Moreover, the accuracy of these algorithms is also tested on ground truth data based on simulated spectra generated from peak intensity prediction software. We found that Novor shows the overall best performance compared with PEAKS and PepNovo with respect to the accuracy of correct full peptide, tag-based and single-residue predictions. In addition, the same tool outpaced the commercial competitor PEAKS in terms of running time speedup by factors of around 12-17. Despite around 35% prediction accuracy for complete peptide sequences on HCD data sets, taken as a whole, the evaluated algorithms perform moderately on experimental data but show a significantly better performance on simulated data (up to 84% accuracy). Further, we describe the most frequently occurring de novo sequencing errors and evaluate the influence of missing fragment ion peaks and spectral noise on the accuracy. Finally, we discuss the potential of de novo sequencing for now becoming more widely used in the field.
Collapse
Affiliation(s)
- Thilo Muth
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
11
|
Fomin E. A Simple Approach to the Reconstruction of a Set of Points from the Multiset of Pairwise Distances in n2 Steps for the Sequencing Problem: III. Noise Inputs for the Beltway Case. J Comput Biol 2019; 26:68-75. [DOI: 10.1089/cmb.2018.0078] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Affiliation(s)
- Eduard Fomin
- Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia
| |
Collapse
|
12
|
Frank Y, Hruz T, Tschager T, Venzin V. Improved de novo peptide sequencing using LC retention time information. Algorithms Mol Biol 2018; 13:14. [PMID: 30181767 PMCID: PMC6114869 DOI: 10.1186/s13015-018-0132-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 08/20/2018] [Indexed: 12/03/2022] Open
Abstract
Background Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. De novo peptide sequencing is the problem of reconstructing the amino acid sequences of a peptide from this measurement data. Past de novo sequencing algorithms solely consider the mass spectrum of the fragments for reconstructing a sequence. Results We propose to additionally exploit the information obtained from liquid chromatography. We study the problem of computing a sequence that is not only in accordance with the experimental mass spectrum, but also with the chromatographic retention time. We consider three models for predicting the retention time and develop algorithms for de novo sequencing for each model. Conclusions Based on an evaluation for two prediction models on experimental data from synthesized peptides we conclude that the identification rates are improved by exploiting the chromatographic information. In our evaluation, we compare our algorithms using the retention time information with algorithms using the same scoring model, but not the retention time.
Collapse
|
13
|
Muth T, Hartkopf F, Vaudel M, Renard BY. A Potential Golden Age to Come-Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics. Proteomics 2018; 18:e1700150. [PMID: 29968278 DOI: 10.1002/pmic.201700150] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/23/2018] [Indexed: 01/15/2023]
Abstract
In shotgun proteomics, peptide and protein identification is most commonly conducted using database search engines, the method of choice when reference protein sequences are available. Despite its widespread use the database-driven approach is limited, mainly because of its static search space. In contrast, de novo sequencing derives peptide sequence information in an unbiased manner, using only the fragment ion information from the tandem mass spectra. In recent years, with the improvements in MS instrumentation, various new methods have been proposed for de novo sequencing. This review article provides an overview of existing de novo sequencing algorithms and software tools ranging from peptide sequencing to sequence-to-protein mapping. Various use cases are described for which de novo sequencing was successfully applied. Finally, limitations of current methods are highlighted and new directions are discussed for a wider acceptance of de novo sequencing in the community.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020, Bergen, Norway.,Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, 5020, Bergen, Norway
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| |
Collapse
|
14
|
Affiliation(s)
- Ngoc Hieu Tran
- David R. Cheriton School of Computer Science; University of Waterloo; Waterloo, ON Canada
| | - Xianglilan Zhang
- David R. Cheriton School of Computer Science; University of Waterloo; Waterloo, ON Canada
- State Key Laboratory of Pathogen and Biosecurity; Beijing Institute of Microbiology and Epidemiology; Beijing P.R. China
| | - Ming Li
- David R. Cheriton School of Computer Science; University of Waterloo; Waterloo, ON Canada
| |
Collapse
|
15
|
Abstract
De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7-22.9% higher accuracy at the amino acid level and 38.1-64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5-100% coverage and 97.2-99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming.
Collapse
|
16
|
Hu H, Khatri K, Zaia J. Algorithms and design strategies towards automated glycoproteomics analysis. MASS SPECTROMETRY REVIEWS 2017; 36:475-498. [PMID: 26728195 PMCID: PMC4931994 DOI: 10.1002/mas.21487] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 11/30/2015] [Indexed: 05/09/2023]
Abstract
Glycoproteomics involves the study of glycosylation events on protein sequences ranging from purified proteins to whole proteome scales. Understanding these complex post-translational modification (PTM) events requires elucidation of the glycan moieties (monosaccharide sequences and glycosidic linkages between residues), protein sequences, as well as site-specific attachment of glycan moieties onto protein sequences, in a spatial and temporal manner in a variety of biological contexts. Compared with proteomics, bioinformatics for glycoproteomics is immature and many researchers still rely on tedious manual interpretation of glycoproteomics data. As sample preparation protocols and analysis techniques have matured, the number of publications on glycoproteomics and bioinformatics has increased substantially; however, the lack of consensus on tool development and code reuse limits the dissemination of bioinformatics tools because it requires significant effort to migrate a computational tool tailored for one method design to alternative methods. This review discusses algorithms and methods in glycoproteomics, and refers to the general proteomics field for potential solutions. It also introduces general strategies for tool integration and pipeline construction in order to better serve the glycoproteomics community. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:475-498, 2017.
Collapse
Affiliation(s)
- Han Hu
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
- Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA
| | - Kshitij Khatri
- Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA
| | - Joseph Zaia
- Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA
| |
Collapse
|
17
|
Tschager T, Rösch S, Gillet L, Widmayer P. A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses. Algorithms Mol Biol 2017; 12:12. [PMID: 28603547 PMCID: PMC5464308 DOI: 10.1186/s13015-017-0104-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 04/19/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the idealde novopeptide sequencing problem: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) de novo peptide sequencing problem therefore has a noisy input: a few of the prefix and suffix masses of the peptide are missing and a few other masses are given in addition. For this setting, we ask for an amino acid string that explains the given masses as accurately as possible. RESULTS Past approaches interpreted accuracy by searching for a string that explains as many masses as possible. We feel, however, that it is not only bad to not explain a mass that appears, but also to explain a mass that does not appear. We propose to minimize the symmetric difference between the set of given masses and the set of masses that the string explains. For this new optimization problem, we propose an efficient algorithm that computes both the best and the k best solutions. Proof-of-concept experiments on measurements of synthesized peptides show that our approach leads to better results compared to finding a string that explains as many given masses as possible. CONCLUSIONS We conclude that considering the symmetric difference as optimization goal can improve the identification rates for de novo peptide sequencing. A preliminary version of this work has been presented at WABI 2016.
Collapse
|
18
|
Guan X, Brownstein NC, Young NL, Marshall AG. Ultrahigh-resolution Fourier transform ion cyclotron resonance mass spectrometry and tandem mass spectrometry for peptide de novo amino acid sequencing for a seven-protein mixture by paired single-residue transposed Lys-N and Lys-C digestion. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2017; 31:207-217. [PMID: 27813191 DOI: 10.1002/rcm.7783] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Revised: 10/29/2016] [Accepted: 10/30/2016] [Indexed: 06/06/2023]
Abstract
RATIONALE Bottom-up tandem mass spectrometry (MS/MS) is regularly used in proteomics to identify proteins from a sequence database. De novo sequencing is also available for sequencing peptides with relatively short sequence lengths. We recently showed that paired Lys-C and Lys-N proteases produce peptides of identical mass and similar retention time, but different tandem mass spectra. Such parallel experiments provide complementary information, and allow for up to 100% MS/MS sequence coverage. METHODS Here, we report digestion by paired Lys-C and Lys-N proteases of a seven-protein mixture: human hemoglobin alpha, bovine carbonic anhydrase 2, horse skeletal muscle myoglobin, hen egg white lysozyme, bovine pancreatic ribonuclease, bovine rhodanese, and bovine serum albumin, followed by reversed-phase nanoflow liquid chromatography, collision-induced dissociation, and 14.5 T Fourier transform ion cyclotron resonance mass spectrometry. RESULTS Matched pairs of product peptide ions of equal precursor mass and similar retention times from each digestion are compared, leveraging single-residue transposed information with independent interferences to confidently identify fragment ion types, residues, and peptides. Selected pairs of product ion mass spectra for de novo sequenced protein segments from each member of the mixture are presented. CONCLUSIONS Pairs of the transposed product ions as well as complementary information from the parallel experiments allow for both high MS/MS coverage for long peptide sequences and high confidence in the amino acid identification. Moreover, the parallel experiments in the de novo sequencing reduce false-positive matches of product ions from the single-residue transposed peptides from the same segment, and thereby further improve the confidence in protein identification. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Xiaoyan Guan
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Florida State University, 1800 East Paul Dirac Drive, Tallahassee, FL, 32310, USA
| | - Naomi C Brownstein
- Department of Behavioral Sciences and Social Medicine, College of Medicine, Florida State University, 1115 W. Call St., Tallahassee, FL, 32306, USA
- Department of Statistics, Florida State University, 117 N. Woodward Ave., Tallahassee, FL, 32306, USA
| | - Nicolas L Young
- Verna & Marrs McLean Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, MS-125, Houston, TX, 77030-3411, USA
| | - Alan G Marshall
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Florida State University, 1800 East Paul Dirac Drive, Tallahassee, FL, 32310, USA
- Department of Chemistry and Biochemistry, Florida State University, 95 Chieftain Way, Tallahassee, FL, 32303, USA
| |
Collapse
|
19
|
Yang H, Chi H, Zhou WJ, Zeng WF, He K, Liu C, Sun RX, He SM. Open-pNovo: De Novo Peptide Sequencing with Thousands of Protein Modifications. J Proteome Res 2017; 16:645-654. [PMID: 28019094 DOI: 10.1021/acs.jproteome.6b00716] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
De novo peptide sequencing has improved remarkably, but sequencing full-length peptides with unexpected modifications is still a challenging problem. Here we present an open de novo sequencing tool, Open-pNovo, for de novo sequencing of peptides with arbitrary types of modifications. Although the search space increases by ∼300 times, Open-pNovo is close to or even ∼10-times faster than the other three proposed algorithms. Furthermore, considering top-1 candidates on three MS/MS data sets, Open-pNovo can recall over 90% of the results obtained by any one traditional algorithm and report 5-87% more peptides, including 14-250% more modified peptides. On a high-quality simulated data set, ∼85% peptides with arbitrary modifications can be recalled by Open-pNovo, while hardly any results can be recalled by others. In summary, Open-pNovo is an excellent tool for open de novo sequencing and has great potential for discovering unexpected modifications in the real biological applications.
Collapse
Affiliation(s)
- Hao Yang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Chi
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China
| | - Wen-Jing Zhou
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wen-Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kun He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chao Liu
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China
| | - Si-Min He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, Chinese Academy of Sciences , Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
20
|
Abstract
Protein identification from tandem mass spectra is one of the most versatile and widely used proteomics workflows, able to identify proteins, characterize post-translational modifications, and provide semiquantitative measurements of relative protein abundance. This manuscript describes the concepts, prerequisites, and methods required to analyze a tandem mass spectrometry dataset in order to identify its proteins, by using a tandem mass spectrometry search engine to search protein sequence databases. The discussion includes instructions for extraction, preparation, and formatting of spectral datafiles, selection of appropriate search parameter settings, and basic interpretation of the results.
Collapse
Affiliation(s)
- Nathan J Edwards
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, 20007, USA.
| |
Collapse
|
21
|
Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016. [PMID: 27975219 DOI: 10.1007/978-3-319-41448-5_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.
Collapse
|
22
|
Fomin E. A Simple Approach to the Reconstruction of a Set of Points from the Multiset of n2 Pairwise Distances in n2 Steps for the Sequencing Problem: II. Algorithm. J Comput Biol 2016; 23:934-942. [DOI: 10.1089/cmb.2016.0046] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Eduard Fomin
- Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia
| |
Collapse
|
23
|
Ma B. De novo Peptide Sequencing. PROTEOME INFORMATICS 2016:15-38. [DOI: 10.1039/9781782626732-00015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
De novo peptide sequencing refers to the process of determining a peptide’s amino acid sequence from its MS/MS spectrum alone. The principle of this process is fairly straightforward: a high-quality spectrum may present a ladder of fragment ion peaks. The mass difference between every two adjacent peaks in the ladder is used to determine a residue of the peptide. However, most practical spectra do not have sufficient quality to support this straightforward process. Therefore, research in de novo sequencing has largely been a battle against the errors in the data. This chapter reviews some of the major developments in this field. The chapter starts with a quick review of the history in Section 1. Then manual de novo sequencing is examined in Section 2. Section 3 introduces a few commonly used de novo sequencing algorithms. An important aspect of automated de novo sequencing software is a good scoring function that serves as the optimization goal of the algorithm. Thus, Section 4 is devoted for the methods to define good scoring functions. Section 5 reviews a list of relevant software. The chapter concludes with a discussion of the applications and limitations of de novosequencing in Section 6.
Collapse
Affiliation(s)
- Bin Ma
- School of Computer Science, University of Waterloo Canada
| |
Collapse
|
24
|
Abstract
In computational proteomics, the identification of peptides with an unlimited number of post-translational modification (PTM) types is a challenging task. The computational cost associated with database search increases exponentially with respect to the number of modified amino acids and linearly with respect to the number of potential PTM types at each amino acid. The problem becomes intractable very quickly if we want to enumerate all possible PTM patterns. To address this issue, one group of methods named restricted tools (including Mascot, Comet, and MS-GF+) only allow a small number of PTM types in database search process. Alternatively, the other group of methods named unrestricted tools (including MS-Alignment, ProteinProspector, and MODa) avoids enumerating PTM patterns with an alignment-based approach to localizing and characterizing modified amino acids. However, because of the large search space and PTM localization issue, the sensitivity of these unrestricted tools is low. This paper proposes a novel method named PIPI to achieve PTM-invariant peptide identification. PIPI belongs to the category of unrestricted tools. It first codes peptide sequences into Boolean vectors and codes experimental spectra into real-valued vectors. For each coded spectrum, it then searches the coded sequence database to find the top scored peptide sequences as candidates. After that, PIPI uses dynamic programming to localize and characterize modified amino acids in each candidate. We used simulation experiments and real data experiments to evaluate the performance in comparison with restricted tools (i.e., Mascot, Comet, and MS-GF+) and unrestricted tools (i.e., Mascot with error tolerant search, MS-Alignment, ProteinProspector, and MODa). Comparison with restricted tools shows that PIPI has a close sensitivity and running speed. Comparison with unrestricted tools shows that PIPI has the highest sensitivity except for Mascot with error tolerant search and ProteinProspector. These two tools simplify the task by only considering up to one modified amino acid in each peptide, which results in a higher sensitivity but has difficulty in dealing with multiple modified amino acids. The simulation experiments also show that PIPI has the lowest false discovery proportion, the highest PTM characterization accuracy, and the shortest running time among the unrestricted tools.
Collapse
Affiliation(s)
- Fengchao Yu
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology , Hong Kong, China
| | - Ning Li
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology , Hong Kong, China.,Division of Life Science, The Hong Kong University of Science and Technology , Hong Kong, China
| | - Weichuan Yu
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology , Hong Kong, China.,Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology , Hong Kong, China
| |
Collapse
|
25
|
Gorshkov V, Hotta SYK, Verano-Braga T, Kjeldsen F. Peptide de novo sequencing of mixture tandem mass spectra. Proteomics 2016; 16:2470-9. [PMID: 27329701 PMCID: PMC5297990 DOI: 10.1002/pmic.201500549] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2015] [Revised: 04/27/2016] [Accepted: 06/17/2016] [Indexed: 02/02/2023]
Abstract
The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co‐isolation and thus prone to false identifications. The deconvolution approach matched complementary b‐, y‐ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co‐isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20–35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications.
Collapse
Affiliation(s)
- Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark.
| | | | - Thiago Verano-Braga
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark.,Department of Physiology and Biophysics, Federal University of Minas Gerais Belo Horizonte - MG, Belo Horizonte, Brazil
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark
| |
Collapse
|
26
|
Computational Methods in Mass Spectrometry-Based Proteomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 939:63-89. [PMID: 27807744 DOI: 10.1007/978-981-10-1503-8_4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This chapter introduces computational methods used in mass spectrometry-based proteomics, including those for addressing the critical problems such as peptide identification and protein inference, peptide and protein quantification, characterization of posttranslational modifications (PTMs), and data-independent acquisitions (DIA). The chapter concludes with emerging applications of proteomic techniques, such as metaproteomics, glycoproteomics, and proteogenomics.
Collapse
|
27
|
Ma B. Peptide De Novo Sequencing with MS/MS. ENCYCLOPEDIA OF ALGORITHMS 2016:1545-1547. [DOI: 10.1007/978-1-4939-2864-4_286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
28
|
Pascal BD, West GM, Scharager-Tapia C, Flefil R, Moroni T, Martinez-Acedo P, Griffin PR, Carvalloza AC. Software Analysis of Uncorrelated MS1 Peaks for Discovery of Post-Translational Modifications. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:2133-2140. [PMID: 26265041 DOI: 10.1007/s13361-015-1229-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Revised: 06/29/2015] [Accepted: 06/30/2015] [Indexed: 06/04/2023]
Abstract
The goal in proteomics to identify all peptides in a complex mixture has been largely addressed using various LC MS/MS approaches, such as data dependent acquisition, SRM/MRM, and data independent acquisition instrumentation. Despite these developments, many peptides remain unsequenced, often due to low abundance, poor fragmentation patterns, or data analysis difficulties. Many of the unidentified peptides exhibit strong evidence in high resolution MS(1) data and are frequently post-translationally modified, playing a significant role in biological processes. Proteomics Workbench (PWB) software was developed to automate the detection and visualization of all possible peptides in MS(1) data, reveal candidate peptides not initially identified, and build inclusion lists for subsequent MS(2) analysis to uncover new identifications. We used this software on existing data on the autophagy regulating kinase Ulk1 as a proof of concept for this method, as we had already manually identified a number of phosphorylation sites Dorsey, F. C. et al (J. Proteome. Res. 8(11), 5253-5263 (2009)). PWB found all previously identified sites of phosphorylation. The software has been made freely available at http://www.proteomicsworkbench.com . Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Bruce D Pascal
- Informatics Core, The Scripps Research Institute, Jupiter, FL, 33458, USA.
| | - Graham M West
- Proteomics Core, The Scripps Research Institute, Jupiter, FL, 33458, USA
| | | | - Ricardo Flefil
- Proteomics Core, The Scripps Research Institute, Jupiter, FL, 33458, USA
| | - Tina Moroni
- Proteomics Core, The Scripps Research Institute, Jupiter, FL, 33458, USA
| | | | - Patrick R Griffin
- Department of Molecular Therapeutics, The Scripps Research Institute, Jupiter, FL, 33458, USA
| | | |
Collapse
|
29
|
Xu T, Park SK, Venable JD, Wohlschlegel JA, Diedrich JK, Cociorva D, Lu B, Liao L, Hewel J, Han X, Wong CCL, Fonslow B, Delahunty C, Gao Y, Shah H, Yates JR. ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J Proteomics 2015; 129:16-24. [PMID: 26171723 DOI: 10.1016/j.jprot.2015.07.001] [Citation(s) in RCA: 349] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 06/08/2015] [Accepted: 07/04/2015] [Indexed: 12/25/2022]
Abstract
ProLuCID, a new algorithm for peptide identification using tandem mass spectrometry and protein sequence databases has been developed. This algorithm uses a three tier scoring scheme. First, a binomial probability is used as a preliminary scoring scheme to select candidate peptides. The binomial probability scores generated by ProLuCID minimize molecular weight bias and are independent of database size. A modified cross-correlation score is calculated for each candidate peptide identified by the binomial probability. This cross-correlation scoring function models the isotopic distributions of fragment ions of candidate peptides which ultimately results in higher sensitivity and specificity than that obtained with the SEQUEST XCorr. Finally, ProLuCID uses the distribution of XCorr values for all of the selected candidate peptides to compute a Z score for the peptide hit with the highest XCorr. The ProLuCID Z score combines the discriminative power of XCorr and DeltaCN, the standard parameters for assessing the quality of the peptide identification using SEQUEST, and displays significant improvement in specificity over ProLuCID XCorr alone. ProLuCID is also able to take advantage of high resolution MS/MS spectra leading to further improvements in specificity when compared to low resolution tandem MS data. A comparison of filtered data searched with SEQUEST and ProLuCID using the same false discovery rate as estimated by a target-decoy database strategy, shows that ProLuCID was able to identify as many as 25% more proteins than SEQUEST. ProLuCID is implemented in Java and can be easily installed on a single computer or a computer cluster. This article is part of a Special Issue entitled: Computational Proteomics.
Collapse
Affiliation(s)
- T Xu
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA; Dow AgroSciences LLC, Indianapolis, IN 46268, USA
| | - S K Park
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - J D Venable
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - J A Wohlschlegel
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - J K Diedrich
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - D Cociorva
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - B Lu
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - L Liao
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - J Hewel
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - X Han
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - C C L Wong
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - B Fonslow
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - C Delahunty
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - Y Gao
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - H Shah
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA
| | - J R Yates
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR11, La Jolla, CA 92037, USA.
| |
Collapse
|
30
|
Muhialdin BJ, Hassan Z, Abu Bakar F, Algboory HL, Saari N. Novel Antifungal Peptides Produced by Leuconostoc mesenteroides DU15 Effectively Inhibit Growth of Aspergillus niger. J Food Sci 2015; 80:M1026-30. [PMID: 25847317 DOI: 10.1111/1750-3841.12844] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2014] [Accepted: 01/28/2015] [Indexed: 11/30/2022]
Abstract
The ability of Leuconostoc mesenteroides DU15 to produce antifungal peptides that inhibit growth of Aspergillus niger was evaluated under optimum growth conditions of 30 °C for 48 h. The cell-free supernatant showed inhibitory activity against A. niger. Five novel peptides were isolated with the sequences GPFPL, YVPLF, LLHGVPLP, GPFPLEMTLGPT, and TVYPFPGPL as identified by de novo sequencing using PEAKS 6 software. Peptide LLHGVPLP was the only positively charged (cationic peptides) and peptide GPFPLEMTLGPT negatively charged (anionic), whereas the rest are neutral. The identified peptides had high hydrophobicity ratio and low molecular weights with amino acids sequences ranging from 5 to 12 residues. The mode of action of these peptides is observed under the scanning electron microscope and is due to cell lysis of fungi. This work reveals the potential of peptides from L. mesenteroides DU15 as natural antifungal preservatives in inhibiting the growth of A. niger that is implicated to the spoilage during storage.
Collapse
Affiliation(s)
- Belal J Muhialdin
- Faculty of Food Science and Technology, Univ. Putra Malaysia, Serdang, Selangor, 43400, Malaysia
| | - Zaiton Hassan
- Faculty of Science and Technology, Univ. Sains Islam Malaysia, Bandar BaruNilai, 71800, Nilai, Negeri Sembilan, Malaysia
| | - Fatimah Abu Bakar
- Faculty of Food Science and Technology, Univ. Putra Malaysia, Serdang, Selangor, 43400, Malaysia
| | | | - Nazamid Saari
- Faculty of Food Science and Technology, Univ. Putra Malaysia, Serdang, Selangor, 43400, Malaysia
| |
Collapse
|
31
|
Song Y, Chi AY. Peptide sequencing via graph path decomposition. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2015.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
32
|
|
33
|
Ma B. Peptide De Novo Sequencing with MS/MS. ENCYCLOPEDIA OF ALGORITHMS 2015:1-4. [DOI: 10.1007/978-3-642-27848-8_286-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2015] [Accepted: 01/14/2015] [Indexed: 09/01/2023]
|
34
|
Biggar KK, Storey KB. New Approaches to Comparative and Animal Stress Biology Research in the Post-genomic Era: A Contextual Overview. Comput Struct Biotechnol J 2014; 11:138-46. [PMID: 25408848 PMCID: PMC4232569 DOI: 10.1016/j.csbj.2014.09.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Revised: 09/07/2014] [Accepted: 09/11/2014] [Indexed: 02/06/2023] Open
Abstract
Although much is known about the physiological responses of many environmental stresses in tolerant animals, studies evaluating the regulation of stress-induced mechanisms that regulate the transitions to and from this state are beginning to explore new and fascinating areas of molecular research. Current findings have developed a general, but refined, view of the important molecular pathways contributing to stress-survival. However, studies utilizing newly developed technologies that broadly focus on genomic and proteomic screening are beginning to identify many new targets for future study. This minireview will provide a contextual overview on the use of DNA/RNA sequencing, microRNA annotation and prediction software, protein structure and function prediction tools, as well as methods of high-throughput protein expression analysis. We will also use select examples to highlight the existing use of these technologies in stress biology research. Such tools can be used in comparative stress biology in the characterization of animal responses to environmental challenges. Although there are many areas of study left to be explored, research in comparative stress biology will always be continuing as new technologies allow the further analysis of cell function, and new paradigms in gene regulation and regulatory molecules (such as microRNAs) are continuing to be discovered. Building upon the findings of past research, while utilizing new technologies in the appropriate manner, future studies can be carried out in new and exciting areas still unexplored. Proper use of rapidly developing technologies will help to create a complete understanding of the animal stress response and survival mechanisms utilized by many diverse organisms.
Collapse
Affiliation(s)
| | - Kenneth B. Storey
- Institute of Biochemistry, Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, ON K1S 5B6, Canada
| |
Collapse
|
35
|
Yao J, Utsunomiya SI, Kajihara S, Tabata T, Aoshima K, Oda Y, Tanaka K. Peptide Peak Detection for Low Resolution MALDI-TOF Mass Spectrometry. Mass Spectrom (Tokyo) 2014; 3:A0030. [PMID: 26819872 PMCID: PMC4306743 DOI: 10.5702/massspectrometry.a0030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 07/04/2014] [Indexed: 12/28/2022] Open
Abstract
A new peak detection method has been developed for rapid selection of peptide and its fragment ion peaks for protein identification using tandem mass spectrometry. The algorithm applies classification of peak intensities present in the defined mass range to determine the noise level. A threshold is then given to select ion peaks according to the determined noise level in each mass range. This algorithm was initially designed for the peak detection of low resolution peptide mass spectra, such as matrix-assisted laser desorption/ionization Time-of-Flight (MALDI-TOF) mass spectra. But it can also be applied to other type of mass spectra. This method has demonstrated obtaining a good rate of number of real ions to noises for even poorly fragmented peptide spectra. The effect of using peak lists generated from this method produces improved protein scores in database search results. The reliability of the protein identifications is increased by finding more peptide identifications. This software tool is freely available at the Mass++ home page (http://www.first-ms3d.jp/english/achievement/software/).
Collapse
Affiliation(s)
- Jingwen Yao
- Koichi Tanaka Laboratory of Advanced Science and Technology, Shimadzu Corporation
| | - Shin-ichi Utsunomiya
- Koichi Tanaka Laboratory of Advanced Science and Technology, Shimadzu Corporation
| | - Shigeki Kajihara
- Koichi Tanaka Laboratory of Advanced Science and Technology, Shimadzu Corporation
| | - Tsuyoshi Tabata
- Biomarker & Personalized Medicine, Eisai Product Creation Systems
| | - Ken Aoshima
- Biomarker & Personalized Medicine, Eisai Product Creation Systems
| | - Yoshiya Oda
- Biomarker & Personalized Medicine, Eisai Product Creation Systems
| | - Koichi Tanaka
- Koichi Tanaka Laboratory of Advanced Science and Technology, Shimadzu Corporation
| |
Collapse
|
36
|
Titz B, Elamin A, Martin F, Schneider T, Dijon S, Ivanov NV, Hoeng J, Peitsch MC. Proteomics for systems toxicology. Comput Struct Biotechnol J 2014; 11:73-90. [PMID: 25379146 PMCID: PMC4212285 DOI: 10.1016/j.csbj.2014.08.004] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Current toxicology studies frequently lack measurements at molecular resolution to enable a more mechanism-based and predictive toxicological assessment. Recently, a systems toxicology assessment framework has been proposed, which combines conventional toxicological assessment strategies with system-wide measurement methods and computational analysis approaches from the field of systems biology. Proteomic measurements are an integral component of this integrative strategy because protein alterations closely mirror biological effects, such as biological stress responses or global tissue alterations. Here, we provide an overview of the technical foundations and highlight select applications of proteomics for systems toxicology studies. With a focus on mass spectrometry-based proteomics, we summarize the experimental methods for quantitative proteomics and describe the computational approaches used to derive biological/mechanistic insights from these datasets. To illustrate how proteomics has been successfully employed to address mechanistic questions in toxicology, we summarized several case studies. Overall, we provide the technical and conceptual foundation for the integration of proteomic measurements in a more comprehensive systems toxicology assessment framework. We conclude that, owing to the critical importance of protein-level measurements and recent technological advances, proteomics will be an integral part of integrative systems toxicology approaches in the future.
Collapse
|
37
|
Abstract
De novo sequencing is an important computational approach to determining the amino acid sequence of a peptide with tandem mass spectrometry (MS/MS). Most of the existing approaches use a graph model to describe a spectrum and the sequencing is performed by computing the longest antisymmetric path in the graph. The task is often computationally intensive since a given MS/MS spectrum often contains noisy data, missing mass peaks, or post translational modifications/mutations. This paper develops a new parameterized algorithm that can efficiently compute the longest antisymmetric partial path in an extended spectrum graph that is of bounded path width. Our testing results show that this algorithm can efficiently process experimental spectra and provide sequencing results of high accuracy.
Collapse
Affiliation(s)
- Yinglei Song
- School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| |
Collapse
|
38
|
Cristoni S, Bernardi LR. Bioinformatics in mass spectrometry data analysis for proteomics studies. Expert Rev Proteomics 2014; 1:469-83. [PMID: 15966842 DOI: 10.1586/14789450.1.4.469] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Mass spectrometry is a technique widely employed for the identification and characterization of proteins. The role of bioinformatics is fundamental for the elaboration of mass spectrometry data due to the amount of data that this technique can produce. To process data efficiently, new software packages and algorithms are continuously being developed to improve protein identification and characterization in terms of high-throughput and statistical accuracy. However, many limitations exist concerning bioinformatics spectral data elaboration. This review aims to critically cover the recent and future developments of new bioinformatics approaches in mass spectrometry data analysis for proteomics studies.
Collapse
Affiliation(s)
- Simone Cristoni
- Università degli Studi di Milano, Via Fratelli Cervi 93, 20090 Segrate, Milan, Italy.
| | | |
Collapse
|
39
|
Bruce C, Stone K, Gulcicek E, Williams K. Proteomics and the analysis of proteomic data: 2013 overview of current protein-profiling technologies. ACTA ACUST UNITED AC 2013; Chapter 13:13.21.1-13.21.17. [PMID: 23504934 DOI: 10.1002/0471250953.bi1321s41] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Mass spectrometry has become a major tool in the study of proteomes. The analysis of proteolytic peptides and their fragment ions by this technique enables the identification and quantitation of the precursor proteins in a mixture. However, deducing chemical structures and then protein sequences from mass-to-charge ratios is a challenging computational task. Software tools incorporating powerful algorithms and statistical methods improved our ability to process the large quantities of proteomics data. Repositories of spectral data make both data analysis and experimental design more efficient. New approaches in quantitative and statistical proteomics make possible a greater coverage of the proteome, the identification of more post-translational modifications, and a greater sensitivity in the quantitation of targeted proteins.
Collapse
Affiliation(s)
- Can Bruce
- W.M. Keck Foundation Biotechnology Resource Laboratory and Molecular Biochemistry and Biophysics Department, Yale University, New Haven, Connecticut, USA
| | | | | | | |
Collapse
|
40
|
HE LIN, HAN XI, MA BIN. DE NOVO SEQUENCING WITH LIMITED NUMBER OF POST-TRANSLATIONAL MODIFICATIONS PER PEPTIDE. J Bioinform Comput Biol 2013; 11:1350007. [DOI: 10.1142/s0219720013500078] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
De novo sequencing derives the peptide sequence from a tandem mass spectrum without the assistance of protein databases. This analysis has been indispensable for the identification of novel or modified peptides in a biological sample. Currently, the speed of de novo sequencing algorithms is not heavily affected by the number of post-translational modification (PTM) types in consideration. However, the accuracy of the algorithms can be degraded due to the increased search space. Most peptides in a proteomics research contain only a small number of PTMs per peptide, yet the types of PTMs can come from a large number of choices. Therefore, it is desirable to include a large number of PTM types in a de novo sequencing algorithm, yet to limit the number of PTM occurrences in each peptide to increase the accuracy. In this paper, we present an efficient de novo sequencing algorithm, DeNovoPTM, for such a purpose. The implemented software is downloadable from http://www.cs.uwaterloo.ca/~l22he/denovo_ptm .
Collapse
Affiliation(s)
- LIN HE
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1, Canada
| | - XI HAN
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1, Canada
| | - BIN MA
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1, Canada
| |
Collapse
|
41
|
Abstract
Motivation: Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but de novo peptide sequencing algorithms to analyze tandem mass (MS/MS) spectra are lagging behind. Although existing de novo sequencing tools perform well on certain types of spectra [e.g. Collision Induced Dissociation (CID) spectra of tryptic peptides], their performance often deteriorates on other types of spectra, such as Electron Transfer Dissociation (ETD), Higher-energy Collisional Dissociation (HCD) spectra or spectra of non-tryptic digests. Thus, rather than developing a new algorithm for each type of spectra, we develop a universal de novo sequencing algorithm called UniNovo that works well for all types of spectra or even for spectral pairs (e.g. CID/ETD spectral pairs). UniNovo uses an improved scoring function that captures the dependences between different ion types, where such dependencies are learned automatically using a modified offset frequency function. Results: The performance of UniNovo is compared with PepNovo+, PEAKS and pNovo using various types of spectra. The results show that the performance of UniNovo is superior to other tools for ETD spectra and superior or comparable with others for CID and HCD spectra. UniNovo also estimates the probability that each reported reconstruction is correct, using simple statistics that are readily obtained from a small training dataset. We demonstrate that the estimation is accurate for all tested types of spectra (including CID, HCD, ETD, CID/ETD and HCD/ETD spectra of trypsin, LysC or AspN digested peptides). Availability: UniNovo is implemented in JAVA and tested on Windows, Ubuntu and OS X machines. UniNovo is available at http://proteomics.ucsd.edu/Software/UniNovo.html along with the manual. Contact:kwj@ucsd.edu or ppevzner@ucsd.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kyowon Jeong
- Department of Electrical and Computer Engineering and Department of Computer Science and Engineering, University of California-San Diego, CA 92093, USA.
| | | | | |
Collapse
|
42
|
Scheubert K, Hufsky F, Böcker S. Computational mass spectrometry for small molecules. J Cheminform 2013; 5:12. [PMID: 23453222 PMCID: PMC3648359 DOI: 10.1186/1758-2946-5-12] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 02/01/2013] [Indexed: 12/29/2022] Open
Abstract
: The identification of small molecules from mass spectrometry (MS) data remains a major challenge in the interpretation of MS data. This review covers the computational aspects of identifying small molecules, from the identification of a compound searching a reference spectral library, to the structural elucidation of unknowns. In detail, we describe the basic principles and pitfalls of searching mass spectral reference libraries. Determining the molecular formula of the compound can serve as a basis for subsequent structural elucidation; consequently, we cover different methods for molecular formula identification, focussing on isotope pattern analysis. We then discuss automated methods to deal with mass spectra of compounds that are not present in spectral libraries, and provide an insight into de novo analysis of fragmentation spectra using fragmentation trees. In addition, this review shortly covers the reconstruction of metabolic networks using MS data. Finally, we list available software for different steps of the analysis pipeline.
Collapse
Affiliation(s)
- Kerstin Scheubert
- Chair of Bioinformatics, Friedrich Schiller University, Ernst-Abbe-Platz 2, Jena, Germany.
| | | | | |
Collapse
|
43
|
Van Riper SK, de Jong EP, Carlis JV, Griffin TJ. Mass Spectrometry-Based Proteomics: Basic Principles and Emerging Technologies and Directions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 990:1-35. [DOI: 10.1007/978-94-007-5896-4_1] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
44
|
Chi H, Chen H, He K, Wu L, Yang B, Sun RX, Liu J, Zeng WF, Song CQ, He SM, Dong MQ. pNovo+: De Novo Peptide Sequencing Using Complementary HCD and ETD Tandem Mass Spectra. J Proteome Res 2012; 12:615-25. [DOI: 10.1021/pr3006843] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haifeng Chen
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kun He
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Long Wu
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bing Yang
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Jianyun Liu
- Laboratory of Intelligent Recognition
and Image Processing, Beijing Key Laboratory of Digital Media, Beihang University, Beijing, 100191, China
| | - Wen-Feng Zeng
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chun-Qing Song
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Si-Min He
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| |
Collapse
|
45
|
CHONG KETFAH, LEONG HONWAI. TUTORIAL ON DE NOVO PEPTIDE SEQUENCING USING MS/MS MASS SPECTROMETRY. J Bioinform Comput Biol 2012; 10:1231002. [DOI: 10.1142/s0219720012310026] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
This paper is a self-contained introductory tutorial on the problem in proteomics known as peptide sequencing using tandem mass spectrometry. This tutorial deals specifically with de novo sequencing methods (as opposed to database search methods). We first give an introduction to peptide sequencing, its importance and history and some background on proteins. Next we show the relationship between a peptide and the final spectrum produced from a tandem mass spectrometer, together with a description of the various sources of complications that arise during the process of generating the mass spectrum. From there we model the computational problem of de novo peptide sequencing, which is basically the reverse problem of identifying the peptide which produced the spectrum. We then present several major approaches to solve it (including reviewing some of the current algorithms in each approach), and also discuss related problems and post-processing approaches.
Collapse
Affiliation(s)
- KET FAH CHONG
- Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore 117543, Singapore
| | - HON WAI LEONG
- Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore 117543, Singapore
| |
Collapse
|
46
|
Bhatia S, Kil YJ, Ueberheide B, Chait BT, Tayo L, Cruz L, Lu B, Yates JR, Bern M. Constrained de novo sequencing of conotoxins. J Proteome Res 2012; 11:4191-200. [PMID: 22709442 DOI: 10.1021/pr300312h] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
De novo peptide sequencing by mass spectrometry (MS) can determine the amino acid sequence of an unknown peptide without reference to a protein database. MS-based de novo sequencing assumes special importance in focused studies of families of biologically active peptides and proteins, such as hormones, toxins, and antibodies, for which amino acid sequences may be difficult to obtain through genomic methods. These protein families often exhibit sequence homology or characteristic amino acid content; yet, current de novo sequencing approaches do not take advantage of this prior knowledge and, hence, search an unnecessarily large space of possible sequences. Here, we describe an algorithm for de novo sequencing that incorporates sequence constraints into the core graph algorithm and thereby reduces the search space by many orders of magnitude. We demonstrate our algorithm in a study of cysteine-rich toxins from two cone snail species (Conus textile and Conus stercusmuscarum) and report 13 de novo and about 60 total toxins.
Collapse
Affiliation(s)
- Swapnil Bhatia
- Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, California 94304, United States
| | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Allmer J. Algorithms for the de novo sequencing of peptides from tandem mass spectra. Expert Rev Proteomics 2012; 8:645-57. [PMID: 21999834 DOI: 10.1586/epr.11.54] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field.
Collapse
Affiliation(s)
- Jens Allmer
- Molecular Biology and Genetics, Izmir Institute of Technology, Urla, Izmir 35430, Turkey.
| |
Collapse
|
48
|
Ma B, Johnson R. De novo sequencing and homology searching. Mol Cell Proteomics 2012; 11:O111.014902. [PMID: 22090170 PMCID: PMC3277775 DOI: 10.1074/mcp.o111.014902] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Revised: 11/08/2011] [Indexed: 11/06/2022] Open
Abstract
In proteomics, de novo sequencing is the process of deriving peptide sequences from tandem mass spectra without the assistance of a sequence database. Such analyses have traditionally been performed manually by human experts, and more recently by computer programs that have been developed because of the need for higher throughput. Although powerful, de novo sequencing often can only determine partially correct sequence tags because of imperfect tandem mass spectra. However, these sequence tags can then be searched in a sequence database to identify the exact or a homologous peptide. Homology searches are particularly useful for the study of organisms whose genomes have not been sequenced. This tutorial will present background important to understanding de novo sequencing, suggestions on how to do this manually, plus descriptions of computer algorithms used to automate this process and to subsequently carryout homology-based database searches. This Tutorial is part of the International Proteomics Tutorial Programme (IPTP 1).
Collapse
Affiliation(s)
- Bin Ma
- From the ‡School of Computer Science, University of Waterloo, 200 University Ave. W, Waterloo, ON, Canada N2L 3G1
| | | |
Collapse
|
49
|
Ranasinghe HS, Scheepens A, Sirimanne E, Mitchell MD, Williams CE, Fraser M. Inhibition of MMP-9 Activity following Hypoxic Ischemia in the Developing Brain Using a Highly Specific Inhibitor. Dev Neurosci 2012; 34:417-27. [DOI: 10.1159/000343257] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Accepted: 09/05/2012] [Indexed: 12/28/2022] Open
|
50
|
Yu C, Lin Y, Sun S, Cai J, Zhang J, Bu D, Zhang Z, Chen R. AN ITERATIVE ALGORITHM TO QUANTIFY FACTORS INFLUENCING PEPTIDE FRAGMENTATION DURING TANDEM MASS SPECTROMETRY. J Bioinform Comput Biol 2011; 5:297-311. [PMID: 17589963 DOI: 10.1142/s0219720007002643] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Revised: 01/02/2007] [Accepted: 01/22/2007] [Indexed: 11/18/2022]
Abstract
In protein identification by tandem mass spectrometry, it is critical to accurately predict the theoretical spectrum for a peptide sequence. To date, the widely-used database searching methods adopted simple statistical models for predicting. For some peptide, these models usually yield a theoretical spectrum with a significant deviation from the experimental one. In this paper, in order to derive an improved predicting model, we utilized a non-linear programming model to quantify the factors impacting peptide fragmentation. Then, an iterative algorithm was proposed to solve this optimization problem. Upon a training set of 1803 spectra, the experimental result showed a good agreement with some known principles about peptide fragmentation, such as the tendency to cleave at the middle of peptide, and Pro's preference of the N-terminal cleavage. Moreover, upon a testing set of 941 spectra, comparison of the predicted spectra against the experimental ones showed that this method can generate reasonable predictions. The results in this paper can offer help to both database searching and de novo methods.
Collapse
Affiliation(s)
- Chungong Yu
- Bioinformatics Lab, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China.
| | | | | | | | | | | | | | | |
Collapse
|