1
|
Schulte D, Snijder J. A Handle on Mass Coincidence Errors in De Novo Sequencing of Antibodies by Bottom-up Proteomics. J Proteome Res 2024; 23:3552-3559. [PMID: 38932690 DOI: 10.1021/acs.jproteome.4c00188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
Antibody sequences can be determined at 99% accuracy directly from the polypeptide product by using bottom-up proteomics techniques. Sequencing accuracy at the peptide level is limited by the isobaric residues leucine and isoleucine, incomplete fragmentation spectra in which the order of two or more residues remains ambiguous due to lacking fragment ions for the intermediate positions, and isobaric combinations of amino acids, of potentially different lengths, for example, GG = N and GA = Q. Here, we present several updates to Stitch (v1.5), which performs template-based assembly of de novo peptides to reconstruct antibody sequences. This version introduces a mass-based alignment algorithm that explicitly accounts for mass coincidence errors. In addition, it incorporates a postprocessing procedure to assign I/L residues based on secondary fragments (satellite ions, i.e., w-ions). Moreover, evidence for sequence assignments can now be directly evaluated with the addition of an integrated spectrum viewer. Lastly, input data from a wider selection of de novo peptide sequencing algorithms are allowed, now including Casanovo, PEAKS, Novor.Cloud, pNovo, and MaxNovo, in addition to flat text and FASTA. Combined, these changes make Stitch compatible with a larger range of data processing pipelines and improve its tolerance to peptide-level sequencing errors.
Collapse
Affiliation(s)
- Douwe Schulte
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, Utrecht 3584 CH, The Netherlands
| | - Joost Snijder
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, Utrecht 3584 CH, The Netherlands
| |
Collapse
|
2
|
Lou R, Shui W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
3
|
Lee S, Kim H. Bidirectional de novo peptide sequencing using a transformer model. PLoS Comput Biol 2024; 20:e1011892. [PMID: 38416757 PMCID: PMC10901305 DOI: 10.1371/journal.pcbi.1011892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 02/02/2024] [Indexed: 03/01/2024] Open
Abstract
In proteomics, a crucial aspect is to identify peptide sequences. De novo sequencing methods have been widely employed to identify peptide sequences, and numerous tools have been proposed over the past two decades. Recently, deep learning approaches have been introduced for de novo sequencing. Previous methods focused on encoding tandem mass spectra and predicting peptide sequences from the first amino acid onwards. However, when predicting peptides using tandem mass spectra, the peptide sequence can be predicted not only from the first amino acid but also from the last amino acid due to the coexistence of b-ion (or a- or c-ion) and y-ion (or x- or z-ion) fragments in the tandem mass spectra. Therefore, it is essential to predict peptide sequences bidirectionally. Our approach, called NovoB, utilizes a Transformer model to predict peptide sequences bidirectionally, starting with both the first and last amino acids. In comparison to Casanovo, our method achieved an improvement of the average peptide-level accuracy rate of approximately 9.8% across all species.
Collapse
Affiliation(s)
- Sangjeong Lee
- Center for Biomedical Computing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | - Hyunwoo Kim
- Center for Biomedical Computing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| |
Collapse
|
4
|
Yang T, Ling T, Sun B, Liang Z, Xu F, Huang X, Xie L, He Y, Li L, He F, Wang Y, Chang C. Introducing π-HelixNovo for practical large-scale de novo peptide sequencing. Brief Bioinform 2024; 25:bbae021. [PMID: 38340092 PMCID: PMC10858680 DOI: 10.1093/bib/bbae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/10/2024] [Accepted: 01/14/2024] [Indexed: 02/12/2024] Open
Abstract
De novo peptide sequencing is a promising approach for novel peptide discovery, highlighting the performance improvements for the state-of-the-art models. The quality of mass spectra often varies due to unexpected missing of certain ions, presenting a significant challenge in de novo peptide sequencing. Here, we use a novel concept of complementary spectra to enhance ion information of the experimental spectrum and demonstrate it through conceptual and practical analyses. Afterward, we design suitable encoders to encode the experimental spectrum and the corresponding complementary spectrum and propose a de novo sequencing model $\pi$-HelixNovo based on the Transformer architecture. We first demonstrated that $\pi$-HelixNovo outperforms other state-of-the-art models using a series of comparative experiments. Then, we utilized $\pi$-HelixNovo to de novo gut metaproteome peptides for the first time. The results show $\pi$-HelixNovo increases the identification coverage and accuracy of gut metaproteome and enhances the taxonomic resolution of gut metaproteome. We finally trained a powerful $\pi$-HelixNovo utilizing a larger training dataset, and as expected, $\pi$-HelixNovo achieves unprecedented performance, even for peptide-spectrum matches with never-before-seen peptide sequences. We also use the powerful $\pi$-HelixNovo to identify antibody peptides and multi-enzyme cleavage peptides, and $\pi$-HelixNovo is highly robust in these applications. Our results demonstrate the effectivity of the complementary spectrum and take a significant step forward in de novo peptide sequencing.
Collapse
Affiliation(s)
- Tingpeng Yang
- Peng Cheng Laboratory, Shenzhen, 518055, China
- Tsinghua Shenzhen International Graduate School, Shenzhen, 518055, China
| | - Tianze Ling
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Boyan Sun
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Zhendong Liang
- Peng Cheng Laboratory, Shenzhen, 518055, China
- Tsinghua Shenzhen International Graduate School, Shenzhen, 518055, China
| | - Fan Xu
- Peng Cheng Laboratory, Shenzhen, 518055, China
| | | | - Linhai Xie
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Yonghong He
- Peng Cheng Laboratory, Shenzhen, 518055, China
- Tsinghua Shenzhen International Graduate School, Shenzhen, 518055, China
| | - Leyuan Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Fuchu He
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- Research Unit of Proteomics Driven Cancer Precision Medicine, Chinese Academy of Medical Sciences, Beijing 102206, China
| | - Yu Wang
- Peng Cheng Laboratory, Shenzhen, 518055, China
| | - Cheng Chang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- Research Unit of Proteomics Driven Cancer Precision Medicine, Chinese Academy of Medical Sciences, Beijing 102206, China
| |
Collapse
|
5
|
Klaproth-Andrade D, Hingerl J, Bruns Y, Smith NH, Träuble J, Wilhelm M, Gagneur J. Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing. Nat Commun 2024; 15:151. [PMID: 38167372 PMCID: PMC10762064 DOI: 10.1038/s41467-023-44323-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
Unlike for DNA and RNA, accurate and high-throughput sequencing methods for proteins are lacking, hindering the utility of proteomics in applications where the sequences are unknown including variant calling, neoepitope identification, and metaproteomics. We introduce Spectralis, a de novo peptide sequencing method for tandem mass spectrometry. Spectralis leverages several innovations including a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposing fragment ion series classification as a pivotal task for de novo peptide sequencing, and a peptide-spectrum confidence score. On spectra for which database search provided a ground truth, Spectralis surpassed 40% sensitivity at 90% precision, nearly doubling state-of-the-art sensitivity. Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling. Altogether, these algorithmic innovations and the substantial sensitivity increase in the high-precision range constitute an important step toward broadly applicable peptide sequencing.
Collapse
Affiliation(s)
- Daniela Klaproth-Andrade
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Johannes Hingerl
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Yanik Bruns
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Nicholas H Smith
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Jakob Träuble
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Mathias Wilhelm
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
- Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, Freising, Germany.
| | - Julien Gagneur
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| |
Collapse
|
6
|
Tariq MU, Ebert S, Saeed F. Making MS Omics Data ML-Ready: SpeCollate Protocols. Methods Mol Biol 2024; 2836:135-155. [PMID: 38995540 DOI: 10.1007/978-1-0716-4007-4_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
The increasing complexity and volume of mass spectrometry (MS) data have presented new challenges and opportunities for proteomics data analysis and interpretation. In this chapter, we provide a comprehensive guide to transforming MS data for machine learning (ML) training, inference, and applications. The chapter is organized into three parts. The first part describes the data analysis needed for MS-based experiments and a general introduction to our deep learning model SpeCollate-which we will use throughout the chapter for illustration. The second part of the chapter explores the transformation of MS data for inference, providing a step-by-step guide for users to deduce peptides from their MS data. This section aims to bridge the gap between data acquisition and practical applications by detailing the necessary steps for data preparation and interpretation. In the final part, we present a demonstrative example of SpeCollate, a deep learning-based peptide database search engine that overcomes the problems of simplistic simulation of theoretical spectra and heuristic scoring functions for peptide-spectrum matches by generating joint embeddings for spectra and peptides. SpeCollate is a user-friendly tool with an intuitive command-line interface to perform the search, showcasing the effectiveness of the techniques and methodologies discussed in the earlier sections and highlighting the potential of machine learning in the context of mass spectrometry data analysis. By offering a comprehensive overview of data transformation, inference, and ML model applications for mass spectrometry, this chapter aims to empower researchers and practitioners in leveraging the power of machine learning to unlock novel insights and drive innovation in the field of mass spectrometry-based omics.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- Knight Foundation School of Computing and Information Sciences (KFSCIS), Florida International University (FIU), Miami, FL, USA
| | - Samuel Ebert
- Knight Foundation School of Computing and Information Sciences (KFSCIS), Florida International University (FIU), Miami, FL, USA
| | - Fahad Saeed
- Knight Foundation School of Computing and Information Sciences (KFSCIS), Florida International University (FIU), Miami, FL, USA.
| |
Collapse
|
7
|
Lei JT, Jaehnig EJ, Smith H, Holt MV, Li X, Anurag M, Ellis MJ, Mills GB, Zhang B, Labrie M. The Breast Cancer Proteome and Precision Oncology. Cold Spring Harb Perspect Med 2023; 13:a041323. [PMID: 37137501 PMCID: PMC10547392 DOI: 10.1101/cshperspect.a041323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
The goal of precision oncology is to translate the molecular features of cancer into predictive and prognostic tests that can be used to individualize treatment leading to improved outcomes and decreased toxicity. Success for this strategy in breast cancer is exemplified by efficacy of trastuzumab in tumors overexpressing ERBB2 and endocrine therapy for tumors that are estrogen receptor positive. However, other effective treatments, including chemotherapy, immune checkpoint inhibitors, and CDK4/6 inhibitors are not associated with strong predictive biomarkers. Proteomics promises another tier of information that, when added to genomic and transcriptomic features (proteogenomics), may create new opportunities to improve both treatment precision and therapeutic hypotheses. Here, we review both mass spectrometry-based and antibody-dependent proteomics as complementary approaches. We highlight how these methods have contributed toward a more complete understanding of breast cancer and describe the potential to guide diagnosis and treatment more accurately.
Collapse
Affiliation(s)
- Jonathan T Lei
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Eric J Jaehnig
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Hannah Smith
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon 97239, USA
| | - Matthew V Holt
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Xi Li
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon 97239, USA
| | - Meenakshi Anurag
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Matthew J Ellis
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Gordon B Mills
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon 97239, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center and Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Marilyne Labrie
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon 97239, USA
| |
Collapse
|
8
|
Ng CCA, Zhou Y, Yao ZP. Algorithms for de-novo sequencing of peptides by tandem mass spectrometry: A review. Anal Chim Acta 2023; 1268:341330. [PMID: 37268337 DOI: 10.1016/j.aca.2023.341330] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 06/04/2023]
Abstract
Peptide sequencing is of great significance to fundamental and applied research in the fields such as chemical, biological, medicinal and pharmaceutical sciences. With the rapid development of mass spectrometry and sequencing algorithms, de-novo peptide sequencing using tandem mass spectrometry (MS/MS) has become the main method for determining amino acid sequences of novel and unknown peptides. Advanced algorithms allow the amino acid sequence information to be accurately obtained from MS/MS spectra in short time. In this review, algorithms from exhaustive search to the state-of-art machine learning and neural network for high-throughput and automated de-novo sequencing are introduced and compared. Impacts of datasets on algorithm performance are highlighted. The current limitations and promising direction of de-novo peptide sequencing are also discussed in this review.
Collapse
Affiliation(s)
- Cheuk Chi A Ng
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Yin Zhou
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Zhong-Ping Yao
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China.
| |
Collapse
|
9
|
Ahn R, Cui Y, White FM. Antigen discovery for the development of cancer immunotherapy. Semin Immunol 2023; 66:101733. [PMID: 36841147 DOI: 10.1016/j.smim.2023.101733] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 02/25/2023]
Abstract
Central to successful cancer immunotherapy is effective T cell antitumor immunity. Multiple targeted immunotherapies engineered to invigorate T cell-driven antitumor immunity rely on identifying the repertoire of T cell antigens expressed on the tumor cell surface. Mass spectrometry-based survey of such antigens ("immunopeptidomics") combined with other omics platforms and computational algorithms has been instrumental in identifying and quantifying tumor-derived T cell antigens. In this review, we discuss the types of tumor antigens that have emerged for targeted cancer immunotherapy and the immunopeptidomics methods that are central in MHC peptide identification and quantification. We provide an overview of the strength and limitations of mass spectrometry-driven approaches and how they have been integrated with other technologies to discover targetable T cell antigens for cancer immunotherapy. We highlight some of the emerging cancer immunotherapies that successfully capitalized on immunopeptidomics, their challenges, and mass spectrometry-based strategies that can support their development.
Collapse
Affiliation(s)
- Ryuhjin Ahn
- David H. Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Yufei Cui
- David H. Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Forest M White
- David H. Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
10
|
Phetsanthad A, Vu NQ, Yu Q, Buchberger AR, Chen Z, Keller C, Li L. Recent advances in mass spectrometry analysis of neuropeptides. MASS SPECTROMETRY REVIEWS 2023; 42:706-750. [PMID: 34558119 PMCID: PMC9067165 DOI: 10.1002/mas.21734] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 08/22/2021] [Accepted: 08/28/2021] [Indexed: 05/08/2023]
Abstract
Due to their involvement in numerous biochemical pathways, neuropeptides have been the focus of many recent research studies. Unfortunately, classic analytical methods, such as western blots and enzyme-linked immunosorbent assays, are extremely limited in terms of global investigations, leading researchers to search for more advanced techniques capable of probing the entire neuropeptidome of an organism. With recent technological advances, mass spectrometry (MS) has provided methodology to gain global knowledge of a neuropeptidome on a spatial, temporal, and quantitative level. This review will cover key considerations for the analysis of neuropeptides by MS, including sample preparation strategies, instrumental advances for identification, structural characterization, and imaging; insightful functional studies; and newly developed absolute and relative quantitation strategies. While many discoveries have been made with MS, the methodology is still in its infancy. Many of the current challenges and areas that need development will also be highlighted in this review.
Collapse
Affiliation(s)
- Ashley Phetsanthad
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Nhu Q. Vu
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Qing Yu
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Amanda R. Buchberger
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Zhengwei Chen
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Caitlin Keller
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| |
Collapse
|
11
|
Ling XJ, Zhou YJ, Yang YS, Xu ZQ, Wang Y, Sun JL, Zhu Y, Wei JF. A new cysteine protease allergen from Ambrosia trifida pollen: proforms and mature forms. Mol Immunol 2022; 147:170-179. [DOI: 10.1016/j.molimm.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 05/05/2022] [Accepted: 05/09/2022] [Indexed: 10/18/2022]
|
12
|
A streamlined platform for analyzing tera-scale DDA and DIA mass spectrometry data enables highly sensitive immunopeptidomics. Nat Commun 2022; 13:3108. [PMID: 35672356 PMCID: PMC9174175 DOI: 10.1038/s41467-022-30867-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 05/20/2022] [Indexed: 12/21/2022] Open
Abstract
Integrating data-dependent acquisition (DDA) and data-independent acquisition (DIA) approaches can enable highly sensitive mass spectrometry, especially for imunnopeptidomics applications. Here we report a streamlined platform for both DDA and DIA data analysis. The platform integrates deep learning-based solutions of spectral library search, database search, and de novo sequencing under a unified framework, which not only boosts the sensitivity but also accurately controls the specificity of peptide identification. Our platform identifies 5-30% more peptide precursors than other state-of-the-art systems on multiple benchmark datasets. When evaluated on immunopeptidomics datasets, we identify 1.7-4.1 and 1.4-2.2 times more peptides from DDA and DIA data, respectively, than previously reported results. We also discover six T-cell epitopes from SARS-CoV-2 immunopeptidome that might represent potential targets for COVID-19 vaccine development. The platform supports data formats from all major instruments and is implemented with the distributed high-performance computing technology, allowing analysis of tera-scale datasets of thousands of samples for clinical applications. Immunopeptidomics benefits from highly sensitive mass spectrometry (MS). Here, the authors present a computational platform for integrating data-dependent and -independent acquisition MS approaches, and demonstrate its utility for deeper immunopeptidome profiling.
Collapse
|
13
|
Gadush MV, Sautto GA, Chandrasekaran H, Bensussan A, Ross TM, Ippolito GC, Person MD. Template-Assisted De Novo Sequencing of SARS-CoV-2 and Influenza Monoclonal Antibodies by Mass Spectrometry. J Proteome Res 2022; 21:1616-1627. [PMID: 35653804 DOI: 10.1021/acs.jproteome.1c00913] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this study, we used multiple enzyme digestions, coupled with higher-energy collisional dissociation (HCD) and electron-transfer/higher-energy collision dissociation (EThcD) fragmentation to develop a mass-spectrometric (MS) method for determining the complete protein sequence of monoclonal antibodies (mAbs). The method was refined on an mAb of a known sequence, a SARS-CoV-1 antireceptor binding domain (RBD) spike monoclonal antibody. The data were searched using Supernovo to generate a complete template-assisted de novo sequence for this and two SARS-CoV-2 mAbs of known sequences resulting in correct sequences for the variable regions and correct distinction of Ile and Leu residues. We then used the method on a set of 25 antihemagglutinin (HA) influenza antibodies of unknown sequences and determined high confidence sequences for >99% of the complementarity determining regions (CDRs). The heavy-chain and light-chain genes were cloned and transfected into cells for recombinant expression followed by affinity purification. The recombinant mAbs displayed binding curves matching the original mAbs with specificity to the HA influenza antigen. Our findings indicate that this methodology results in almost complete antibody sequence coverage with high confidence results for CDR regions on diverse mAb sequences.
Collapse
Affiliation(s)
- Michelle V Gadush
- Center for Biomedical Research Support, Biological Mass Spectrometry Facility, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Giuseppe A Sautto
- Center for Vaccines and Immunology, University of Georgia, Athens, Georgia 30602, United States
| | - Hamssika Chandrasekaran
- Center for Biomedical Research Support, Biological Mass Spectrometry Facility, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Alena Bensussan
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Ted M Ross
- Center for Vaccines and Immunology, University of Georgia, Athens, Georgia 30602, United States.,Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, Georgia 30602, United States
| | - Gregory C Ippolito
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Maria D Person
- Center for Biomedical Research Support, Biological Mass Spectrometry Facility, The University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
14
|
Suomi T, Elo LL. Statistical and machine learning methods to study human CD4+ T cell proteome profiles. Immunol Lett 2022; 245:8-17. [DOI: 10.1016/j.imlet.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/11/2022] [Accepted: 03/15/2022] [Indexed: 11/05/2022]
|
15
|
Abstract
Accurate full-length sequencing of a purified unknown protein is still challenging nowadays due to the error-prone mass-spectrometry (MS)-based methods. De novo identified peptide sequence largely contain errors, undermining the accuracy of assembly. Bias on the detectability of the peptides also makes low-coverage regions, resulting in gaps. Although recent advances on multi-enzyme hydrolysis and algorithms showed complete assembly of full-length protein sequences in a few examples, the robustness in practical application is still to be improved. Here, inspired by genome assembly strategies, we demonstrate a contig-scaffolding strategy to assemble protein sequences with high robustness and accuracy. This strategy integrates multiple unspecific hydrolysis methods to minimize the bias in the hydrolysis process. After de novo identification of the peptides, our assembly algorithm, named Multiple Contigs & Scaffolding (MuCS), assembles the peptide sequences in a multistep, i.e., contig-scaffold manner, with error correction in each step. MS data from different hydrolysis experiments complement each other for robust contig extension and error correction. We demonstrated that our strategy on three proteins and three replications all reached 100% coverage (except one with 98.85%) and 98.69-100% accuracy. It can also efficiently deal with the membrane protein, although the transmembrane region was missing due to the limitation of the MS. The three replicates reached 88.85-92.57% coverage and 97.57-100% accuracy. In sum, we provided a practical, robust, and accurate solution for full-length protein sequencing. The MuCS software is available at http://chi-biotech.com/mucs/.
Collapse
Affiliation(s)
- Zhi-Biao Mai
- Big Data Decision Institute, Jinan University, Guangzhou 510632, China.,Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Zhong-Hua Zhou
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| |
Collapse
|
16
|
Tran NH, Xu J, Li M. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction. Brief Bioinform 2022; 23:bbab493. [PMID: 34891158 PMCID: PMC8769896 DOI: 10.1093/bib/bbab493] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open
Abstract
In this article, we review two challenging computational questions in protein science: neoantigen prediction and protein structure prediction. Both topics have seen significant leaps forward by deep learning within the past five years, which immediately unlocked new developments of drugs and immunotherapies. We show that deep learning models offer unique advantages, such as representation learning and multi-layer architecture, which make them an ideal choice to leverage a huge amount of protein sequence and structure data to address those two problems. We also discuss the impact and future possibilities enabled by those two applications, especially how the data-driven approach by deep learning shall accelerate the progress towards personalized biomedicine.
Collapse
Affiliation(s)
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| | - Ming Li
- University of Waterloo, Canada
| |
Collapse
|
17
|
Xu ZQ, Zhu LX, Lu C, Jiao YX, Zhu DX, Guo M, Yang YS, Cao MD, Zhang LS, Tian M, Sun JL, Wei JF. Identification of Per a 13 as a novel allergen in American cockroach. Mol Immunol 2022; 143:41-49. [PMID: 35033813 DOI: 10.1016/j.molimm.2022.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 12/29/2021] [Accepted: 01/08/2022] [Indexed: 11/16/2022]
Abstract
BACKGROUND Cockroaches are an important source of indoor allergens. Environmental exposure to cockroach allergens is closely associated with the development of immunoglobulin E (IgE)-mediated allergic diseases. However, the allergenic components in the American cockroaches are not fully studied yet. In order to develop novel diagnostic and therapeutic strategies for cockroach allergy, it is necessary to comprehensively investigate this undescribed allergen in the American cockroach. METHODS The full-length cDNA of the potential allergen was isolated from the cDNA library of the American cockroach by PCR cloning. Both the recombinant and natural protein molecules were purified and characterized. The allergenicity was further analyzed by enzyme linked immunosorbent assay, immunoblot, and basophil activation test using sera from cockroach allergic patients. RESULTS A novel allergen belonging to glyceraldehyde 3-phosphate dehydrogenase (GAPDH) was firstly identified in the American cockroach and named as Per a 13. The cDNA of this allergen is 1255 base pairs in length and contains an open reading frame of 999 base pairs, encoding 332 amino acids. The purified Per a 13 was fully characterized and assessed to react with IgEs from 49.3 % of cockroach allergic patients, and patients with allergic rhinitis were more sensitized to it. Moreover, the allergenicity was further confirmed by immunoblot and basophil activation test. CONCLUSIONS We firstly identified GAPDH (Per a 13) in the American cockroach, which is a novel type of inhalant allergen derived from animal species. These findings could be useful in developing novel diagnostic and therapeutic strategies for cockroach allergy.
Collapse
Affiliation(s)
- Zhi-Qiang Xu
- Research Division of Clinical Pharmacology, the First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Li-Xiang Zhu
- Research Division of Clinical Pharmacology, the First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Chen Lu
- Precision Medicine Center, the First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| | - Yong-Xin Jiao
- Research Division of Clinical Pharmacology, the First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Dan-Xuan Zhu
- Clinical Allergy Center, the First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Miao Guo
- Research Division of Clinical Pharmacology, the First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yong-Shi Yang
- Department of Allergy, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China
| | - Meng-Da Cao
- Research Division of Clinical Pharmacology, the First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Li-Shan Zhang
- Department of Allergy, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China
| | - Man Tian
- Department of Respiratory Medicine, Children's Hospital of Nanjing Medical University, Nanjing, China.
| | - Jin-Lyu Sun
- Department of Allergy, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China.
| | - Ji-Fu Wei
- Research Division of Clinical Pharmacology, the First Affiliated Hospital of Nanjing Medical University, Nanjing, China; Department of Pharmacy, Jiangsu Cancer Hospital, The Affiliated Cancer Hospital of Nanjing Medical University, Jiangsu Institute of Cancer Research, Nanjing, China; Clinical Allergy Center, the First Affiliated Hospital of Nanjing Medical University, Nanjing, China; Department of Clinical Pharmacy, School of Pharmacy, Nanjing Medical University, Nanjing, China.
| |
Collapse
|
18
|
Zhu S, Yang C, Wu W. MSPoisDM: A Novel Peptide Identification Algorithm Optimized for Tandem Mass Spectra. BIO WEB OF CONFERENCES 2022. [DOI: 10.1051/bioconf/20225501003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Tandem mass spectrometry (MS/MS) plays an extremely important role in proteomics research. Thousands of spectra can be generated in modern experiments, how to interpret the LC-MS/MS is a challenging problem in tandem mass spectra analysis. Our peptide identification algorithm, MSPoisDM, is integrated the intensity information which produced by target-decoy statistics, although intensity information often undervalued. Furthermore, in order to combine the intensity information for better, we propose a novel concept scoring model which based on Poisson distribution. Compared with commonly used commercial software Mascot and Sequest at 1% FDR, the results show MSPoisDM is robust and versatile for various datasets which obtained from different instruments. We expect our algorithm MSPoisDM will be broadly applied in the proteomics studies.
Collapse
|
19
|
VDACs Post-Translational Modifications Discovery by Mass Spectrometry: Impact on Their Hub Function. Int J Mol Sci 2021; 22:ijms222312833. [PMID: 34884639 PMCID: PMC8657666 DOI: 10.3390/ijms222312833] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 11/12/2021] [Accepted: 11/23/2021] [Indexed: 12/23/2022] Open
Abstract
VDAC (voltage-dependent anion selective channel) proteins, also known as mitochondrial porins, are the most abundant proteins of the outer mitochondrial membrane (OMM), where they play a vital role in various cellular processes, in the regulation of metabolism, and in survival pathways. There is increasing consensus about their function as a cellular hub, connecting bioenergetics functions to the rest of the cell. The structural characterization of VDACs presents challenging issues due to their very high hydrophobicity, low solubility, the difficulty to separate them from other mitochondrial proteins of similar hydrophobicity and the practical impossibility to isolate each single isoform. Consequently, it is necessary to analyze them as components of a relatively complex mixture. Due to the experimental difficulties in their structural characterization, post-translational modifications (PTMs) of VDAC proteins represent a little explored field. Only in recent years, the increasing number of tools aimed at identifying and quantifying PTMs has allowed to increase our knowledge in this field and in the mechanisms that regulate functions and interactions of mitochondrial porins. In particular, the development of nano-reversed phase ultra-high performance liquid chromatography (nanoRP-UHPLC) and ultra-sensitive high-resolution mass spectrometry (HRMS) methods has played a key role in this field. The findings obtained on VDAC PTMs using such methodologies, which permitted an in-depth characterization of these very hydrophobic trans-membrane pore proteins, are summarized in this review.
Collapse
|
20
|
Tariq MU, Saeed F. SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions. PLoS One 2021; 16:e0259349. [PMID: 34714871 PMCID: PMC8555789 DOI: 10.1371/journal.pone.0259349] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 10/18/2021] [Indexed: 11/19/2022] Open
Abstract
Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic similarity-scoring functions are used to match an experimental spectrum to a theoretical spectrum. However, the heuristic nature of the scoring functions and the simple transformation of the peptides into theoretical spectra, along with noisy mass spectra for the less abundant peptides, can introduce a cascade of inaccuracies. In this paper, we design and implement a Deep Cross-Modal Similarity Network called SpeCollate, which overcomes these inaccuracies by learning the similarity function between experimental spectra and peptides directly from the labeled MS data. SpeCollate transforms spectra and peptides into a shared Euclidean subspace by learning fixed size embeddings for both. Our proposed deep-learning network trains on sextuplets of positive and negative examples coupled with our custom-designed SNAP-loss function. Online hardest negative mining is used to select the appropriate negative examples for optimal training performance. We use 4.8 million sextuplets obtained from the NIST and MassIVE peptide libraries to train the network and demonstrate that for closed search, SpeCollate is able to perform better than Crux and MSFragger in terms of the number of peptide-spectrum matches (PSMs) and unique peptides identified under 1% FDR for real-world data. SpeCollate also identifies a large number of peptides not reported by either Crux or MSFragger. To the best of our knowledge, our proposed SpeCollate is the first deep-learning network that can determine the cross-modal similarity between peptides and mass-spectra for MS-based proteomics. We believe SpeCollate is significant progress towards developing machine-learning solutions for MS-based omics data analysis. SpeCollate is available at https://deepspecs.github.io/.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing & Information Sciences, Florida International University, Miami, FL, United States of America
| | - Fahad Saeed
- School of Computing & Information Sciences, Florida International University, Miami, FL, United States of America
| |
Collapse
|
21
|
Dong L, Ariëns RM, America AH, Paul A, Veldkamp T, Mes JJ, Wichers HJ, Govers C. Clostridium perfringens suppressing activity in black soldier fly protein preparations. Lebensm Wiss Technol 2021. [DOI: 10.1016/j.lwt.2021.111806] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
22
|
Empirical Evaluation of the Use of Computational HLA Binding as an Early Filter to the Mass Spectrometry-Based Epitope Discovery Workflow. Cancers (Basel) 2021; 13:cancers13102307. [PMID: 34065814 PMCID: PMC8150281 DOI: 10.3390/cancers13102307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 05/06/2021] [Accepted: 05/06/2021] [Indexed: 12/22/2022] Open
Abstract
Immunopeptidomics is used to identify novel epitopes for (therapeutic) vaccination strategies in cancer and infectious disease. Various false discovery rates (FDRs) are applied in the field when converting liquid chromatography-tandem mass spectrometry (LC-MS/MS) spectra to peptides. Subsequently, large efforts have recently been made to rescue peptides of lower confidence. However, it remains unclear what the overall relation is between the FDR threshold and the percentage of obtained HLA-binders. We here directly evaluated the effect of varying FDR thresholds on the resulting immunopeptidomes of HLA-eluates from human cancer cell lines and primary hepatocyte isolates using HLA-binding algorithms. Additional peptides obtained using less stringent FDR-thresholds, although generally derived from poorer spectra, still contained a high amount of HLA-binders and confirmed recently developed tools that tap into this pool of otherwise ignored peptides. Most of these peptides were identified with improved confidence when cell input was increased, supporting the validity and potential of these identifications. Altogether, our data suggest that increasing the FDR threshold for peptide identification in conjunction with data filtering by HLA-binding prediction, is a valid and highly potent method to more efficient exhaustion of immunopeptidome datasets for epitope discovery and reveals the extent of peptides to be rescued by recently developed algorithms.
Collapse
|
23
|
Yang C, Shan YC, Zhang WJ, Dai ZP, Zhang LH, Zhang YK. Full-length Protein Sequencing Based on Continuous Digestion Using Non-specific Proteases. ACTA CHIMICA SINICA 2021. [DOI: 10.6023/a21010025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
24
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
25
|
Identification and dereplication of endophytic Colletotrichum strains by MALDI TOF mass spectrometry and molecular networking. Sci Rep 2020; 10:19788. [PMID: 33188275 PMCID: PMC7666161 DOI: 10.1038/s41598-020-74852-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 09/29/2020] [Indexed: 01/09/2023] Open
Abstract
The chemical diversity of biologically active fungal strains from 42 Colletotrichum, isolated from leaves of the tropical palm species Astrocaryum sciophilum collected in pristine forests of French Guiana, was investigated. The collection was first classified based on protein fingerprints acquired by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) correlated with cytotoxicity. Liquid chromatography coupled to high-resolution tandem mass spectrometry (LC-HRMS/MS) data from ethyl acetate extracts were acquired and processed to generate a massive molecular network (MN) using the MetGem software. From five Colletotrichum strains producing cytotoxic specialized metabolites, we predicted the occurrence of peptide and cytochalasin analogues in four of them by MN, including a similar ion clusters in the MN algorithm provided by MetGem software. Chemoinformatics predictions were fully confirmed after isolation of three pentacyclopeptides (cyclo(Phe-Leu-Leu-Leu-Val), cyclo(Phe-Leu-Leu-Leu-Leu) and cyclo(Phe-Leu-Leu-Leu-Ile)) and two cytochalasins (cytochalasin C and cytochalasin D) exhibiting cytotoxicity at the micromolar concentration. Finally, the chemical study of the last active cytotoxic strain BSNB-0583 led to the isolation of four colletamides bearing an identical decadienamide chain.
Collapse
|
26
|
Fei Z, Wang K, Chi H. GameTag: A New Sequence Tag Generation Algorithm Based on Cooperative Game Theory. Proteomics 2020; 20:e2000021. [PMID: 32927502 DOI: 10.1002/pmic.202000021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 08/06/2020] [Indexed: 02/02/2023]
Abstract
Sequence tag-based peptide search is a critical technology in proteomics for the characterization of proteins from tandem mass spectrometry data. However, the main reason for hindering the full application of such an approach lies that accurately extracting sequence tags responsible for each experimental spectrum. Toward that end, GameTag, a novel cooperative game framework for sequence tag generation is proposed, which includes a tag generator and a tag discriminator to collaboratively generate sequence tags. Specifically, the tag generator works to extract as many correct tag candidates as possible and the tag discriminator serves to determine the correctness of tag candidates and reduce the total number of output tags simultaneously. Through the dynamic two-player game, the number of extracted tags is decreased while the number of correct tags gets boosted. The performance of the proposed method is also investigated under various hyperparameter and structure settings. Extensive experiments on a wide variety of data sets from different species demonstrate that GameTag outperforms previous state-of-the-art methods, InsPecT, PepNovo+, DirecTag, and the existing tag-extraction method in Open-pFind, increasing by at least 10% the number of spectra extracted more than one correct tag.
Collapse
Affiliation(s)
- Zhengcong Fei
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, No. 6 Zhongguancun South Road, Beijing, 100190, China.,University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Kaifei Wang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, No. 6 Zhongguancun South Road, Beijing, 100190, China.,University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Shijingshan District, Beijing, 100049, China
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, No. 6 Zhongguancun South Road, Beijing, 100190, China.,University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Shijingshan District, Beijing, 100049, China
| |
Collapse
|
27
|
Kocáb O, Jakšová J, Novák O, Petřík I, Lenobel R, Chamrád I, Pavlovič A. Jasmonate-independent regulation of digestive enzyme activity in the carnivorous butterwort Pinguicula × Tina. JOURNAL OF EXPERIMENTAL BOTANY 2020; 71:3749-3758. [PMID: 32219314 PMCID: PMC7307851 DOI: 10.1093/jxb/eraa159] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/25/2020] [Indexed: 05/18/2023]
Abstract
Carnivorous plants within the order Caryophyllales use jasmonates, a class of phytohormone, in the regulation of digestive enzyme activities. We used the carnivorous butterwort Pinguicula × Tina from the order Lamiales to investigate whether jasmonate signaling is a universal and ubiquitous signaling pathway that exists outside the order Caryophyllales. We measured the electrical signals, enzyme activities, and phytohormone tissue levels in response to prey capture. Mass spectrometry was used to identify proteins in the digestive secretion. We identified eight enzymes in the digestive secretion, many of which were previously found in other genera of carnivorous plants. Among them, alpha-amylase is unique in carnivorous plants. Enzymatic activities increased in response to prey capture; however, the tissue content of jasmonic acid and its isoleucine conjugate remained rather low in contrast to the jasmonate response to wounding. Enzyme activities did not increase in response to the exogenous application of jasmonic acid or coronatine. Whereas similar digestive enzymes were co-opted from plant defense mechanisms among carnivorous plants, the mode of their regulation differs. The butterwort has not co-opted jasmonate signaling for the induction of enzyme activities in response to prey capture. Moreover, the presence of alpha-amylase in digestive fluid of P. × Tina, which has not been found in other genera of carnivorous plants, might indicate that non-defense-related genes have also been co-opted for carnivory.
Collapse
Affiliation(s)
- Ondřej Kocáb
- Department of Biophysics, Centre of the Region Haná for Biotechnological and Agricultural Research, Faculty of Science, Palacký University, Šlechtitelů 27, Olomouc, Czech Republic
| | - Jana Jakšová
- Department of Biophysics, Centre of the Region Haná for Biotechnological and Agricultural Research, Faculty of Science, Palacký University, Šlechtitelů 27, Olomouc, Czech Republic
| | - Ondřej Novák
- Laboratory of Growth Regulators, Institute of Experimental Botany, The Czech Academy of Sciences and Faculty of Science, Palacký University, Šlechtitelů 27, Olomouc , Czech Republic
| | - Ivan Petřík
- Laboratory of Growth Regulators, Institute of Experimental Botany, The Czech Academy of Sciences and Faculty of Science, Palacký University, Šlechtitelů 27, Olomouc , Czech Republic
| | - René Lenobel
- Department of Protein Biochemistry and Proteomics, Centre of the Region Haná for Biotechnological and Agricultural Research, Faculty of Science, Palacký University, Šlechtitelů 27, Olomouc, Czech Republic
| | - Ivo Chamrád
- Department of Protein Biochemistry and Proteomics, Centre of the Region Haná for Biotechnological and Agricultural Research, Faculty of Science, Palacký University, Šlechtitelů 27, Olomouc, Czech Republic
| | - Andrej Pavlovič
- Department of Biophysics, Centre of the Region Haná for Biotechnological and Agricultural Research, Faculty of Science, Palacký University, Šlechtitelů 27, Olomouc, Czech Republic
- Correspondence:
| |
Collapse
|
28
|
Yang H, Chi H, Zeng WF, Zhou WJ, He SM. pNovo 3: precise de novo peptide sequencing using a learning-to-rank framework. Bioinformatics 2020; 35:i183-i190. [PMID: 31510687 PMCID: PMC6612832 DOI: 10.1093/bioinformatics/btz366] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION De novo peptide sequencing based on tandem mass spectrometry data is the key technology of shotgun proteomics for identifying peptides without any database and assembling unknown proteins. However, owing to the low ion coverage in tandem mass spectra, the order of certain consecutive amino acids cannot be determined if all of their supporting fragment ions are missing, which results in the low precision of de novo sequencing. RESULTS In order to solve this problem, we developed pNovo 3, which used a learning-to-rank framework to distinguish similar peptide candidates for each spectrum. Three metrics for measuring the similarity between each experimental spectrum and its corresponding theoretical spectrum were used as important features, in which the theoretical spectra can be precisely predicted by the pDeep algorithm using deep learning. On seven benchmark datasets from six diverse species, pNovo 3 recalled 29-102% more correct spectra, and the precision was 11-89% higher than three other state-of-the-art de novo sequencing algorithms. Furthermore, compared with the newly developed DeepNovo, which also used the deep learning approach, pNovo 3 still identified 21-50% more spectra on the nine datasets used in the study of DeepNovo. In summary, the deep learning and learning-to-rank techniques implemented in pNovo 3 significantly improve the precision of de novo sequencing, and such machine learning framework is worth extending to other related research fields to distinguish the similar sequences. AVAILABILITY AND IMPLEMENTATION pNovo 3 can be freely downloaded from http://pfind.ict.ac.cn/software/pNovo/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hao Yang
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing. Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Hao Chi
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing. Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Wen-Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing. Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Wen-Jing Zhou
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing. Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Si-Min He
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing. Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
29
|
Zhao B, Reilly CP, Davis C, Matouschek A, Reilly JP. Use of Multiple Ion Fragmentation Methods to Identify Protein Cross-Links and Facilitate Comparison of Data Interpretation Algorithms. J Proteome Res 2020; 19:2758-2771. [PMID: 32496805 DOI: 10.1021/acs.jproteome.0c00111] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Multiple ion fragmentation methods involving collision-induced dissociation (CID), higher-energy collisional dissociation (HCD) with regular and very high energy settings, and electron-transfer dissociation with supplementary HCD (EThcD) are implemented to improve the confidence of cross-link identifications. Three different S. cerevisiae proteasome samples cross-linked by diethyl suberthioimidate (DEST) or bis(sulfosuccinimidyl)suberate (BS3) are analyzed. Two approaches are introduced to combine interpretations from the above four methods. Working with cleavable cross-linkers such as DEST, the first approach searches for cross-link diagnostic ions and consistency among the best interpretations derived from all four MS2 spectra associated with each precursor ion. Better agreement leads to a more definitive identification. Compatible with both cleavable and noncleavable cross-linkers such as BS3, the second approach multiplies scoring metrics from a number of fragmentation experiments to derive an overall best match. This significantly increases the scoring gap between the target and decoy matches. The validity of cross-links fragmented by HCD alone and identified by Kojak, MeroX, pLink, and Xi was evaluated using multiple fragmentation data. Possible ways to improve the identification credibility are discussed. Data are available via ProteomeXchange with identifier PXD018310.
Collapse
Affiliation(s)
- Bingqing Zhao
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Colin P Reilly
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | - Caroline Davis
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Andreas Matouschek
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas 78712, United States
| | - James P Reilly
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
30
|
Eggers B, Pacharra S, Eisenacher M, Marcus K, Uszkoreit J. Let me infuse this for you - A way to solve the first YPIC challenge. EUPA OPEN PROTEOMICS 2020; 22-23:19-21. [PMID: 31890549 PMCID: PMC6924283 DOI: 10.1016/j.euprot.2019.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 07/17/2019] [Indexed: 11/30/2022]
Abstract
In a common proteomics analysis today, the origins of our sample in the vial are known and therefore a database dependent approach to identify the containing peptides can be used. The first YPIC challenge though provided us with 19 synthetic peptides, which together formed an English sentence. For the identification of these peptides, a de-novo approach was used, which brought us together with an internet search engine to the hidden sentence. But only having the sentence was not sufficient for us, we also wanted to identify as many as possible of the spectra in our data. Therefore, we created and refined a database approach from the de-novo method and finally could identify the peptide-sentence with a good overlap.
Collapse
Affiliation(s)
- Britta Eggers
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Gesundheitscampus 4, D-44801, Bochum, Germany
| | - Sandra Pacharra
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Gesundheitscampus 4, D-44801, Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Gesundheitscampus 4, D-44801, Bochum, Germany
| | - Katrin Marcus
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Gesundheitscampus 4, D-44801, Bochum, Germany
| | - Julian Uszkoreit
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Gesundheitscampus 4, D-44801, Bochum, Germany
| |
Collapse
|
31
|
Fert-Bober J, Murray CI, Parker SJ, Van Eyk JE. Precision Profiling of the Cardiovascular Post-Translationally Modified Proteome: Where There Is a Will, There Is a Way. Circ Res 2019; 122:1221-1237. [PMID: 29700069 DOI: 10.1161/circresaha.118.310966] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
There is an exponential increase in biological complexity as initial gene transcripts are spliced, translated into amino acid sequence, and post-translationally modified. Each protein can exist as multiple chemical or sequence-specific proteoforms, and each has the potential to be a critical mediator of a physiological or pathophysiological signaling cascade. Here, we provide an overview of how different proteoforms come about in biological systems and how they are most commonly measured using mass spectrometry-based proteomics and bioinformatics. Our goal is to present this information at a level accessible to every scientist interested in mass spectrometry and its application to proteome profiling. We will specifically discuss recent data linking various protein post-translational modifications to cardiovascular disease and conclude with a discussion for enablement and democratization of proteomics across the cardiovascular and scientific community. The aim is to inform and inspire the readership to explore a larger breadth of proteoform, particularity post-translational modifications, related to their particular areas of expertise in cardiovascular physiology.
Collapse
Affiliation(s)
- Justyna Fert-Bober
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| | - Christopher I Murray
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| | - Sarah J Parker
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA.
| | - Jennifer E Van Eyk
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| |
Collapse
|
32
|
Brademan DR, Riley NM, Kwiecien NW, Coon JJ. Interactive Peptide Spectral Annotator: A Versatile Web-based Tool for Proteomic Applications. Mol Cell Proteomics 2019; 18:S193-S201. [PMID: 31088857 PMCID: PMC6692776 DOI: 10.1074/mcp.tir118.001209] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 03/21/2019] [Indexed: 11/06/2022] Open
Abstract
Here we present IPSA, an innovative web-based spectrum annotator that visualizes and characterizes peptide tandem mass spectra. A tool for the scientific community, IPSA can visualize peptides collected using a wide variety of experimental and instrumental configurations. Annotated spectra are customizable via a selection of interactive features and can be exported as editable scalable vector graphics to aid in the production of publication-quality figures. Single spectra can be analyzed through provided web forms, whereas data for multiple peptide spectral matches can be uploaded using the Proteomics Standards Initiative file formats mzTab, mzIdentML, and mzML. Alternatively, peptide identifications and spectral data can be provided using generic file formats. IPSA provides supports for annotating spectra collecting using negative-mode ionization and facilitates the characterization of experimental MS/MS performance through the optional export of fragment ion statistics from one to many peptide spectral matches. This resource is made freely accessible at http://interactivepeptidespectralannotator.com, whereas the source code and user guides are available at https://github.com/coongroup/IPSA for private hosting or custom implementations.
Collapse
Affiliation(s)
- Dain R Brademan
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706; Genome Center of Wisconsin, Madison, WI 53706
| | - Nicholas M Riley
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706; Genome Center of Wisconsin, Madison, WI 53706
| | - Nicholas W Kwiecien
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706; Genome Center of Wisconsin, Madison, WI 53706
| | - Joshua J Coon
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706; Morgridge Institute for Research, Madison, WI 53715; Genome Center of Wisconsin, Madison, WI 53706; Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI 53706.
| |
Collapse
|
33
|
Li C, Li K, Li K, Xie X, Lin F. SWPepNovo: An Efficient De Novo Peptide Sequencing Tool for Large-scale MS/MS Spectra Analysis. Int J Biol Sci 2019; 15:1787-1801. [PMID: 31523183 PMCID: PMC6743289 DOI: 10.7150/ijbs.32142] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 04/09/2019] [Indexed: 12/17/2022] Open
Abstract
Tandem mass spectrometry (MS/MS)-based de novo peptide sequencing is a powerful method for high-throughput protein analysis. However, the explosively increasing size of MS/MS spectra dataset inevitably and exponentially raises the computational demand of existing de novo peptide sequencing methods, which is an issue urgently to be solved in computational biology. This paper introduces an efficient tool based on SW26010 many-core processor, namely SWPepNovo, to process the large-scale peptide MS/MS spectra using a parallel peptide spectrum matches (PSMs) algorithm. Our design employs a two-level parallelization mechanism: (1) the task-level parallelism between MPEs using MPI based on a data transformation method and a dynamic feedback task scheduling algorithm, (2) the thread-level parallelism across CPEs using asynchronous task transfer and multithreading. Moreover, three optimization strategies, including vectorization, double buffering and memory access optimizations, have been employed to overcome both the compute-bound and the memory-bound bottlenecks in the parallel PSMs algorithm. The results of experiments conducted on multiple spectra datasets demonstrate the performance of SWPepNovo against three state-of-the-art tools for peptide sequencing, including PepNovo+, PEAKS and DeepNovo-DIA. The SWPepNovo also shows high scalability in experiments on extremely large datasets sized up to 11.22 GB. The software and the parameter settings are available at https://github.com/ChuangLi99/SWPepNovo.
Collapse
Affiliation(s)
- Chuang Li
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Kenli Li
- College of Information Science and Engineering, Hunan University, National Supercomputing Center in Changsha, Changsha, China
| | - Keqin Li
- College of Information Science and Engineering, Hunan University, Department of Computer Science, State University of New York, NY, USA
| | - Xianghui Xie
- State Key Laboratory of Mathematic Engineering and Advance Computing, Wuxi Jiangnan Institute of Computing Technology, Jiangsu, China
| | - Feng Lin
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| |
Collapse
|
34
|
Devabhaktuni A, Lin S, Zhang L, Swaminathan K, Gonzalez CG, Olsson N, Pearlman SM, Rawson K, Elias JE. TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat Biotechnol 2019; 37:469-479. [PMID: 30936560 PMCID: PMC6447449 DOI: 10.1038/s41587-019-0067-5] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Accepted: 02/12/2019] [Indexed: 02/06/2023]
Abstract
Although mass spectrometry is well suited to identifying thousands of potential protein post-translational modifications (PTMs), it has historically been biased towards just a few. To measure the entire set of PTMs across diverse proteomes, software must overcome the dual challenges of covering enormous search spaces and distinguishing correct from incorrect spectrum interpretations. Here, we describe TagGraph, a computational tool that overcomes both challenges with an unrestricted string-based search method that is as much as 350-fold faster than existing approaches, and a probabilistic validation model that we optimized for PTM assignments. We applied TagGraph to a published human proteomic dataset of 25 million mass spectra and tripled confident spectrum identifications compared to its original analysis. We identified thousands of modification types on almost 1 million sites in the proteome. We show alternative contexts for highly abundant yet understudied PTMs such as proline hydroxylation, and its unexpected association with cancer mutations. By enabling broad characterization of PTMs, TagGraph informs as to how their functions and regulation intersect.
Collapse
Affiliation(s)
- Arun Devabhaktuni
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Sarah Lin
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Lichao Zhang
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Kavya Swaminathan
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Carlos G Gonzalez
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Niclas Olsson
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Samuel M Pearlman
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Keith Rawson
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Joshua E Elias
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
35
|
Yang H, Li YC, Zhao MZ, Wu FL, Wang X, Xiao WD, Wang YH, Zhang JL, Wang FQ, Xu F, Zeng WF, Overall CM, He SM, Chi H, Xu P. Precision De Novo Peptide Sequencing Using Mirror Proteases of Ac-LysargiNase and Trypsin for Large-scale Proteomics. Mol Cell Proteomics 2019; 18:773-785. [PMID: 30622160 PMCID: PMC6442358 DOI: 10.1074/mcp.tir118.000918] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2018] [Revised: 11/20/2018] [Indexed: 11/06/2022] Open
Abstract
De novo peptide sequencing for large-scale proteomics remains challenging because of the lack of full coverage of ion series in tandem mass spectra. We developed a mirror protease of trypsin, acetylated LysargiNase (Ac-LysargiNase), with superior activity and stability. The mirror spectrum pairs derived from the Ac-LysargiNase and trypsin treated samples can generate full b and y ion series, which provide mutual complementarity of each other, and allow us to develop a novel algorithm, pNovoM, for de novo sequencing. Using pNovoM to sequence peptides of purified proteins, the accuracy of the sequence was close to 100%. More importantly, from a large-scale yeast proteome sample digested with trypsin and Ac-LysargiNase individually, 48% of all tandem mass spectra formed mirror spectrum pairs, 97% of which contained full coverage of ion series, resulting in precision de novo sequencing of full-length peptides by pNovoM. This enabled pNovoM to successfully sequence 21,249 peptides from 3,753 proteins and interpreted 44-152% more spectra than pNovo+ and PEAKS at a 5% FDR at the spectrum level. Moreover, the mirror protease strategy had an obvious advantage in sequencing long peptides. We believe that the combination of mirror protease strategy and pNovoM will be an effective approach for precision de novo sequencing on both single proteins and proteome samples.
Collapse
Affiliation(s)
- Hao Yang
- From the ‡Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Institute of Computing Technology, CAS, Beijing 100190, China
| | - Yan-Chang Li
- §State Key Laboratory of Proteomics; Beijing Proteome Research Center; National Center for Protein Sciences Beijing; Beijing Institute of Lifeomics, Beijing 102206, China
| | - Ming-Zhi Zhao
- §State Key Laboratory of Proteomics; Beijing Proteome Research Center; National Center for Protein Sciences Beijing; Beijing Institute of Lifeomics, Beijing 102206, China
| | - Fei-Lin Wu
- §State Key Laboratory of Proteomics; Beijing Proteome Research Center; National Center for Protein Sciences Beijing; Beijing Institute of Lifeomics, Beijing 102206, China
| | - Xi Wang
- From the ‡Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Institute of Computing Technology, CAS, Beijing 100190, China
| | - Wei-Di Xiao
- §State Key Laboratory of Proteomics; Beijing Proteome Research Center; National Center for Protein Sciences Beijing; Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yi-Hao Wang
- §State Key Laboratory of Proteomics; Beijing Proteome Research Center; National Center for Protein Sciences Beijing; Beijing Institute of Lifeomics, Beijing 102206, China
| | - Jun-Ling Zhang
- §State Key Laboratory of Proteomics; Beijing Proteome Research Center; National Center for Protein Sciences Beijing; Beijing Institute of Lifeomics, Beijing 102206, China
| | - Fu-Qiang Wang
- §State Key Laboratory of Proteomics; Beijing Proteome Research Center; National Center for Protein Sciences Beijing; Beijing Institute of Lifeomics, Beijing 102206, China
| | - Feng Xu
- §State Key Laboratory of Proteomics; Beijing Proteome Research Center; National Center for Protein Sciences Beijing; Beijing Institute of Lifeomics, Beijing 102206, China
| | - Wen-Feng Zeng
- From the ‡Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Institute of Computing Technology, CAS, Beijing 100190, China
| | - Christopher M Overall
- ‖Centre for Blood Research, University of British Columbia, Vancouver, British Columbia, Canada
| | - Si-Min He
- From the ‡Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Institute of Computing Technology, CAS, Beijing 100190, China;.
| | - Hao Chi
- From the ‡Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Institute of Computing Technology, CAS, Beijing 100190, China;.
| | - Ping Xu
- §State Key Laboratory of Proteomics; Beijing Proteome Research Center; National Center for Protein Sciences Beijing; Beijing Institute of Lifeomics, Beijing 102206, China;; ¶Key Laboratory of Combinatorial Biosynthesis and Drug Discovery of Ministry of Education Wuhan University, Wuhan University School of Pharmaceutical Sciences, Wuhan 430071, China;; College of Life Sciences, Hebei University, Baoding 071002, China.
| |
Collapse
|
36
|
Sheng J, Yang X, Chen J, Peng T, Yin X, Liu W, Liang M, Wan J, Yang X. Antioxidative Effects and Mechanism Study of Bioactive Peptides from Defatted Walnut ( Juglans regia L.) Meal Hydrolysate. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2019; 67:3305-3312. [PMID: 30817142 DOI: 10.1021/acs.jafc.8b05722] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The peptide components of defatted walnut ( Juglans regia L.) meal hydrolysate (DWMH) remain unclear, hindering the investigation of biological mechanisms and exploitation of bioactive peptides. The present study aims to identify the peptide composition of DWMH, followed by to evaluate in vitro antioxidant effects of selected peptides and investigate mechanisms of antioxidative effect. First, more than 1 000 peptides were identified by de novo sequencing in DWMH. Subsequently, a scoring method was established to select promising bioactive peptides by structure based screening. Eight brand new peptides were selected due to their highest scores in two different batches of DWMH. All of them showed potent in vitro antioxidant effects on H2O2-injured nerve cells. Four of them even possessed significantly stronger effects than DWMH, making the selected bioactive peptides useful for further research as new bioactive entities. Two mechanisms of hydroxyl radical scavenging and ROS reduction were involved in their antioxidative effects at different degrees. The results showed peptides possessing similar capacity of hydroxyl radical scavenging or ROS reduction may have significantly different in vitro antioxidative effects. Therefore, comprehensive consideration of different antioxidative mechanisms were suggested in selecting antioxidative peptides from DWMH.
Collapse
Affiliation(s)
- Jianyong Sheng
- National Engineering Research Center for Nanomedicine, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan , Hubei 430074 , People's Republic of China
| | - Xiaoyu Yang
- National Engineering Research Center for Nanomedicine, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan , Hubei 430074 , People's Republic of China
| | - Jitang Chen
- National Engineering Research Center for Nanomedicine, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan , Hubei 430074 , People's Republic of China
| | - Tianhao Peng
- National Engineering Research Center for Nanomedicine, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan , Hubei 430074 , People's Republic of China
| | - Xiquan Yin
- Joint Laboratory for The Research of Modern Preparation Technology-Huazhong University of Science and Technology and Infinitus , Guangzhou , Guangdong 510663 , People's Republic of China
| | - Wei Liu
- National Engineering Research Center for Nanomedicine, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan , Hubei 430074 , People's Republic of China
| | - Ming Liang
- Joint Laboratory for The Research of Modern Preparation Technology-Huazhong University of Science and Technology and Infinitus , Guangzhou , Guangdong 510663 , People's Republic of China
| | - Jiangling Wan
- National Engineering Research Center for Nanomedicine, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan , Hubei 430074 , People's Republic of China
| | - Xiangliang Yang
- National Engineering Research Center for Nanomedicine, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan , Hubei 430074 , People's Republic of China
| |
Collapse
|
37
|
Wang T, Ma B. Adjacent Y-Ion Ratio Distributions and Its Application in Peptide Sequencing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:43-51. [PMID: 30106691 DOI: 10.1109/tcbb.2018.2864647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A scoring function plays a critical role in software for peptide identification with mass spectrometry. We present a general scoring feature that can be incorporated in the scoring functions of other peptide identification software. The scoring feature is based on the intensity ratios between two adjacent y-ions in the spectrum. A method is proposed to obtain the probability distributions of such ratios, and to calculate the scoring feature based on the distributions. To demonstrate the performance of the method, the new feature is incorporated with X!Tandem [1] , [2] and Novor [3] and significantly improved the database search and de novo sequencing performances on the testing data, respectively.
Collapse
|
38
|
Fomin E. A Simple Approach to the Reconstruction of a Set of Points from the Multiset of Pairwise Distances in n2 Steps for the Sequencing Problem: III. Noise Inputs for the Beltway Case. J Comput Biol 2019; 26:68-75. [DOI: 10.1089/cmb.2018.0078] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Affiliation(s)
- Eduard Fomin
- Institute of Cytology and Genetics, SB RAS, Novosibirsk, Russia
| |
Collapse
|
39
|
Miller SE, Rizzo AI, Waldbauer JR. Postnovo: Postprocessing Enables Accurate and FDR-Controlled de Novo Peptide Sequencing. J Proteome Res 2018; 17:3671-3680. [PMID: 30277077 DOI: 10.1021/acs.jproteome.8b00278] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
De novo sequencing offers an alternative to database search methods for peptide identification from mass spectra. Since it does not rely on a predetermined database of expected or potential sequences in the sample, de novo sequencing is particularly appropriate for samples lacking a well-defined or comprehensive reference database. However, the low accuracy of many de novo sequence predictions has prevented the widespread use of the variety of sequencing tools currently available. Here, we present a new open-source tool, Postnovo, that postprocesses de novo sequence predictions to find high-accuracy results. Postnovo uses a predictive model to rescore and rerank candidate sequences in a manner akin to database search postprocessing tools such as Percolator. Postnovo leverages the output from multiple de novo sequencing tools in its own analyses, producing many times the length of amino acid sequence information (including both full- and partial-length peptide sequences) at an equivalent false discovery rate (FDR) compared to any individual tool. We present a methodology to reliably screen the sequence predictions to a desired FDR given the Postnovo sequence score. We validate Postnovo with multiple data sets and demonstrate its ability to identify proteins that are missed by database search even in samples with paired reference databases.
Collapse
Affiliation(s)
- Samuel E Miller
- Department of the Geophysical Sciences , University of Chicago , 5734 South Ellis Avenue , Chicago , Illinois 60637 , United States
| | - Adriana I Rizzo
- Department of the Geophysical Sciences , University of Chicago , 5734 South Ellis Avenue , Chicago , Illinois 60637 , United States
| | - Jacob R Waldbauer
- Department of the Geophysical Sciences , University of Chicago , 5734 South Ellis Avenue , Chicago , Illinois 60637 , United States
| |
Collapse
|
40
|
Muth T, Hartkopf F, Vaudel M, Renard BY. A Potential Golden Age to Come-Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics. Proteomics 2018; 18:e1700150. [PMID: 29968278 DOI: 10.1002/pmic.201700150] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/23/2018] [Indexed: 01/15/2023]
Abstract
In shotgun proteomics, peptide and protein identification is most commonly conducted using database search engines, the method of choice when reference protein sequences are available. Despite its widespread use the database-driven approach is limited, mainly because of its static search space. In contrast, de novo sequencing derives peptide sequence information in an unbiased manner, using only the fragment ion information from the tandem mass spectra. In recent years, with the improvements in MS instrumentation, various new methods have been proposed for de novo sequencing. This review article provides an overview of existing de novo sequencing algorithms and software tools ranging from peptide sequencing to sequence-to-protein mapping. Various use cases are described for which de novo sequencing was successfully applied. Finally, limitations of current methods are highlighted and new directions are discussed for a wider acceptance of de novo sequencing in the community.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020, Bergen, Norway.,Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, 5020, Bergen, Norway
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| |
Collapse
|
41
|
Háda V, Bagdi A, Bihari Z, Timári SB, Fizil Á, Szántay C. Recent advancements, challenges, and practical considerations in the mass spectrometry-based analytics of protein biotherapeutics: A viewpoint from the biosimilar industry. J Pharm Biomed Anal 2018; 161:214-238. [PMID: 30205300 DOI: 10.1016/j.jpba.2018.08.024] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 08/08/2018] [Accepted: 08/10/2018] [Indexed: 01/22/2023]
Abstract
The extensive analytical characterization of protein biotherapeutics, especially of biosimilars, is a critical part of the product development and registration. High-resolution mass spectrometry became the primary analytical tool used for the structural characterization of biotherapeutics. Its high instrumental sensitivity and methodological versatility made it possible to use this technique to characterize both the primary and higher-order structure of these proteins. However, even by using high-end instrumentation, analysts face several challenges with regard to how to cope with industrial and regulatory requirements, that is, how to obtain accurate and reliable analytical data in a time- and cost-efficient way. New sample preparation approaches, measurement techniques and data evaluation strategies are available to meet those requirements. The practical considerations of these methods are discussed in the present review article focusing on hot topics, such as reliable and efficient sequencing strategies, minimization of artefact formation during sample preparation, quantitative peptide mapping, the potential of multi-attribute methodology, the increasing role of mass spectrometry in higher-order structure characterization and the challenges of MS-based identification of host cell proteins. On the basis of the opportunities in new instrumental techniques, methodological advancements and software-driven data evaluation approaches, for the future one can envision an even wider application area for mass spectrometry in the biopharmaceutical industry.
Collapse
Affiliation(s)
- Viktor Háda
- Analytical Department of Biotechnology, Gedeon Richter Plc, Hungary.
| | - Attila Bagdi
- Analytical Department of Biotechnology, Gedeon Richter Plc, Hungary
| | - Zsolt Bihari
- Analytical Department of Biotechnology, Gedeon Richter Plc, Hungary
| | | | - Ádám Fizil
- Analytical Department of Biotechnology, Gedeon Richter Plc, Hungary
| | - Csaba Szántay
- Spectroscopic Research Department, Gedeon Richter Plc, Hungary.
| |
Collapse
|
42
|
Park H, Kim J, Lee YK, Kim W, You SK, Do J, Jang Y, Oh DB, Il Kim J, Kim HH. Four unreported types of glycans containing mannose-6-phosphate are heterogeneously attached at three sites (including newly found Asn 233) to recombinant human acid alpha-glucosidase that is the only approved treatment for Pompe disease. Biochem Biophys Res Commun 2017; 495:2418-2424. [PMID: 29274340 DOI: 10.1016/j.bbrc.2017.12.101] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 12/18/2017] [Indexed: 11/25/2022]
Abstract
Myozyme is a recombinant human acid alpha-glucosidase (rhGAA) that is currently the only drug approved for treating Pompe disease, and its low efficacy means that a high dose is required. Mannose-6-phosphate (M6P) glycosylation on rhGAA is a key factor influencing lysosomal enzyme targeting and the efficacy of enzyme replacement therapy (ERT); however, its complex structure and relatively small quantity still remain to be characterized. This study investigated M6P glycosylation on rhGAA using liquid chromatography (LC)-electrospray ionization (ESI)-high-energy collisional dissociation (HCD) tandem mass spectrometry (MS/MS). The glycans released from rhGAA were labeled with procainamide to improve mass ionization efficiency and the sensitivity of MS/MS. The relative quantities (%) of 78 glycans were obtained, and 1.0% of them were glycans containing M6P (M6P glycans). These were categorized according to their structure into 4 types: 3 newly found ones, comprising high-mannose-type M6P glycans capped with N-acetylglucosamine (GlcNAc) (2 variants, 17.5%), hybrid-type M6P glycans (2 variants, 11.2%), and hybrid-type M6P glycans capped with GlcNAc (3 variants, 6.9%), as well as high-mannose-type M6P glycans (3 variants, 64.4%). HCD-MS/MS spectra identified six distinctive M6P-derived oxonium ions. The glycopeptides obtained from protease-digested rhGAA were analyzed using nano-LC-ESI-HCD-MS/MS, and the extracted-ion chromatograms of M6P-derived oxonium ions confirmed three M6P glycosylation sites comprising Asn 140, Asn 233 (newly found), and Asn 470 attached heterogeneously to nine M6P glycans (two types), eight M6P glycans (four types), and seven M6P glycans (two types), respectively. This is the first study of rhGAA to differentiate M6P glycans and identify their attachment sites, despite rhGAA already being an approved drug for Pompe disease.
Collapse
Affiliation(s)
- Heajin Park
- Biotherapeutics and Glycomics Laboratory, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06944, South Korea
| | - Jihye Kim
- Biotherapeutics and Glycomics Laboratory, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06944, South Korea
| | - Young Kwang Lee
- Biotherapeutics and Glycomics Laboratory, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06944, South Korea
| | - Wooseok Kim
- Biotherapeutics and Glycomics Laboratory, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06944, South Korea
| | - Seung Kwan You
- Biotherapeutics and Glycomics Laboratory, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06944, South Korea
| | - Jonghye Do
- Biotherapeutics and Glycomics Laboratory, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06944, South Korea
| | - Yeonjoo Jang
- Biotherapeutics and Glycomics Laboratory, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06944, South Korea
| | - Doo-Byung Oh
- Korea Research Institute of Bioscience & Biotechnology, 125 Gwahak-ro, Yuseong-gu, Daejeon 34141, South Korea
| | - Jae Il Kim
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju 61005, South Korea
| | - Ha Hyung Kim
- Biotherapeutics and Glycomics Laboratory, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06944, South Korea.
| |
Collapse
|
43
|
Yang H, Chi H, Zhou WJ, Zeng WF, Liu C, Wang RM, Wang ZW, Niu XN, Chen ZL, He SM. pSite: Amino Acid Confidence Evaluation for Quality Control of De Novo Peptide Sequencing and Modification Site Localization. J Proteome Res 2017; 17:119-128. [DOI: 10.1021/acs.jproteome.7b00428] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Hao Yang
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Chi
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Wen-Jing Zhou
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wen-Feng Zeng
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chao Liu
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Rui-Min Wang
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhao-Wei Wang
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiu-Nan Niu
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhen-Lin Chen
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Si-Min He
- Key
Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
44
|
Vyatkina K, Dekker LJM, Wu S, VanDuijn MM, Liu X, Tolić N, Luider TM, Paša-Tolić L. De Novo Sequencing of Peptides from High-Resolution Bottom-Up Tandem Mass Spectra using Top-Down Intended Methods. Proteomics 2017; 17. [PMID: 29110399 DOI: 10.1002/pmic.201600321] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Revised: 09/15/2017] [Indexed: 11/10/2022]
Abstract
Despite high-resolution mass spectrometers are becoming accessible for more and more laboratories, tandem (MS/MS) mass spectra are still often collected at a low resolution. And even if acquired at a high resolution, software tools used for their processing do not tend to benefit from that in full, and an ability to specify a relative mass tolerance in this case often remains the only feature the respective algorithms take advantage of. We argue that a more efficient way to analyze high-resolution MS/MS spectra should be with methods more explicitly accounting for the precision level, and sustain this claim through demonstrating that a de novo sequencing framework originally developed for (high-resolution) top-down MS/MS data is perfectly suitable for processing high-resolution bottom-up datasets, even though a top-down like deconvolution performed as the first step will leave in many spectra at most a few peaks.
Collapse
Affiliation(s)
- Kira Vyatkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Mathematical and Information Technologies, Saint Petersburg Academic University, Russian Academy of Sciences, Saint Petersburg, Russia.,Department of Information Technologies and Programming, ITMO University, Saint Petersburg, Russia.,Department of Computer Technologies and Informatics, Saint Petersburg Electrotechnical University LETI, Saint Petersburg, Russia
| | - Lennard J M Dekker
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
| | - Martijn M VanDuijn
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Nikola Tolić
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Theo M Luider
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Ljiljana Paša-Tolić
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| |
Collapse
|
45
|
Abstract
De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7-22.9% higher accuracy at the amino acid level and 38.1-64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5-100% coverage and 97.2-99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming.
Collapse
|
46
|
Sen KI, Tang WH, Nayak S, Kil YJ, Bern M, Ozoglu B, Ueberheide B, Davis D, Becker C. Automated Antibody De Novo Sequencing and Its Utility in Biopharmaceutical Discovery. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2017; 28:803-810. [PMID: 28105549 PMCID: PMC5392168 DOI: 10.1007/s13361-016-1580-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 12/02/2016] [Accepted: 12/04/2016] [Indexed: 05/12/2023]
Abstract
Applications of antibody de novo sequencing in the biopharmaceutical industry range from the discovery of new antibody drug candidates to identifying reagents for research and determining the primary structure of innovator products for biosimilar development. When murine, phage display, or patient-derived monoclonal antibodies against a target of interest are available, but the cDNA or the original cell line is not, de novo protein sequencing is required to humanize and recombinantly express these antibodies, followed by in vitro and in vivo testing for functional validation. Availability of fully automated software tools for monoclonal antibody de novo sequencing enables efficient and routine analysis. Here, we present a novel method to automatically de novo sequence antibodies using mass spectrometry and the Supernovo software. The robustness of the algorithm is demonstrated through a series of stress tests. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- K Ilker Sen
- Protein Metrics Inc, 1622 San Carlos Ave, Suite C, San Carlos, CA, 94070, USA.
| | - Wilfred H Tang
- Protein Metrics Inc, 1622 San Carlos Ave, Suite C, San Carlos, CA, 94070, USA
| | - Shruti Nayak
- Langone Medical Center, New York University, 430 East 29th street, 8th floor room 860, New York, NY, 10016, USA
| | - Yong J Kil
- Protein Metrics Inc, 1622 San Carlos Ave, Suite C, San Carlos, CA, 94070, USA
| | - Marshall Bern
- Protein Metrics Inc, 1622 San Carlos Ave, Suite C, San Carlos, CA, 94070, USA
| | - Berk Ozoglu
- Janssen Research and Development, LLC, 1400 McKean Road, Spring House, PA, 19477, USA
| | - Beatrix Ueberheide
- Langone Medical Center, New York University, 430 East 29th street, 8th floor room 860, New York, NY, 10016, USA
| | - Darryl Davis
- Janssen Research and Development, LLC, 1400 McKean Road, Spring House, PA, 19477, USA
| | - Christopher Becker
- Protein Metrics Inc, 1622 San Carlos Ave, Suite C, San Carlos, CA, 94070, USA
| |
Collapse
|
47
|
Xiao K, Yu F, Fang H, Xue B, Liu Y, Li Y, Tian Z. Are neutral loss and internal product ions useful for top-down protein identification? J Proteomics 2017; 160:21-27. [DOI: 10.1016/j.jprot.2017.03.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Revised: 03/13/2017] [Accepted: 03/15/2017] [Indexed: 10/19/2022]
|
48
|
Xu F, Wang L, Ju X, Zhang J, Yin S, Shi J, He R, Yuan Q. Transepithelial Transport of YWDHNNPQIR and Its Metabolic Fate with Cytoprotection against Oxidative Stress in Human Intestinal Caco-2 Cells. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2017; 65:2056-2065. [PMID: 28218523 DOI: 10.1021/acs.jafc.6b04731] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Studies on antioxidant peptides extracted from foodstuff sources have included not only experiments to elucidate their chemical characteristics but also to investigate their bioavailability and intracellular mechanisms. This study was designed to clarify the absorption and antioxidative activity of YWDHNNPQIR (named RAP), which is derived from rapeseed protein using a Caco-2 cell transwell model. Results showed that 0.8% RAP (C0 = 0.2 mM, t = 90 min) could maintain the original structure across the Caco-2 cell monolayers via the intracellular transcytosis pathway, and the apparent drug absorption rate (Papp) was (6.6 ± 1.24) × 10-7 cm/s. Three main fragments (WDHNNPQIR, DHNNPQIR, and YWDHNNPQ) and five modified peptides derived from RAP were found in both the apical and basolateral side of the Caco-2 cell transwell model. Among these new metabolites, WDHNNPQIR had the greatest antioxidative activity in Caco-2 cells apart from the DPPH assay. With a RAP concentration of 200 μM, there were significant differences in four antioxidative indicators (T-AOC, GSH-Px, SOD, and MDA) compared to the oxidative stress control (P < 0.05). In addition, RAP may also influence apoptosis of the Caco-2 cells, which was caused by AAPH-induced oxidative damage.
Collapse
Affiliation(s)
- Feiran Xu
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Lifeng Wang
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Xingrong Ju
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Jing Zhang
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Shi Yin
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Jiayi Shi
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Rong He
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| | - Qiang Yuan
- College of Food Science and Engineering/Collaborative Innovation Center for Modern Grain Circulation and Safety/Key Laboratory of Grains and Oils Quality Control and Processing, Nanjing University of Finance and Economics , Nanjing 210023, P.R. China
| |
Collapse
|
49
|
Liu Y, Ma B, Zhang K, Lajoie G. An Approach for Peptide Identification by De Novo Sequencing of Mixture Spectra. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:326-336. [PMID: 28368810 DOI: 10.1109/tcbb.2015.2407401] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Mixture spectra occur quite frequently in a typical wet-lab mass spectrometry experiment, which result from the concurrent fragmentation of multiple precursors. The ability to efficiently and confidently identify mixture spectra is essential to alleviate the existent bottleneck of low mass spectra identification rate. However, most of the traditional computational methods are not suitable for interpreting mixture spectra, because they still take the assumption that the acquired spectra come from the fragmentation of a single precursor. In this manuscript, we formulate the mixture spectra de novo sequencing problem mathematically, and propose a dynamic programming algorithm for the problem. Additionally, we use both simulated and real mixture spectra data sets to verify the merits of the proposed algorithm.
Collapse
|
50
|
Vyatkina K. De Novo Sequencing of Top-Down Tandem Mass Spectra: A Next Step towards Retrieving a Complete Protein Sequence. Proteomes 2017; 5:E6. [PMID: 28248257 PMCID: PMC5372227 DOI: 10.3390/proteomes5010006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 01/30/2017] [Accepted: 02/04/2017] [Indexed: 11/16/2022] Open
Abstract
De novo sequencing of tandem (MS/MS) mass spectra represents the only way to determine the sequence of proteins from organisms with unknown genomes, or the ones not directly inscribed in a genome-such as antibodies, or novel splice variants. Top-down mass spectrometry provides new opportunities for analyzing such proteins; however, retrieving a complete protein sequence from top-down MS/MS spectra still remains a distant goal. In this paper, we review the state-of-the-art on this subject, and enhance our previously developed Twister algorithm for de novo sequencing of peptides from top-down MS/MS spectra to derive longer sequence fragments of a target protein.
Collapse
Affiliation(s)
- Kira Vyatkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, 7-9 Universitetskaya nab., St. Petersburg 199034, Russia.
- Department of Mathematical and Information Technologies, Saint Petersburg Academic University, 8/3 Khlopina st., St. Petersburg 194021, Russia.
| |
Collapse
|