1
|
Van Den Bossche T, Beslic D, van Puyenbroeck S, Suomi T, Holstein T, Martens L, Elo LL, Muth T. Metaproteomics Beyond Databases: Addressing the Challenges and Potentials of De Novo Sequencing. Proteomics 2025:e202400321. [PMID: 39888246 DOI: 10.1002/pmic.202400321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 01/09/2025] [Accepted: 01/10/2025] [Indexed: 02/01/2025]
Abstract
Metaproteomics enables the large-scale characterization of microbial community proteins, offering crucial insights into their taxonomic composition, functional activities, and interactions within their environments. By directly analyzing proteins, metaproteomics offers insights into community phenotypes and the roles individual members play in diverse ecosystems. Although database-dependent search engines are commonly used for peptide identification, they rely on pre-existing protein databases, which can be limiting for complex, poorly characterized microbiomes. De novo sequencing presents a promising alternative, which derives peptide sequences directly from mass spectra without requiring a database. Over time, this approach has evolved from manual annotation to advanced graph-based, tag-based, and deep learning-based methods, significantly improving the accuracy of peptide identification. This Viewpoint explores the evolution, advantages, limitations, and future opportunities of de novo sequencing in metaproteomics. We highlight recent technological advancements that have improved its potential for detecting unsequenced species and for providing deeper functional insights into microbial communities.
Collapse
Affiliation(s)
- Tim Van Den Bossche
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Denis Beslic
- Centre for Artificial Intelligence in Public Health Research, Robert Koch Institute, Berlin, Germany
| | - Sam van Puyenbroeck
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Tomi Suomi
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Tanja Holstein
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
- Data Competence Center MF 2, Robert Koch Institute, Berlin, Germany
| | - Lennart Martens
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- Institute of Biomedicine, University of Turku, Turku, Finland
| | - Thilo Muth
- Data Competence Center MF 2, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
2
|
Liang J, Liao Y, Tu Z, Liu J. Revamping Hepatocellular Carcinoma Immunotherapy: The Advent of Microbial Neoantigen Vaccines. Vaccines (Basel) 2024; 12:930. [PMID: 39204053 PMCID: PMC11359864 DOI: 10.3390/vaccines12080930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Revised: 08/14/2024] [Accepted: 08/19/2024] [Indexed: 09/03/2024] Open
Abstract
Immunotherapy has revolutionized the treatment paradigm for hepatocellular carcinoma (HCC). However, its efficacy varies significantly with each patient's genetic composition and the complex interactions with their microbiome, both of which are pivotal in shaping anti-tumor immunity. The emergence of microbial neoantigens, a novel class of tumor vaccines, heralds a transformative shift in HCC therapy. This review explores the untapped potential of microbial neoantigens as innovative tumor vaccines, poised to redefine current HCC treatment modalities. For instance, neoantigens derived from the microbiome have demonstrated the capacity to enhance anti-tumor immunity in colorectal cancer, suggesting similar applications in HCC. By harnessing these unique neoantigens, we propose a framework for a personalized immunotherapeutic response, aiming to deliver a more precise and potent treatment strategy for HCC. Leveraging these neoantigens could significantly advance personalized medicine, potentially revolutionizing patient outcomes in HCC therapy.
Collapse
Affiliation(s)
| | | | | | - Jinping Liu
- State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China; (J.L.); (Y.L.); (Z.T.)
| |
Collapse
|
3
|
Flender D, Vilenne F, Adams C, Boonen K, Valkenborg D, Baggerman G. Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain. MASS SPECTROMETRY REVIEWS 2024. [PMID: 39152539 DOI: 10.1002/mas.21905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
Immunopeptidomics is becoming an increasingly important field of study. The capability to identify immunopeptides with pivotal roles in the human immune system is essential to shift the current curative medicine towards personalized medicine. Throughout the years, the field has matured, giving insight into the current pitfalls. Nowadays, it is commonly accepted that generalizing shotgun proteomics workflows is malpractice because immunopeptidomics faces numerous challenges. While many of these difficulties have been addressed, the road towards the ideal workflow remains complicated. Although the presence of Posttranslational modifications (PTMs) in the immunopeptidome has been demonstrated, their identification remains highly challenging despite their significance for immunotherapies. The large number of unpredictable modifications in the immunopeptidome plays a pivotal role in the functionality and these challenges. This review provides a comprehensive overview of the current advancements in immunopeptidomics. We delve into the challenges associated with identifying PTMs within the immunopeptidome, aiming to address the current state of the field.
Collapse
Affiliation(s)
- Daniel Flender
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- Health Unit, VITO, Mol, Belgium
| | - Frédérique Vilenne
- Health Unit, VITO, Mol, Belgium
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- ImmuneSpec, Niel, Belgium
| | - Dirk Valkenborg
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Geert Baggerman
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- ImmuneSpec, Niel, Belgium
| |
Collapse
|
4
|
Yilmaz M, Fondrie WE, Bittremieux W, Melendez CF, Nelson R, Ananth V, Oh S, Noble WS. Sequence-to-sequence translation from mass spectra to peptides with a transformer model. Nat Commun 2024; 15:6427. [PMID: 39080256 PMCID: PMC11289372 DOI: 10.1038/s41467-024-49731-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 06/18/2024] [Indexed: 08/02/2024] Open
Abstract
A fundamental challenge in mass spectrometry-based proteomics is the identification of the peptide that generated each acquired tandem mass spectrum. Approaches that leverage known peptide sequence databases cannot detect unexpected peptides and can be impractical or impossible to apply in some settings. Thus, the ability to assign peptide sequences to tandem mass spectra without prior information-de novo peptide sequencing-is valuable for tasks including antibody sequencing, immunopeptidomics, and metaproteomics. Although many methods have been developed to address this problem, it remains an outstanding challenge in part due to the difficulty of modeling the irregular data structure of tandem mass spectra. Here, we describe Casanovo, a machine learning model that uses a transformer neural network architecture to translate the sequence of peaks in a tandem mass spectrum into the sequence of amino acids that comprise the generating peptide. We train a Casanovo model from 30 million labeled spectra and demonstrate that the model outperforms several state-of-the-art methods on a cross-species benchmark dataset. We also develop a version of Casanovo that is fine-tuned for non-enzymatic peptides. Finally, we demonstrate that Casanovo's superior performance improves the analysis of immunopeptidomics and metaproteomics experiments and allows us to delve deeper into the dark proteome.
Collapse
Affiliation(s)
- Melih Yilmaz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | | | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Carlo F Melendez
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Rowan Nelson
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Varun Ananth
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.
- Department of Genome Sciences, University of Washington, Seattle, USA.
| |
Collapse
|
5
|
Petrovskiy DV, Nikolsky KS, Kulikova LI, Rudnev VR, Butkova TV, Malsagova KA, Kopylov AT, Kaysheva AL. PowerNovo: de novo peptide sequencing via tandem mass spectrometry using an ensemble of transformer and BERT models. Sci Rep 2024; 14:15000. [PMID: 38951578 PMCID: PMC11217302 DOI: 10.1038/s41598-024-65861-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 06/25/2024] [Indexed: 07/03/2024] Open
Abstract
The primary objective of analyzing the data obtained in a mass spectrometry-based proteomic experiment is peptide and protein identification, or correct assignment of the tandem mass spectrum to one amino acid sequence. Comparison of empirical fragment spectra with the theoretical predicted one or matching with the collected spectra library are commonly accepted strategies of proteins identification and defining of their amino acid sequences. Although these approaches are widely used and are appreciably efficient for the well-characterized model organisms or measured proteins, they cannot detect novel peptide sequences that have not been previously annotated or are rare. This study presents PowerNovo tool for de novo sequencing of proteins using tandem mass spectra acquired in a variety of types of mass analyzers and different fragmentation techniques. PowerNovo involves an ensemble of models for peptide sequencing: model for detecting regularities in tandem mass spectra, precursors, and fragment ions and a natural language processing model, which has a function of peptide sequence quality assessment and helps with reconstruction of noisy sequences. The results of testing showed that the performance of PowerNovo is comparable and even better than widely utilized PointNovo, DeepNovo, Casanovo, and Novor packages. Also, PowerNovo provides complete cycle of processing (pipeline) of mass spectrometry data and, along with predicting the peptide sequence, involves the peptide assembly and protein inference blocks.
Collapse
|
6
|
Ebrahimi S, Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. ARXIV 2024:arXiv:2402.11363v3. [PMID: 38659639 PMCID: PMC11042412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Transformer-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Transformer-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Transformer-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Transformer-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Transformer-DIA.
Collapse
Affiliation(s)
- Shiva Ebrahimi
- Computer Science & Engineering, University of North Texas, Denton, USA
| | - Xuan Guo
- Computer Science & Engineering, University of North Texas, Denton, USA
| |
Collapse
|
7
|
Nguyen SN, Le SH, Ivanov DG, Ivetic N, Nazy I, Kaltashov IA. Structural Characterization of a Pathogenic Antibody Underlying Vaccine-Induced Immune Thrombotic Thrombocytopenia (VITT). Anal Chem 2024; 96:6209-6217. [PMID: 38607319 DOI: 10.1021/acs.analchem.3c05253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2024]
Abstract
Vaccine-induced immune thrombotic thrombocytopenia (VITT) is a rare but dangerous side effect of adenoviral-vectored COVID-19 vaccines. VITT had been linked to production of autoantibodies recognizing platelet factor 4 (PF4). Here, we characterize anti-PF4 antibodies obtained from a VITT patient's blood. Intact mass measurements indicate that a significant fraction of these antibodies represent a limited number of clones. MS analysis of large antibody fragments (the light chain and the Fc/2 and Fd fragments of the heavy chain) confirms the monoclonal nature of this component of the anti-PF4 antibodies repertoire and reveals the presence of a mature complex biantennary N-glycan within the Fd segment. Peptide mapping using two complementary proteases and LC-MS/MS was used to determine the amino acid sequence of the entire light chain and over 98% of the heavy chain (excluding a short N-terminal segment). The sequence analysis allows the monoclonal antibody to be assigned to the IgG2 subclass and verifies that the light chain belongs to the λ-type. Incorporation of enzymatic de-N-glycosylation into the peptide mapping routine allows the N-glycan in the Fab region of the antibody to be localized to the framework 3 region of the VH domain. This novel N-glycosylation site is the result of a single mutation within the germline sequence. Peptide mapping also provides information on lower-abundance (polyclonal) components of the anti-PF4 antibody ensemble, revealing the presence of all four subclasses (IgG1-IgG4) and both types of the light chain (λ and κ). This case study demonstrates the power of combining the intact, middle-down, and bottom-up MS approaches for meaningful characterization of ultralow quantities of pathogenic antibodies extracted directly from patients' blood.
Collapse
Affiliation(s)
- Son N Nguyen
- Department of Chemistry, University of Massachusetts-Amherst, Amherst, Massachusetts 01003, United States
| | - Si-Hung Le
- Department of Chemistry, University of Massachusetts-Amherst, Amherst, Massachusetts 01003, United States
| | - Daniil G Ivanov
- Department of Chemistry, University of Massachusetts-Amherst, Amherst, Massachusetts 01003, United States
| | - Nikola Ivetic
- Department of Medicine, Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Ishac Nazy
- Department of Medicine, Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Igor A Kaltashov
- Department of Chemistry, University of Massachusetts-Amherst, Amherst, Massachusetts 01003, United States
| |
Collapse
|
8
|
Yang T, Ling T, Sun B, Liang Z, Xu F, Huang X, Xie L, He Y, Li L, He F, Wang Y, Chang C. Introducing π-HelixNovo for practical large-scale de novo peptide sequencing. Brief Bioinform 2024; 25:bbae021. [PMID: 38340092 PMCID: PMC10858680 DOI: 10.1093/bib/bbae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/10/2024] [Accepted: 01/14/2024] [Indexed: 02/12/2024] Open
Abstract
De novo peptide sequencing is a promising approach for novel peptide discovery, highlighting the performance improvements for the state-of-the-art models. The quality of mass spectra often varies due to unexpected missing of certain ions, presenting a significant challenge in de novo peptide sequencing. Here, we use a novel concept of complementary spectra to enhance ion information of the experimental spectrum and demonstrate it through conceptual and practical analyses. Afterward, we design suitable encoders to encode the experimental spectrum and the corresponding complementary spectrum and propose a de novo sequencing model $\pi$-HelixNovo based on the Transformer architecture. We first demonstrated that $\pi$-HelixNovo outperforms other state-of-the-art models using a series of comparative experiments. Then, we utilized $\pi$-HelixNovo to de novo gut metaproteome peptides for the first time. The results show $\pi$-HelixNovo increases the identification coverage and accuracy of gut metaproteome and enhances the taxonomic resolution of gut metaproteome. We finally trained a powerful $\pi$-HelixNovo utilizing a larger training dataset, and as expected, $\pi$-HelixNovo achieves unprecedented performance, even for peptide-spectrum matches with never-before-seen peptide sequences. We also use the powerful $\pi$-HelixNovo to identify antibody peptides and multi-enzyme cleavage peptides, and $\pi$-HelixNovo is highly robust in these applications. Our results demonstrate the effectivity of the complementary spectrum and take a significant step forward in de novo peptide sequencing.
Collapse
Affiliation(s)
- Tingpeng Yang
- Peng Cheng Laboratory, Shenzhen, 518055, China
- Tsinghua Shenzhen International Graduate School, Shenzhen, 518055, China
| | - Tianze Ling
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Boyan Sun
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Zhendong Liang
- Peng Cheng Laboratory, Shenzhen, 518055, China
- Tsinghua Shenzhen International Graduate School, Shenzhen, 518055, China
| | - Fan Xu
- Peng Cheng Laboratory, Shenzhen, 518055, China
| | | | - Linhai Xie
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Yonghong He
- Peng Cheng Laboratory, Shenzhen, 518055, China
- Tsinghua Shenzhen International Graduate School, Shenzhen, 518055, China
| | - Leyuan Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Fuchu He
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- Research Unit of Proteomics Driven Cancer Precision Medicine, Chinese Academy of Medical Sciences, Beijing 102206, China
| | - Yu Wang
- Peng Cheng Laboratory, Shenzhen, 518055, China
| | - Cheng Chang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- Research Unit of Proteomics Driven Cancer Precision Medicine, Chinese Academy of Medical Sciences, Beijing 102206, China
| |
Collapse
|
9
|
Kleikamp HBC, van der Zwaan R, van Valderen R, van Ede JM, Pronk M, Schaasberg P, Allaart MT, van Loosdrecht MCM, Pabst M. NovoLign: metaproteomics by sequence alignment. ISME COMMUNICATIONS 2024; 4:ycae121. [PMID: 39493671 PMCID: PMC11530927 DOI: 10.1093/ismeco/ycae121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 09/03/2024] [Accepted: 10/10/2024] [Indexed: 11/05/2024]
Abstract
Tremendous advances in mass spectrometric and bioinformatic approaches have expanded proteomics into the field of microbial ecology. The commonly used spectral annotation method for metaproteomics data relies on database searching, which requires sample-specific databases obtained from whole metagenome sequencing experiments. However, creating these databases is complex, time-consuming, and prone to errors, potentially biasing experimental outcomes and conclusions. This asks for alternative approaches that can provide rapid and orthogonal insights into metaproteomics data. Here, we present NovoLign, a de novo metaproteomics pipeline that performs sequence alignment of de novo sequences from complete metaproteomics experiments. The pipeline enables rapid taxonomic profiling of complex communities and evaluates the taxonomic coverage of metaproteomics outcomes obtained from database searches. Furthermore, the NovoLign pipeline supports the creation of reference sequence databases for database searching to ensure comprehensive coverage. We assessed the NovoLign pipeline for taxonomic coverage and false positive annotations using a wide range of in silico and experimental data, including pure reference strains, laboratory enrichment cultures, synthetic communities, and environmental microbial communities. In summary, we present NovoLign, a de novo metaproteomics pipeline that employs large-scale sequence alignment to enable rapid taxonomic profiling, evaluation of database searching outcomes, and the creation of reference sequence databases. The NovoLign pipeline is publicly available via: https://github.com/hbckleikamp/NovoLign.
Collapse
Affiliation(s)
- Hugo B C Kleikamp
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Ramon van der Zwaan
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Ramon van Valderen
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Jitske M van Ede
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Mario Pronk
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Pim Schaasberg
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Maximilienne T Allaart
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Mark C M van Loosdrecht
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Martin Pabst
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| |
Collapse
|
10
|
Ebrahimi S, Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING 2023; 2023:28-35. [PMID: 38665266 PMCID: PMC11044815 DOI: 10.1109/bibe60311.2023.00013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Casanovo-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Casanovo-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Casanovo-DIA.
Collapse
Affiliation(s)
- Shiva Ebrahimi
- Computer Science & Engineering University of North Texas Denton, USA
| | - Xuan Guo
- Computer Science & Engineering University of North Texas Denton, USA
| |
Collapse
|
11
|
Chen Z, Lim YW, Neo JY, Ting Chan RS, Koh LQ, Yuen TY, Lim YH, Johannes CW, Gates ZP. De Novo Sequencing of Synthetic Bis-cysteine Peptide Macrocycles Enabled by "Chemical Linearization" of Compound Mixtures. Anal Chem 2023; 95:14870-14878. [PMID: 37724843 PMCID: PMC10569172 DOI: 10.1021/acs.analchem.3c01742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023]
Abstract
A "chemical linearization" approach was applied to synthetic peptide macrocycles to enable their de novo sequencing from mixtures using nanoliquid chromatography-tandem mass spectrometry (nLC-MS/MS). This approach─previously applied to individual macrocycles but not to mixtures─involves cleavage of the peptide backbone at a defined position to give a product capable of generating sequence-determining fragment ions. Here, we first established the compatibility of "chemical linearization" by Edman degradation with a prominent macrocycle scaffold based on bis-Cys peptides cross-linked with the m-xylene linker, which are of major significance in therapeutics discovery. Then, using macrocycle libraries of known sequence composition, the ability to recover accurate de novo assignments to linearized products was critically tested using performance metrics unique to mixtures. Significantly, we show that linearized macrocycles can be sequenced with lower recall compared to linear peptides but with similar accuracy, which establishes the potential of using "chemical linearization" with synthetic libraries and selection procedures that yield compound mixtures. Sodiated precursor ions were identified as a significant source of high-scoring but inaccurate assignments, with potential implications for improving automated de novo sequencing more generally.
Collapse
Affiliation(s)
- Zhi’ang Chen
- Institute
of Molecular and Cell Biology (IMCB), Agency
for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore 138673, Republic of Singapore
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Yi Wee Lim
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Jin Yong Neo
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Rachel Shu Ting Chan
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Li Quan Koh
- Institute
of Molecular and Cell Biology (IMCB), Agency
for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore 138673, Republic of Singapore
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Tsz Ying Yuen
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Yee Hwee Lim
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Charles W. Johannes
- Institute
of Molecular and Cell Biology (IMCB), Agency
for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore 138673, Republic of Singapore
| | - Zachary P. Gates
- Institute
of Molecular and Cell Biology (IMCB), Agency
for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore 138673, Republic of Singapore
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| |
Collapse
|
12
|
Ng CCA, Zhou Y, Yao ZP. Algorithms for de-novo sequencing of peptides by tandem mass spectrometry: A review. Anal Chim Acta 2023; 1268:341330. [PMID: 37268337 DOI: 10.1016/j.aca.2023.341330] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 06/04/2023]
Abstract
Peptide sequencing is of great significance to fundamental and applied research in the fields such as chemical, biological, medicinal and pharmaceutical sciences. With the rapid development of mass spectrometry and sequencing algorithms, de-novo peptide sequencing using tandem mass spectrometry (MS/MS) has become the main method for determining amino acid sequences of novel and unknown peptides. Advanced algorithms allow the amino acid sequence information to be accurately obtained from MS/MS spectra in short time. In this review, algorithms from exhaustive search to the state-of-art machine learning and neural network for high-throughput and automated de-novo sequencing are introduced and compared. Impacts of datasets on algorithm performance are highlighted. The current limitations and promising direction of de-novo peptide sequencing are also discussed in this review.
Collapse
Affiliation(s)
- Cheuk Chi A Ng
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Yin Zhou
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Zhong-Ping Yao
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China.
| |
Collapse
|
13
|
Affiliation(s)
- Bruna Gomes
- From the Departments of Medicine, Genetics, and Biomedical Data Science, Stanford University, Stanford, CA (B.G., E.A.A.); and the Department of Cardiology, Pneumology, and Angiology, Heidelberg University Hospital, Heidelberg, Germany (B.G.)
| | - Euan A Ashley
- From the Departments of Medicine, Genetics, and Biomedical Data Science, Stanford University, Stanford, CA (B.G., E.A.A.); and the Department of Cardiology, Pneumology, and Angiology, Heidelberg University Hospital, Heidelberg, Germany (B.G.)
| |
Collapse
|
14
|
Nguyen SN, Le SH, Ivanov DG, Ivetic N, Nazy I, Kaltashov IA. Structural characterization of a pathogenic antibody underlying vaccine-induced immune thrombotic thrombocytopenia (VITT). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.28.542636. [PMID: 37398203 PMCID: PMC10312456 DOI: 10.1101/2023.05.28.542636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Vaccine-induced immune thrombotic thrombocytopenia (VITT) is a rare but extremely dangerous side effect that has been reported for several adenoviral (Ad)-vectored COVID-19 vaccines. VITT pathology had been linked to production of antibodies that recognize platelet factor 4 (PF4), an endogenous chemokine. In this work we characterize anti-PF4 antibodies obtained from a VITT patient's blood. Intact-mass MS measurements indicate that a significant fraction of this ensemble is comprised of antibodies representing a limited number of clones. MS analysis of large antibody fragments (the light chain, as well as the Fc/2 and Fd fragments of the heavy chain) confirms the monoclonal nature of this component of the anti-PF4 antibodies repertoire, and reveals the presence of a fully mature complex biantennary N-glycan within its Fd segment. Peptide mapping using two complementary proteases and LC-MS/MS analysis were used to determine the amino acid sequence of the entire light chain and over 98% of the heavy chain (excluding a short N-terminal segment). The sequence analysis allows the monoclonal antibody to be assigned to IgG2 subclass and verify that the light chain belongs to the λ-type. Incorporation of enzymatic de- N -glycosylation into the peptide mapping routine allows the N -glycan in the Fab region of the antibody to be localized to the framework 3 region of the V H domain. This novel N -glycosylation site (absent in the germline sequence) is a result of a single mutation giving rise to an NDT motif in the antibody sequence. Peptide mapping also provides a wealth of information on lower-abundance proteolytic fragments derived from the polyclonal component of the anti-PF4 antibody ensemble, revealing the presence of all four subclasses (IgG1 through IgG4) and both types of the light chain (λ and κ). The structural information reported in this work will be indispensable for understanding the molecular mechanism of VITT pathogenesis.
Collapse
|