1
|
Tran NH, Qiao R, Mao Z, Pan S, Zhang Q, Li W, Xin L, Li M, Shan B. NovoBoard: a comprehensive framework for evaluating the false discovery rate and accuracy of de novo peptide sequencing. Mol Cell Proteomics 2024:100849. [PMID: 39321875 DOI: 10.1016/j.mcpro.2024.100849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 08/27/2024] [Accepted: 09/18/2024] [Indexed: 09/27/2024] Open
Abstract
De novo peptide sequencing is one of the most fundamental research areas in mass spectrometry (MS) based proteomics. Many methods have often been evaluated using a couple of simple metrics that do not fully reflect their overall performance. Moreover, there has not been an established method to estimate the false discovery rate (FDR) of de novo peptide-spectrum matches (PSMs). Here we propose NovoBoard, a comprehensive framework to evaluate the performance of de novo peptide sequencing methods. The framework consists of diverse benchmark datasets (including tryptic, nontryptic, immunopeptidomics, and different species), and a standard set of accuracy metrics to evaluate the fragment ions, amino acids, and peptides of the de novo results. More importantly, a new approach is designed to evaluate de novo peptide sequencing methods on target-decoy spectra and to estimate and validate their FDRs. Our FDR estimation provides valuable information to assess the reliability of new peptides identified by de novo sequencing tools, especially when no ground-truth information is available to evaluate their accuracy. The FDR estimation can also be used to evaluate the capability of de novo peptide sequencing tools to distinguish between de novo PSMs and random matches. Our results thoroughly reveal the strengths and weaknesses of different de novo peptide sequencing methods, and how their performances depend on specific applications and the types of data.
Collapse
Affiliation(s)
- Ngoc Hieu Tran
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Rui Qiao
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Zeping Mao
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada; David R. Cheriton School of Computer Science, University of Waterloo, Ontario, Canada
| | - Shengying Pan
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Qing Zhang
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Wenting Li
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Lei Xin
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada.
| | - Ming Li
- David R. Cheriton School of Computer Science, University of Waterloo, Ontario, Canada.
| | - Baozhen Shan
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada.
| |
Collapse
|
2
|
Dens C, Adams C, Laukens K, Bittremieux W. Machine Learning Strategies to Tackle Data Challenges in Mass Spectrometry-Based Proteomics. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:2143-2155. [PMID: 39074335 DOI: 10.1021/jasms.4c00180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/31/2024]
Abstract
In computational proteomics, machine learning (ML) has emerged as a vital tool for enhancing data analysis. Despite significant advancements, the diversity of ML model architectures and the complexity of proteomics data present substantial challenges in the effective development and evaluation of these tools. Here, we highlight the necessity for high-quality, comprehensive data sets to train ML models and advocate for the standardization of data to support robust model development. We emphasize the instrumental role of key data sets like ProteomeTools and MassIVE-KB in advancing ML applications in proteomics and discuss the implications of data set size on model performance, highlighting that larger data sets typically yield more accurate models. To address data scarcity, we explore algorithmic strategies such as self-supervised pretraining and multitask learning. Ultimately, we hope that this discussion can serve as a call to action for the proteomics community to collaborate on data standardization and collection efforts, which are crucial for the sustainable advancement and refinement of ML methodologies in the field.
Collapse
Affiliation(s)
- Ceder Dens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium
| | - Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium
| |
Collapse
|
3
|
Kiseleva OI, Arzumanian VA, Kurbatov IY, Poverennaya EV. In silico and in cellulo approaches for functional annotation of human protein splice variants. BIOMEDITSINSKAIA KHIMIIA 2024; 70:315-328. [PMID: 39324196 DOI: 10.18097/pbmc20247005315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
The elegance of pre-mRNA splicing mechanisms continues to interest scientists even after over a half century, since the discovery of the fact that coding regions in genes are interrupted by non-coding sequences. The vast majority of human genes have several mRNA variants, coding structurally and functionally different protein isoforms in a tissue-specific manner and with a linkage to specific developmental stages of the organism. Alteration of splicing patterns shifts the balance of functionally distinct proteins in living systems, distorts normal molecular pathways, and may trigger the onset and progression of various pathologies. Over the past two decades, numerous studies have been conducted in various life sciences disciplines to deepen our understanding of splicing mechanisms and the extent of their impact on the functioning of living systems. This review aims to summarize experimental and computational approaches used to elucidate the functions of splice variants of a single gene based on our experience accumulated in the laboratory of interactomics of proteoforms at the Institute of Biomedical Chemistry (IBMC) and best global practices.
Collapse
Affiliation(s)
- O I Kiseleva
- Institute of Biomedical Chemistry, Moscow, Russia
| | | | | | | |
Collapse
|
4
|
Tariq U, Saeed F. Predicting peptide properties from mass spectrometry data using deep attention-based multitask network and uncertainty quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.21.609035. [PMID: 39229185 PMCID: PMC11370541 DOI: 10.1101/2024.08.21.609035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Database search algorithms reduce the number of potential candidate peptides against which scoring needs to be performed using a single (i.e. mass) property for filtering. While useful, filtering based on one property may lead to exclusion of non-abundant spectra and uncharacterized peptides - potentially exacerbating the streetlight effect. Here we present ProteoRift, a novel attention and multitask deep-network, which can predict multiple peptide properties (length, missed cleavages, and modification status) directly from spectra. We demonstrate that ProteoRift can predict these properties with up to 97% accuracy resulting in search-space reduction by more than 90%. As a result, our end-to-end pipeline is shown to exhibit 8x to 12x speedups with peptide deduction accuracy comparable to algorithmic techniques. We also formulate two uncertainty estimation metrics, which can distinguish between in-distribution and out-of-distribution data (ROC-AUC 0.99) and predict high-scoring mass spectra against correct peptide (ROC-AUC 0.94). These models and metrics are integrated in an end-to-end ML pipeline available at https://github.com/pcdslab/ProteoRift.
Collapse
Affiliation(s)
- Usman Tariq
- Knight Foundation School of Computing, and Information Sciences, Florida International University (FIU), Miami, FL USA
| | - Fahad Saeed
- Knight Foundation School of Computing, and Information Sciences, Florida International University (FIU), Miami, FL USA
- Biomolecular Sciences Institute (BSI), Florida International University, Miami, FL, USA
- Department of Human and Molecular Genetics, Herbert Wertheim School of Medicine, Florida International University, Miami, FL, USA
| |
Collapse
|
5
|
Flender D, Vilenne F, Adams C, Boonen K, Valkenborg D, Baggerman G. Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain. MASS SPECTROMETRY REVIEWS 2024. [PMID: 39152539 DOI: 10.1002/mas.21905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
Immunopeptidomics is becoming an increasingly important field of study. The capability to identify immunopeptides with pivotal roles in the human immune system is essential to shift the current curative medicine towards personalized medicine. Throughout the years, the field has matured, giving insight into the current pitfalls. Nowadays, it is commonly accepted that generalizing shotgun proteomics workflows is malpractice because immunopeptidomics faces numerous challenges. While many of these difficulties have been addressed, the road towards the ideal workflow remains complicated. Although the presence of Posttranslational modifications (PTMs) in the immunopeptidome has been demonstrated, their identification remains highly challenging despite their significance for immunotherapies. The large number of unpredictable modifications in the immunopeptidome plays a pivotal role in the functionality and these challenges. This review provides a comprehensive overview of the current advancements in immunopeptidomics. We delve into the challenges associated with identifying PTMs within the immunopeptidome, aiming to address the current state of the field.
Collapse
Affiliation(s)
- Daniel Flender
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- Health Unit, VITO, Mol, Belgium
| | - Frédérique Vilenne
- Health Unit, VITO, Mol, Belgium
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- ImmuneSpec, Niel, Belgium
| | - Dirk Valkenborg
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Geert Baggerman
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- ImmuneSpec, Niel, Belgium
| |
Collapse
|
6
|
Yilmaz M, Fondrie WE, Bittremieux W, Melendez CF, Nelson R, Ananth V, Oh S, Noble WS. Sequence-to-sequence translation from mass spectra to peptides with a transformer model. Nat Commun 2024; 15:6427. [PMID: 39080256 PMCID: PMC11289372 DOI: 10.1038/s41467-024-49731-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 06/18/2024] [Indexed: 08/02/2024] Open
Abstract
A fundamental challenge in mass spectrometry-based proteomics is the identification of the peptide that generated each acquired tandem mass spectrum. Approaches that leverage known peptide sequence databases cannot detect unexpected peptides and can be impractical or impossible to apply in some settings. Thus, the ability to assign peptide sequences to tandem mass spectra without prior information-de novo peptide sequencing-is valuable for tasks including antibody sequencing, immunopeptidomics, and metaproteomics. Although many methods have been developed to address this problem, it remains an outstanding challenge in part due to the difficulty of modeling the irregular data structure of tandem mass spectra. Here, we describe Casanovo, a machine learning model that uses a transformer neural network architecture to translate the sequence of peaks in a tandem mass spectrum into the sequence of amino acids that comprise the generating peptide. We train a Casanovo model from 30 million labeled spectra and demonstrate that the model outperforms several state-of-the-art methods on a cross-species benchmark dataset. We also develop a version of Casanovo that is fine-tuned for non-enzymatic peptides. Finally, we demonstrate that Casanovo's superior performance improves the analysis of immunopeptidomics and metaproteomics experiments and allows us to delve deeper into the dark proteome.
Collapse
Affiliation(s)
- Melih Yilmaz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | | | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Carlo F Melendez
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Rowan Nelson
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Varun Ananth
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.
- Department of Genome Sciences, University of Washington, Seattle, USA.
| |
Collapse
|
7
|
Pongcharoen S, Kaewsringam N, Somaparn P, Roytrakul S, Maneerat Y, Pintha K, Topanurak S. Immunopeptidomics in the cancer immunotherapy era. EXPLORATION OF TARGETED ANTI-TUMOR THERAPY 2024; 5:801-817. [PMID: 39280250 PMCID: PMC11390293 DOI: 10.37349/etat.2024.00249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 06/06/2024] [Indexed: 09/18/2024] Open
Abstract
Cancer is the primary cause of death worldwide, and conventional treatments are painful, complicated, and have negative effects on healthy cells. However, cancer immunotherapy has emerged as a promising alternative. Principle of cancer immunotherapy is the re-activation of T-cell to combat the tumor that presents the peptide antigen on major histocompatibility complex (MHC). Those peptide antigens are identified with the set of omics technology, proteomics, genomics, and bioinformatics, which referred to immunopeptidomics. Indeed, immunopeptidomics can identify the neoantigens that are very useful for cancer immunotherapies. This review explored the use of immunopeptidomics for various immunotherapies, i.e., peptide-based vaccines, immune checkpoint inhibitors, oncolytic viruses, and chimeric antigen receptor T-cell. We also discussed how the diversity of neoantigens allows for the discovery of novel antigenic peptides while post-translationally modified peptides diversify the overall peptides binding to MHC or so-called MHC ligandome. The development of immunopeptidomics is keeping up-to-date and very active, particularly for clinical application. Immunopeptidomics is expected to be fast, accurate and reliable for the application for cancer immunotherapies.
Collapse
Affiliation(s)
- Sutatip Pongcharoen
- Division of Immunology, Department of Medicine, Faculty of Medicine, Naresuan University, Phitsanulok 65000, Thailand
| | - Nongphanga Kaewsringam
- Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| | - Poorichaya Somaparn
- Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| | - Sittiruk Roytrakul
- Functional Proteomics Technology Laboratory, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Khlong Nueng, Khlong Luang 12120, Pathum Thani, Thailand
| | - Yaowapa Maneerat
- Department of Tropical Pathology, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| | - Komsak Pintha
- Division of Biochemistry, School of Medical Sciences, University of Phayao, Phayao 56000, Thailand
| | - Supachai Topanurak
- Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| |
Collapse
|
8
|
Petrovskiy DV, Nikolsky KS, Kulikova LI, Rudnev VR, Butkova TV, Malsagova KA, Kopylov AT, Kaysheva AL. PowerNovo: de novo peptide sequencing via tandem mass spectrometry using an ensemble of transformer and BERT models. Sci Rep 2024; 14:15000. [PMID: 38951578 PMCID: PMC11217302 DOI: 10.1038/s41598-024-65861-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 06/25/2024] [Indexed: 07/03/2024] Open
Abstract
The primary objective of analyzing the data obtained in a mass spectrometry-based proteomic experiment is peptide and protein identification, or correct assignment of the tandem mass spectrum to one amino acid sequence. Comparison of empirical fragment spectra with the theoretical predicted one or matching with the collected spectra library are commonly accepted strategies of proteins identification and defining of their amino acid sequences. Although these approaches are widely used and are appreciably efficient for the well-characterized model organisms or measured proteins, they cannot detect novel peptide sequences that have not been previously annotated or are rare. This study presents PowerNovo tool for de novo sequencing of proteins using tandem mass spectra acquired in a variety of types of mass analyzers and different fragmentation techniques. PowerNovo involves an ensemble of models for peptide sequencing: model for detecting regularities in tandem mass spectra, precursors, and fragment ions and a natural language processing model, which has a function of peptide sequence quality assessment and helps with reconstruction of noisy sequences. The results of testing showed that the performance of PowerNovo is comparable and even better than widely utilized PointNovo, DeepNovo, Casanovo, and Novor packages. Also, PowerNovo provides complete cycle of processing (pipeline) of mass spectrometry data and, along with predicting the peptide sequence, involves the peptide assembly and protein inference blocks.
Collapse
|
9
|
Üresin D, Schulte J, Morgner N, Soppa J. C(P)XCG Proteins of Haloferax volcanii with Predicted Zinc Finger Domains: The Majority Bind Zinc, but Several Do Not. Int J Mol Sci 2024; 25:7166. [PMID: 39000272 PMCID: PMC11241148 DOI: 10.3390/ijms25137166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 06/20/2024] [Accepted: 06/24/2024] [Indexed: 07/16/2024] Open
Abstract
In recent years, interest in very small proteins (µ-proteins) has increased significantly, and they were found to fulfill important functions in all prokaryotic and eukaryotic species. The halophilic archaeon Haloferax volcanii encodes about 400 µ-proteins of less than 70 amino acids, 49 of which contain at least two C(P)XCG motifs and are, thus, predicted zinc finger proteins. The determination of the NMR solution structure of HVO_2753 revealed that only one of two predicted zinc fingers actually bound zinc, while a second one was metal-free. Therefore, the aim of the current study was the homologous production of additional C(P)XCG proteins and the quantification of their zinc content. Attempts to produce 31 proteins failed, underscoring the particular difficulties of working with µ-proteins. In total, 14 proteins could be produced and purified, and the zinc content was determined. Only nine proteins complexed zinc, while five proteins were zinc-free. Three of the latter could be analyzed using ESI-MS and were found to contain another metal, most likely cobalt or nickel. Therefore, at least in haloarchaea, the variability of predicted C(P)XCG zinc finger motifs is higher than anticipated, and they can be metal-free, bind zinc, or bind another metal. Notably, AlphaFold2 cannot correctly predict whether or not the four cysteines have the tetrahedral configuration that is a prerequisite for metal binding.
Collapse
Affiliation(s)
- Deniz Üresin
- Institute for Molecular Biosciences, Goethe University, 60438 Frankfurt, Germany;
| | - Jonathan Schulte
- Institute of Physical and Theoretical Chemistry, Goethe University, 60438 Frankfurt, Germany; (J.S.); (N.M.)
| | - Nina Morgner
- Institute of Physical and Theoretical Chemistry, Goethe University, 60438 Frankfurt, Germany; (J.S.); (N.M.)
| | - Jörg Soppa
- Institute for Molecular Biosciences, Goethe University, 60438 Frankfurt, Germany;
| |
Collapse
|
10
|
Ananth V, Sanders J, Yilmaz M, Wen B, Oh S, Noble WS. A learned score function improves the power of mass spectrometry database search. Bioinformatics 2024; 40:i410-i417. [PMID: 38940129 PMCID: PMC11211853 DOI: 10.1093/bioinformatics/btae218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION One of the core problems in the analysis of protein tandem mass spectrometry data is the peptide assignment problem: determining, for each observed spectrum, the peptide sequence that was responsible for generating the spectrum. Two primary classes of methods are used to solve this problem: database search and de novo peptide sequencing. State-of-the-art methods for de novo sequencing use machine learning methods, whereas most database search engines use hand-designed score functions to evaluate the quality of a match between an observed spectrum and a candidate peptide from the database. We hypothesized that machine learning models for de novo sequencing implicitly learn a score function that captures the relationship between peptides and spectra, and thus may be re-purposed as a score function for database search. Because this score function is trained from massive amounts of mass spectrometry data, it could potentially outperform existing, hand-designed database search tools. RESULTS To test this hypothesis, we re-engineered Casanovo, which has been shown to provide state-of-the-art de novo sequencing capabilities, to assign scores to given peptide-spectrum pairs. We then evaluated the statistical power of this Casanovo score function, Casanovo-DB, to detect peptides on a benchmark of three mass spectrometry runs from three different species. In addition, we show that re-scoring with the Percolator post-processor benefits Casanovo-DB more than other score functions, further increasing the number of detected peptides.
Collapse
Affiliation(s)
- Varun Ananth
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Justin Sanders
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Melih Yilmaz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Bo Wen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
11
|
Ebrahimi S, Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. ARXIV 2024:arXiv:2402.11363v3. [PMID: 38659639 PMCID: PMC11042412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Transformer-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Transformer-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Transformer-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Transformer-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Transformer-DIA.
Collapse
Affiliation(s)
- Shiva Ebrahimi
- Computer Science & Engineering, University of North Texas, Denton, USA
| | - Xuan Guo
- Computer Science & Engineering, University of North Texas, Denton, USA
| |
Collapse
|
12
|
Kleikamp HBC, Palacios PA, Kofoed MVW, Papacharalampos G, Bentien A, Nielsen JL. The Selenoproteome as a Dynamic Response Mechanism to Oxidative Stress in Hydrogenotrophic Methanogenic Communities. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:6637-6646. [PMID: 38580315 PMCID: PMC11025550 DOI: 10.1021/acs.est.3c07725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 03/08/2024] [Accepted: 03/08/2024] [Indexed: 04/07/2024]
Abstract
Methanogenesis is a critical process in the carbon cycle that is applied industrially in anaerobic digestion and biogas production. While naturally occurring in diverse environments, methanogenesis requires anaerobic and reduced conditions, although varying degrees of oxygen tolerance have been described. Microaeration is suggested as the next step to increase methane production and improve hydrolysis in digestion processes; therefore, a deeper understanding of the methanogenic response to oxygen stress is needed. To explore the drivers of oxygen tolerance in methanogenesis, two parallel enrichments were performed under the addition of H2/CO2 in an environment without reducing agents and in a redox-buffered environment by adding redox mediator 9,10-anthraquinone-2,7-disulfonate disodium. The cellular response to oxidative conditions is mapped using proteomic analysis. The resulting community showed remarkable tolerance to high-redox environments and was unperturbed in its methane production. Next to the expression of pathways to mitigate reactive oxygen species, the higher redox potential environment showed an increased presence of selenocysteine and selenium-associated pathways. By including sulfur-to-selenium mass shifts in a proteomic database search, we provide the first evidence of the dynamic and large-scale incorporation of selenocysteine as a response to oxidative stress in hydrogenotrophic methanogenesis and the presence of a dynamic selenoproteome.
Collapse
Affiliation(s)
- Hugo B. C. Kleikamp
- Department
of Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, 9220 Aalborg, Denmark
| | - Paola A. Palacios
- Department
of Biological and Chemical Engineering, Aarhus University, Gustav Wieds Vej 10C, 8000 Aarhus, Denmark
| | - Michael V. W. Kofoed
- Department
of Biological and Chemical Engineering, Aarhus University, Gustav Wieds Vej 10C, 8000 Aarhus, Denmark
| | - Georgios Papacharalampos
- Department
of Biological and Chemical Engineering, Aarhus University, Gustav Wieds Vej 10C, 8000 Aarhus, Denmark
| | - Anders Bentien
- Department
of Biological and Chemical Engineering, Aarhus University, Åbogade 40, 8200 Aarhus, Denmark
| | - Jeppe L. Nielsen
- Department
of Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, 9220 Aalborg, Denmark
| |
Collapse
|
13
|
Minegishi Y, Haga Y, Ueda K. Emerging potential of immunopeptidomics by mass spectrometry in cancer immunotherapy. Cancer Sci 2024; 115:1048-1059. [PMID: 38382459 PMCID: PMC11007014 DOI: 10.1111/cas.16118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/02/2024] [Accepted: 02/07/2024] [Indexed: 02/23/2024] Open
Abstract
With significant advances in analytical technologies, research in the field of cancer immunotherapy, such as adoptive T cell therapy, cancer vaccine, and immune checkpoint blockade (ICB), is currently gaining tremendous momentum. Since the efficacy of cancer immunotherapy is recognized only by a minority of patients, more potent tumor-specific antigens (TSAs, also known as neoantigens) and predictive markers for treatment response are of great interest. In cancer immunity, immunopeptides, presented by human leukocyte antigen (HLA) class I, play a role as initiating mediators of immunogenicity. The latest advancement in the interdisciplinary multiomics approach has rapidly enlightened us about the identity of the "dark matter" of cancer and the associated immunopeptides. In this field, mass spectrometry (MS) is a viable option to select because of the naturally processed and actually presented TSA candidates in order to grasp the whole picture of the immunopeptidome. In the past few years the search space has been enlarged by the multiomics approach, the sensitivity of mass spectrometers has been improved, and deep/machine-learning-supported peptide search algorithms have taken immunopeptidomics to the next level. In this review, along with the introduction of key technical advancements in immunopeptidomics, the potential and further directions of immunopeptidomics will be reviewed from the perspective of cancer immunotherapy.
Collapse
Affiliation(s)
- Yuriko Minegishi
- Cancer Proteomics Group, Cancer Precision Medicine CenterJapanese Foundation for Cancer ResearchTokyoJapan
| | - Yoshimi Haga
- Cancer Proteomics Group, Cancer Precision Medicine CenterJapanese Foundation for Cancer ResearchTokyoJapan
| | - Koji Ueda
- Cancer Proteomics Group, Cancer Precision Medicine CenterJapanese Foundation for Cancer ResearchTokyoJapan
| |
Collapse
|
14
|
Wongklaew P, Sriswasdi S, Chuangsuwanich E. MHCSeqNet2-improved peptide-class I MHC binding prediction for alleles with low data. Bioinformatics 2024; 40:btad780. [PMID: 38152987 PMCID: PMC10783953 DOI: 10.1093/bioinformatics/btad780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 12/14/2023] [Accepted: 12/27/2023] [Indexed: 12/29/2023] Open
Abstract
MOTIVATION The binding of a peptide antigen to a Class I major histocompatibility complex (MHC) protein is part of a key process that lets the immune system recognize an infected cell or a cancer cell. This mechanism enabled the development of peptide-based vaccines that can activate the patient's immune response to treat cancers. Hence, the ability of accurately predict peptide-MHC binding is an essential component for prioritizing the best peptides for each patient. However, peptide-MHC binding experimental data for many MHC alleles are still lacking, which limited the accuracy of existing prediction models. RESULTS In this study, we presented an improved version of MHCSeqNet that utilized sub-word-level peptide features, a 3D structure embedding for MHC alleles, and an expanded training dataset to achieve better generalizability on MHC alleles with small amounts of data. Visualization of MHC allele embeddings confirms that the model was able to group alleles with similar binding specificity, including those with no peptide ligand in the training dataset. Furthermore, an external evaluation suggests that MHCSeqNet2 can improve the prioritization of T cell epitopes for MHC alleles with small amount of training data. AVAILABILITY AND IMPLEMENTATION The source code and installation instruction for MHCSeqNet2 are available at https://github.com/cmb-chula/MHCSeqNet2.
Collapse
Affiliation(s)
- Patiphan Wongklaew
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand
| | - Sira Sriswasdi
- Center of Excellence in Computational Molecular Biology, Division of Research Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
- Center for Artificial Intelligence in Medicine, Division of Research Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| | - Ekapol Chuangsuwanich
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand
- Center of Excellence in Computational Molecular Biology, Division of Research Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| |
Collapse
|
15
|
Klaproth-Andrade D, Hingerl J, Bruns Y, Smith NH, Träuble J, Wilhelm M, Gagneur J. Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing. Nat Commun 2024; 15:151. [PMID: 38167372 PMCID: PMC10762064 DOI: 10.1038/s41467-023-44323-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
Unlike for DNA and RNA, accurate and high-throughput sequencing methods for proteins are lacking, hindering the utility of proteomics in applications where the sequences are unknown including variant calling, neoepitope identification, and metaproteomics. We introduce Spectralis, a de novo peptide sequencing method for tandem mass spectrometry. Spectralis leverages several innovations including a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposing fragment ion series classification as a pivotal task for de novo peptide sequencing, and a peptide-spectrum confidence score. On spectra for which database search provided a ground truth, Spectralis surpassed 40% sensitivity at 90% precision, nearly doubling state-of-the-art sensitivity. Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling. Altogether, these algorithmic innovations and the substantial sensitivity increase in the high-precision range constitute an important step toward broadly applicable peptide sequencing.
Collapse
Affiliation(s)
- Daniela Klaproth-Andrade
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Johannes Hingerl
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Yanik Bruns
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Nicholas H Smith
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Jakob Träuble
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Mathias Wilhelm
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
- Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, Freising, Germany.
| | - Julien Gagneur
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| |
Collapse
|
16
|
Schrader M. Origins, Technological Advancement, and Applications of Peptidomics. Methods Mol Biol 2024; 2758:3-47. [PMID: 38549006 DOI: 10.1007/978-1-0716-3646-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Peptidomics is the comprehensive characterization of peptides from biological sources instead of heading for a few single peptides in former peptide research. Mass spectrometry allows to detect a multitude of peptides in complex mixtures and thus enables new strategies leading to peptidomics. The term was established in the year 2001, and up to now, this new field has grown to over 3000 publications. Analytical techniques originally developed for fast and comprehensive analysis of peptides in proteomics were specifically adjusted for peptidomics. Although it is thus closely linked to proteomics, there are fundamental differences with conventional bottom-up proteomics. Fundamental technological advancements of peptidomics since have occurred in mass spectrometry and data processing, including quantification, and more slightly in separation technology. Different strategies and diverse sources of peptidomes are mentioned by numerous applications, such as discovery of neuropeptides and other bioactive peptides, including the use of biochemical assays. Furthermore, food and plant peptidomics are introduced similarly. Additionally, applications with a clinical focus are included, comprising biomarker discovery as well as immunopeptidomics. This overview extensively reviews recent methods, strategies, and applications including links to all other chapters of this book.
Collapse
Affiliation(s)
- Michael Schrader
- Department of Bioengineering Sciences, Weihenstephan-Tr. University of Applied Sciences, Freising, Germany.
| |
Collapse
|
17
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
18
|
Ebrahimi S, Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING 2023; 2023:28-35. [PMID: 38665266 PMCID: PMC11044815 DOI: 10.1109/bibe60311.2023.00013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Casanovo-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Casanovo-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Casanovo-DIA.
Collapse
Affiliation(s)
- Shiva Ebrahimi
- Computer Science & Engineering University of North Texas Denton, USA
| | - Xuan Guo
- Computer Science & Engineering University of North Texas Denton, USA
| |
Collapse
|
19
|
Fan KT, Hsu CW, Chen YR. Mass spectrometry in the discovery of peptides involved in intercellular communication: From targeted to untargeted peptidomics approaches. MASS SPECTROMETRY REVIEWS 2023; 42:2404-2425. [PMID: 35765846 DOI: 10.1002/mas.21789] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 03/17/2022] [Accepted: 04/08/2022] [Indexed: 06/15/2023]
Abstract
Endogenous peptide hormones represent an essential class of biomolecules, which regulate cell-cell communications in diverse physiological processes of organisms. Mass spectrometry (MS) has been developed to be a powerful technology for identifying and quantifying peptides in a highly efficient manner. However, it is difficult to directly identify these peptide hormones due to their diverse characteristics, dynamic regulations, low abundance, and existence in a complicated biological matrix. Here, we summarize and discuss the roles of targeted and untargeted MS in discovering peptide hormones using bioassay-guided purification, bioinformatics screening, or the peptidomics-based approach. Although the peptidomics approach is expected to discover novel peptide hormones unbiasedly, only a limited number of successful cases have been reported. The critical challenges and corresponding measures for peptidomics from the steps of sample preparation, peptide extraction, and separation to the MS data acquisition and analysis are also discussed. We also identify emerging technologies and methods that can be integrated into the discovery platform toward the comprehensive study of endogenous peptide hormones.
Collapse
Affiliation(s)
- Kai-Ting Fan
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| | - Chia-Wei Hsu
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| | - Yet-Ran Chen
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
20
|
Wang X, Zhang Z, Shi C, Wang Y, Zhou T, Lin A. Clinical prospects and research strategies of long non-coding RNA encoding micropeptides. Zhejiang Da Xue Xue Bao Yi Xue Ban 2023; 52:397-405. [PMID: 37643974 PMCID: PMC10495248 DOI: 10.3724/zdxbyxb-2023-0128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 07/20/2023] [Indexed: 08/12/2023]
Abstract
Long non-coding RNAs (lncRNAs) which are usually thought to have no protein coding ability, are widely involved in cell proliferation, signal transduction and other biological activities. However, recent studies have suggested that short open reading frames (sORFs) of some lncRNAs can encode small functional peptides (micropeptides). These micropeptides appear to play important roles in calcium homeostasis, embryonic development and tumorigenesis, suggesting their potential as therapeutic targets and diagnostic biomarkers. Currently, bioinformatic tools as well as experimental methods such as ribosome mapping and in vitro translation are applied to predict the coding potential of lncRNAs. Furthermore, mass spectrometry, specific antibodies and epitope tags are used for validating the expression of micropeptides. Here, we review the physiological and pathological functions of recently identified micropeptides as well as research strategies for predicting the coding potential of lncRNAs to facilitate the further research of lncRNA encoded micropeptides.
Collapse
Affiliation(s)
- Xinyi Wang
- College of Life Sciences, Zhejiang University, Hangzhou 310058, China.
- Zhejiang University Cancer Center, Hangzhou 310058, China.
| | - Zhen Zhang
- College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- Zhejiang University Cancer Center, Hangzhou 310058, China
| | - Chengyu Shi
- College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- Zhejiang University Cancer Center, Hangzhou 310058, China
| | - Ying Wang
- College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- Zhejiang University Cancer Center, Hangzhou 310058, China
| | - Tianhua Zhou
- Zhejiang University Cancer Center, Hangzhou 310058, China.
- The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Center for RNA Medicine, International Institutes of Medicine, Zhejiang University, Jinhua 322000, Zhejiang Province, China.
- Department of Cell Biology, Zhejiang University School of Medicine, Hangzhou 310058, China.
| | - Aifu Lin
- College of Life Sciences, Zhejiang University, Hangzhou 310058, China.
- Zhejiang University Cancer Center, Hangzhou 310058, China.
- The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Center for RNA Medicine, International Institutes of Medicine, Zhejiang University, Jinhua 322000, Zhejiang Province, China.
| |
Collapse
|
21
|
Ng CCA, Zhou Y, Yao ZP. Algorithms for de-novo sequencing of peptides by tandem mass spectrometry: A review. Anal Chim Acta 2023; 1268:341330. [PMID: 37268337 DOI: 10.1016/j.aca.2023.341330] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 06/04/2023]
Abstract
Peptide sequencing is of great significance to fundamental and applied research in the fields such as chemical, biological, medicinal and pharmaceutical sciences. With the rapid development of mass spectrometry and sequencing algorithms, de-novo peptide sequencing using tandem mass spectrometry (MS/MS) has become the main method for determining amino acid sequences of novel and unknown peptides. Advanced algorithms allow the amino acid sequence information to be accurately obtained from MS/MS spectra in short time. In this review, algorithms from exhaustive search to the state-of-art machine learning and neural network for high-throughput and automated de-novo sequencing are introduced and compared. Impacts of datasets on algorithm performance are highlighted. The current limitations and promising direction of de-novo peptide sequencing are also discussed in this review.
Collapse
Affiliation(s)
- Cheuk Chi A Ng
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Yin Zhou
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Zhong-Ping Yao
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China.
| |
Collapse
|
22
|
Admon A. The biogenesis of the immunopeptidome. Semin Immunol 2023; 67:101766. [PMID: 37141766 DOI: 10.1016/j.smim.2023.101766] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/26/2023] [Accepted: 04/26/2023] [Indexed: 05/06/2023]
Abstract
The immunopeptidome is the repertoire of peptides bound and presented by the MHC class I, class II, and non-classical molecules. The peptides are produced by the degradation of most cellular proteins, and in some cases, peptides are produced from extracellular proteins taken up by the cells. This review attempts to first describe some of its known and well-accepted concepts, and next, raise some questions about a few of the established dogmas in this field: The production of novel peptides by splicing is questioned, suggesting here that spliced peptides are extremely rare, if existent at all. The degree of the contribution to the immunopeptidome by degradation of cellular protein by the proteasome is doubted, therefore this review attempts to explain why it is likely that this contribution to the immunopeptidome is possibly overstated. The contribution of defective ribosome products (DRiPs) and non-canonical peptides to the immunopeptidome is noted and methods are suggested to quantify them. In addition, the common misconception that the MHC class II peptidome is mostly derived from extracellular proteins is noted, and corrected. It is stressed that the confirmation of sequence assignments of non-canonical and spliced peptides should rely on targeted mass spectrometry using spiking-in of heavy isotope-labeled peptides. Finally, the new methodologies and modern instrumentation currently available for high throughput kinetics and quantitative immunopeptidomics are described. These advanced methods open up new possibilities for utilizing the big data generated and taking a fresh look at the established dogmas and reevaluating them critically.
Collapse
Affiliation(s)
- Arie Admon
- Faculty of Biology, Technion-Israel Institute of Technology, Israel.
| |
Collapse
|
23
|
Beslic D, Tscheuschner G, Renard BY, Weller MG, Muth T. Comprehensive evaluation of peptide de novo sequencing tools for monoclonal antibody assembly. Brief Bioinform 2023; 24:bbac542. [PMID: 36545804 PMCID: PMC9851299 DOI: 10.1093/bib/bbac542] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/25/2022] [Accepted: 11/10/2022] [Indexed: 12/24/2022] Open
Abstract
Monoclonal antibodies are biotechnologically produced proteins with various applications in research, therapeutics and diagnostics. Their ability to recognize and bind to specific molecule structures makes them essential research tools and therapeutic agents. Sequence information of antibodies is helpful for understanding antibody-antigen interactions and ensuring their affinity and specificity. De novo protein sequencing based on mass spectrometry is a valuable method to obtain the amino acid sequence of peptides and proteins without a priori knowledge. In this study, we evaluated six recently developed de novo peptide sequencing algorithms (Novor, pNovo 3, DeepNovo, SMSNet, PointNovo and Casanovo), which were not specifically designed for antibody data. We validated their ability to identify and assemble antibody sequences on three multi-enzymatic data sets. The deep learning-based tools Casanovo and PointNovo showed an increased peptide recall across different enzymes and data sets compared with spectrum-graph-based approaches. We evaluated different error types of de novo peptide sequencing tools and their performance for different numbers of missing cleavage sites, noisy spectra and peptides of various lengths. We achieved a sequence coverage of 97.69-99.53% on the light chains of three different antibody data sets using the de Bruijn assembler ALPS and the predictions from Casanovo. However, low sequence coverage and accuracy on the heavy chains demonstrate that complete de novo protein sequencing remains a challenging issue in proteomics that requires improved de novo error correction, alternative digestion strategies and hybrid approaches such as homology search to achieve high accuracy on long protein sequences.
Collapse
Affiliation(s)
- Denis Beslic
- Robert Koch Institute, MF1, Nordufer 20, 13353 Berlin
| | - Georg Tscheuschner
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| | - Bernhard Y Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam
| | - Michael G Weller
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| | - Thilo Muth
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| |
Collapse
|
24
|
The Current State-of-the-Art Identification of Unknown Proteins Using Mass Spectrometry Exemplified on De Novo Sequencing of a Venom Protease from Bothrops moojeni. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27154976. [PMID: 35956926 PMCID: PMC9370501 DOI: 10.3390/molecules27154976] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 07/29/2022] [Accepted: 08/03/2022] [Indexed: 11/16/2022]
Abstract
(1) Background: The amino acid sequence elucidation of peptides from the gas phase fragmentation mass spectra, de novo sequencing, is a valuable method for the identification of unknown proteins complementary to Edman sequencing. It is increasingly used in shot-gun mass spectrometry (MS)-based proteomics experiments. We review the current state-of-the-art and use the identification of an unknown snake venom protein targeting the human tissue factor (TF) as an example to describe the analysis process based on manual spectrum interrogation. (2) Methods: The immobilized TF was incubated with a crude B. moojeni venom solution. The potential binding partners were eluted and further purified by gel electrophoresis. Edman degradation was performed to elucidate the N-terminus of the 31 kDa protein of interest. High-resolution MS with collision-induced dissociation was employed to generate peptide fragmentation spectra. Sequence tags were deduced and used for searches in the NCBI and Uniprot databases. Protein matches from the snake species were further validated by target MS/MS. (3) Results: Sequence tag D [K/Q] D [I/L] VDD [K/Q] led to a snake venom serine protease (SVSP) from lancehead B. jararaca (P81824). With target MS/MS, 24% of the SVSP sequence were confirmed; an additional 41% were tentatively assigned by data-independent MS. Edman sequencing provided information for 10 N-terminal amino acid residues, also confirming the match to SVSP. (4) Conclusions: The identification of unknown proteins continues to be a challenge despite major advances in MS instrumentation and bioinformatic tools. The main requirement is the generation of meaningful, high-quality MS peptide fragmentation spectra. These are used to elucidate sufficiently long sequence tags, which can subsequently be submitted to searches in protein databases. This basic method does not require extensive bioinformatics because peptide MS/MS spectra, especially of doubly-charged ions, can be analysed manually. We demonstrated the procedure with the elucidation of SVSP. While de novo sequencing quickly indicates the correct protein group, the validation of the entire protein sequence of amino acid-by-amino acid will take time. Reasons are the need to properly assign isobaric amino acid residues and modifications. With the ongoing efforts in genomics and transcriptomics and the availability of ever more data in public databases, the need for de novo MS sequencing will decrease. Still, not every animal and plant species will be sequenced, so the combination of MS and Edman sequencing will continue to be of importance for the identification of unknown proteins.
Collapse
|
25
|
Identification of Daboia siamensis venome using integrated multi-omics data. Sci Rep 2022; 12:13140. [PMID: 35907887 PMCID: PMC9338987 DOI: 10.1038/s41598-022-17300-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 07/22/2022] [Indexed: 11/08/2022] Open
Abstract
Snakebite, classified by World Health Organization as a neglected tropical disease, causes more than 100,000 deaths and 2 million injuries per year. Currently, available antivenoms do not bind with strong specificity to target toxins, which means that severe complications can still occur despite treatment. Moreover, the cost of antivenom is expensive. Knowledge of venom compositions is fundamental for producing a specific antivenom that has high effectiveness, low side effects, and ease of manufacture. With advances in mass spectrometry techniques, venom proteomes can now be analyzed in great depth at high efficiency. However, these techniques require genomic and transcriptomic data for interpreting mass spectrometry data. This study aims to establish and incorporate genomics, transcriptomics, and proteomics data to study venomics of a venomous snake, Daboia siamensis. Multiple proteins that have not been reported as venom components of this snake such as hyaluronidase-1, phospholipase B, and waprin were discovered. Thus, multi-omics data are advantageous for venomics studies. These findings will be valuable not only for antivenom production but also for the development of novel therapeutics.
Collapse
|
26
|
Reixachs‐Solé M, Eyras E. Uncovering the impacts of alternative splicing on the proteome with current omics techniques. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1707. [PMID: 34979593 PMCID: PMC9542554 DOI: 10.1002/wrna.1707] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 11/27/2021] [Accepted: 11/29/2021] [Indexed: 12/15/2022]
Abstract
The high-throughput sequencing of cellular RNAs has underscored a broad effect of isoform diversification through alternative splicing on the transcriptome. Moreover, the differential production of transcript isoforms from gene loci has been recognized as a critical mechanism in cell differentiation, organismal development, and disease. Yet, the extent of the impact of alternative splicing on protein production and cellular function remains a matter of debate. Multiple experimental and computational approaches have been developed in recent years to address this question. These studies have unveiled how molecular changes at different steps in the RNA processing pathway can lead to differences in protein production and have functional effects. New and emerging experimental technologies open exciting new opportunities to develop new methods to fully establish the connection between messenger RNA expression and protein production and to further investigate how RNA variation impacts the proteome and cell function. This article is categorized under: RNA Processing > Splicing Regulation/Alternative Splicing Translation > Regulation RNA Evolution and Genomics > Computational Analyses of RNA.
Collapse
Affiliation(s)
- Marina Reixachs‐Solé
- The John Curtin School of Medical ResearchAustralian National UniversityCanberraAustralian Capital TerritoryAustralia
- EMBL Australia Partner Laboratory Network and the Australian National UniversityCanberraAustralian Capital TerritoryAustralia
| | - Eduardo Eyras
- The John Curtin School of Medical ResearchAustralian National UniversityCanberraAustralian Capital TerritoryAustralia
- EMBL Australia Partner Laboratory Network and the Australian National UniversityCanberraAustralian Capital TerritoryAustralia
- Catalan Institution for Research and Advanced StudiesBarcelonaSpain
- Hospital del Mar Medical Research Institute (IMIM)BarcelonaSpain
| |
Collapse
|
27
|
Cancer-related micropeptides encoded by ncRNAs: Promising drug targets and prognostic biomarkers. Cancer Lett 2022; 547:215723. [DOI: 10.1016/j.canlet.2022.215723] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 04/14/2022] [Accepted: 05/01/2022] [Indexed: 02/07/2023]
|
28
|
Sricharoensuk C, Boonchalermvichien T, Muanwien P, Somparn P, Pisitkun T, Sriswasdi S. Unsupervised Mining of HLA-I Peptidomes Reveals New Binding Motifs and Potential False Positives in the Community Database. Front Immunol 2022; 13:847756. [PMID: 35386688 PMCID: PMC8977642 DOI: 10.3389/fimmu.2022.847756] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 02/25/2022] [Indexed: 11/13/2022] Open
Abstract
Modern vaccine designs and studies of human leukocyte antigen (HLA)-mediated immune responses rely heavily on the knowledge of HLA allele-specific binding motifs and computational prediction of HLA-peptide binding affinity. Breakthroughs in HLA peptidomics have considerably expanded the databases of natural HLA ligands and enabled detailed characterizations of HLA-peptide binding specificity. However, cautions must be made when analyzing HLA peptidomics data because identified peptides may be contaminants in mass spectrometry or may weakly bind to the HLA molecules. Here, a hybrid de novo peptide sequencing approach was applied to large-scale mono-allelic HLA peptidomics datasets to uncover new ligands and refine current knowledge of HLA binding motifs. Up to 12-40% of the peptidomics data were low-binding affinity peptides with an arginine or a lysine at the C-terminus and likely to be tryptic peptide contaminants. Thousands of these peptides have been reported in a community database as legitimate ligands and might be erroneously used for training prediction models. Furthermore, unsupervised clustering of identified ligands revealed additional binding motifs for several HLA class I alleles and effectively isolated outliers that were experimentally confirmed to be false positives. Overall, our findings expanded the knowledge of HLA binding specificity and advocated for more rigorous interpretation of HLA peptidomics data that will ensure the high validity of community HLA ligandome databases.
Collapse
Affiliation(s)
- Chatchapon Sricharoensuk
- Center of Excellence in Computational Molecular Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Tanupat Boonchalermvichien
- Center of Excellence in Computational Molecular Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Phijitra Muanwien
- Medical Sciences, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Poorichaya Somparn
- Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Trairak Pisitkun
- Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.,Research Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Sira Sriswasdi
- Center of Excellence in Computational Molecular Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.,Research Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
29
|
Combined proteomic strategies for in-depth venomic analysis of the beaked sea snake (Hydrophis schistosus) from Songkhla Lake, Thailand. J Proteomics 2022; 259:104559. [DOI: 10.1016/j.jprot.2022.104559] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 02/22/2022] [Accepted: 03/07/2022] [Indexed: 11/30/2022]
|
30
|
Tran NH, Xu J, Li M. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction. Brief Bioinform 2022; 23:bbab493. [PMID: 34891158 PMCID: PMC8769896 DOI: 10.1093/bib/bbab493] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open
Abstract
In this article, we review two challenging computational questions in protein science: neoantigen prediction and protein structure prediction. Both topics have seen significant leaps forward by deep learning within the past five years, which immediately unlocked new developments of drugs and immunotherapies. We show that deep learning models offer unique advantages, such as representation learning and multi-layer architecture, which make them an ideal choice to leverage a huge amount of protein sequence and structure data to address those two problems. We also discuss the impact and future possibilities enabled by those two applications, especially how the data-driven approach by deep learning shall accelerate the progress towards personalized biomedicine.
Collapse
Affiliation(s)
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| | - Ming Li
- University of Waterloo, Canada
| |
Collapse
|
31
|
Chen L, Zhang Y, Yang Y, Yang Y, Li H, Dong X, Wang H, Xie Z, Zhao Q. An Integrated Approach for Discovering Noncanonical MHC-I Peptides Encoded by Small Open Reading Frames. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:2346-2357. [PMID: 34260243 DOI: 10.1021/jasms.1c00076] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
MHC-I peptides are a group of important immunopeptides presented by major histocompatibility complex (MHC) on the cell surface for immune recognition. The majority of reported MHC-I peptides are derived from protein coding sequences, and noncanonical peptides translated from small open reading frames (sORF) are largely unknown due to the lack of accurate and sensitive detection methods. Herein we report an efficient approach that implements complementary bioinformatic strategies to improve the identification of noncanonical MHC-I peptides. In a database search strategy, noncanonical immunopeptides mapping was optimized by combining three complementary pipelines to construct predicted sORF databases from Ribo-seq data. In a de novo peptide sequencing strategy, MS data search results were filtered against sORF databases to pin down additional noncanonical immunopeptides. In total, 308 noncanonical immunopeptides were identified from two tumor cell lines with selected ones vigorously validated. Our approach is a handy solution to identify noncanonical MHC peptides with Ribo-seq and MS data. Meanwhile, the novel noncanonical immunopeptides identified with this method could shed insights on fundamental immunology as well as cancer immunotherapies.
Collapse
Affiliation(s)
- Lei Chen
- Laboratory for Synthetic Chemistry and Chemical Biology Limited, Hong Kong SAR 999077, China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hong Kong SAR 999077, China
| | - Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hong Kong SAR 999077, China
| | - Yang Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hong Kong SAR 999077, China
| | - Huihui Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 510623, China
| | - Xuan Dong
- BGI-Shenzhen, Shenzhen 518083, China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 510623, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 510623, China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hong Kong SAR 999077, China
| |
Collapse
|
32
|
Perpetuo L, Klein J, Ferreira R, Guedes S, Amado F, Leite-Moreira A, Silva AMS, Thongboonkerd V, Vitorino R. How can artificial intelligence be used for peptidomics? Expert Rev Proteomics 2021; 18:527-556. [PMID: 34343059 DOI: 10.1080/14789450.2021.1962303] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
INTRODUCTION Peptidomics is an emerging field of omics sciences using advanced isolation, analysis, and computational techniques that enable qualitative and quantitative analyses of various peptides in biological samples. Peptides can act as useful biomarkers and as therapeutic molecules for diseases. AREAS COVERED The use of therapeutic peptides can be predicted quickly and efficiently using data-driven computational methods, particularly artificial intelligence (AI) approach. Various AI approaches are useful for peptide-based drug discovery, such as support vector machine, random forest, extremely randomized trees, and other more recently developed deep learning methods. AI methods are relatively new to the development of peptide-based therapies, but these techniques already become essential tools in protein science by dissecting novel therapeutic peptides and their functions (Figure 1).[Figure: see text]. EXPERT OPINION Researchers have shown that AI models can facilitate the development of peptidomics and selective peptide therapies in the field of peptide science. Biopeptide prediction is important for the discovery and development of successful peptide-based drugs. Due to their ability to predict therapeutic roles based on sequence details, many AI-dependent prediction tools have been developed (Figure 1).
Collapse
Affiliation(s)
- Luís Perpetuo
- iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro
| | - Julie Klein
- Institut National de la Santé et de la Recherche Médicale (INSERM), U1297, Institute of Cardiovascular and Metabolic Disease, Université Toulouse III, Toulouse, France
| | - Rita Ferreira
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Sofia Guedes
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Francisco Amado
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Adelino Leite-Moreira
- UnIC, Departamento de Cirurgia e Fisiologia, Faculdade de Medicina da Universidade do Porto, Porto
| | - Artur M S Silva
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Rui Vitorino
- iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro.,LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro.,UnIC, Departamento de Cirurgia e Fisiologia, Faculdade de Medicina da Universidade do Porto, Porto
| |
Collapse
|
33
|
Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00304-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
34
|
Abstract
Mass spectrometry (MS)-based proteomics is currently the most successful approach to measure and compare peptides and proteins in a large variety of biological samples. Modern mass spectrometers, equipped with high-resolution analyzers, provide large amounts of data output. This is the case of shotgun/bottom-up proteomics, which consists in the enzymatic digestion of protein into peptides that are then measured by MS-instruments through a data dependent acquisition (DDA) mode. Dedicated bioinformatic tools and platforms have been developed to face the increasing size and complexity of raw MS data that need to be processed and interpreted for large-scale protein identification and quantification. This chapter illustrates the most popular bioinformatics solution for the analysis of shotgun MS-proteomics data. A general description will be provided on the data preprocessing options and the different search engines available, including practical suggestions on how to optimize the parameters for peptide search, based on hands-on experience.
Collapse
Affiliation(s)
- Avinash Yadav
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Federica Marini
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Alessandro Cuomo
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Tiziana Bonaldi
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy.
| |
Collapse
|
35
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
36
|
Zahn S, Kubatova N, Pyper DJ, Cassidy L, Saxena K, Tholey A, Schwalbe H, Soppa J. Biological functions, genetic and biochemical characterization, and NMR structure determination of the small zinc finger protein HVO_2753 from
Haloferax volcanii. FEBS J 2020; 288:2042-2062. [DOI: 10.1111/febs.15559] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 06/26/2020] [Accepted: 09/02/2020] [Indexed: 12/26/2022]
Affiliation(s)
- Sebastian Zahn
- Institute for Molecular Biosciences Goethe‐University Frankfurt Germany
| | - Nina Kubatova
- Institute for Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance Goethe‐University Frankfurt/Main Germany
| | - Dennis J. Pyper
- Institute for Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance Goethe‐University Frankfurt/Main Germany
| | - Liam Cassidy
- Systematic Proteome Research & Bioanalytics Institute for Experimental Medicine Christian‐Albrechts‐Universität zu Kiel Kiel Germany
| | - Krishna Saxena
- Institute for Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance Goethe‐University Frankfurt/Main Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics Institute for Experimental Medicine Christian‐Albrechts‐Universität zu Kiel Kiel Germany
| | - Harald Schwalbe
- Institute for Organic Chemistry and Chemical Biology Center for Biomolecular Magnetic Resonance Goethe‐University Frankfurt/Main Germany
| | - Jörg Soppa
- Institute for Molecular Biosciences Goethe‐University Frankfurt Germany
- Johann Wolfgang Goethe‐Universität Frankfurt am Main Germany
| |
Collapse
|
37
|
O'Bryon I, Jenson SC, Merkley ED. Flying blind, or just flying under the radar? The underappreciated power of de novo methods of mass spectrometric peptide identification. Protein Sci 2020; 29:1864-1878. [PMID: 32713088 PMCID: PMC7454419 DOI: 10.1002/pro.3919] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/21/2020] [Accepted: 07/23/2020] [Indexed: 12/15/2022]
Abstract
Mass spectrometry-based proteomics is a popular and powerful method for precise and highly multiplexed protein identification. The most common method of analyzing untargeted proteomics data is called database searching, where the database is simply a collection of protein sequences from the target organism, derived from genome sequencing. Experimental peptide tandem mass spectra are compared to simplified models of theoretical spectra calculated from the translated genomic sequences. However, in several interesting application areas, such as forensics, archaeology, venomics, and others, a genome sequence may not be available, or the correct genome sequence to use is not known. In these cases, de novo peptide identification can play an important role. De novo methods infer peptide sequence directly from the tandem mass spectrum without reference to a sequence database, usually using graph-based or machine learning algorithms. In this review, we provide a basic overview of de novo peptide identification methods and applications, briefly covering de novo algorithms and tools, and focusing in more depth on recent applications from venomics, metaproteomics, forensics, and characterization of antibody drugs.
Collapse
Affiliation(s)
- Isabelle O'Bryon
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Sarah C. Jenson
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Eric D. Merkley
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| |
Collapse
|
38
|
Abstract
Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.
Collapse
|
39
|
Kote S, Pirog A, Bedran G, Alfaro J, Dapic I. Mass Spectrometry-Based Identification of MHC-Associated Peptides. Cancers (Basel) 2020; 12:cancers12030535. [PMID: 32110973 PMCID: PMC7139412 DOI: 10.3390/cancers12030535] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 02/15/2020] [Accepted: 02/20/2020] [Indexed: 02/06/2023] Open
Abstract
Neoantigen-based immunotherapies promise to improve patient outcomes over the current standard of care. However, detecting these cancer-specific antigens is one of the significant challenges in the field of mass spectrometry. Even though the first sequencing of the immunopeptides was done decades ago, today there is still a diversity of the protocols used for neoantigen isolation from the cell surface. This heterogeneity makes it difficult to compare results between the laboratories and the studies. Isolation of the neoantigens from the cell surface is usually done by mild acid elution (MAE) or immunoprecipitation (IP) protocol. However, limited amounts of the neoantigens present on the cell surface impose a challenge and require instrumentation with enough sensitivity and accuracy for their detection. Detecting these neopeptides from small amounts of available patient tissue limits the scope of most of the studies to cell cultures. Here, we summarize protocols for the extraction and identification of the major histocompatibility complex (MHC) class I and II peptides. We aimed to evaluate existing methods in terms of the appropriateness of the isolation procedure, as well as instrumental parameters used for neoantigen detection. We also focus on the amount of the material used in the protocols as the critical factor to consider when analyzing neoantigens. Beyond experimental aspects, there are numerous readily available proteomics suits/tools applicable for neoantigen discovery; however, experimental validation is still necessary for neoantigen characterization.
Collapse
|