1
|
Singer F, Kuhring M, Renard BY, Muth T. Moving Toward Metaproteogenomics: A Computational Perspective on Analyzing Microbial Samples via Proteogenomics. Methods Mol Biol 2025; 2859:297-318. [PMID: 39436609 DOI: 10.1007/978-1-0716-4152-1_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Microbial sample analysis has received growing attention within the last decade, driven by important findings in microbiome research and promising applications in the biotechnological field. Modern mass spectrometry-based methodology has been established in this context, providing sufficient sensitivity, resolution, dynamic range, and throughput to analyze the so-called metaproteome of complex microbial mixtures from clinical or environmental samples. While proteomic analyses were previously restricted to common model organisms, next-generation sequencing technologies nowadays allow for the rapid and cost-efficient characterization of whole metagenomes of microbial consortia and specific genomes from non-model organisms to which microbes contribute by significant amounts. This proteogenomic approach, meaning the combined application of genomic and proteomic methods, enables researchers to create a protein database that presents a tailored blueprint of the microbial sample under investigation. This contribution provides an overview of the computational challenges and opportunities in proteogenomics and metaproteomics as of January 2018. For practical application, we first showcase an integrative proteogenomic method that circumvents existing reference databases by creating sample-specific transcripts. The underlying algorithm uses a graph network approach that combines RNA-Seq and peptide information. As a second example, we provide a tutorial for a simulation tool that estimates the computational limits of detecting microbial non-model organisms. This method evaluates the potential influence of error-tolerant searches and proteogenomic approaches on databases of interest. Finally, we discuss recommendations for developing future strategies that may help overcome present limitations by combining the strengths of genome- and proteome-based methods and moving toward an integrated metaproteogenomics approach.
Collapse
Affiliation(s)
- Franziska Singer
- NEXUS Personalized Health Technologies, ETH Zürich, Zürich, Switzerland
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany
| | - Mathias Kuhring
- Core Unit Bioinformatics, Berlin Institute of Health (BIH) at Charité, Berlin, Germany
| | - Bernhard Y Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.
- Bioinformatics Unit, Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany.
| | - Thilo Muth
- Domain Data Competence Center (MF2), Department for Research Infrastructure and Information Technology, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
2
|
Ren Y, Yue Y, Li X, Weng S, Xu H, Liu L, Cheng Q, Luo P, Zhang T, Liu Z, Han X. Proteogenomics offers a novel avenue in neoantigen identification for cancer immunotherapy. Int Immunopharmacol 2024; 142:113147. [PMID: 39270345 DOI: 10.1016/j.intimp.2024.113147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 08/11/2024] [Accepted: 09/08/2024] [Indexed: 09/15/2024]
Abstract
Cancer neoantigens are tumor-specific non-synonymous mutant peptides that activate the immune system to produce an anti-tumor response. Personalized cancer vaccines based on neoantigens are currently one of the most promising therapeutic approaches for cancer treatment. By utilizing the unique mutations within each patient's tumor, these vaccines aim to elicit a strong and specific immune response against cancer cells. However, the identification of neoantigens remains challenging due to the low accuracy of current prediction tools and the high false-positive rate of candidate neoantigens. Since the concept of "proteogenomics" emerged in 2004, it has evolved rapidly with the increased sequencing depth of next-generation sequencing technologies and the maturation of mass spectrometry-based proteomics technologies to become a more comprehensive approach to neoantigen identification, allowing the discovery of high-confidence candidate neoantigens. In this review, we summarize the reason why cancer neoantigens have become attractive targets for immunotherapy, the mechanism of cancer vaccines and the advances in cancer immunotherapy. Considerations relevant to the application emerging of proteogenomics technologies for neoantigen identification and challenges in this field are described.
Collapse
Affiliation(s)
- Yuqing Ren
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China; Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Yi Yue
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Xinyang Li
- Department of Oncology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Siyuan Weng
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Hui Xu
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Long Liu
- Department of Hepatobiliary and Pancreatic Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Quan Cheng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Peng Luo
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Tengfei Zhang
- Department of Oncology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China.
| | - Zaoqu Liu
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100730, China.
| | - Xinwei Han
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China; Interventional Institute of Zhengzhou University, Zhengzhou, Henan 450052, China; Interventional Treatment and Clinical Research Center of Henan Province, Zhengzhou, Henan 450052, China.
| |
Collapse
|
3
|
Flender D, Vilenne F, Adams C, Boonen K, Valkenborg D, Baggerman G. Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain. MASS SPECTROMETRY REVIEWS 2024. [PMID: 39152539 DOI: 10.1002/mas.21905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
Immunopeptidomics is becoming an increasingly important field of study. The capability to identify immunopeptides with pivotal roles in the human immune system is essential to shift the current curative medicine towards personalized medicine. Throughout the years, the field has matured, giving insight into the current pitfalls. Nowadays, it is commonly accepted that generalizing shotgun proteomics workflows is malpractice because immunopeptidomics faces numerous challenges. While many of these difficulties have been addressed, the road towards the ideal workflow remains complicated. Although the presence of Posttranslational modifications (PTMs) in the immunopeptidome has been demonstrated, their identification remains highly challenging despite their significance for immunotherapies. The large number of unpredictable modifications in the immunopeptidome plays a pivotal role in the functionality and these challenges. This review provides a comprehensive overview of the current advancements in immunopeptidomics. We delve into the challenges associated with identifying PTMs within the immunopeptidome, aiming to address the current state of the field.
Collapse
Affiliation(s)
- Daniel Flender
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- Health Unit, VITO, Mol, Belgium
| | - Frédérique Vilenne
- Health Unit, VITO, Mol, Belgium
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- ImmuneSpec, Niel, Belgium
| | - Dirk Valkenborg
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Geert Baggerman
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- ImmuneSpec, Niel, Belgium
| |
Collapse
|
4
|
Lin A, Torres CM, Hobbs EC, Bardhan J, Aley SB, Spencer CT, Taylor KL, Chiang T. Computational and Systems Biology Advances to Enable Bioagent Agnostic Signatures. Health Secur 2024; 22:130-139. [PMID: 38483337 PMCID: PMC11044874 DOI: 10.1089/hs.2023.0076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024] Open
Affiliation(s)
- Andy Lin
- Andy Lin, PhD, is a Linus Pauling Distinguished Postdoctoral Fellow; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| | - Cameron M. Torres
- Cameron M. Torres is a Graduate Research Assistant and Wieland Fellow, Department of Biological Sciences; at the University of Texas at El Paso, El Paso, TX
| | - Errett C. Hobbs
- Errett C. Hobbs, PhD, is a Data Scientist; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| | - Jaydeep Bardhan
- Jaydeep Bardhan, PhD, is a Research Line Manager, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA
| | - Stephen B. Aley
- Stephen B. Aley, PhD, is a Professor, Biological Sciences, and an Associate Vice President for Research, Sponsored Projects; at the University of Texas at El Paso, El Paso, TX
| | - Charles T. Spencer
- Charles T. Spencer, PhD, is an Associate Professor, Biological Sciences, and Edward and Barbara Brown Egbert Endowed Chair of the Department of Biological Sciences; at the University of Texas at El Paso, El Paso, TX
| | - Karen L. Taylor
- Karen L. Taylor, MS, is a Research Line Manager; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| | - Tony Chiang
- Tony Chiang, PhD, is a Data Scientist; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| |
Collapse
|
5
|
Liao H, Barra C, Zhou Z, Peng X, Woodhouse I, Tailor A, Parker R, Carré A, Borrow P, Hogan MJ, Paes W, Eisenlohr LC, Mallone R, Nielsen M, Ternette N. MARS an improved de novo peptide candidate selection method for non-canonical antigen target discovery in cancer. Nat Commun 2024; 15:661. [PMID: 38253617 PMCID: PMC10803737 DOI: 10.1038/s41467-023-44460-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 12/14/2023] [Indexed: 01/24/2024] Open
Abstract
Understanding the nature and extent of non-canonical human leukocyte antigen (HLA) presentation in tumour cells is a priority for target antigen discovery for the development of next generation immunotherapies in cancer. We here employ a de novo mass spectrometric sequencing approach with a refined, MHC-centric analysis strategy to detect non-canonical MHC-associated peptides specific to cancer without any prior knowledge of the target sequence from genomic or RNA sequencing data. Our strategy integrates MHC binding rank, Average local confidence scores, and peptide Retention time prediction for improved de novo candidate Selection; culminating in the machine learning model MARS. We benchmark our model on a large synthetic peptide library dataset and reanalysis of a published dataset of high-quality non-canonical MHC-associated peptide identifications in human cancer. We achieve almost 2-fold improvement for high quality spectral assignments in comparison to de novo sequencing alone with an estimated accuracy of above 85.7% when integrated with a stepwise peptide sequence mapping strategy. Finally, we utilize MARS to detect and validate lncRNA-derived peptides in human cervical tumour resections, demonstrating its suitability to discover novel, immunogenic, non-canonical peptide sequences in primary tumour tissue.
Collapse
Affiliation(s)
- Hanqing Liao
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | | | - Zhicheng Zhou
- Université Paris Cité, Institut Cochin, CNRS, INSERM, 75014, Paris, France
| | - Xu Peng
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
| | - Isaac Woodhouse
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Arun Tailor
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Robert Parker
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Alexia Carré
- Université Paris Cité, Institut Cochin, CNRS, INSERM, 75014, Paris, France
| | - Persephone Borrow
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Michael J Hogan
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Wayne Paes
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Laurence C Eisenlohr
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Roberto Mallone
- Université Paris Cité, Institut Cochin, CNRS, INSERM, 75014, Paris, France
- Assistance Publique Hôpitaux de Paris, Service de Diabétologie et Immunologie Clinique, Cochin Hospital, 75014, Paris, France
| | | | - Nicola Ternette
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK.
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK.
- University of Utrecht, Department of Pharmaceutical Sciences, 3584 CH, Utrecht, The Netherlands.
| |
Collapse
|
6
|
Klaproth-Andrade D, Hingerl J, Bruns Y, Smith NH, Träuble J, Wilhelm M, Gagneur J. Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing. Nat Commun 2024; 15:151. [PMID: 38167372 PMCID: PMC10762064 DOI: 10.1038/s41467-023-44323-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
Unlike for DNA and RNA, accurate and high-throughput sequencing methods for proteins are lacking, hindering the utility of proteomics in applications where the sequences are unknown including variant calling, neoepitope identification, and metaproteomics. We introduce Spectralis, a de novo peptide sequencing method for tandem mass spectrometry. Spectralis leverages several innovations including a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposing fragment ion series classification as a pivotal task for de novo peptide sequencing, and a peptide-spectrum confidence score. On spectra for which database search provided a ground truth, Spectralis surpassed 40% sensitivity at 90% precision, nearly doubling state-of-the-art sensitivity. Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling. Altogether, these algorithmic innovations and the substantial sensitivity increase in the high-precision range constitute an important step toward broadly applicable peptide sequencing.
Collapse
Affiliation(s)
- Daniela Klaproth-Andrade
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Johannes Hingerl
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Yanik Bruns
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Nicholas H Smith
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Jakob Träuble
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Mathias Wilhelm
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
- Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, Freising, Germany.
| | - Julien Gagneur
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| |
Collapse
|
7
|
Kleikamp HBC, van der Zwaan R, van Valderen R, van Ede JM, Pronk M, Schaasberg P, Allaart MT, van Loosdrecht MCM, Pabst M. NovoLign: metaproteomics by sequence alignment. ISME COMMUNICATIONS 2024; 4:ycae121. [PMID: 39493671 PMCID: PMC11530927 DOI: 10.1093/ismeco/ycae121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 09/03/2024] [Accepted: 10/10/2024] [Indexed: 11/05/2024]
Abstract
Tremendous advances in mass spectrometric and bioinformatic approaches have expanded proteomics into the field of microbial ecology. The commonly used spectral annotation method for metaproteomics data relies on database searching, which requires sample-specific databases obtained from whole metagenome sequencing experiments. However, creating these databases is complex, time-consuming, and prone to errors, potentially biasing experimental outcomes and conclusions. This asks for alternative approaches that can provide rapid and orthogonal insights into metaproteomics data. Here, we present NovoLign, a de novo metaproteomics pipeline that performs sequence alignment of de novo sequences from complete metaproteomics experiments. The pipeline enables rapid taxonomic profiling of complex communities and evaluates the taxonomic coverage of metaproteomics outcomes obtained from database searches. Furthermore, the NovoLign pipeline supports the creation of reference sequence databases for database searching to ensure comprehensive coverage. We assessed the NovoLign pipeline for taxonomic coverage and false positive annotations using a wide range of in silico and experimental data, including pure reference strains, laboratory enrichment cultures, synthetic communities, and environmental microbial communities. In summary, we present NovoLign, a de novo metaproteomics pipeline that employs large-scale sequence alignment to enable rapid taxonomic profiling, evaluation of database searching outcomes, and the creation of reference sequence databases. The NovoLign pipeline is publicly available via: https://github.com/hbckleikamp/NovoLign.
Collapse
Affiliation(s)
- Hugo B C Kleikamp
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Ramon van der Zwaan
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Ramon van Valderen
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Jitske M van Ede
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Mario Pronk
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Pim Schaasberg
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Maximilienne T Allaart
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Mark C M van Loosdrecht
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| | - Martin Pabst
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, Delft 2629HZ, The Netherlands
| |
Collapse
|
8
|
Aita S, Cerrato A, Laganà A, Montone CM, Taglioni E, Capriotti AL. Untargeted Analysis of Short-Chain Peptides in Urine Samples Short Peptides Analysis. Methods Mol Biol 2024; 2745:31-43. [PMID: 38060178 DOI: 10.1007/978-1-0716-3577-3_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
Short-chain peptides have attracted increasing attention in different research fields, including biomarker discovery, but also a well-known analytical challenge in complex matrices due to their low abundance compared to other molecules, which can cause extensive ion suppression during mass spectrometric acquisition. Moreover, there is a lack of analytical workflows for their comprehensive characterization since ordinary peptidomics strategies cannot identify them. In this context, an enrichment strategy was introduced and developed to isolate and clean up short-chain peptides by graphitized carbon black solid phase extraction. For better coverage of peptide polarity, urine samples were analyzed by ultrahigh performance liquid chromatography by reversed-phase and hydrophilic interaction liquid chromatography. High-resolution mass spectrometry allowed the detection of the eluting peptides by data-dependent mode using a suspect screening strategy with an inclusion list; peptides were identified by a semiautomated workflow implemented on Compound Discoverer. The complementarity of the orthogonal separation strategy was confirmed by peptide identification, resulting in 101 peptides identified from the RP runs, and 111 peptides from the HILIC runs, with 60 common identifications.
Collapse
Affiliation(s)
- SaraElsa Aita
- Dipartimento di Chimica, Università di Roma La Sapienza, Rome, Italy
| | - Andrea Cerrato
- Dipartimento di Chimica, Università di Roma La Sapienza, Rome, Italy
| | - Aldo Laganà
- Dipartimento di Chimica, Università di Roma La Sapienza, Rome, Italy
| | | | - Enrico Taglioni
- Dipartimento di Chimica, Università di Roma La Sapienza, Rome, Italy
| | | |
Collapse
|
9
|
Fan KT, Hsu CW, Chen YR. Mass spectrometry in the discovery of peptides involved in intercellular communication: From targeted to untargeted peptidomics approaches. MASS SPECTROMETRY REVIEWS 2023; 42:2404-2425. [PMID: 35765846 DOI: 10.1002/mas.21789] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 03/17/2022] [Accepted: 04/08/2022] [Indexed: 06/15/2023]
Abstract
Endogenous peptide hormones represent an essential class of biomolecules, which regulate cell-cell communications in diverse physiological processes of organisms. Mass spectrometry (MS) has been developed to be a powerful technology for identifying and quantifying peptides in a highly efficient manner. However, it is difficult to directly identify these peptide hormones due to their diverse characteristics, dynamic regulations, low abundance, and existence in a complicated biological matrix. Here, we summarize and discuss the roles of targeted and untargeted MS in discovering peptide hormones using bioassay-guided purification, bioinformatics screening, or the peptidomics-based approach. Although the peptidomics approach is expected to discover novel peptide hormones unbiasedly, only a limited number of successful cases have been reported. The critical challenges and corresponding measures for peptidomics from the steps of sample preparation, peptide extraction, and separation to the MS data acquisition and analysis are also discussed. We also identify emerging technologies and methods that can be integrated into the discovery platform toward the comprehensive study of endogenous peptide hormones.
Collapse
Affiliation(s)
- Kai-Ting Fan
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| | - Chia-Wei Hsu
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| | - Yet-Ran Chen
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
10
|
Bertile F, Matallana-Surget S, Tholey A, Cristobal S, Armengaud J. Diversifying the concept of model organisms in the age of -omics. Commun Biol 2023; 6:1062. [PMID: 37857885 PMCID: PMC10587087 DOI: 10.1038/s42003-023-05458-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 10/13/2023] [Indexed: 10/21/2023] Open
Abstract
In today's post-genomic era, it is crucial to rethink the concept of model organisms. While a few historically well-established organisms, e.g. laboratory rodents, have enabled significant scientific breakthroughs, there is now a pressing need for broader inclusion. Indeed, new organisms and models, from complex microbial communities to holobionts, are essential to fully grasp the complexity of biological principles across the breadth of biodiversity. By fostering collaboration between biology, advanced molecular science and omics communities, we can collectively adopt new models, unraveling their molecular functioning, and uncovering fundamental mechanisms. This concerted effort will undoubtedly enhance human health, environmental quality, and biodiversity conservation.
Collapse
Affiliation(s)
- Fabrice Bertile
- Université de Strasbourg, CNRS, IPHC UMR 7178, 23 rue du Loess, 67037, Strasbourg Cedex 2, France.
| | - Sabine Matallana-Surget
- Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, FK9 4LA, UK
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, 24105, Kiel, Germany
| | - Susana Cristobal
- Department of Biomedical and Clinical Sciences, Cell Biology, Medical Faculty, Linköping University, Linköping, 581 85, Sweden
- Ikerbasque, Basque Foundation for Science, Department of Physiology, Faculty of Medicine and Nursing, University of the Basque Country (UPV/EHU), Barrio Sarriena, s/n, Leioa, 48940, Spain
| | - Jean Armengaud
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, 30200, Bagnols-sur-Cèze, France
| |
Collapse
|
11
|
Chen Z, Lim YW, Neo JY, Ting Chan RS, Koh LQ, Yuen TY, Lim YH, Johannes CW, Gates ZP. De Novo Sequencing of Synthetic Bis-cysteine Peptide Macrocycles Enabled by "Chemical Linearization" of Compound Mixtures. Anal Chem 2023; 95:14870-14878. [PMID: 37724843 PMCID: PMC10569172 DOI: 10.1021/acs.analchem.3c01742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023]
Abstract
A "chemical linearization" approach was applied to synthetic peptide macrocycles to enable their de novo sequencing from mixtures using nanoliquid chromatography-tandem mass spectrometry (nLC-MS/MS). This approach─previously applied to individual macrocycles but not to mixtures─involves cleavage of the peptide backbone at a defined position to give a product capable of generating sequence-determining fragment ions. Here, we first established the compatibility of "chemical linearization" by Edman degradation with a prominent macrocycle scaffold based on bis-Cys peptides cross-linked with the m-xylene linker, which are of major significance in therapeutics discovery. Then, using macrocycle libraries of known sequence composition, the ability to recover accurate de novo assignments to linearized products was critically tested using performance metrics unique to mixtures. Significantly, we show that linearized macrocycles can be sequenced with lower recall compared to linear peptides but with similar accuracy, which establishes the potential of using "chemical linearization" with synthetic libraries and selection procedures that yield compound mixtures. Sodiated precursor ions were identified as a significant source of high-scoring but inaccurate assignments, with potential implications for improving automated de novo sequencing more generally.
Collapse
Affiliation(s)
- Zhi’ang Chen
- Institute
of Molecular and Cell Biology (IMCB), Agency
for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore 138673, Republic of Singapore
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Yi Wee Lim
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Jin Yong Neo
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Rachel Shu Ting Chan
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Li Quan Koh
- Institute
of Molecular and Cell Biology (IMCB), Agency
for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore 138673, Republic of Singapore
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Tsz Ying Yuen
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Yee Hwee Lim
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| | - Charles W. Johannes
- Institute
of Molecular and Cell Biology (IMCB), Agency
for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore 138673, Republic of Singapore
| | - Zachary P. Gates
- Institute
of Molecular and Cell Biology (IMCB), Agency
for Science, Technology and Research (A*STAR), 61 Biopolis Drive, Proteos, Singapore 138673, Republic of Singapore
- Institute
of Sustainability for Chemicals, Energy and Environment (ISCE), Agency for Science, Technology
and Research (A*STAR), 8 Biomedical Grove, #07-01 Neuros, Singapore 138665, Republic
of Singapore
| |
Collapse
|
12
|
Zhang Y, Liu L, Zhang M, Li S, Wu J, Sun Q, Ma S, Cai W. The Research Progress of Bioactive Peptides Derived from Traditional Natural Products in China. Molecules 2023; 28:6421. [PMID: 37687249 PMCID: PMC10489889 DOI: 10.3390/molecules28176421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 08/20/2023] [Accepted: 08/30/2023] [Indexed: 09/10/2023] Open
Abstract
Traditional natural products in China have a long history and a vast pharmacological repertoire that has garnered significant attention due to their safety and efficacy in disease prevention and treatment. Among the bioactive components of traditional natural products in China, bioactive peptides (BPs) are specific protein fragments that have beneficial effects on human health. Despite many of the traditional natural products in China ingredients being rich in protein, BPs have not received sufficient attention as a critical factor influencing overall therapeutic efficacy. Therefore, the purpose of this review is to provide a comprehensive summary of the current methodologies for the preparation, isolation, and identification of BPs from traditional natural products in China and to classify the functions of discovered BPs. Insights from this review are expected to facilitate the development of targeted drugs and functional foods derived from traditional natural products in China in the future.
Collapse
Affiliation(s)
- Yanyan Zhang
- College of Food Science and Pharmacy, Xinjiang Agricultural University, Urumqi 830052, China; (Y.Z.); (Q.S.)
| | - Lianghong Liu
- School of Pharmaceutical Sciences, Hunan University of Medicine, Huaihua 418000, China; (L.L.); (M.Z.); (S.L.); (J.W.)
| | - Min Zhang
- School of Pharmaceutical Sciences, Hunan University of Medicine, Huaihua 418000, China; (L.L.); (M.Z.); (S.L.); (J.W.)
| | - Shani Li
- School of Pharmaceutical Sciences, Hunan University of Medicine, Huaihua 418000, China; (L.L.); (M.Z.); (S.L.); (J.W.)
| | - Jini Wu
- School of Pharmaceutical Sciences, Hunan University of Medicine, Huaihua 418000, China; (L.L.); (M.Z.); (S.L.); (J.W.)
| | - Qiuju Sun
- College of Food Science and Pharmacy, Xinjiang Agricultural University, Urumqi 830052, China; (Y.Z.); (Q.S.)
| | - Shengjun Ma
- College of Food Science and Pharmacy, Xinjiang Agricultural University, Urumqi 830052, China; (Y.Z.); (Q.S.)
| | - Wei Cai
- School of Pharmaceutical Sciences, Hunan University of Medicine, Huaihua 418000, China; (L.L.); (M.Z.); (S.L.); (J.W.)
| |
Collapse
|
13
|
Lapehn S, Colacino JA, Harris C. Spatiotemporal protein dynamics during early organogenesis in mouse conceptuses treated with valproic acid. Neurotoxicol Teratol 2023; 99:107286. [PMID: 37442398 PMCID: PMC10697214 DOI: 10.1016/j.ntt.2023.107286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 05/29/2023] [Accepted: 07/05/2023] [Indexed: 07/15/2023]
Abstract
Valproic acid (VPA) is an anti-epileptic medication that increases the risk of neural tube defect (NTD) outcomes in infants exposed during gestation. Previous studies into VPA's mechanism of action have focused on alterations in gene expression and metabolism but have failed to consider how exposure changes the abundance of critical developmental proteins over time. This study evaluates the effects of VPA on protein abundance in the developmentally distinct tissues of the mouse visceral yolk sac (VYS) and embryo proper (EMB) using mouse whole embryo culture. Embryos were exposed to 600 μM VPA at 2 h intervals over 10 h during early organogenesis with the aim of identifying protein pathways relevant to VPA's mechanism of action in failed NTC. Protein abundance was measured through tandem mass tag (TMT) labeling followed by liquid chromatography and mass spectrometry. Overall, there were over 1500 proteins with altered abundance after VPA exposure in the EMB or VYS with 428 of these proteins showing previous gene expression associations with VPA exposure. Limited overlap of significant proteins between tissues supported the conclusion of independent roles for the VYS and EMB in response to VPA. Pathway analysis of proteins with increased or decreased abundance identified multiple pathways with mechanistic relevance to NTC and embryonic development including convergent extension, Wnt Signaling/planar cell polarity, cellular migration, cellular proliferation, cell death, and cytoskeletal organization processes as targets of VPA. Clustering of co-regulated proteins to identify shared patterns of protein abundance over time highlighted 4 h and 6/10 h as periods of divergent protein abundance between control and VPA-treated samples in the VYS and EMB, respectively. Overall, this study demonstrated that VPA temporally alters protein content in critical developmental pathways in the VYS and the EMB during early organogenesis in mice.
Collapse
Affiliation(s)
- Samantha Lapehn
- Department of Environmental Health Sciences, University of Michigan, Ann Arbor, MI, United States.
| | - Justin A Colacino
- Department of Environmental Health Sciences, University of Michigan, Ann Arbor, MI, United States
| | - Craig Harris
- Department of Environmental Health Sciences, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
14
|
Kelliher JM, Robinson AJ, Longley R, Johnson LYD, Hanson BT, Morales DP, Cailleau G, Junier P, Bonito G, Chain PSG. The endohyphal microbiome: current progress and challenges for scaling down integrative multi-omic microbiome research. MICROBIOME 2023; 11:192. [PMID: 37626434 PMCID: PMC10463477 DOI: 10.1186/s40168-023-01634-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 07/29/2023] [Indexed: 08/27/2023]
Abstract
As microbiome research has progressed, it has become clear that most, if not all, eukaryotic organisms are hosts to microbiomes composed of prokaryotes, other eukaryotes, and viruses. Fungi have only recently been considered holobionts with their own microbiomes, as filamentous fungi have been found to harbor bacteria (including cyanobacteria), mycoviruses, other fungi, and whole algal cells within their hyphae. Constituents of this complex endohyphal microbiome have been interrogated using multi-omic approaches. However, a lack of tools, techniques, and standardization for integrative multi-omics for small-scale microbiomes (e.g., intracellular microbiomes) has limited progress towards investigating and understanding the total diversity of the endohyphal microbiome and its functional impacts on fungal hosts. Understanding microbiome impacts on fungal hosts will advance explorations of how "microbiomes within microbiomes" affect broader microbial community dynamics and ecological functions. Progress to date as well as ongoing challenges of performing integrative multi-omics on the endohyphal microbiome is discussed herein. Addressing the challenges associated with the sample extraction, sample preparation, multi-omic data generation, and multi-omic data analysis and integration will help advance current knowledge of the endohyphal microbiome and provide a road map for shrinking microbiome investigations to smaller scales. Video Abstract.
Collapse
Affiliation(s)
| | | | - Reid Longley
- Los Alamos National Laboratory, Los Alamos, NM, USA
| | | | | | | | | | | | | | | |
Collapse
|
15
|
Ng CCA, Zhou Y, Yao ZP. Algorithms for de-novo sequencing of peptides by tandem mass spectrometry: A review. Anal Chim Acta 2023; 1268:341330. [PMID: 37268337 DOI: 10.1016/j.aca.2023.341330] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 06/04/2023]
Abstract
Peptide sequencing is of great significance to fundamental and applied research in the fields such as chemical, biological, medicinal and pharmaceutical sciences. With the rapid development of mass spectrometry and sequencing algorithms, de-novo peptide sequencing using tandem mass spectrometry (MS/MS) has become the main method for determining amino acid sequences of novel and unknown peptides. Advanced algorithms allow the amino acid sequence information to be accurately obtained from MS/MS spectra in short time. In this review, algorithms from exhaustive search to the state-of-art machine learning and neural network for high-throughput and automated de-novo sequencing are introduced and compared. Impacts of datasets on algorithm performance are highlighted. The current limitations and promising direction of de-novo peptide sequencing are also discussed in this review.
Collapse
Affiliation(s)
- Cheuk Chi A Ng
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Yin Zhou
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Zhong-Ping Yao
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China.
| |
Collapse
|
16
|
Yang KL, Yu F, Teo GC, Li K, Demichev V, Ralser M, Nesvizhskii AI. MSBooster: improving peptide identification rates using deep learning-based features. Nat Commun 2023; 14:4539. [PMID: 37500632 PMCID: PMC10374903 DOI: 10.1038/s41467-023-40129-9] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 07/06/2023] [Indexed: 07/29/2023] Open
Abstract
Peptide identification in liquid chromatography-tandem mass spectrometry (LC-MS/MS) experiments relies on computational algorithms for matching acquired MS/MS spectra against sequences of candidate peptides using database search tools, such as MSFragger. Here, we present a new tool, MSBooster, for rescoring peptide-to-spectrum matches using additional features incorporating deep learning-based predictions of peptide properties, such as LC retention time, ion mobility, and MS/MS spectra. We demonstrate the utility of MSBooster, in tandem with MSFragger and Percolator, in several different workflows, including nonspecific searches (immunopeptidomics), direct identification of peptides from data independent acquisition data, single-cell proteomics, and data generated on an ion mobility separation-enabled timsTOF MS platform. MSBooster is fast, robust, and fully integrated into the widely used FragPipe computational platform.
Collapse
Affiliation(s)
- Kevin L Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| | - Guo Ci Teo
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Kai Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Vadim Demichev
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Markus Ralser
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Nuffield Department of Medicine, The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
17
|
Potgieter MG, Nel AJM, Fortuin S, Garnett S, Wendoh JM, Tabb DL, Mulder NJ, Blackburn JM. MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets. PLoS Comput Biol 2023; 19:e1011163. [PMID: 37327214 PMCID: PMC10310047 DOI: 10.1371/journal.pcbi.1011163] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 06/29/2023] [Accepted: 05/08/2023] [Indexed: 06/18/2023] Open
Abstract
BACKGROUND Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines. RESULTS We compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database-but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation. CONCLUSIONS By estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.
Collapse
Affiliation(s)
- Matthys G. Potgieter
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Andrew J. M. Nel
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Suereta Fortuin
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Shaun Garnett
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Jerome M. Wendoh
- Division of Immunology, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - David L. Tabb
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences; African Microbiome Institute; South African Tuberculosis Bioinformatics Initiative; Stellenbosch University, Cape Town, South Africa
| | - Nicola J. Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Institute of Infectious Disease & Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jonathan M. Blackburn
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Institute of Infectious Disease & Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
18
|
Oreper D, Klaeger S, Jhunjhunwala S, Delamarre L. The peptide woods are lovely, dark and deep: Hunting for novel cancer antigens. Semin Immunol 2023; 67:101758. [PMID: 37027981 DOI: 10.1016/j.smim.2023.101758] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/22/2023] [Accepted: 03/22/2023] [Indexed: 04/08/2023]
Abstract
Harnessing the patient's immune system to control a tumor is a proven avenue for cancer therapy. T cell therapies as well as therapeutic vaccines, which target specific antigens of interest, are being explored as treatments in conjunction with immune checkpoint blockade. For these therapies, selecting the best suited antigens is crucial. Most of the focus has thus far been on neoantigens that arise from tumor-specific somatic mutations. Although there is clear evidence that T-cell responses against mutated neoantigens are protective, the large majority of these mutations are not immunogenic. In addition, most somatic mutations are unique to each individual patient and their targeting requires the development of individualized approaches. Therefore, novel antigen types are needed to broaden the scope of such treatments. We review high throughput approaches for discovering novel tumor antigens and some of the key challenges associated with their detection, and discuss considerations when selecting tumor antigens to target in the clinic.
Collapse
Affiliation(s)
- Daniel Oreper
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | - Susan Klaeger
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | | | | |
Collapse
|
19
|
Osipov AV, Cheremnykh EG, Ziganshin RH, Starkov VG, Nguyen TTT, Nguyen KC, Le DT, Hoang AN, Tsetlin VI, Utkin YN. The Potassium Channel Blocker β-Bungarotoxin from the Krait Bungarus multicinctus Venom Manifests Antiprotozoal Activity. Biomedicines 2023; 11:biomedicines11041115. [PMID: 37189733 DOI: 10.3390/biomedicines11041115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 03/17/2023] [Accepted: 04/04/2023] [Indexed: 05/17/2023] Open
Abstract
Protozoal infections are a world-wide problem. The toxicity and somewhat low effectiveness of the existing drugs require the search for new ways of protozoa suppression. Snake venom contains structurally diverse components manifesting antiprotozoal activity; for example, those in cobra venom are cytotoxins. In this work, we aimed to characterize a novel antiprotozoal component(s) in the Bungarus multicinctus krait venom using the ciliate Tetrahymena pyriformis as a model organism. To determine the toxicity of the substances under study, surviving ciliates were registered automatically by an original BioLaT-3.2 instrument. The krait venom was separated by three-step liquid chromatography and the toxicity of the obtained fractions against T. pyriformis was analyzed. As a result, 21 kDa protein toxic to Tetrahymena was isolated and its amino acid sequence was determined by MALDI TOF MS and high-resolution mass spectrometry. It was found that antiprotozoal activity was manifested by β-bungarotoxin (β-Bgt) differing from the known toxins by two amino acid residues. Inactivation of β-Bgt phospholipolytic activity with p-bromophenacyl bromide did not change its antiprotozoal activity. Thus, this is the first demonstration of the antiprotozoal activity of β-Bgt, which is shown to be independent of its phospholipolytic activity.
Collapse
Affiliation(s)
- Alexey V Osipov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow 117997, Russia
| | | | - Rustam H Ziganshin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow 117997, Russia
| | - Vladislav G Starkov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow 117997, Russia
| | | | - Khoa Cuu Nguyen
- Institute of Applied Materials Science, Vietnam Academy of Science and Technology, Ho Chi Minh City 700000, Vietnam
| | - Dung Tien Le
- Institute of Applied Materials Science, Vietnam Academy of Science and Technology, Ho Chi Minh City 700000, Vietnam
| | - Anh Ngoc Hoang
- Institute of Applied Materials Science, Vietnam Academy of Science and Technology, Ho Chi Minh City 700000, Vietnam
| | - Victor I Tsetlin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow 117997, Russia
| | - Yuri N Utkin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow 117997, Russia
| |
Collapse
|
20
|
Ahn R, Cui Y, White FM. Antigen discovery for the development of cancer immunotherapy. Semin Immunol 2023; 66:101733. [PMID: 36841147 DOI: 10.1016/j.smim.2023.101733] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 02/25/2023]
Abstract
Central to successful cancer immunotherapy is effective T cell antitumor immunity. Multiple targeted immunotherapies engineered to invigorate T cell-driven antitumor immunity rely on identifying the repertoire of T cell antigens expressed on the tumor cell surface. Mass spectrometry-based survey of such antigens ("immunopeptidomics") combined with other omics platforms and computational algorithms has been instrumental in identifying and quantifying tumor-derived T cell antigens. In this review, we discuss the types of tumor antigens that have emerged for targeted cancer immunotherapy and the immunopeptidomics methods that are central in MHC peptide identification and quantification. We provide an overview of the strength and limitations of mass spectrometry-driven approaches and how they have been integrated with other technologies to discover targetable T cell antigens for cancer immunotherapy. We highlight some of the emerging cancer immunotherapies that successfully capitalized on immunopeptidomics, their challenges, and mass spectrometry-based strategies that can support their development.
Collapse
Affiliation(s)
- Ryuhjin Ahn
- David H. Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Yufei Cui
- David H. Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Forest M White
- David H. Koch Institute for Integrative Cancer Research, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
21
|
Vu NQ, Yen HC, Fields L, Cao W, Li L. HyPep: An Open-Source Software for Identification and Discovery of Neuropeptides Using Sequence Homology Search. J Proteome Res 2023; 22:420-431. [PMID: 36696582 PMCID: PMC10160011 DOI: 10.1021/acs.jproteome.2c00597] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Neuropeptides are a class of endogenous peptides that have key regulatory roles in biochemical, physiological, and behavioral processes. Mass spectrometry analyses of neuropeptides often rely on protein informatics tools for database searching and peptide identification. As neuropeptide databases are typically experimentally built and comprised of short sequences with high sequence similarity to each other, we developed a novel database searching tool, HyPep, which utilizes sequence homology searching for peptide identification. HyPep aligns de novo sequenced peptides, generated through PEAKS software, with neuropeptide database sequences and identifies neuropeptides based on the alignment score. HyPep performance was optimized using LC-MS/MS measurements of peptide extracts from various Callinectes sapidus neuronal tissue types and compared with a commercial database searching software, PEAKS DB. HyPep identified more neuropeptides from each tissue type than PEAKS DB at 1% false discovery rate, and the false match rate from both programs was 2%. In addition to identification, this report describes how HyPep can aid in the discovery of novel neuropeptides.
Collapse
Affiliation(s)
- Nhu Q Vu
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Hsu-Ching Yen
- Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, Wisconsin 53706, United States
| | - Lauren Fields
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Weifeng Cao
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States.,School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| |
Collapse
|
22
|
Dorl S, Winkler S, Mechtler K, Dorfer V. MS Ana: Improving Sensitivity in Peptide Identification with Spectral Library Search. J Proteome Res 2023; 22:462-470. [PMID: 36688604 PMCID: PMC9903325 DOI: 10.1021/acs.jproteome.2c00658] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Spectral library search can enable more sensitive peptide identification in tandem mass spectrometry experiments. However, its drawbacks are the limited availability of high-quality libraries and the added difficulty of creating decoy spectra for result validation. We describe MS Ana, a new spectral library search engine that enables high sensitivity peptide identification using either curated or predicted spectral libraries as well as robust false discovery control through its own decoy library generation algorithm. MS Ana identifies on average 36% more spectrum matches and 4% more proteins than database search in a benchmark test on single-shot human cell-line data. Further, we demonstrate the quality of the result validation with tests on synthetic peptide pools and show the importance of library selection through a comparison of library search performance with different configurations of publicly available human spectral libraries.
Collapse
Affiliation(s)
- Sebastian Dorl
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria,E-mail: . Phone: +43 (0) 50804
27145
| | - Stephan Winkler
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria
| | - Karl Mechtler
- Research
Institute of Molecular Pathology (IMP), Protein Chemistry, Campus-Vienna-Biocenter 1, 1030Vienna, Austria,Institute
of Molecular Biotechnology (IMBA), Protein Chemistry, Vienna Biocenter
(VBC), Dr. Bohr-Gasse 3, 1030Vienna, Austria,Gregor
Mendel Institute of Molecular Plant Biology of the Austrian Academy
of Sciences (GMI), Dr.
Bohr Gasse 3, 1030Vienna, Austria
| | - Viktoria Dorfer
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,E-mail: . Phone: +43 (0) 50804
22740
| |
Collapse
|
23
|
Beslic D, Tscheuschner G, Renard BY, Weller MG, Muth T. Comprehensive evaluation of peptide de novo sequencing tools for monoclonal antibody assembly. Brief Bioinform 2023; 24:bbac542. [PMID: 36545804 PMCID: PMC9851299 DOI: 10.1093/bib/bbac542] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/25/2022] [Accepted: 11/10/2022] [Indexed: 12/24/2022] Open
Abstract
Monoclonal antibodies are biotechnologically produced proteins with various applications in research, therapeutics and diagnostics. Their ability to recognize and bind to specific molecule structures makes them essential research tools and therapeutic agents. Sequence information of antibodies is helpful for understanding antibody-antigen interactions and ensuring their affinity and specificity. De novo protein sequencing based on mass spectrometry is a valuable method to obtain the amino acid sequence of peptides and proteins without a priori knowledge. In this study, we evaluated six recently developed de novo peptide sequencing algorithms (Novor, pNovo 3, DeepNovo, SMSNet, PointNovo and Casanovo), which were not specifically designed for antibody data. We validated their ability to identify and assemble antibody sequences on three multi-enzymatic data sets. The deep learning-based tools Casanovo and PointNovo showed an increased peptide recall across different enzymes and data sets compared with spectrum-graph-based approaches. We evaluated different error types of de novo peptide sequencing tools and their performance for different numbers of missing cleavage sites, noisy spectra and peptides of various lengths. We achieved a sequence coverage of 97.69-99.53% on the light chains of three different antibody data sets using the de Bruijn assembler ALPS and the predictions from Casanovo. However, low sequence coverage and accuracy on the heavy chains demonstrate that complete de novo protein sequencing remains a challenging issue in proteomics that requires improved de novo error correction, alternative digestion strategies and hybrid approaches such as homology search to achieve high accuracy on long protein sequences.
Collapse
Affiliation(s)
- Denis Beslic
- Robert Koch Institute, MF1, Nordufer 20, 13353 Berlin
| | - Georg Tscheuschner
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| | - Bernhard Y Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam
| | - Michael G Weller
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| | - Thilo Muth
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| |
Collapse
|
24
|
McDonnell K, Howley E, Abram F. Critical evaluation of the use of artificial data for machine learning based de novo peptide identification. Comput Struct Biotechnol J 2023; 21:2732-2743. [PMID: 37168871 PMCID: PMC10165132 DOI: 10.1016/j.csbj.2023.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/16/2023] [Accepted: 04/16/2023] [Indexed: 05/13/2023] Open
Abstract
Proteins are essential components of all living cells and so the study of their in situ expression, proteomics, has wide reaching applications. Peptide identification in proteomics typically relies on matching high resolution tandem mass spectra to a protein database but can also be performed de novo. While artificial spectra have been successfully incorporated into database search pipelines to increase peptide identification rates, little work has been done to investigate the utility of artificial spectra in the context of de novo peptide identification. Here, we perform a critical analysis of the use of artificial data for the training and evaluation of de novo peptide identification algorithms. First, we classify the different fragment ion types present in real spectra and then estimate the number of spurious matches using random peptides. We then categorise the different types of noise present in real spectra. Finally, we transfer this knowledge to artificial data and test the performance of a state-of-the-art de novo peptide identification algorithm trained using artificial spectra with and without relevant noise addition. Noise supplementation increased artificial training data performance from 30% to 77% of real training data peptide recall. While real data performance was not fully replicated, this work provides the first steps towards an artificial spectrum framework for the training and evaluation of de novo peptide identification algorithms. Further enhanced artificial spectra may allow for more in depth analysis of de novo algorithms as well as alleviating the reliance on database searches for training data.
Collapse
Affiliation(s)
- Kevin McDonnell
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- School of Computer Science, University of Galway, Ireland
- Corresponding author at: Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland.
| | - Enda Howley
- School of Computer Science, University of Galway, Ireland
| | - Florence Abram
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- Corresponding author.
| |
Collapse
|
25
|
Álvarez-Urdiola R, Borràs E, Valverde F, Matus JT, Sabidó E, Riechmann JL. Peptidomics Methods Applied to the Study of Flower Development. Methods Mol Biol 2023; 2686:509-536. [PMID: 37540375 DOI: 10.1007/978-1-0716-3299-4_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Understanding the global and dynamic nature of plant developmental processes requires not only the study of the transcriptome, but also of the proteome, including its largely uncharacterized peptidome fraction. Recent advances in proteomics and high-throughput analyses of translating RNAs (ribosome profiling) have begun to address this issue, evidencing the existence of novel, uncharacterized, and possibly functional peptides. To validate the accumulation in tissues of sORF-encoded polypeptides (SEPs), the basic setup of proteomic analyses (i.e., LC-MS/MS) can be followed. However, the detection of peptides that are small (up to ~100 aa, 6-7 kDa) and novel (i.e., not annotated in reference databases) presents specific challenges that need to be addressed both experimentally and with computational biology resources. Several methods have been developed in recent years to isolate and identify peptides from plant tissues. In this chapter, we outline two different peptide extraction protocols and the subsequent peptide identification by mass spectrometry using the database search or the de novo identification methods.
Collapse
Affiliation(s)
- Raquel Álvarez-Urdiola
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Eva Borràs
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Federico Valverde
- Institute for Plant Biochemistry and Photosynthesis CSIC - University of Seville, Seville, Spain
| | - José Tomás Matus
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, Valencia, Spain
| | - Eduard Sabidó
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - José Luis Riechmann
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
26
|
Gueto-Tettay C, Tang D, Happonen L, Heusel M, Khakzad H, Malmström J, Malmström L. Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics. PLoS Comput Biol 2023; 19:e1010457. [PMID: 36668672 PMCID: PMC9891523 DOI: 10.1371/journal.pcbi.1010457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 02/01/2023] [Accepted: 01/04/2023] [Indexed: 01/21/2023] Open
Abstract
Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systemcally varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.
Collapse
Affiliation(s)
- Carlos Gueto-Tettay
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Di Tang
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Lotta Happonen
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Moritz Heusel
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Hamed Khakzad
- Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
| | - Johan Malmström
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Lars Malmström
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| |
Collapse
|
27
|
McDonnell K, Abram F, Howley E. Application of a Novel Hybrid CNN-GNN for Peptide Ion Encoding. J Proteome Res 2022; 22:323-333. [PMID: 36534699 PMCID: PMC9903319 DOI: 10.1021/acs.jproteome.2c00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Almost all state-of-the-art de novo peptide sequencing algorithms now use machine learning models to encode fragment peaks and hence identify amino acids in mass spectrometry (MS) spectra. Previous work has highlighted how the inherent MS challenges of noise and missing peptide peaks detrimentally affect the performance of these models. In the present research we extracted and evaluated the encoding modules from 3 state-of-the-art de novo peptide sequencing algorithms. We also propose a convolutional neural network-graph neural network machine learning model for encoding peptide ions in tandem MS spectra. We compared the proposed encoding module to those used in the state-of-the-art de novo peptide sequencing algorithms by assessing their ability to identify b-ions and y-ions in MS spectra. This included a comprehensive evaluation in both real and artificial data across various levels of noise and missing peptide peaks. The proposed model performed best across all data sets using two different metrics (area under the receiver operating characteristic curve (AUC) and average precision). The work also highlighted the effect of including additional features such as intensity rank in these encoding modules as well as issues with using the AUC as a metric. This work is of significance to those designing future de novo peptide identification algorithms as it is the first step toward a new approach.
Collapse
Affiliation(s)
- Kevin McDonnell
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland,Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland,E-mail:
| | - Florence Abram
- Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland
| | - Enda Howley
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland
| |
Collapse
|
28
|
Boiko DA, Kozlov KS, Burykina JV, Ilyushenkova VV, Ananikov VP. Fully Automated Unconstrained Analysis of High-Resolution Mass Spectrometry Data with Machine Learning. J Am Chem Soc 2022; 144:14590-14606. [PMID: 35939718 DOI: 10.1021/jacs.2c03631] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Mass spectrometry (MS) is a convenient, highly sensitive, and reliable method for the analysis of complex mixtures, which is vital for materials science, life sciences fields such as metabolomics and proteomics, and mechanistic research in chemistry. Although it is one of the most powerful methods for individual compound detection, complete signal assignment in complex mixtures is still a great challenge. The unconstrained formula-generating algorithm, covering the entire spectra and revealing components, is a "dream tool" for researchers. We present the framework for efficient MS data interpretation, describing a novel approach for detailed analysis based on deisotoping performed by gradient-boosted decision trees and a neural network that generates molecular formulas from the fine isotopic structure, approaching the long-standing inverse spectral problem. The methods were successfully tested on three examples: fragment ion analysis in protein sequencing for proteomics, analysis of the natural samples for life sciences, and study of the cross-coupling catalytic system for chemistry.
Collapse
Affiliation(s)
- Daniil A Boiko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Konstantin S Kozlov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Julia V Burykina
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Valentina V Ilyushenkova
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| | - Valentine P Ananikov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt 47, Moscow 119991, Russia
| |
Collapse
|
29
|
Zhang W, Yang C, Liu J, Liang Z, Shan Y, Zhang L, Zhang Y. Accurate discrimination of leucine and isoleucine residues by combining continuous digestion with multiple MS 3 spectra integration in protein sequence. Talanta 2022; 249:123666. [PMID: 35717752 DOI: 10.1016/j.talanta.2022.123666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/07/2022] [Accepted: 06/08/2022] [Indexed: 12/26/2022]
Abstract
Protein de novo sequencing based on tandem mass spectrometry is a crucial technology that enables the identification of peptides without searching databases and assembling unknown sequence proteins, especially for monoclonal antibodies (mAbs). However, the discrimination of leucine (Leu) and isoleucine (Ile) residues in the target protein sequence is still challenging. Herein, we developed an accurate method by continuous digestion with MS3-based fragmentation and multiple spectra integration (evaluated by combined verification score, CVS) to distinguish Leu and Ile residues. Continuous digestion promotes the diversity of peptides in order to expose more Leu and Ile at the N-terminal. CVS integrates multiple MS3 spectra to reduce the interference from noise and co-fragmented ions and improve accuracy. This method successfully resolved all 75 Leu/Ile in bovine serum albumin, especially 3 consecutive Leu/Ile. We further applied the method to analyze trastuzumab and 67 out of the 68 Leu/Ile from the light chain and heavy chain were accurately discriminated, demonstrating the great potential in mAbs sequencing.
Collapse
Affiliation(s)
- Weijie Zhang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China; University of Chinese Academy of Sciences, Beijing, 100039, China
| | - Chao Yang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China; University of Chinese Academy of Sciences, Beijing, 100039, China
| | - Jianhui Liu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China
| | - Zhen Liang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China
| | - Yichu Shan
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China.
| | - Lihua Zhang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China.
| | - Yukui Zhang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China
| |
Collapse
|
30
|
Torres-Sangiao E, Giddey AD, Leal Rodriguez C, Tang Z, Liu X, Soares NC. Proteomic Approaches to Unravel Mechanisms of Antibiotic Resistance and Immune Evasion of Bacterial Pathogens. Front Med (Lausanne) 2022; 9:850374. [PMID: 35586072 PMCID: PMC9108449 DOI: 10.3389/fmed.2022.850374] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open
Abstract
The profound effects of and distress caused by the global COVID-19 pandemic highlighted what has been known in the health sciences a long time ago: that bacteria, fungi, viruses, and parasites continue to present a major threat to human health. Infectious diseases remain the leading cause of death worldwide, with antibiotic resistance increasing exponentially due to a lack of new treatments. In addition to this, many pathogens share the common trait of having the ability to modulate, and escape from, the host immune response. The challenge in medical microbiology is to develop and apply new experimental approaches that allow for the identification of both the microbe and its drug susceptibility profile in a time-sensitive manner, as well as to elucidate their molecular mechanisms of survival and immunomodulation. Over the last three decades, proteomics has contributed to a better understanding of the underlying molecular mechanisms responsible for microbial drug resistance and pathogenicity. Proteomics has gained new momentum as a result of recent advances in mass spectrometry. Indeed, mass spectrometry-based biomedical research has been made possible thanks to technological advances in instrumentation capability and the continuous improvement of sample processing and workflows. For example, high-throughput applications such as SWATH or Trapped ion mobility enable the identification of thousands of proteins in a matter of minutes. This type of rapid, in-depth analysis, combined with other advanced, supportive applications such as data processing and artificial intelligence, presents a unique opportunity to translate knowledge-based findings into measurable impacts like new antimicrobial biomarkers and drug targets. In relation to the Research Topic “Proteomic Approaches to Unravel Mechanisms of Resistance and Immune Evasion of Bacterial Pathogens,” this review specifically seeks to highlight the synergies between the powerful fields of modern proteomics and microbiology, as well as bridging translational opportunities from biomedical research to clinical practice.
Collapse
Affiliation(s)
- Eva Torres-Sangiao
- Clinical Microbiology Lab, University Hospital Marqués de Valdecilla, Santander, Spain
- Instituto de Investigación Sanitaria Marqués de Valdecilla (IDIVAL), Santander, Spain
- *Correspondence: Eva Torres-Sangiao,
| | - Alexander Dyason Giddey
- Sharjah Institute of Medical Research, University of Sharjah, Sharjah, United Arab Emirates
- Department of Medicinal Chemistry, College of Pharmacy, University of Sharjah, Sharjah, United Arab Emirates
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Cristina Leal Rodriguez
- Copenhagen Prospectives Studies on Asthma in Childhood, COPSAC, Copenhagen University Hospital, Herlev-Gentofte, Denmark
| | - Zhiheng Tang
- Department of Microbiology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
| | - Xiaoyun Liu
- Department of Microbiology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
| | - Nelson C. Soares
- Sharjah Institute of Medical Research, University of Sharjah, Sharjah, United Arab Emirates
- Department of Medicinal Chemistry, College of Pharmacy, University of Sharjah, Sharjah, United Arab Emirates
- Nelson C. Soares,
| |
Collapse
|
31
|
de Melo-Braga MN, Moreira RDS, Gervásio JHDB, Felicori LF. Overview of protein posttranslational modifications in Arthropoda venoms. J Venom Anim Toxins Incl Trop Dis 2022; 28:e20210047. [PMID: 35519418 PMCID: PMC9036706 DOI: 10.1590/1678-9199-jvatitd-2021-0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 08/27/2021] [Indexed: 11/22/2022] Open
Abstract
Accidents with venomous animals are a public health issue worldwide. Among the species involved in these accidents are scorpions, spiders, bees, wasps, and other members of the phylum Arthropoda. The knowledge of the function of proteins present in these venoms is important to guide diagnosis, therapeutics, besides being a source of a large variety of biotechnological active molecules. Although our understanding about the characteristics and function of arthropod venoms has been evolving in the last decades, a major aspect crucial for the function of these proteins remains poorly studied, the posttranslational modifications (PTMs). Comprehension of such modifications can contribute to better understanding the basis of envenomation, leading to improvements in the specificities of potential therapeutic toxins. Therefore, in this review, we bring to light protein/toxin PTMs in arthropod venoms by accessing the information present in the UniProtKB/Swiss-Prot database, including experimental and putative inferences. Then, we concentrate our discussion on the current knowledge on protein phosphorylation and glycosylation, highlighting the potential functionality of these modifications in arthropod venom. We also briefly describe general approaches to study "PTM-functional-venomics", herein referred to the integration of PTM-venomics with a functional investigation of PTM impact on venom biology. Furthermore, we discuss the bottlenecks in toxinology studies covering PTM investigation. In conclusion, through the mining of PTMs in arthropod venoms, we observed a large gap in this field that limits our understanding on the biology of these venoms, affecting the diagnosis and therapeutics development. Hence, we encourage community efforts to draw attention to a better understanding of PTM in arthropod venom toxins.
Collapse
Affiliation(s)
- Marcella Nunes de Melo-Braga
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais (UFMG), Belo Horizonte, MG, Brazil
| | - Raniele da Silva Moreira
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais (UFMG), Belo Horizonte, MG, Brazil
| | - João Henrique Diniz Brandão Gervásio
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais (UFMG), Belo Horizonte, MG, Brazil
| | - Liza Figueiredo Felicori
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais (UFMG), Belo Horizonte, MG, Brazil
| |
Collapse
|
32
|
The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms. Comput Struct Biotechnol J 2022; 20:1402-1412. [PMID: 35386104 PMCID: PMC8956878 DOI: 10.1016/j.csbj.2022.03.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 03/09/2022] [Accepted: 03/09/2022] [Indexed: 01/24/2023] Open
Abstract
Most correct de novo peptides have ⩽1 missing fragmentation cleavages. DeepNovo outperforms Novor for peptide accuracy for both data types. Novor excels at amino acid recall when many fragmentation cleavages are missing. Deep learning allows DeepNovo to predict amino acids without adjacent peaks.
Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms’ correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms’ improvements and offer potential avenues to overcome current inherent data limitations.
Collapse
|
33
|
Brady MM, Meyer AS. Cataloguing the proteome: Current developments in single-molecule protein sequencing. BIOPHYSICS REVIEWS 2022; 3:011304. [PMID: 38505228 PMCID: PMC10903494 DOI: 10.1063/5.0065509] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 01/13/2022] [Indexed: 03/21/2024]
Abstract
The cellular proteome is complex and dynamic, with proteins playing a critical role in cell-level biological processes that contribute to homeostasis, stimuli response, and disease pathology, among others. As such, protein analysis and characterization are of extreme importance in both research and clinical settings. In the last few decades, most proteomics analysis has relied on mass spectrometry, affinity reagents, or some combination thereof. However, these techniques are limited by their requirements for large sample amounts, low resolution, and insufficient dynamic range, making them largely insufficient for the characterization of proteins in low-abundance or single-cell proteomic analysis. Despite unique technical challenges, several single-molecule protein sequencing (SMPS) technologies have been proposed in recent years to address these issues. In this review, we outline several approaches to SMPS technologies and discuss their advantages, limitations, and potential contributions toward an accurate, sensitive, and high-throughput platform.
Collapse
Affiliation(s)
- Morgan M. Brady
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | - Anne S. Meyer
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
34
|
Abstract
Accurate full-length sequencing of a purified unknown protein is still challenging nowadays due to the error-prone mass-spectrometry (MS)-based methods. De novo identified peptide sequence largely contain errors, undermining the accuracy of assembly. Bias on the detectability of the peptides also makes low-coverage regions, resulting in gaps. Although recent advances on multi-enzyme hydrolysis and algorithms showed complete assembly of full-length protein sequences in a few examples, the robustness in practical application is still to be improved. Here, inspired by genome assembly strategies, we demonstrate a contig-scaffolding strategy to assemble protein sequences with high robustness and accuracy. This strategy integrates multiple unspecific hydrolysis methods to minimize the bias in the hydrolysis process. After de novo identification of the peptides, our assembly algorithm, named Multiple Contigs & Scaffolding (MuCS), assembles the peptide sequences in a multistep, i.e., contig-scaffold manner, with error correction in each step. MS data from different hydrolysis experiments complement each other for robust contig extension and error correction. We demonstrated that our strategy on three proteins and three replications all reached 100% coverage (except one with 98.85%) and 98.69-100% accuracy. It can also efficiently deal with the membrane protein, although the transmembrane region was missing due to the limitation of the MS. The three replicates reached 88.85-92.57% coverage and 97.57-100% accuracy. In sum, we provided a practical, robust, and accurate solution for full-length protein sequencing. The MuCS software is available at http://chi-biotech.com/mucs/.
Collapse
Affiliation(s)
- Zhi-Biao Mai
- Big Data Decision Institute, Jinan University, Guangzhou 510632, China.,Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Zhong-Hua Zhou
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| |
Collapse
|
35
|
Affinity Selection from Synthetic Peptide Libraries Enabled by De Novo MS/MS Sequencing. Int J Pept Res Ther 2022. [DOI: 10.1007/s10989-022-10370-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AbstractRecently, de novo MS/MS peptide sequencing has enabled the application of affinity selections to synthetic peptide mixtures that approach the diversity of phage libraries (> 108 random peptides). In conjunction with ‘split-mix’ solid phase synthesis to access equimolar peptide mixtures, this approach provides a straightforward means to examine synthetic peptide libraries of considerably higher diversity than has been feasible historically. Here, we offer a critical perspective on this work, report emerging data, and highlight opportunities for further methods refinement. With continued development, ‘affinity selection–mass spectrometry’ may become a complimentary approach to phage display, in vitro selection, and DNA-encoded libraries for the discovery of synthetic ligands that modulate protein function.
Collapse
|
36
|
Blakeley-Ruiz JA, Kleiner M. Considerations for Constructing a Protein Sequence Database for Metaproteomics. Comput Struct Biotechnol J 2022; 20:937-952. [PMID: 35242286 PMCID: PMC8861567 DOI: 10.1016/j.csbj.2022.01.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 01/14/2022] [Accepted: 01/18/2022] [Indexed: 12/14/2022] Open
Abstract
Mass spectrometry-based metaproteomics has emerged as a prominent technique for interrogating the functions of specific organisms in microbial communities, in addition to total community function. Identifying proteins by mass spectrometry requires matching mass spectra of fragmented peptide ions to a database of protein sequences corresponding to the proteins in the sample. This sequence database determines which protein sequences can be identified from the measurement, and as such the taxonomic and functional information that can be inferred from a metaproteomics measurement. Thus, the construction of the protein sequence database directly impacts the outcome of any metaproteomics study. Several factors, such as source of sequence information and database curation, need to be considered during database construction to maximize accurate protein identifications traceable to the species of origin. In this review, we provide an overview of existing strategies for database construction and the relevant studies that have sought to test and validate these strategies. Based on this review of the literature and our experience we provide a decision tree and best practices for choosing and implementing database construction strategies.
Collapse
Affiliation(s)
- J. Alfredo Blakeley-Ruiz
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Corresponding authors at: Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA.
| | - Manuel Kleiner
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
- Corresponding authors at: Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
37
|
Mühlhausen S, Schmitt HD, Plessmann U, Mienkus P, Sternisek P, Perl T, Weig M, Urlaub H, Bader O, Kollmar M. Proteogenomics analysis of CUG codon translation in the human pathogen Candida albicans. BMC Biol 2021; 19:258. [PMID: 34863173 PMCID: PMC8645108 DOI: 10.1186/s12915-021-01197-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 11/18/2021] [Indexed: 11/25/2022] Open
Abstract
Background Yeasts of the CTG-clade lineage, which includes the human-infecting Candida albicans, Candida parapsilosis and Candida tropicalis species, are characterized by an altered genetic code. Instead of translating CUG codons as leucine, as happens in most eukaryotes, these yeasts, whose ancestors are thought to have lost the relevant leucine-tRNA gene, translate CUG codons as serine using a serine-tRNA with a mutated anticodon, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$ {\mathrm{tRNA}}_{\mathrm{CAG}}^{\mathrm{Ser}} $$\end{document}tRNACAGSer. Previously reported experiments have suggested that 3–5% of the CTG-clade CUG codons are mistranslated as leucine due to mischarging of the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$ {\mathrm{tRNA}}_{\mathrm{CAG}}^{\mathrm{Ser}} $$\end{document}tRNACAGSer. The mistranslation was suggested to result in variable surface proteins explaining fast host adaptation and pathogenicity. Results In this study, we reassess this potential mistranslation by high-resolution mass spectrometry-based proteogenomics of multiple CTG-clade yeasts, including various C. albicans strains, isolated from colonized and from infected human body sites, and C. albicans grown in yeast and hyphal forms. Our data do not support a bias towards CUG codon mistranslation as leucine. Instead, our data suggest that (i) CUG codons are mistranslated at a frequency corresponding to the normal extent of ribosomal mistranslation with no preference for specific amino acids, (ii) CUG codons are as unambiguous (or ambiguous) as the related CUU leucine and UCC serine codons, (iii) tRNA anticodon loop variation across the CTG-clade yeasts does not result in any difference of the mistranslation level, and (iv) CUG codon unambiguity is independent of C. albicans’ strain pathogenicity or growth form. Conclusions Our findings imply that C. albicans does not decode CUG ambiguously. This suggests that the proposed misleucylation of the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$ {\mathrm{tRNA}}_{\mathrm{CAG}}^{\mathrm{Ser}} $$\end{document}tRNACAGSer might be as prevalent as every other misacylation or mistranslation event and, if at all, be just one of many reasons causing phenotypic diversity. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-01197-9.
Collapse
Affiliation(s)
- Stefanie Mühlhausen
- Theoretical Computer Science and Algorithmic Methods Group, Institute of Computer Science, University of Göttingen, Goldschmidtstr. 7, 37077, Göttingen, Germany
| | - Hans Dieter Schmitt
- Department of Neurobiology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| | - Uwe Plessmann
- Bioanalytical Mass Spectrometry, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| | - Peter Mienkus
- Department of Neurobiology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| | - Pia Sternisek
- Institute for Medical Microbiology, University Medical Center Göttingen, Kreuzbergring 57, 37075, Göttingen, Germany
| | - Thorsten Perl
- Intermediate Care, University Medical Center Göttingen, Robert Koch Strasse 40, 37075, Göttingen, Germany
| | - Michael Weig
- Institute for Medical Microbiology, University Medical Center Göttingen, Kreuzbergring 57, 37075, Göttingen, Germany
| | - Henning Urlaub
- Bioanalytical Mass Spectrometry, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany.,Bioanalytics Group, Department of Clinical Chemistry, University Medical Center Göttingen, Robert Koch Strasse 40, 37075, Göttingen, Germany
| | - Oliver Bader
- Institute for Medical Microbiology, University Medical Center Göttingen, Kreuzbergring 57, 37075, Göttingen, Germany
| | - Martin Kollmar
- Theoretical Computer Science and Algorithmic Methods Group, Institute of Computer Science, University of Göttingen, Goldschmidtstr. 7, 37077, Göttingen, Germany. .,Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany.
| |
Collapse
|
38
|
Chen L, Yang Y, Zhang Y, Li K, Cai H, Wang H, Zhao Q. The Small Open Reading Frame-Encoded Peptides: Advances in Methodologies and Functional Studies. Chembiochem 2021; 23:e202100534. [PMID: 34862721 DOI: 10.1002/cbic.202100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/15/2021] [Indexed: 11/07/2022]
Abstract
Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.
Collapse
Affiliation(s)
- Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China.,Laboratory for Synthetic Chemistry and Chemical Biology Limited, Hong Kong Science and Technology Park, New Territories, Hong Kong SAR, 999077, P. R. China
| | - Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Kecheng Li
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510623, P. R. China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510623, P. R. China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| |
Collapse
|
39
|
Hruska M, Holub D. Evaluation of an integrative Bayesian peptide detection approach on a combinatorial peptide library. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2021; 27:217-234. [PMID: 34989269 DOI: 10.1177/14690667211066725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detection of peptides lies at the core of bottom-up proteomics analyses. We examined a Bayesian approach to peptide detection, integrating match-based models (fragments, retention time, isotopic distribution, and precursor mass) and peptide prior probability models under a unified probabilistic framework. To assess the relevance of these models and their various combinations, we employed a complete- and a tail-complete search of a low-precursor-mass synthetic peptide library based on oncogenic KRAS peptides. The fragment match was by far the most informative match-based model, while the retention time match was the only remaining such model with an appreciable impact--increasing correct detections by around 8 %. A peptide prior probability model built from a reference proteome greatly improved the detection over a uniform prior, essentially transforming de novo sequencing into a reference-guided search. The knowledge of a correct sequence tag in advance to peptide-spectrum matching had only a moderate impact on peptide detection unless the tag was long and of high certainty. The approach also derived more precise error rates on the analyzed combinatorial peptide library than those estimated using PeptideProphet and Percolator, showing its potential applicability for the detection of homologous peptides. Although the approach requires further computational developments for routine data analysis, it illustrates the value of peptide prior probabilities and presents a Bayesian approach for their incorporation into peptide detection.
Collapse
Affiliation(s)
- Miroslav Hruska
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
- Department of Computer Science, Faculty of Science, 98735Palacky University, Olomouc, Czech Republic
| | - Dusan Holub
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
| |
Collapse
|
40
|
Burgos R, Weber M, Gallo C, Lluch-Senar M, Serrano L. Widespread ribosome stalling in a genome-reduced bacterium and the need for translational quality control. iScience 2021; 24:102985. [PMID: 34485867 PMCID: PMC8403727 DOI: 10.1016/j.isci.2021.102985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 07/22/2021] [Accepted: 08/11/2021] [Indexed: 11/21/2022] Open
Abstract
Trans-translation is a ubiquitous bacterial mechanism of ribosome rescue mediated by a transfer-messenger RNA (tmRNA) that adds a degradation tag to the truncated nascent polypeptide. Here, we characterize this quality control system in a genome-reduced bacterium, Mycoplasma pneumoniae (MPN), and perform a comparative analysis of protein quality control components in slow and fast-growing prokaryotes. We show in vivo that in MPN the sole quality control cytoplasmic protease (Lon) degrades efficiently tmRNA-tagged proteins. Analysis of tmRNA-mutants encoding a tag resistant to proteolysis reveals extensive tagging activity under normal growth. Unlike knockout strains, these mutants are viable demonstrating the requirement of tmRNA-mediated ribosome recycling. Chaperone and Lon steady-state levels maintain proteostasis in these mutants suggesting a model in which co-evolution of Lon and their substrates offer simple mechanisms of regulation without specialized degradation machineries. Finally, comparative analysis shows relative increase in Lon/Chaperone levels in slow-growing bacteria suggesting physiological adaptation to growth demand. Lon degrades efficiently tmRNA-tagged proteins in a genome-reduced bacterium tmRNA-tag mutants are viable and reveal extensive tagging activity in M. pneumoniae Co-evolution of Lon and their substrates offer simple mechanisms of regulation Chaperone and Lon relative levels correlate with bacterial growth rates
Collapse
Affiliation(s)
- Raul Burgos
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Corresponding author
| | - Marc Weber
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Carolina Gallo
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Maria Lluch-Senar
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- ICREA, Pg. Lluis Companys 23, Barcelona 08010, Spain
- Corresponding author
| |
Collapse
|
41
|
Karimi MR, Karimi AH, Abolmaali S, Sadeghi M, Schmitz U. Prospects and challenges of cancer systems medicine: from genes to disease networks. Brief Bioinform 2021; 23:6361045. [PMID: 34471925 PMCID: PMC8769701 DOI: 10.1093/bib/bbab343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 12/20/2022] Open
Abstract
It is becoming evident that holistic perspectives toward cancer are crucial in deciphering the overwhelming complexity of tumors. Single-layer analysis of genome-wide data has greatly contributed to our understanding of cellular systems and their perturbations. However, fundamental gaps in our knowledge persist and hamper the design of effective interventions. It is becoming more apparent than ever, that cancer should not only be viewed as a disease of the genome but as a disease of the cellular system. Integrative multilayer approaches are emerging as vigorous assets in our endeavors to achieve systemic views on cancer biology. Herein, we provide a comprehensive review of the approaches, methods and technologies that can serve to achieve systemic perspectives of cancer. We start with genome-wide single-layer approaches of omics analyses of cellular systems and move on to multilayer integrative approaches in which in-depth descriptions of proteogenomics and network-based data analysis are provided. Proteogenomics is a remarkable example of how the integration of multiple levels of information can reduce our blind spots and increase the accuracy and reliability of our interpretations and network-based data analysis is a major approach for data interpretation and a robust scaffold for data integration and modeling. Overall, this review aims to increase cross-field awareness of the approaches and challenges regarding the omics-based study of cancer and to facilitate the necessary shift toward holistic approaches.
Collapse
Affiliation(s)
| | | | | | - Mehdi Sadeghi
- Department of Cell & Molecular Biology, Semnan University, Semnan, Iran
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville, QLD 4811, Australia
| |
Collapse
|
42
|
Diving Deep into the Data: A Review of Deep Learning Approaches and Potential Applications in Foodomics. Foods 2021; 10:foods10081803. [PMID: 34441579 PMCID: PMC8392494 DOI: 10.3390/foods10081803] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 07/30/2021] [Accepted: 08/02/2021] [Indexed: 01/18/2023] Open
Abstract
Deep learning is a trending field in bioinformatics; so far, mostly known for image processing and speech recognition, but it also shows promising possibilities for data processing in food analysis, especially, foodomics. Thus, more and more deep learning approaches are used. This review presents an introduction into deep learning in the context of metabolomics and proteomics, focusing on the prediction of shelf-life, food authenticity, and food quality. Apart from the direct food-related applications, this review summarizes deep learning for peptide sequencing and its context to food analysis. The review’s focus further lays on MS (mass spectrometry)-based approaches. As a result of the constant development and improvement of analytical devices, as well as more complex holistic research questions, especially with the diverse and complex matrix food, there is a need for more effective methods for data processing. Deep learning might offer meeting this need and gives prospect to deal with the vast amount and complexity of data.
Collapse
|
43
|
Wang B, Wang Z, Pan N, Huang J, Wan C. Improved Identification of Small Open Reading Frames Encoded Peptides by Top-Down Proteomic Approaches and De Novo Sequencing. Int J Mol Sci 2021; 22:ijms22115476. [PMID: 34067398 PMCID: PMC8197016 DOI: 10.3390/ijms22115476] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 05/14/2021] [Accepted: 05/18/2021] [Indexed: 12/20/2022] Open
Abstract
Small open reading frames (sORFs) have translational potential to produce peptides that play essential roles in various biological processes. Nevertheless, many sORF-encoded peptides (SEPs) are still on the prediction level. Here, we construct a strategy to analyze SEPs by combining top-down and de novo sequencing to improve SEP identification and sequence coverage. With de novo sequencing, we identified 1682 peptides mapping to 2544 human sORFs, which were all first characterized in this work. Two-thirds of these new sORFs have reading frame shifts and use a non-ATG start codon. The top-down approach identified 241 human SEPs, with high sequence coverage. The average length of the peptides from the bottom-up database search was 19 amino acids (AA); from de novo sequencing, it was 9 AA; and from the top-down approach, it was 25 AA. The longer peptide positively boosts the sequence coverage, more efficiently distinguishing SEPs from the known gene coding sequence. Top-down has the advantage of identifying peptides with sequential K/R or high K/R content, which is unfavorable in the bottom-up approach. Our method can explore new coding sORFs and obtain highly accurate sequences of their SEPs, which can also benefit future function research.
Collapse
|
44
|
Yang C, Shan YC, Zhang WJ, Dai ZP, Zhang LH, Zhang YK. Full-length Protein Sequencing Based on Continuous Digestion Using Non-specific Proteases. ACTA CHIMICA SINICA 2021. [DOI: 10.6023/a21010025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
45
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
46
|
Buric F, Zrimec J, Zelezniak A. Parallel Factor Analysis Enables Quantification and Identification of Highly Convolved Data-Independent-Acquired Protein Spectra. PATTERNS (NEW YORK, N.Y.) 2020; 1:100137. [PMID: 33336195 PMCID: PMC7733873 DOI: 10.1016/j.patter.2020.100137] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 09/14/2020] [Accepted: 10/12/2020] [Indexed: 11/26/2022]
Abstract
High-throughput data-independent acquisition (DIA) is the method of choice for quantitative proteomics, combining the best practices of targeted and shotgun approaches. The resultant DIA spectra are, however, highly convolved and with no direct precursor-fragment correspondence, complicating biological sample analysis. Here, we present CANDIA (canonical decomposition of data-independent-acquired spectra), a GPU-powered unsupervised multiway factor analysis framework that deconvolves multispectral scans to individual analyte spectra, chromatographic profiles, and sample abundances, using parallel factor analysis. The deconvolved spectra can be annotated with traditional database search engines or used as high-quality input for de novo sequencing methods. We demonstrate that spectral libraries generated with CANDIA substantially reduce the false discovery rate underlying the validation of spectral quantification. CANDIA covers up to 33 times more total ion current than library-based approaches, which typically use less than 5% of total recorded ions, thus allowing quantification and identification of signals from unexplored DIA spectra. Conventional DIA spectral libraries cover less than 3% of a scan's total ion count CANDIA deconvolves peptide signals by leveraging all scan data CANDIA uses GPUs to enable tensor algebra on massive DIA mass spectrometry data CANDIA output enables high-confidence and precise quantitative proteomics
The latest high-throughput mass spectrometry-based technologies can record virtually all molecules from complex biological samples, providing a holistic picture of proteomes in cells and tissues and enabling an evaluation of the overall status of a person's health. However, current best practices are still only scratching the surface of the wealth of available information obtained from the massive proteome datasets, and efficient novel data-driven strategies are needed. Powered by advances in GPU hardware and open-source machine-learning frameworks, we developed a data-driven approach, CANDIA, which disassembles highly complex proteomics data into the elementary molecular signatures of the proteins in biological samples. Our work provides a performant and adaptable solution that complements existing mass spectrometry techniques. As the central mathematical methods are generic, other scientific fields that are dealing with highly convolved datasets will benefit from this work.
Collapse
Affiliation(s)
- Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, Gothenburg 412 96, Sweden
| | - Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, Gothenburg 412 96, Sweden
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, Gothenburg 412 96, Sweden.,Science for Life Laboratory, Tomtebodavägen 23a, Stockholm 171 65, Sweden
| |
Collapse
|
47
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
48
|
O'Bryon I, Jenson SC, Merkley ED. Flying blind, or just flying under the radar? The underappreciated power of de novo methods of mass spectrometric peptide identification. Protein Sci 2020; 29:1864-1878. [PMID: 32713088 PMCID: PMC7454419 DOI: 10.1002/pro.3919] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/21/2020] [Accepted: 07/23/2020] [Indexed: 12/15/2022]
Abstract
Mass spectrometry-based proteomics is a popular and powerful method for precise and highly multiplexed protein identification. The most common method of analyzing untargeted proteomics data is called database searching, where the database is simply a collection of protein sequences from the target organism, derived from genome sequencing. Experimental peptide tandem mass spectra are compared to simplified models of theoretical spectra calculated from the translated genomic sequences. However, in several interesting application areas, such as forensics, archaeology, venomics, and others, a genome sequence may not be available, or the correct genome sequence to use is not known. In these cases, de novo peptide identification can play an important role. De novo methods infer peptide sequence directly from the tandem mass spectrum without reference to a sequence database, usually using graph-based or machine learning algorithms. In this review, we provide a basic overview of de novo peptide identification methods and applications, briefly covering de novo algorithms and tools, and focusing in more depth on recent applications from venomics, metaproteomics, forensics, and characterization of antibody drugs.
Collapse
Affiliation(s)
- Isabelle O'Bryon
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Sarah C. Jenson
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Eric D. Merkley
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| |
Collapse
|
49
|
Cautereels J, Van Hee N, Chatterjee S, Van Alsenoy C, Lemière F, Blockhuys F. QCMS 2 as a new method for providing insight into peptide fragmentation: The influence of the side-chain and inter-side-chain interactions. JOURNAL OF MASS SPECTROMETRY : JMS 2020; 55:e4446. [PMID: 31652378 DOI: 10.1002/jms.4446] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 09/12/2019] [Accepted: 09/21/2019] [Indexed: 06/10/2023]
Abstract
The identification of peptides and proteins from tandem mass spectra is a difficult task and multiple tools have been developed to aid this identification. We present a new method called quantum chemical mass spectrometry for materials science (QCMS2 ), which is based on quantum chemical calculations of bond orders, reaction, and transition-state energies at the DFT/B3LYP/6-311+G* level of theory. The method was used to describe the fragmentation pathways of five X-His-Ser tripeptides with X = Asn, Asp, Glu, Ser, and Trp, thereby focusing on the influence of the side chain and inter-side-chain interactions on the fragmentation. The main features in the mass spectra of the five tripeptides were correctly reproduced, and a number of fragments were assigned to fragmentations involving the side chain and the influence of inter-side-chain interactions. Product ion spectra were recorded to evaluate the capabilities and limitations of QCMS2 and a number of conventional tools.
Collapse
Affiliation(s)
- Julie Cautereels
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| | - Nils Van Hee
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| | - Sneha Chatterjee
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| | | | - Filip Lemière
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| | - Frank Blockhuys
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
50
|
Cautereels J, Giribaldi J, Enjalbal C, Blockhuys F. Quantum chemical mass spectrometry: Ab initio study of b 2 -ion formation mechanisms for the singly protonated Gln-His-Ser tripeptide. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2020; 34:e8778. [PMID: 32144813 DOI: 10.1002/rcm.8778] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 02/28/2020] [Accepted: 03/05/2020] [Indexed: 06/10/2023]
Abstract
RATIONALE Both amide bond protonation triggering peptide fragmentations and the controversial b2 -ion structures have been subjects of intense research. The involvement of histidine (H), with its imidazole side chain that induces specific dissociation patterns involving inter-side-chain (ISC) interactions, in b2 -ion formation was investigated, focusing on the QHS model tripeptide. METHODS To identify the effect of histidine on fragmentations issued from ISC interactions, QHS was selected for a comprehensive analysis of the pathways leading to the three possible b2 -ion structures, using quantum chemical calculations performed at the DFT/B3LYP/6-311+G* level of theory. Electrospray ionization ion trap mass spectrometry allowed the recording of MS2 and MS3 tandem mass spectra, whereas the Quantum Chemical Mass Spectrometry for Materials Science (QCMS2 ) method was used to predict fragmentation patterns. RESULTS Whereas it is very difficult to differentiate among protonated oxazolone, diketopiperazine, or lactam b2 -ions using MS2 and MS3 mass spectra, the calculations indicated that the QH b2 -ion (detected at m/z 266) is probably a mixture of the lactam and oxazolone structures formed after amide nitrogen protonation, making the formation of diketopiperazine less likely as it requires an additional step for its formation. CONCLUSIONS In contrast to glycine-histidine-containing b2 -ions, known to be issued from the backbone-imidazole cyclization, we found that interactions between the side chains were not obvious to perceive, neither from a thermodynamics nor from a fragmentation perspective, emphasizing the importance of the whole sequence on the dissociation behavior usually demonstrated from simple glycine-containing tripeptides.
Collapse
Affiliation(s)
- Julie Cautereels
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| | | | | | - Frank Blockhuys
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| |
Collapse
|