1
|
Roberts DS, Loo JA, Tsybin YO, Liu X, Wu S, Chamot-Rooke J, Agar JN, Paša-Tolić L, Smith LM, Ge Y. Top-down proteomics. NATURE REVIEWS. METHODS PRIMERS 2024; 4:38. [PMID: 39006170 PMCID: PMC11242913 DOI: 10.1038/s43586-024-00318-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 04/24/2024] [Indexed: 07/16/2024]
Abstract
Proteoforms, which arise from post-translational modifications, genetic polymorphisms and RNA splice variants, play a pivotal role as drivers in biology. Understanding proteoforms is essential to unravel the intricacies of biological systems and bridge the gap between genotypes and phenotypes. By analysing whole proteins without digestion, top-down proteomics (TDP) provides a holistic view of the proteome and can decipher protein function, uncover disease mechanisms and advance precision medicine. This Primer explores TDP, including the underlying principles, recent advances and an outlook on the future. The experimental section discusses instrumentation, sample preparation, intact protein separation, tandem mass spectrometry techniques and data collection. The results section looks at how to decipher raw data, visualize intact protein spectra and unravel data analysis. Additionally, proteoform identification, characterization and quantification are summarized, alongside approaches for statistical analysis. Various applications are described, including the human proteoform project and biomedical, biopharmaceutical and clinical sciences. These are complemented by discussions on measurement reproducibility, limitations and a forward-looking perspective that outlines areas where the field can advance, including potential future applications.
Collapse
Affiliation(s)
- David S Roberts
- Department of Chemistry, Stanford University, Stanford, CA, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
| | - Joseph A Loo
- Department of Chemistry and Biochemistry, Department of Biological Chemistry, University of California - Los Angeles, Los Angeles, CA, USA
| | | | - Xiaowen Liu
- Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, The University of Alabama, Tuscaloosa, AL, USA
| | | | - Jeffrey N Agar
- Departments of Chemistry and Chemical Biology and Pharmaceutical Sciences, Northeastern University, Boston, MA, USA
| | - Ljiljana Paša-Tolić
- Environmental and Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, WI, USA
| | - Ying Ge
- Department of Chemistry, University of Wisconsin, Madison, WI, USA
- Department of Cell and Regenerative Biology, Human Proteomics Program, University of Wisconsin - Madison, Madison, WI, USA
| |
Collapse
|
2
|
Abstract
Artificial intelligence (AI) methods have been and are now being increasingly integrated in prediction software implemented in bioinformatics and its glycoscience branch known as glycoinformatics. AI techniques have evolved in the past decades, and their applications in glycoscience are not yet widespread. This limited use is partly explained by the peculiarities of glyco-data that are notoriously hard to produce and analyze. Nonetheless, as time goes, the accumulation of glycomics, glycoproteomics, and glycan-binding data has reached a point where even the most recent deep learning methods can provide predictors with good performance. We discuss the historical development of the application of various AI methods in the broader field of glycoinformatics. A particular focus is placed on shining a light on challenges in glyco-data handling, contextualized by lessons learnt from related disciplines. Ending on the discussion of state-of-the-art deep learning approaches in glycoinformatics, we also envision the future of glycoinformatics, including development that need to occur in order to truly unleash the capabilities of glycoscience in the systems biology era.
Collapse
Affiliation(s)
- Daniel Bojar
- Department
of Chemistry and Molecular Biology, University
of Gothenburg, Gothenburg 41390, Sweden
- Wallenberg
Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden
| | - Frederique Lisacek
- Proteome
Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
- Computer
Science Department & Section of Biology, University of Geneva, route de Drize 7, CH-1227, Geneva, Switzerland
| |
Collapse
|
3
|
Li L, Ning Z, Cheng K, Zhang X, Simopoulos CMA, Figeys D. iMetaLab Suite: A one-stop toolset for metaproteomics. IMETA 2022; 1:e25. [PMID: 38868572 PMCID: PMC10989937 DOI: 10.1002/imt2.25] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/15/2022] [Accepted: 05/02/2022] [Indexed: 06/14/2024]
Abstract
Metaproteomics is a recently thriving technique that studies the collection of proteins in complex microbiomes of the human, animal, plant, and environment. The bioinformatics workflow required for metaproteomics research, from the database search and protein quantification to downstream functional and taxonomic analysis has been challenging and thus limiting the accessibility of metaproteomics to microbiome researchers. To overcome these challenges, we have developed a set of tools named iMetaLab Suite. iMetaLab Suite includes the following components: (1) MetaLab Desktop, an automated database search software that facilities proteins identification and quantitation from microbiomes; (2) the automated iMetaReport that allows users to quickly access database search results and data set profiles; and (3) an interactive online toolset, iMetaShiny, covering most frequently used functional, taxonomic, and statistical analysis in metaproteomics. iMetaLab Suite is a free, easily accessible, and actively updated toolset available to assist researchers to explore metaproteomic data.
Collapse
Affiliation(s)
- Leyuan Li
- School of Pharmaceutical Sciences, Faculty of MedicineUniversity of OttawaOttawaOntarioCanada
- Ottawa Institute of Systems BiologyUniversity of OttawaOttawaOntarioCanada
| | - Zhibin Ning
- School of Pharmaceutical Sciences, Faculty of MedicineUniversity of OttawaOttawaOntarioCanada
- Ottawa Institute of Systems BiologyUniversity of OttawaOttawaOntarioCanada
| | - Kai Cheng
- School of Pharmaceutical Sciences, Faculty of MedicineUniversity of OttawaOttawaOntarioCanada
- Ottawa Institute of Systems BiologyUniversity of OttawaOttawaOntarioCanada
| | - Xu Zhang
- School of Pharmaceutical Sciences, Faculty of MedicineUniversity of OttawaOttawaOntarioCanada
- Ottawa Institute of Systems BiologyUniversity of OttawaOttawaOntarioCanada
| | - Caitlin M. A. Simopoulos
- School of Pharmaceutical Sciences, Faculty of MedicineUniversity of OttawaOttawaOntarioCanada
- Ottawa Institute of Systems BiologyUniversity of OttawaOttawaOntarioCanada
| | - Daniel Figeys
- School of Pharmaceutical Sciences, Faculty of MedicineUniversity of OttawaOttawaOntarioCanada
- Ottawa Institute of Systems BiologyUniversity of OttawaOttawaOntarioCanada
| |
Collapse
|
4
|
Rusconi F. Free Open Source Software for Protein and Peptide Mass Spectrometry- based Science. Curr Protein Pept Sci 2021; 22:134-147. [PMID: 33461461 DOI: 10.2174/1389203722666210118160946] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 10/12/2020] [Accepted: 01/04/2021] [Indexed: 12/28/2022]
Abstract
In the field of biology, and specifically in protein and peptide science, the power of mass spectrometry is that it is applicable to a vast spectrum of applications. Mass spectrometry can be applied to identify proteins and peptides in complex mixtures, to identify and locate post-translational modifications, to characterize the structure of proteins and peptides to the most detailed level or to detect protein-ligand non-covalent interactions. Thanks to the Free and Open Source Software (FOSS) movement, scientists have limitless opportunities to deepen their skills in software development to code software that solves mass spectrometric data analysis problems. After the conversion of raw data files into open standard format files, the entire spectrum of data analysis tasks can now be performed integrally on FOSS platforms, like GNU/Linux, and only with FOSS solutions. This review presents a brief history of mass spectrometry open file formats and goes on with the description of FOSS projects that are commonly used in protein and peptide mass spectrometry fields of endeavor: identification projects that involve mostly automated pipelines, like proteomics and peptidomics, and bio-structural characterization projects that most often involve manual scrutiny of the mass data. Projects of the last kind usually involve software that allows the user to delve into the mass data in an interactive graphics-oriented manner. Software projects are thus categorized on the basis of these criteria: software libraries for software developers vs desktop-based graphical user interface, software for the end-user and automated pipeline-based data processing vs interactive graphics-based mass data scrutiny.
Collapse
Affiliation(s)
- Filippo Rusconi
- PAPPSO, Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| |
Collapse
|
5
|
Ricart E, Pupin M, Müller M, Lisacek F. Automatic Annotation and Dereplication of Tandem Mass Spectra of Peptidic Natural Products. Anal Chem 2020; 92:15862-15871. [DOI: 10.1021/acs.analchem.0c03208] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Emma Ricart
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, Geneva 1211, Switzerland
- Computer Science Department, University of Geneva, Geneva 1227, Switzerland
| | - Maude Pupin
- University Lille, CNRS, Centrale Lille, UMR 9189−CRIStAL−Centre de Recherche en Informatique Signal et Automatique de Lille, Lille F-59000, France
| | - Markus Müller
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, Geneva 1211, Switzerland
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, Amphipole Building, Quartier Sorge, Lausanne 1015, Switzerland
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, Geneva 1211, Switzerland
- Computer Science Department, University of Geneva, Geneva 1227, Switzerland
- Section of Biology, University of Geneva, Geneva 1227, Switzerland
| |
Collapse
|
6
|
Stricker T, Bonner R, Lisacek F, Hopfgartner G. Adduct annotation in liquid chromatography/high-resolution mass spectrometry to enhance compound identification. Anal Bioanal Chem 2020; 413:503-517. [PMID: 33123762 PMCID: PMC7806579 DOI: 10.1007/s00216-020-03019-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 09/21/2020] [Accepted: 10/19/2020] [Indexed: 12/31/2022]
Abstract
Annotation and interpretation of full scan electrospray mass spectra of metabolites is complicated by the presence of a wide variety of ions. Not only protonated, deprotonated, and neutral loss ions but also sodium, potassium, and ammonium adducts as well as oligomers are frequently observed. This diversity challenges automatic annotation and is often poorly addressed by current annotation tools. In many cases, annotation is integrated in metabolomics workflows and is based on specific chromatographic peak-picking tools. We introduce mzAdan, a nonchromatography-based multipurpose standalone application that was developed for the annotation and exploration of convolved high-resolution ESI-MS spectra. The tool annotates single or multiple accurate mass spectra using a customizable adduct annotation list and outputs a list of [M+H]+ candidates. MzAdan was first tested with a collection of 408 analytes acquired with flow injection analysis. This resulted in 402 correct [M+H]+ identifications and, with combinations of sodium, ammonium, and potassium adducts and water and ammonia losses within a tolerance of 10 mmu, explained close to 50% of the total ion current. False positives were monitored with mass accuracy and bias as well as chromatographic behavior which led to the identification of adducts with calcium instead of the expected potassium. MzAdan was then integrated in a workflow with XCMS for the untargeted LC-MS data analysis of a 52 metabolite standard mix and a human urine sample. The results were benchmarked against three other annotation tools, CAMERA, findMAIN, and CliqueMS: findMAIN and mzAdan consistently produced higher numbers of [M+H]+ candidates compared with CliqueMS and CAMERA, especially with co-eluting metabolites. Detection of low-intensity ions and correct grouping were found to be essential for annotation performance. Graphical abstract ![]()
Collapse
Affiliation(s)
- Thomas Stricker
- Life Sciences Mass Spectrometry, Department of Inorganic and Analytical Chemistry, University of Geneva, 24 Quai Ernest Ansermet, 1211, Geneva 4, Switzerland
- Proteome Informatics Group (PIG), Swiss Institute of Bioinformatics and University of Geneva, 7, route de Drize, 1211, Geneva 4, Switzerland
| | - Ron Bonner
- Ron Bonner Consulting, Newmarket, ON, L3Y 3C7, Canada
| | - Frédérique Lisacek
- Proteome Informatics Group (PIG), Swiss Institute of Bioinformatics and University of Geneva, 7, route de Drize, 1211, Geneva 4, Switzerland
| | - Gérard Hopfgartner
- Life Sciences Mass Spectrometry, Department of Inorganic and Analytical Chemistry, University of Geneva, 24 Quai Ernest Ansermet, 1211, Geneva 4, Switzerland.
| |
Collapse
|
7
|
Blanco-Míguez A, Fdez-Riverola F, Sánchez B, Lourenço A. Resources and tools for the high-throughput, multi-omic study of intestinal microbiota. Brief Bioinform 2020; 20:1032-1056. [PMID: 29186315 DOI: 10.1093/bib/bbx156] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 10/23/2017] [Indexed: 12/18/2022] Open
Abstract
The human gut microbiome impacts several aspects of human health and disease, including digestion, drug metabolism and the propensity to develop various inflammatory, autoimmune and metabolic diseases. Many of the molecular processes that play a role in the activity and dynamics of the microbiota go beyond species and genic composition and thus, their understanding requires advanced bioinformatics support. This article aims to provide an up-to-date view of the resources and software tools that are being developed and used in human gut microbiome research, in particular data integration and systems-level analysis efforts. These efforts demonstrate the power of standardized and reproducible computational workflows for integrating and analysing varied omics data and gaining deeper insights into microbe community structure and function as well as host-microbe interactions.
Collapse
Affiliation(s)
| | | | | | - Anália Lourenço
- Dpto. de Informática - Universidade de Vigo, ESEI - Escuela Superior de Ingeniería Informática, Edificio politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain
| |
Collapse
|
8
|
Chong C, Müller M, Pak H, Harnett D, Huber F, Grun D, Leleu M, Auger A, Arnaud M, Stevenson BJ, Michaux J, Bilic I, Hirsekorn A, Calviello L, Simó-Riudalbas L, Planet E, Lubiński J, Bryśkiewicz M, Wiznerowicz M, Xenarios I, Zhang L, Trono D, Harari A, Ohler U, Coukos G, Bassani-Sternberg M. Integrated proteogenomic deep sequencing and analytics accurately identify non-canonical peptides in tumor immunopeptidomes. Nat Commun 2020; 11:1293. [PMID: 32157095 PMCID: PMC7064602 DOI: 10.1038/s41467-020-14968-9] [Citation(s) in RCA: 186] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 02/12/2020] [Indexed: 12/20/2022] Open
Abstract
Efforts to precisely identify tumor human leukocyte antigen (HLA) bound peptides capable of mediating T cell-based tumor rejection still face important challenges. Recent studies suggest that non-canonical tumor-specific HLA peptides derived from annotated non-coding regions could elicit anti-tumor immune responses. However, sensitive and accurate mass spectrometry (MS)-based proteogenomics approaches are required to robustly identify these non-canonical peptides. We present an MS-based analytical approach that characterizes the non-canonical tumor HLA peptide repertoire, by incorporating whole exome sequencing, bulk and single-cell transcriptomics, ribosome profiling, and two MS/MS search tools in combination. This approach results in the accurate identification of hundreds of shared and tumor-specific non-canonical HLA peptides, including an immunogenic peptide derived from an open reading frame downstream of the melanoma stem cell marker gene ABCB5. These findings hold great promise for the discovery of previously unknown tumor antigens for cancer immunotherapy.
Collapse
Affiliation(s)
- Chloe Chong
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
| | - Markus Müller
- Vital IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Amphipôle, 1015, Lausanne, Switzerland
| | - HuiSong Pak
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
| | - Dermot Harnett
- Max Delbrück Centre for Molecular Medicine in the Helmholtz Association, Institute for Medical Systems Biology, Hannoversche Straße 28, 10115, Berlin, Germany
| | - Florian Huber
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
| | - Delphine Grun
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015, Lausanne, Switzerland
| | - Marion Leleu
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Amphipôle, 1015, Lausanne, Switzerland
| | - Aymeric Auger
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
| | - Marion Arnaud
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
| | - Brian J Stevenson
- Vital IT, Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Amphipôle, 1015, Lausanne, Switzerland
| | - Justine Michaux
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
| | - Ilija Bilic
- Max Delbrück Centre for Molecular Medicine in the Helmholtz Association, Institute for Medical Systems Biology, Hannoversche Straße 28, 10115, Berlin, Germany
| | - Antje Hirsekorn
- Max Delbrück Centre for Molecular Medicine in the Helmholtz Association, Institute for Medical Systems Biology, Hannoversche Straße 28, 10115, Berlin, Germany
| | - Lorenzo Calviello
- Max Delbrück Centre for Molecular Medicine in the Helmholtz Association, Institute for Medical Systems Biology, Hannoversche Straße 28, 10115, Berlin, Germany
| | - Laia Simó-Riudalbas
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015, Lausanne, Switzerland
| | - Evarist Planet
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015, Lausanne, Switzerland
| | - Jan Lubiński
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, ul. Rybacka 1, 70-204, Szczecin, Poland
- International Institute for Molecular Oncology, Jakuba Krauthofera 23, 60-203, Poznań, Poland
| | - Marta Bryśkiewicz
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, ul. Rybacka 1, 70-204, Szczecin, Poland
- International Institute for Molecular Oncology, Jakuba Krauthofera 23, 60-203, Poznań, Poland
| | - Maciej Wiznerowicz
- International Institute for Molecular Oncology, Jakuba Krauthofera 23, 60-203, Poznań, Poland
- Poznan University of Medical Sciences, Fredry 10, 61-701, Poznań, Poland
| | - Ioannis Xenarios
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
- Genome Center Health 2030, Chemin de Mines 9, 1202, Genève, Switzerland
- Department of Training and Research, CHUV/UNIL Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
| | - Lin Zhang
- Center for Research on Reproduction and Women's Health, University of Pennsylvania, 421 Curie Boulevard, Philadelphia, PA, 19104, USA
- Department of Obstetrics and Gynecology, University of Pennsylvania, 3400 Civic Center Boulevard, Philadelphia, PA, 19104, USA
| | - Didier Trono
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015, Lausanne, Switzerland
| | - Alexandre Harari
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
| | - Uwe Ohler
- Max Delbrück Centre for Molecular Medicine in the Helmholtz Association, Institute for Medical Systems Biology, Hannoversche Straße 28, 10115, Berlin, Germany
- Departments of Biology and Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
| | - George Coukos
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Agora Center, Rue du Bugnon 25A, 1005, Lausanne, Switzerland.
- Department of Oncology, Centre hospitalier universitaire vaudois (CHUV), Rue du Bugnon 46, 1011, Lausanne, Switzerland.
| |
Collapse
|
9
|
Abstract
Glycoinformatics is a critical resource for the study of glycobiology, and glycobiology is a necessary component for understanding the complex interface between intra- and extracellular spaces. Despite this, there is limited software available to scientists studying these topics, requiring each to create fundamental data structures and representations anew for each of their applications. This leads to poor uptake of standardization and loss of focus on the real problems. We present glypy, a library written in Python for reading, writing, manipulating, and transforming glycans at several levels of precision. In addition to understanding several common formats for textual representation of glycans, the library also provides application programming interfaces (APIs) for major community databases, including GlyTouCan and UnicarbKB. The library is freely available under the Apache 2 common license with source code available at https://github.com/mobiusklein/ and documentation at https://glypy.readthedocs.io/ .
Collapse
Affiliation(s)
- Joshua Klein
- Program for Bioinformatics , Boston University , Boston , Massachusetts 02215 , United States
| | - Joseph Zaia
- Program for Bioinformatics , Boston University , Boston , Massachusetts 02215 , United States.,Department of Biochemistry , Boston University , Boston , Massachusetts 02215 , United States
| |
Collapse
|
10
|
Alocci D, Suchánková P, Costa R, Hory N, Mariethoz J, Vařeková RS, Toukach P, Lisacek F. SugarSketcher: Quick and Intuitive Online Glycan Drawing. Molecules 2018; 23:E3206. [PMID: 30563078 PMCID: PMC6320881 DOI: 10.3390/molecules23123206] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 11/23/2018] [Accepted: 11/29/2018] [Indexed: 01/24/2023] Open
Abstract
SugarSketcher is an intuitive and fast JavaScript interface module for online drawing of glycan structures in the popular Symbol Nomenclature for Glycans (SNFG) notation and exporting them to various commonly used formats encoding carbohydrate sequences (e.g., GlycoCT) or quality images (e.g., svg). It does not require a backend server or any specific browser plugins and can be integrated in any web glycoinformatics project. SugarSketcher allows drawing glycans both for glycobiologists and non-expert users. The "quick mode" allows a newcomer to build up a glycan structure having only a limited knowledge in carbohydrate chemistry. The "normal mode" integrates advanced options which enable glycobiologists to tailor complex carbohydrate structures. The source code is freely available on GitHub and glycoinformaticians are encouraged to participate in the development process while users are invited to test a prototype available on the ExPASY web-site and send feedback.
Collapse
Affiliation(s)
- Davide Alocci
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland.
- Computer Science Department, University of Geneva, 1211 Geneva, Switzerland.
| | - Pavla Suchánková
- CEITEC⁻Central European Institute of Technology, Masaryk University Brno, 625 00 Brno-Bohunice, Czech Republic.
- National Centre for Biomolecular Research, Faculty of Science, 625 00 Brno-Bohunice, Czech Republic.
| | - Renaud Costa
- Polytech Nice Sophia, Campus SophiaTech, 06903 Sophia-Antipolis, France.
| | - Nicolas Hory
- Polytech Nice Sophia, Campus SophiaTech, 06903 Sophia-Antipolis, France.
| | - Julien Mariethoz
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland.
- Computer Science Department, University of Geneva, 1211 Geneva, Switzerland.
| | - Radka Svobodová Vařeková
- CEITEC⁻Central European Institute of Technology, Masaryk University Brno, 625 00 Brno-Bohunice, Czech Republic.
- National Centre for Biomolecular Research, Faculty of Science, 625 00 Brno-Bohunice, Czech Republic.
| | - Philip Toukach
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Laboratory of Carbohydrate Chemistry, 119991 Moscow, Russia.
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland.
- Computer Science Department, University of Geneva, 1211 Geneva, Switzerland.
- Section of Biology, University of Geneva, 1211 Geneva, Switzerland.
| |
Collapse
|
11
|
Mariethoz J, Alocci D, Gastaldello A, Horlacher O, Gasteiger E, Rojas-Macias M, Karlsson NG, Packer NH, Lisacek F. Glycomics@ExPASy: Bridging the Gap. Mol Cell Proteomics 2018; 17:2164-2176. [PMID: 30097532 PMCID: PMC6210229 DOI: 10.1074/mcp.ra118.000799] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 07/15/2018] [Indexed: 12/28/2022] Open
Abstract
Glycomics@ExPASy (https://www.expasy.org/glycomics) is the glycomics tab of ExPASy, the server of SIB Swiss Institute of Bioinformatics. It was created in 2016 to centralize web-based glycoinformatics resources developed within an international network of glycoscientists. The hosted collection currently includes mainly databases and tools created and maintained at SIB but also links to a range of reference resources popular in the glycomics community. The philosophy of our toolbox is that it should be {glycoscientist AND protein scientist}-friendly with the aim of (1) popularizing the use of bioinformatics in glycobiology and (2) emphasizing the relationship between glycobiology and protein-oriented bioinformatics resources. The scarcity of data bridging these two disciplines led us to design tools as interactive as possible based on database connectivity to facilitate data exploration and support hypothesis building. Glycomics@ExPASy was designed, and is developed, with a long-term vision in close collaboration with glycoscientists to meet as closely as possible the growing needs of the community for glycoinformatics.
Collapse
Affiliation(s)
- Julien Mariethoz
- From the ‡Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
- §Computer Science Department, University of Geneva, Geneva, Switzerland
| | - Davide Alocci
- From the ‡Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
- §Computer Science Department, University of Geneva, Geneva, Switzerland
| | - Alessandra Gastaldello
- From the ‡Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
- §Computer Science Department, University of Geneva, Geneva, Switzerland
| | - Oliver Horlacher
- From the ‡Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Elisabeth Gasteiger
- ¶Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Miguel Rojas-Macias
- ‖Glyco Inflammatory Group, Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Niclas G Karlsson
- ‖Glyco Inflammatory Group, Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Nicolle H Packer
- **Institute for Glycomics, Gold Coast Campus, Griffith University, Southport, QLD, Australia
- ‡‡Biomolecular Discovery & Design Research Centre, Macquarie University, North Ryde, NSW, Australia
| | - Frédérique Lisacek
- From the ‡Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland;
- §Computer Science Department, University of Geneva, Geneva, Switzerland
- §§Section of Biology, University of Geneva, Geneva, Switzerland
| |
Collapse
|
12
|
Robin T, Bairoch A, Müller M, Lisacek F, Lane L. Large-Scale Reanalysis of Publicly Available HeLa Cell Proteomics Data in the Context of the Human Proteome Project. J Proteome Res 2018; 17:4160-4170. [DOI: 10.1021/acs.jproteome.8b00392] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Thibault Robin
- CALIPHO Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, CH-1211 Geneva, Switzerland
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, CH-1211 Geneva, Switzerland
- Computer Science Department, University of Geneva, CH-1211 Geneva, Switzerland
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, CH-1211 Geneva, Switzerland
| | - Amos Bairoch
- CALIPHO Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, CH-1211 Geneva, Switzerland
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, CH-1211 Geneva, Switzerland
| | - Markus Müller
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, Genopode Building, Quartier Sorge, CH-1015 Lausanne, Switzerland
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, CH-1211 Geneva, Switzerland
- Computer Science Department, University of Geneva, CH-1211 Geneva, Switzerland
- Section of Biology, University of Geneva, CH-1211 Geneva, Switzerland
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics, CMU, Rue Michel-Servet 1, CH-1211 Geneva, Switzerland
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, CH-1211 Geneva, Switzerland
| |
Collapse
|
13
|
Mylonas R, Beer I, Iseli C, Chong C, Pak HS, Gfeller D, Coukos G, Xenarios I, Müller M, Bassani-Sternberg M. Estimating the Contribution of Proteasomal Spliced Peptides to the HLA-I Ligandome. Mol Cell Proteomics 2018; 17:2347-2357. [PMID: 30171158 PMCID: PMC6283289 DOI: 10.1074/mcp.ra118.000877] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 08/27/2018] [Indexed: 12/21/2022] Open
Abstract
It has been reported that about 30% of the HLA-I ligands are produced by proteasomal splicing of two noncontiguous fragments of a parental protein. We report that the identification of many of those spliced peptides is ambiguous. With an alternative workflow, based on de novo sequencing and subsequent verification with multiple search tools, we estimate that the upper bound for the proportion of cis-spliced peptides is 2–6%. Nevertheless, the true contribution of spliced peptides to the ligandome may be much smaller. Spliced peptides are short protein fragments spliced together in the proteasome by peptide bond formation. True estimation of the contribution of proteasome-spliced peptides (PSPs) to the global human leukocyte antigen (HLA) ligandome is critical. A recent study suggested that PSPs contribute up to 30% of the HLA ligandome. We performed a thorough reanalysis of the reported results using multiple computational tools and various validation steps and concluded that only a fraction of the proposed PSPs passes the quality filters. To better estimate the actual number of PSPs, we present an alternative workflow. We performed de novo sequencing of the HLA-peptide spectra and discarded all de novo sequences found in the UniProt database. We checked whether the remaining de novo sequences could match spliced peptides from human proteins. The spliced sequences were appended to the UniProt fasta file, which was searched by two search tools at a false discovery rate (FDR) of 1%. We find that 2–6% of the HLA ligandome could be explained as spliced protein fragments. The majority of these potential PSPs have good peptide-spectrum match properties and are predicted to bind the respective HLA molecules. However, it remains to be shown how many of these potential PSPs actually originate from proteasomal splicing events.
Collapse
Affiliation(s)
- Roman Mylonas
- Vital-IT, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Ilan Beer
- Adicet Bio Israel, Ltd., Technion City, 32000, Haifa, Israel
| | - Christian Iseli
- Vital-IT, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Chloe Chong
- Ludwig Cancer Research Center, University of Lausanne, 1066 Epalinges, Switzerland; Department of Oncology, University Hospital of Lausanne, 1011 Lausanne, Switzerland
| | - Hui-Song Pak
- Ludwig Cancer Research Center, University of Lausanne, 1066 Epalinges, Switzerland; Department of Oncology, University Hospital of Lausanne, 1011 Lausanne, Switzerland
| | - David Gfeller
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Ludwig Cancer Research Center, University of Lausanne, 1066 Epalinges, Switzerland
| | - George Coukos
- Ludwig Cancer Research Center, University of Lausanne, 1066 Epalinges, Switzerland; Department of Oncology, University Hospital of Lausanne, 1011 Lausanne, Switzerland
| | - Ioannis Xenarios
- Vital-IT, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Markus Müller
- Vital-IT, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.
| | - Michal Bassani-Sternberg
- Vital-IT, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.
| |
Collapse
|
14
|
Cheng K, Ning Z, Zhang X, Li L, Liao B, Mayne J, Stintzi A, Figeys D. MetaLab: an automated pipeline for metaproteomic data analysis. MICROBIOME 2017; 5:157. [PMID: 29197424 PMCID: PMC5712144 DOI: 10.1186/s40168-017-0375-2] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 11/17/2017] [Indexed: 05/19/2023]
Abstract
BACKGROUND Research involving microbial ecosystems has drawn increasing attention in recent years. Studying microbe-microbe, host-microbe, and environment-microbe interactions are essential for the understanding of microbial ecosystems. Currently, metaproteomics provide qualitative and quantitative information of proteins, providing insights into the functional changes of microbial communities. However, computational analysis of large-scale data generated in metaproteomic studies remains a challenge. Conventional proteomic software have difficulties dealing with the extreme complexity and species diversity present in microbiome samples leading to lower rates of peptide and protein identification. To address this issue, we previously developed the MetaPro-IQ approach for highly efficient microbial protein/peptide identification and quantification. RESULT Here, we developed an integrated software platform, named MetaLab, providing a complete and automated, user-friendly pipeline for fast microbial protein identification, quantification, as well as taxonomic profiling, directly from mass spectrometry raw data. Spectral clustering adopted in the pre-processing step dramatically improved the speed of peptide identification from database searches. Quantitative information of identified peptides was used for estimating the relative abundance of taxa at all phylogenetic ranks. Taxonomy result files exported by MetaLab are fully compatible with widely used metagenomics tools. Herein, the potential of MetaLab is evaluated by reanalyzing a metaproteomic dataset from mouse gut microbiome samples. CONCLUSION MetaLab is a fully automatic software platform enabling an integrated data-processing pipeline for metaproteomics. The function of sample-specific database generation can be very advantageous for searching peptides against huge protein databases. It provides a seamless connection between peptide determination and taxonomic profiling; therefore, the peptide abundance is readily used for measuring the microbial variations. MetaLab is designed as a versatile, efficient, and easy-to-use tool which can greatly simplify the procedure of metaproteomic data analysis for researchers in microbiome studies.
Collapse
Affiliation(s)
- Kai Cheng
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Zhibin Ning
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Xu Zhang
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Leyuan Li
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Bo Liao
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Janice Mayne
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Alain Stintzi
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Daniel Figeys
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
- Molecular Architecture of Life Program, Canadian Institute for Advanced Research, Toronto, Ontario Canada
| |
Collapse
|
15
|
Müller M, Gfeller D, Coukos G, Bassani-Sternberg M. 'Hotspots' of Antigen Presentation Revealed by Human Leukocyte Antigen Ligandomics for Neoantigen Prioritization. Front Immunol 2017; 8:1367. [PMID: 29104575 PMCID: PMC5654951 DOI: 10.3389/fimmu.2017.01367] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 10/05/2017] [Indexed: 12/30/2022] Open
Abstract
The remarkable clinical efficacy of the immune checkpoint blockade therapies has motivated researchers to discover immunogenic epitopes and exploit them for personalized vaccines. Human leukocyte antigen (HLA)-binding peptides derived from processing and presentation of mutated proteins are one of the leading targets for T-cell recognition of cancer cells. Currently, most studies attempt to identify neoantigens based on predicted affinity to HLA molecules, but the performance of such prediction algorithms is rather poor for rare HLA class I alleles and for HLA class II. Direct identification of neoantigens by mass spectrometry (MS) is becoming feasible; however, it is not yet applicable to most patients and lacks sensitivity. In an attempt to capitalize on existing immunopeptidomics data and extract information that could complement HLA-binding prediction, we first compiled a large HLA class I and class II immunopeptidomics database across dozens of cell types and HLA allotypes and detected hotspots that are subsequences of proteins frequently presented. About 3% of the peptidome was detected in both class I and class II. Based on the gene ontology of their source proteins and the peptide's length, we propose that their processing may partake by the cellular class II presentation machinery. Our database captures the global nature of the in vivo peptidome averaged over many HLA alleles, and therefore, reflects the propensity of peptides to be presented on HLA complexes, which is complementary to the existing neoantigen prediction features such as binding affinity and stability or RNA abundance. We further introduce two immunopeptidomics MS-based features to guide prioritization of neoantigens: the number of peptides matching a protein in our database and the overlap of the predicted wild-type peptide with other peptides in our database. We show as a proof of concept that our immunopeptidomics MS-based features improved neoantigen prioritization by up to 50%. Overall, our work shows that, in addition to providing huge training data to improve the HLA binding prediction, immunopeptidomics also captures other aspects of the natural in vivo presentation that significantly improve prediction of clinically relevant neoantigens.
Collapse
Affiliation(s)
- Markus Müller
- Vital-IT, Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David Gfeller
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Ludwig Cancer Research Center, University of Lausanne, Epalinges, Switzerland
| | - George Coukos
- Ludwig Cancer Research Center, University of Lausanne, Epalinges, Switzerland.,Department of Oncology, Lausanne University Hospital, Lausanne, Switzerland
| | - Michal Bassani-Sternberg
- Ludwig Cancer Research Center, University of Lausanne, Epalinges, Switzerland.,Department of Oncology, Lausanne University Hospital, Lausanne, Switzerland
| |
Collapse
|
16
|
Horlacher O, Jin C, Alocci D, Mariethoz J, Müller M, Karlsson NG, Lisacek F. Glycoforest 1.0. Anal Chem 2017; 89:10932-10940. [PMID: 28901741 DOI: 10.1021/acs.analchem.7b02754] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Tandem mass spectrometry, when combined with liquid chromatography and applied to complex mixtures, produces large amounts of raw data, which needs to be analyzed to identify molecular structures. This technique is widely used, particularly in glycomics. Due to a lack of high throughput glycan sequencing software, glycan spectra are predominantly sequenced manually. A challenge for writing glycan-sequencing software is that there is no direct template that can be used to infer structures detectable in an organism. To help alleviate this bottleneck, we present Glycoforest 1.0, a partial de novo algorithm for sequencing glycan structures based on MS/MS spectra. Glycoforest was tested on two data sets (human gastric and salmon mucosa O-linked glycomes) for which MS/MS spectra were annotated manually. Glycoforest generated the human validated structure for 92% of test cases. The correct structure was found as the best scoring match for 70% and among the top 3 matches for 83% of test cases. In addition, the Glycoforest algorithm detected glycan structures from MS/MS spectra missing a manual annotation. In total 1532 MS/MS previously unannotated spectra were annotated by Glycoforest. A portion containing 521 spectra was manually checked confirming that Glycoforest annotated an additional 50 MS/MS spectra overlooked during manual annotation.
Collapse
Affiliation(s)
- Oliver Horlacher
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics , Geneva, 1211, Switzerland.,University of Geneva , Geneva, 1211, Switzerland
| | - Chunsheng Jin
- Glyco Inflammatory Group, Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg , Gothenburg, SE405 30, Sweden
| | - Davide Alocci
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics , Geneva, 1211, Switzerland.,University of Geneva , Geneva, 1211, Switzerland
| | - Julien Mariethoz
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics , Geneva, 1211, Switzerland.,University of Geneva , Geneva, 1211, Switzerland
| | - Markus Müller
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics , Geneva, 1211, Switzerland.,University of Geneva , Geneva, 1211, Switzerland
| | - Niclas G Karlsson
- Glyco Inflammatory Group, Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg , Gothenburg, SE405 30, Sweden
| | - Frederique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics , Geneva, 1211, Switzerland.,University of Geneva , Geneva, 1211, Switzerland
| |
Collapse
|
17
|
Blanco-Míguez A, Fdez-Riverola F, Lourenço A, Sánchez B. P4P: a peptidome-based strain-level genome comparison web tool. Nucleic Acids Res 2017; 45:W265-W269. [PMID: 28482090 PMCID: PMC5570244 DOI: 10.1093/nar/gkx389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 05/05/2017] [Indexed: 12/02/2022] Open
Abstract
Peptidome similarity analysis enables researchers to gain insights into differential peptide profiles, providing a robust tool to discriminate strain-specific peptides, true intra-species differences among biological replicates or even microorganism-phenotype variations. However, no in silico peptide fingerprinting software existed to facilitate such phylogeny inference. Hence, we developed the Peptidomes for Phylogenies (P4P) web tool, which enables the survey of similarities between microbial proteomes and simplifies the process of obtaining new biological insights into their phylogeny. P4P can be used to analyze different peptide datasets, i.e. bacteria, viruses, eukaryotic species or even metaproteomes. Also, it is able to work with whole proteome datasets and experimental mass-to-charge lists originated from mass spectrometers. The ultimate aim is to generate a valid and manageable list of peptides that have phylogenetic signal and are potentially sample-specific. Sample-to-sample comparison is based on a consensus peak set matrix, which can be further submitted to phylogenetic analysis. P4P holds great potential for improving phylogenetic analyses in challenging taxonomic groups, biomarker identification or epidemiologic studies. Notably, P4P can be of interest for applications handling large proteomic datasets, which it is able to reduce to small matrices while maintaining high phylogenetic resolution. The web server is available at http://sing-group.org/p4p.
Collapse
Affiliation(s)
- Aitor Blanco-Míguez
- ESEI-Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas S/N 32004, Ourense, Spain.,CINBIO-Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain.,CEB-Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
| | - Florentino Fdez-Riverola
- ESEI-Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas S/N 32004, Ourense, Spain.,CINBIO-Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain
| | - Anália Lourenço
- ESEI-Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas S/N 32004, Ourense, Spain.,CINBIO-Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain.,Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA), Consejo Superior de Investigaciones Científicas (CSIC), Paseo Río Linares S/N 33300, Villaviciosa, Asturias, Spain
| | - Borja Sánchez
- CEB-Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
| |
Collapse
|
18
|
Blanco-Míguez A, Gutiérrez-Jácome A, Fdez-Riverola F, Lourenço A, Sánchez B. MAHMI database: a comprehensive MetaHit-based resource for the study of the mechanism of action of the human microbiota. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:baw157. [PMID: 28077565 PMCID: PMC5225402 DOI: 10.1093/database/baw157] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 11/04/2016] [Accepted: 11/05/2016] [Indexed: 01/01/2023]
Abstract
The Mechanism of Action of the Human Microbiome (MAHMI) database is a unique resource that provides comprehensive information about the sequence of potential immunomodulatory and antiproliferative peptides encrypted in the proteins produced by the human gut microbiota. Currently, MAHMI database contains over 300 hundred million peptide entries, with detailed information about peptide sequence, sources and potential bioactivity. The reference peptide data section is curated manually by domain experts. The in silico peptide data section is populated automatically through the systematic processing of publicly available exoproteomes of the human microbiome. Bioactivity prediction is based on the global alignment of the automatically processed peptides with experimentally validated immunomodulatory and antiproliferative peptides, in the reference section. MAHMI provides researchers with a comparative tool for inspecting the potential immunomodulatory or antiproliferative bioactivity of new amino acidic sequences and identifying promising peptides to be further investigated. Moreover, researchers are welcome to submit new experimental evidence on peptide bioactivity, namely, empiric and structural data, as a proactive, expert means to keep the database updated and improve the implemented bioactivity prediction method. Bioactive peptides identified by MAHMI have a huge biotechnological potential, including the manipulation of aberrant immune responses and the design of new functional ingredients/foods based on the genetic sequences of the human microbiome. Hopefully, the resources provided by MAHMI will be useful to those researching gastrointestinal disorders of autoimmune and inflammatory nature, such as Inflammatory Bowel Diseases. MAHMI database is routinely updated and is available free of charge. Database URL:http://mahmi.org/
Collapse
Affiliation(s)
- Aitor Blanco-Míguez
- ESEI - Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n 32004, Ourense, Spain.,Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA), Consejo Superior de Investigaciones Científicas (CSIC), Villaviciosa, Asturias, Spain
| | - Alberto Gutiérrez-Jácome
- ESEI - Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n 32004, Ourense, Spain
| | - Florentino Fdez-Riverola
- ESEI - Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n 32004, Ourense, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n 32004, Ourense, Spain .,CEB - Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
| | - Borja Sánchez
- Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA), Consejo Superior de Investigaciones Científicas (CSIC), Villaviciosa, Asturias, Spain
| |
Collapse
|
19
|
Blanco-Míguez A, Meier-Kolthoff JP, Gutiérrez-Jácome A, Göker M, Fdez-Riverola F, Sánchez B, Lourenço A. Improving Phylogeny Reconstruction at the Strain Level Using Peptidome Datasets. PLoS Comput Biol 2016; 12:e1005271. [PMID: 28033346 PMCID: PMC5198984 DOI: 10.1371/journal.pcbi.1005271] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 11/28/2016] [Indexed: 11/18/2022] Open
Abstract
Typical bacterial strain differentiation methods are often challenged by high genetic similarity between strains. To address this problem, we introduce a novel in silico peptide fingerprinting method based on conventional wet-lab protocols that enables the identification of potential strain-specific peptides. These can be further investigated using in vitro approaches, laying a foundation for the development of biomarker detection and application-specific methods. This novel method aims at reducing large amounts of comparative peptide data to binary matrices while maintaining a high phylogenetic resolution. The underlying case study concerns the Bacillus cereus group, namely the differentiation of Bacillus thuringiensis, Bacillus anthracis and Bacillus cereus strains. Results show that trees based on cytoplasmic and extracellular peptidomes are only marginally in conflict with those based on whole proteomes, as inferred by the established Genome-BLAST Distance Phylogeny (GBDP) method. Hence, these results indicate that the two approaches can most likely be used complementarily even in other organismal groups. The obtained results confirm previous reports about the misclassification of many strains within the B. cereus group. Moreover, our method was able to separate the B. anthracis strains with high resolution, similarly to the GBDP results as benchmarked via Bayesian inference and both Maximum Likelihood and Maximum Parsimony. In addition to the presented phylogenomic applications, whole-peptide fingerprinting might also become a valuable complementary technique to digital DNA-DNA hybridization, notably for bacterial classification at the species and subspecies level in the future.
Collapse
Affiliation(s)
- Aitor Blanco-Míguez
- ESEI–Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense, Spain
- Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA), Consejo Superior de Investigaciones Científicas (CSIC), Villaviciosa, Asturias, Spain
| | - Jan P. Meier-Kolthoff
- Leibniz Institute DSMZ–German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, Braunschweig, Germany
| | - Alberto Gutiérrez-Jácome
- ESEI–Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense, Spain
| | - Markus Göker
- Leibniz Institute DSMZ–German Collection of Microorganisms and Cell Cultures GmbH, Inhoffenstraße 7B, Braunschweig, Germany
| | - Florentino Fdez-Riverola
- ESEI–Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense, Spain
| | - Borja Sánchez
- Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA), Consejo Superior de Investigaciones Científicas (CSIC), Villaviciosa, Asturias, Spain
| | - Anália Lourenço
- ESEI–Department of Computer Science, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense, Spain
- CEB—Centre of Biological Engineering, University of Minho, Campus de Gualtar, Braga, Portugal
| |
Collapse
|
20
|
Blanco-Míguez A, Gutiérrez-Jácome A, Fdez-Riverola F, Lourenço A, Sánchez B. A peptidome-based phylogeny pipeline reveals differential peptides at the strain level within Bifidobacterium animalis subsp. lactis. Food Microbiol 2016; 60:137-41. [DOI: 10.1016/j.fm.2016.06.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2016] [Revised: 05/23/2016] [Accepted: 06/26/2016] [Indexed: 11/28/2022]
|
21
|
Horlacher O, Lisacek F, Müller M. Mining Large Scale Tandem Mass Spectrometry Data for Protein Modifications Using Spectral Libraries. J Proteome Res 2015; 15:721-31. [PMID: 26653734 DOI: 10.1021/acs.jproteome.5b00877] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Experimental improvements in post-translational modification (PTM) detection by tandem mass spectrometry (MS/MS) has allowed the identification of vast numbers of PTMs. Open modification searches (OMSs) of MS/MS data, which do not require prior knowledge of the modifications present in the sample, further increased the diversity of detected PTMs. Despite much effort, there is still a lack of functional annotation of PTMs. One possibility to narrow the annotation gap is to mine MS/MS data deposited in public repositories and to correlate the PTM presence with biological meta-information attached to the data. Since the data volume can be quite substantial and contain tens of millions of MS/MS spectra, the data mining tools must be able to cope with big data. Here, we present two tools, Liberator and MzMod, which are built using the MzJava class library and the Apache Spark large scale computing framework. Liberator builds large MS/MS spectrum libraries, and MzMod searches them in an OMS mode. We applied these tools to a recently published set of 25 million spectra from 30 human tissues and present tissue specific PTMs. We also compared the results to the ones obtained with the OMS tool MODa and the search engine X!Tandem.
Collapse
Affiliation(s)
- Oliver Horlacher
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics , Geneva 1211, Switzerland.,Centre Universitaire de Bioinformatique, University of Geneva , Geneva 1211, Switzerland
| | - Frederique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics , Geneva 1211, Switzerland.,Centre Universitaire de Bioinformatique, University of Geneva , Geneva 1211, Switzerland
| | - Markus Müller
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics , Geneva 1211, Switzerland.,Centre Universitaire de Bioinformatique, University of Geneva , Geneva 1211, Switzerland
| |
Collapse
|
22
|
Alocci D, Mariethoz J, Horlacher O, Bolleman JT, Campbell MP, Lisacek F. Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search. PLoS One 2015; 10:e0144578. [PMID: 26656740 PMCID: PMC4684231 DOI: 10.1371/journal.pone.0144578] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 11/22/2015] [Indexed: 11/18/2022] Open
Abstract
Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data. We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context, Graph databases are possible software solutions for storing glycan structures and Graph query languages, such as SPARQL and Cypher, can be used to perform a substructure search. Glycan substructure searching is an important feature for querying structure and experimental glycan databases and retrieving biologically meaningful data. This applies for example to identifying a region of the glycan recognised by a glycan binding protein (GBP). In this study, 19,404 glycan structures were selected from GlycomeDB (www.glycome-db.org) and modelled for being stored into a RDF triple store and a Property Graph. We then performed two different sets of searches and compared the query response times and the results from both technologies to assess performance and accuracy. The two implementations produced the same results, but interestingly we noted a difference in the query response times. Qualitative measures such as portability were also used to define further criteria for choosing the technology adapted to solving glycan substructure search and other comparable issues.
Collapse
Affiliation(s)
- Davide Alocci
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland
- Computer Science Department, University of Geneva, Geneva, 1227, Switzerland
| | - Julien Mariethoz
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland
| | - Oliver Horlacher
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland
- Computer Science Department, University of Geneva, Geneva, 1227, Switzerland
| | - Jerven T. Bolleman
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland
| | - Matthew P. Campbell
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia
| | - Frederique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland
- Computer Science Department, University of Geneva, Geneva, 1227, Switzerland
- * E-mail:
| |
Collapse
|