1
|
Dührkop K, Nothias LF, Fleischauer M, Reher R, Ludwig M, Hoffmann MA, Petras D, Gerwick WH, Rousu J, Dorrestein PC, Böcker S. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat Biotechnol 2021; 39:462-471. [PMID: 33230292 DOI: 10.1038/s41587-020-0740-8] [Citation(s) in RCA: 279] [Impact Index Per Article: 93.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 10/16/2020] [Indexed: 12/12/2022]
Abstract
Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis and interpretation of experimental data. Here we describe CANOPUS (class assignment and ontology prediction using mass spectrometry), a computational tool for systematic compound class annotation. CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available and predicts classes lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four baseline methods. We demonstrate the broad utility of CANOPUS by investigating the effect of microbial colonization in the mouse digestive system, through analysis of the chemodiversity of different Euphorbia plants and regarding the discovery of a marine natural product, revealing biological insights at the compound class level.
Collapse
Affiliation(s)
- Kai Dührkop
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
| | - Louis-Félix Nothias
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | | | - Raphael Reher
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
| | - Marcus Ludwig
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
| | - Martin A Hoffmann
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
- International Max Planck Research School 'Exploration of Ecological Interactions with Molecular and Chemical Techniques', Max Planck Institute for Chemical Ecology, Jena, Germany
| | - Daniel Petras
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
| | - William H Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Juho Rousu
- Helsinki Institute for Information Technology, Department of Computer Science, Aalto University, Espoo, Finland
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany.
| |
Collapse
|
2
|
Stanstrup J, Broeckling CD, Helmus R, Hoffmann N, Mathé E, Naake T, Nicolotti L, Peters K, Rainer J, Salek RM, Schulze T, Schymanski EL, Stravs MA, Thévenot EA, Treutler H, Weber RJM, Willighagen E, Witting M, Neumann S. The metaRbolomics Toolbox in Bioconductor and beyond. Metabolites 2019; 9:E200. [PMID: 31548506 PMCID: PMC6835268 DOI: 10.3390/metabo9100200] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 11/17/2022] Open
Abstract
Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub.
Collapse
Affiliation(s)
- Jan Stanstrup
- Preventive and Clinical Nutrition, University of Copenhagen, Rolighedsvej 30, 1958 Frederiksberg C, Denmark.
| | - Corey D Broeckling
- Proteomics and Metabolomics Facility, Colorado State University, Fort Collins, CO 80523, USA.
| | - Rick Helmus
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, 1098 XH Amsterdam, The Netherlands.
| | - Nils Hoffmann
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Otto-Hahn-Straße 6b, 44227 Dortmund, Germany.
| | - Ewy Mathé
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.
| | - Thomas Naake
- Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany.
| | - Luca Nicolotti
- The Australian Wine Research Institute, Metabolomics Australia, PO Box 197, Adelaide SA 5064, Australia.
| | - Kristian Peters
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
| | - Johannes Rainer
- Institute for Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, 39100 Bolzano, Italy.
| | - Reza M Salek
- The International Agency for Research on Cancer, 150 cours Albert Thomas, CEDEX 08, 69372 Lyon, France.
| | - Tobias Schulze
- Department of Effect-Directed Analysis, Helmholtz Centre for Environmental Research-UFZ, Permoserstraße 15, 04318 Leipzig, Germany.
| | - Emma L Schymanski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg.
| | - Michael A Stravs
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, 8600 Dubendorf, Switzerland.
| | - Etienne A Thévenot
- CEA, LIST, Laboratory for Data Sciences and Decision, MetaboHUB, Gif-Sur-Yvette F-91191, France.
| | - Hendrik Treutler
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
| | - Ralf J M Weber
- Phenome Centre Birmingham and School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK.
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.
| | - Michael Witting
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, 85764 Neuherberg, Germany.
- Chair of Analytical Food Chemistry, Technische Universität München, 85354 Weihenstephan, Germany.
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig Deutscher, Platz 5e, 04103 Leipzig, Germany.
| |
Collapse
|
3
|
Zhang L, Tang C, Cao D, Zeng Y, Tan B, Zeng M, Fan W, Xiao H, Liang Y. Strategies for structure elucidation of small molecules using gas chromatography-mass spectrometric data. Trends Analyt Chem 2013. [DOI: 10.1016/j.trac.2013.02.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
4
|
Askenazi M, Linial M. Mass Informatics: From Mass Spectrometry Peaks to Biological Pathways. Isr J Chem 2013. [DOI: 10.1002/ijch.201200076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
5
|
Bobach C, Böhme T, Laube U, Püschel A, Weber L. Automated compound classification using a chemical ontology. J Cheminform 2012; 4:40. [PMID: 23273256 PMCID: PMC3547776 DOI: 10.1186/1758-2946-4-40] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 12/10/2012] [Indexed: 11/25/2022] Open
Abstract
Background Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of biological interest based on their structural as well as property or use features. In these ontologies, compounds have been assigned manually to their respective classes. However, with the ever increasing possibilities to extract new compounds from text documents using name-to-structure tools and considering the large number of compounds deposited in databases, automated and comprehensive chemical classification methods are needed to avoid the error prone and time consuming manual classification of compounds. Results In the present work we implement principles and methods to construct a chemical ontology of classes that shall support the automated, high-quality compound classification in chemical databases or text documents. While SMARTS expressions have already been used to define chemical structure class concepts, in the present work we have extended the expressive power of such class definitions by expanding their structure-based reasoning logic. Thus, to achieve the required precision and granularity of chemical class definitions, sets of SMARTS class definitions are connected by OR and NOT logical operators. In addition, AND logic has been implemented to allow the concomitant use of flexible atom lists and stereochemistry definitions. The resulting chemical ontology is a multi-hierarchical taxonomy of concept nodes connected by directed, transitive relationships. Conclusions A proposal for a rule based definition of chemical classes has been made that allows to define chemical compound classes more precisely than before. The proposed structure-based reasoning logic allows to translate chemistry expert knowledge into a computer interpretable form, preventing erroneous compound assignments and allowing automatic compound classification. The automated assignment of compounds in databases, compound structure files or text documents to their related ontology classes is possible through the integration with a chemical structure search engine. As an application example, the annotation of chemical structure files with a prototypic ontology is demonstrated.
Collapse
Affiliation(s)
- Claudia Bobach
- OntoChem GmbH, Heinrich-Damerow-Str, 4, Halle/Saale, D-06120, Germany.
| | | | | | | | | |
Collapse
|
6
|
Morrey CP, Perl Y, Halper M, Chen L, Gu H“H. A chemical specialty semantic network for the Unified Medical Language System. J Cheminform 2012; 4:9. [PMID: 22577759 PMCID: PMC3428652 DOI: 10.1186/1758-2946-4-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2012] [Accepted: 05/11/2012] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Terms representing chemical concepts found the Unified Medical Language System (UMLS) are used to derive an expanded semantic network with mutually exclusive semantic types. The UMLS Semantic Network (SN) is composed of a collection of broad categories called semantic types (STs) that are assigned to concepts. Within the UMLS's coverage of the chemical domain, we find a great deal of concepts being assigned more than one ST. This leads to the situation where the extent of a given ST may contain concepts elaborating variegated semantics.A methodology for expanding the chemical subhierarchy of the SN into a finer-grained categorization of mutually exclusive types with semantically uniform extents is presented. We call this network a Chemical Specialty Semantic Network (CSSN). A CSSN is derived automatically from the existing chemical STs and their assignments. The methodology incorporates a threshold value governing the minimum size of a type's extent needed for inclusion in the CSSN. Thus, different CSSNs can be created by choosing different threshold values based on varying requirements. RESULTS A complete CSSN is derived using a threshold value of 300 and having 68 STs. It is used effectively to provide high-level categorizations for a random sample of compounds from the "Chemical Entities of Biological Interest" (ChEBI) ontology. The effect on the size of the CSSN using various threshold parameter values between one and 500 is shown. CONCLUSIONS The methodology has several potential applications, including its use to derive a pre-coordinated guide for ST assignments to new UMLS chemical concepts, as a tool for auditing existing concepts, inter-terminology mapping, and to serve as an upper-level network for ChEBI.
Collapse
Affiliation(s)
- C Paul Morrey
- Department of Information Systems and Technology, Utah Valley University, 800 West University Parkway, Orem, UT 84058, USA
| | - Yehoshua Perl
- Structural Analysis of Biomedical Ontologies Center, Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
| | - Michael Halper
- Information Technology Program, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA
| | - Ling Chen
- Department of Science, Borough of Manhattan Community College, City University of New York, 199 Chambers Street, New York, NY 10007, USA
| | - Huanying “Helen” Gu
- Department of Computer Science, New York Institute of Technology, 1855 Broadway, New York, NY 10023, USA
| |
Collapse
|