1
|
Lee AS, Elliott S, Harb H, Ward L, Foster I, Curtiss L, Assary RS. Emin: A First-Principles Thermochemical Descriptor for Predicting Molecular Synthesizability. J Chem Inf Model 2024; 64:1277-1289. [PMID: 38359461 DOI: 10.1021/acs.jcim.3c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]
Abstract
Predicting the synthesizability of a new molecule remains an unsolved challenge that chemists have long tackled with heuristic approaches. Here, we report a new method for predicting synthesizability using a simple yet accurate thermochemical descriptor. We introduce Emin, the energy difference between a molecule and its lowest energy constitutional isomer, as a synthesizability predictor that is accurate, physically meaningful, and first-principles based. We apply Emin to 134,000 molecules in the QM9 data set and find that Emin is accurate when used alone and reduces incorrect predictions of "synthesizable" by up to 52% when used to augment commonly used prediction methods. Our work illustrates how first-principles thermochemistry and heuristic approximations for molecular stability are complementary, opening a new direction for synthesizability prediction methods.
Collapse
Affiliation(s)
- Andrew S Lee
- Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208, United States
| | - Sarah Elliott
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Hassan Harb
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Logan Ward
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Ian Foster
- Data Science and Learning Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Larry Curtiss
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Rajeev S Assary
- Materials Science Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
2
|
Chandrasekhar V, Sharma N, Schaub J, Steinbeck C, Rajan K. Cheminformatics Microservice: unifying access to open cheminformatics toolkits. J Cheminform 2023; 15:98. [PMID: 37845745 PMCID: PMC10577930 DOI: 10.1186/s13321-023-00762-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 09/19/2023] [Indexed: 10/18/2023] Open
Abstract
In recent years, cheminformatics has experienced significant advancements through the development of new open-source software tools based on various cheminformatics programming toolkits. However, adopting these toolkits presents challenges, including proper installation, setup, deployment, and compatibility management. In this work, we present the Cheminformatics Microservice. This open-source solution provides a unified interface for accessing commonly used functionalities of multiple cheminformatics toolkits, namely RDKit, Chemistry Development Kit (CDK), and Open Babel. In addition, more advanced functionalities like structure generation and Optical Chemical Structure Recognition (OCSR) are made available through the Cheminformatics Microservice based on pre-existing tools. The software service also enables developers to extend the functionalities easily and to seamlessly integrate them with existing workflows and applications. It is built on FastAPI and containerized using Docker, making it highly scalable. An instance of the microservice is publicly available at https://api.naturalproducts.net . The source code is publicly accessible on GitHub, accompanied by comprehensive documentation, version control, and continuous integration and deployment workflows. All resources can be found at the following link: https://github.com/Steinbeck-Lab/cheminformatics-microservice .
Collapse
Affiliation(s)
- Venkata Chandrasekhar
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743, Jena, Germany
| | - Nisha Sharma
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743, Jena, Germany
| | - Jonas Schaub
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743, Jena, Germany
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743, Jena, Germany
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743, Jena, Germany.
| |
Collapse
|
3
|
Ateia M, Sigmund G, Bentel MJ, Washington JW, Lai A, Merrill NH, Wang Z. Integrated data-driven cross-disciplinary framework to prevent chemical water pollution. ONE EARTH (CAMBRIDGE, MASS.) 2023; 6:10.1016/j.oneear.2023.07.001. [PMID: 38264630 PMCID: PMC10802893 DOI: 10.1016/j.oneear.2023.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Access to a clean and healthy environment is a human right and a prerequisite for maintaining a sustainable ecosystem. Experts across domains along the chemical life cycle have traditionally operated in isolation, leading to limited connectivity between upstream chemical innovation to downstream development of water-treatment technologies. This fragmented and historically reactive approach to managing emerging contaminants has resulted in significant externalized societal costs. Herein, we propose an integrated data-driven framework to foster proactive action across domains to effectively address chemical water pollution. By implementing this integrated framework, it will not only enhance the capabilities of experts in their respective fields but also create opportunities for novel approaches that yield co-benefits across multiple domains. To successfully operationalize the integrated framework, several concerted efforts are warranted, including adopting open and FAIR (findable, accessible, interoperable, and reusable) data practices, developing common knowledge bases/platforms, and staying vigilant against new substance "properties" of concern.
Collapse
Affiliation(s)
- Mohamed Ateia
- United States Environmental Protection Agency, Center for Environmental Solutions & Emergency Response, Cincinnati, OH 45220, USA
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, TX, USA
| | - Gabriel Sigmund
- Environmental Geosciences, Centre for Microbiology and Environmental Systems Science, University of Vienna, Josef-Holaubeck-Platz 2, 1090 Vienna, Austria
- Environmental Technology, Wageningen University & Research, P.O. Box 17, 6700 AA Wageningen, the Netherlands
| | - Michael J. Bentel
- Department of Environmental Engineering and Earth Sciences, Clemson University, Clemson, SC 29634, USA
| | - John W. Washington
- United States Environmental Protection Agency, Center for Environmental Measurement and Modeling, Athens, GA 30605, USA
| | - Adelene Lai
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367 Belvaux, Luxembourg
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University, 07743 Jena, Germany
| | - Nathaniel H. Merrill
- United States Environmental Protection Agency, Center for Environmental Measurement and Modeling, Narragansett, RI, USA
| | - Zhanyun Wang
- Empa Swiss – Federal Laboratories for Materials Science and Technology, Technology and Society Laboratory, 9014 St. Gallen, Switzerland
| |
Collapse
|
4
|
Devata S, Cleaves HJ, Dimandja J, Heist CA, Meringer M. Comparative Evaluation of Electron Ionization Mass Spectral Prediction Methods. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2023. [PMID: 37390315 DOI: 10.1021/jasms.3c00059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2023]
Abstract
During the past decade promising methods for computational prediction of electron ionization mass spectra have been developed. The most prominent ones are based on quantum chemistry (QCEIMS) and machine learning (CFM-EI, NEIMS). Here we provide a threefold comparison of these methods with respect to spectral prediction and compound identification. We found that there is no unambiguous way to determine the best of these three methods. Among other factors, we find that the choice of spectral distance functions play an important role regarding the performance for compound identification.
Collapse
Affiliation(s)
- Sriram Devata
- International Institute of Information Technology, Hyderabad 500 032, India
- Blue Marble Space Institute of Science, 1001 4th Ave, Suite 3201, Seattle, Washington 98154, United States
| | - Henderson James Cleaves
- Blue Marble Space Institute of Science, 1001 4th Ave, Suite 3201, Seattle, Washington 98154, United States
- Earth-Life Science Institute, Tokyo Institute of Technology, 2-12-1-IE-1 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - John Dimandja
- Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Christopher A Heist
- Georgia Tech Research Institute (GTRI), Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Markus Meringer
- Department of Atmospheric Processors, German Aerospace Center (DLR), Münchner Straße 20, 82234 Oberpfaffenhofen-Wessling, Germany
| |
Collapse
|
5
|
Fujioka K, Kaiser RI, Sun R. Unsupervised Reaction Pathways Search for the Oxidation of Hypergolic Ionic Liquids: 1-Ethyl-3-methylimidazolium Cyanoborohydride (EMIM +/CBH -) as a Case Study. J Phys Chem A 2023; 127:913-923. [PMID: 36574603 DOI: 10.1021/acs.jpca.2c07624] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Hypergolic ionic liquids have come under increased study for having several desirable properties as a fuel source. One particular ionic liquid, 1-ethyl-3-methylimidazolium/cyanoborohydride (EMIM+/CBH-), and oxidant, nitric acid (HNO3), has been reported to be hypergolic experimentally, but its mechanism is not well-understood at a mechanistic level. In this computational study, the reaction is first probed with ab initio molecular dynamics simulations to confirm that anion-oxidant interactions likely are the first step in the mechanism. Second, the potential energy surface of the anion-oxidant system is studied with an in-depth search over possible isomerizations, and a network of possible intermediates are found. The critical point search is unsupervised and thus has the potential of identifying structures that deviate from chemical intuition. Molecular graphs are employed for analyzing 3000+ intermediates found, and nudged elastic band calculations are employed to identify transition states between them. Finally, the reactivity of the system is discussed through examination of minimal energy paths connecting the reactant to various common products from hypergolic ionic liquid oxidation. Eight products are reported for this system: NO, N2O, NO2, HNO, HONO, HNO2, HCN, and H2O. All reaction paths leading to these exothermic products have overall reaction barriers of 6-7 kcal/mol.
Collapse
Affiliation(s)
- Kazuumi Fujioka
- Department of Chemistry, The University of Hawai'i at Manoa, Honolulu, Hawaii96822, United States
| | - Ralf I Kaiser
- Department of Chemistry, The University of Hawai'i at Manoa, Honolulu, Hawaii96822, United States
| | - Rui Sun
- Department of Chemistry, The University of Hawai'i at Manoa, Honolulu, Hawaii96822, United States
| |
Collapse
|
6
|
Rieder SR, Oliveira MP, Riniker S, Hünenberger PH. Development of an open-source software for isomer enumeration. J Cheminform 2023; 15:10. [PMID: 36683047 PMCID: PMC9867865 DOI: 10.1186/s13321-022-00677-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 12/28/2022] [Indexed: 01/23/2023] Open
Abstract
This article documents enu, a freely-downloadable, open-source and stand-alone program written in C++ for the enumeration of the constitutional isomers and stereoisomers of a molecular formula. The program relies on graph theory to enumerate all the constitutional isomers of a given formula on the basis of their canonical adjacency matrix. The stereoisomers of a given constitutional isomer are enumerated as well, on the basis of the automorphism group of this matrix. The isomer list is then reported in the form of canonical SMILES strings within files in XML format. The specification of the molecule family of interest is very flexible and the code is optimized for computational efficiency. The algorithms and implementations underlying enu are described, and simple illustrative applications are presented. The enu code is freely available on GitHub at https://github.com/csms-ethz/CombiFF .
Collapse
Affiliation(s)
- Salomé R. Rieder
- grid.5801.c0000 0001 2156 2780Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Marina P. Oliveira
- grid.5801.c0000 0001 2156 2780Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Sereina Riniker
- grid.5801.c0000 0001 2156 2780Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Philippe H. Hünenberger
- grid.5801.c0000 0001 2156 2780Laboratorium für Physikalische Chemie, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
7
|
de Jonge NF, Mildau K, Meijer D, Louwen JJR, Bueschl C, Huber F, van der Hooft JJJ. Good practices and recommendations for using and benchmarking computational metabolomics metabolite annotation tools. Metabolomics 2022; 18:103. [PMID: 36469190 PMCID: PMC9722809 DOI: 10.1007/s11306-022-01963-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 11/18/2022] [Indexed: 12/12/2022]
Abstract
BACKGROUND Untargeted metabolomics approaches based on mass spectrometry obtain comprehensive profiles of complex biological samples. However, on average only 10% of the molecules can be annotated. This low annotation rate hampers biochemical interpretation and effective comparison of metabolomics studies. Furthermore, de novo structural characterization of mass spectral data remains a complicated and time-intensive process. Recently, the field of computational metabolomics has gained traction and novel methods have started to enable large-scale and reliable metabolite annotation. Molecular networking and machine learning-based in-silico annotation tools have been shown to greatly assist metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery. AIM OF REVIEW We highlight recent advances in computational metabolite annotation workflows with a special focus on their evaluation and comparison with other tools. Whilst the progress is substantial and promising, we also argue that inconsistencies in benchmarking different tools hamper users from selecting the most appropriate and promising method for their research. We summarize benchmarking strategies of the different tools and outline several recommendations for benchmarking and comparing novel tools. KEY SCIENTIFIC CONCEPTS OF REVIEW This review focuses on recent advances in mass spectral library-based and machine learning-supported metabolite annotation workflows. We discuss large-scale library matching and analogue search, the current bloom of mass spectral similarity scores, and how molecular networking has changed the field. In addition, the potentials and challenges of machine learning-supported metabolite annotation workflows are highlighted. Overall, recent developments in computational metabolomics have started to fundamentally change metabolomics workflows, and we expect that as a community we will be able to overcome current method performance ambiguities and annotation bottlenecks.
Collapse
Affiliation(s)
- Niek F. de Jonge
- Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
| | - Kevin Mildau
- Department of Analytical Chemistry, Biochemical Network Analysis Lab, University of Vienna, Vienna, Austria
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
| | - Joris J. R. Louwen
- Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
| | - Christoph Bueschl
- Department of Analytical Chemistry, Biochemical Network Analysis Lab, University of Vienna, Vienna, Austria
| | - Florian Huber
- Centre for Digitalization and Digitality (ZDD), University of Applied Sciences Düsseldorf, Düsseldorf, Germany
| | - Justin J. J. van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| |
Collapse
|