1
|
Abedin MM, Tabata K, Matsumura Y, Komatsuzaki T. Multi-armed bandit algorithm for sequential experiments of molecular properties with dynamic feature selection. J Chem Phys 2024; 161:014115. [PMID: 38958158 DOI: 10.1063/5.0206042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 06/16/2024] [Indexed: 07/04/2024] Open
Abstract
Sequential optimization is one of the promising approaches in identifying the optimal candidate(s) (molecules, reactants, drugs, etc.) with desired properties (reaction yield, selectivity, efficacy, etc.) from a large set of potential candidates, while minimizing the number of experiments required. However, the high dimensionality of the feature space (e.g., molecular descriptors) makes it often difficult to utilize the relevant features during the process of updating the set of candidates to be examined. In this article, we developed a new sequential optimization algorithm for molecular problems based on reinforcement learning, multi-armed linear bandit framework, and online, dynamic feature selections in which relevant molecular descriptors are updated along with the experiments. We also designed a stopping condition aimed to guarantee the reliability of the chosen candidate from the dataset pool. The developed algorithm was examined by comparing with Bayesian optimization (BO), using two synthetic datasets and two real datasets in which one dataset includes hydration free energy of molecules and another one includes a free energy difference between enantiomer products in chemical reaction. We found that the dynamic feature selection in representing the desired properties along the experiments provides a better performance (e.g., time required to find the best candidate and stop the experiment) as the overall trend and that our multi-armed linear bandit approach with a dynamic feature selection scheme outperforms the standard BO with fixed feature variables. The comparison of our algorithm to BO with dynamic feature selection is also addressed.
Collapse
Affiliation(s)
- Md Menhazul Abedin
- Graduate School of Chemical Sciences and Engineering, Hokkaido University, Sapporo 060-8628, Japan
- Khulna University, Khulna 9208, Bangladesh
| | - Koji Tabata
- Research Institute for Electronic Science, Hokkaido University, Sapporo 001-0020, Japan
- Department of Mathematics, Hokkaido University, Sapporo 060-0810, Japan
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
| | - Yoshihiro Matsumura
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
| | - Tamiki Komatsuzaki
- Graduate School of Chemical Sciences and Engineering, Hokkaido University, Sapporo 060-8628, Japan
- Research Institute for Electronic Science, Hokkaido University, Sapporo 001-0020, Japan
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
- Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Yamadaoka, Suita 565-0871, Osaka, Japan
- The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki 567-0047, Osaka, Japan
| |
Collapse
|
2
|
Pracht P, Grimme S, Bannwarth C, Bohle F, Ehlert S, Feldmann G, Gorges J, Müller M, Neudecker T, Plett C, Spicher S, Steinbach P, Wesołowski PA, Zeller F. CREST-A program for the exploration of low-energy molecular chemical space. J Chem Phys 2024; 160:114110. [PMID: 38511658 DOI: 10.1063/5.0197592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 02/29/2024] [Indexed: 03/22/2024] Open
Abstract
Conformer-rotamer sampling tool (CREST) is an open-source program for the efficient and automated exploration of molecular chemical space. Originally developed in Pracht et al. [Phys. Chem. Chem. Phys. 22, 7169 (2020)] as an automated driver for calculations at the extended tight-binding level (xTB), it offers a variety of molecular- and metadynamics simulations, geometry optimization, and molecular structure analysis capabilities. Implemented algorithms include automated procedures for conformational sampling, explicit solvation studies, the calculation of absolute molecular entropy, and the identification of molecular protonation and deprotonation sites. Calculations are set up to run concurrently, providing efficient single-node parallelization. CREST is designed to require minimal user input and comes with an implementation of the GFNn-xTB Hamiltonians and the GFN-FF force-field. Furthermore, interfaces to any quantum chemistry and force-field software can easily be created. In this article, we present recent developments in the CREST code and show a selection of applications for the most important features of the program. An important novelty is the refactored calculation backend, which provides significant speed-up for sampling of small or medium-sized drug molecules and allows for more sophisticated setups, for example, quantum mechanics/molecular mechanics and minimum energy crossing point calculations.
Collapse
Affiliation(s)
- Philipp Pracht
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Stefan Grimme
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Christoph Bannwarth
- Institute for Physical Chemistry, RWTH Aachen University, Melatener Str. 20, 52056 Aachen, Germany
| | - Fabian Bohle
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Sebastian Ehlert
- AI4Science, Microsoft Research, Evert van de Beekstraat 354, 1118 CZ Schiphol, The Netherlands
| | - Gereon Feldmann
- Institute for Physical Chemistry, RWTH Aachen University, Melatener Str. 20, 52056 Aachen, Germany
| | - Johannes Gorges
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Marcel Müller
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | - Tim Neudecker
- Institute for Physical and Theoretical Chemistry, University of Bremen, 28359 Bremen, Germany
| | - Christoph Plett
- Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn, Beringstr. 4, 53115 Bonn, Germany
| | | | - Pit Steinbach
- Institute for Physical Chemistry, RWTH Aachen University, Melatener Str. 20, 52056 Aachen, Germany
| | - Patryk A Wesołowski
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Felix Zeller
- Institute for Physical and Theoretical Chemistry, University of Bremen, 28359 Bremen, Germany
| |
Collapse
|
3
|
Han W, Xu X, Fan Q, Yan Y, Zhang Y, Chen Y, Liu H. In silico construction of a focused fragment library facilitating exploration of chemical space. Mol Inform 2024; 43:e202300256. [PMID: 38193642 DOI: 10.1002/minf.202300256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 12/11/2023] [Accepted: 01/06/2024] [Indexed: 01/10/2024]
Abstract
Fragment-based drug design (FBDD) has emerged as a captivating subject in the realm of computer-aided drug design, enabling the generation of novel molecules through the rearrangement of ring systems within known compounds. The construction of focused fragment library plays a pivotal role in FBDD, necessitating the compilation of all potential bioactive ring systems capable of interacting with a specific target. In our study, we propose a workflow for the development of a focused fragment library and combinatorial compound library. The fragment library comprises seed fragments and collected fragments. The extraction of seed fragments is guided by receptor information, serving as a prerequisite for establishing a focused libraries. Conversely, collected fragments are obtained using the feature graph method, which offers a simplified representation of fragments and strikes a balance between diversity and similarity when categorizing different fragments. The utilization of feature graph facilitates the rational partitioning of chemical space at fragment level, enabling the exploration of desired chemical space and enhancing the efficiency of screening compound library. Analysis demonstrates that our workflow enables the enumeration of a greater number of entirely new potential compounds, thereby aiding in the rational design of drugs.
Collapse
Affiliation(s)
- Weijie Han
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Xiaohe Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Qing Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yingchao Yan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - YanMin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| |
Collapse
|
4
|
Leonov AI, Hammer AJS, Lach S, Mehr SHM, Caramelli D, Angelone D, Khan A, O'Sullivan S, Craven M, Wilbraham L, Cronin L. An integrated self-optimizing programmable chemical synthesis and reaction engine. Nat Commun 2024; 15:1240. [PMID: 38336880 PMCID: PMC10858227 DOI: 10.1038/s41467-024-45444-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
Robotic platforms for chemistry are developing rapidly but most systems are not currently able to adapt to changing circumstances in real-time. We present a dynamically programmable system capable of making, optimizing, and discovering new molecules which utilizes seven sensors that continuously monitor the reaction. By developing a dynamic programming language, we demonstrate the 10-fold scale-up of a highly exothermic oxidation reaction, end point detection, as well as detecting critical hardware failures. We also show how the use of in-line spectroscopy such as HPLC, Raman, and NMR can be used for closed-loop optimization of reactions, exemplified using Van Leusen oxazole synthesis, a four-component Ugi condensation and manganese-catalysed epoxidation reactions, as well as two previously unreported reactions, discovered from a selected chemical space, providing up to 50% yield improvement over 25-50 iterations. Finally, we demonstrate an experimental pipeline to explore a trifluoromethylations reaction space, that discovers new molecules.
Collapse
Affiliation(s)
- Artem I Leonov
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Alexander J S Hammer
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Slawomir Lach
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - S Hessam M Mehr
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Dario Caramelli
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Davide Angelone
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Aamir Khan
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Steven O'Sullivan
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Matthew Craven
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Liam Wilbraham
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Leroy Cronin
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK.
| |
Collapse
|
5
|
Escayola S, Bahri-Laleh N, Poater A. % VBur index and steric maps: from predictive catalysis to machine learning. Chem Soc Rev 2024; 53:853-882. [PMID: 38113051 DOI: 10.1039/d3cs00725a] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Steric indices are parameters used in chemistry to describe the spatial arrangement of atoms or groups of atoms in molecules. They are important in determining the reactivity, stability, and physical properties of chemical compounds. One commonly used steric index is the steric hindrance, which refers to the obstruction or hindrance of movement in a molecule caused by bulky substituents or functional groups. Steric hindrance can affect the reactivity of a molecule by altering the accessibility of its reactive sites and influencing the geometry of its transition states. Notably, the Tolman cone angle and %VBur are prominent among these indices. Actually, steric effects can also be described using the concept of steric bulk, which refers to the space occupied by a molecule or functional group. Steric bulk can affect the solubility, melting point, boiling point, and viscosity of a substance. Even though electronic indices are more widely used, they have certain drawbacks that might shift preferences towards others. They present a higher computational cost, and often, the weight of electronics in correlation with chemical properties, e.g. binding energies, falls short in comparison to %VBur. However, it is worth noting that this may be because the steric index inherently captures part of the electronic content. Overall, steric indices play an important role in understanding the behaviour of chemical compounds and can be used to predict their reactivity, stability, and physical properties. Predictive chemistry is an approach to chemical research that uses computational methods to anticipate the properties and behaviour of these compounds and reactions, facilitating the design of new compounds and reactivities. Within this domain, predictive catalysis specifically targets the prediction of the performance and behaviour of catalysts. Ultimately, the goal is to identify new catalysts with optimal properties, leading to chemical processes that are both more efficient and sustainable. In this framework, %VBur can be a key metric for deepening our understanding of catalysis, emphasizing predictive catalysis and sustainability. Those latter concepts are needed to direct our efforts toward identifying the optimal catalyst for any reaction, minimizing waste, and reducing experimental efforts while maximizing the efficacy of the computational methods.
Collapse
Affiliation(s)
- Sílvia Escayola
- Institut de Química Computacional i Catàlisi and Departament de Química, Universitat de Girona, c/Mª Aurèlia Capmany 69, 17003 Girona, Catalonia, Spain.
- Donostia International Physics Center (DIPC), 20018 Donostia, Euskadi, Spain
| | - Naeimeh Bahri-Laleh
- Iran Polymer and Petrochemical Institute (IPPI), P.O. Box 14965/115, Tehran, Iran
- Institute for Sustainability with Knotted Chiral Meta Matter (WPI-SKCM), Hiroshima University, Hiroshima, 739-8526, Japan
| | - Albert Poater
- Institut de Química Computacional i Catàlisi and Departament de Química, Universitat de Girona, c/Mª Aurèlia Capmany 69, 17003 Girona, Catalonia, Spain.
| |
Collapse
|
6
|
Amoroso N, Gambacorta N, Mastrolorito F, Togo MV, Trisciuzzi D, Monaco A, Pantaleo E, Altomare CD, Ciriaco F, Nicolotti O. Making sense of chemical space network shows signs of criticality. Sci Rep 2023; 13:21335. [PMID: 38049451 PMCID: PMC10696027 DOI: 10.1038/s41598-023-48107-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 11/22/2023] [Indexed: 12/06/2023] Open
Abstract
Chemical space modelling has great importance in unveiling and visualising latent information, which is critical in predictive toxicology related to drug discovery process. While the use of traditional molecular descriptors and fingerprints may suffer from the so-called curse of dimensionality, complex networks are devoid of the typical drawbacks of coordinate-based representations. Herein, we use chemical space networks (CSNs) to analyse the case of the developmental toxicity (Dev Tox), which remains a challenging endpoint for the difficulty of gathering enough reliable data despite very important for the protection of the maternal and child health. Our study proved that the Dev Tox CSN has a complex non-random organisation and can thus provide a wealth of meaningful information also for predictive purposes. At a phase transition, chemical similarities highlight well-established toxicophores, such as aryl derivatives, mostly neurotoxic hydantoins, barbiturates and amino alcohols, steroids, and volatile organic compounds ether-like chemicals, which are strongly suspected of the Dev Tox onset and can thus be employed as effective alerts for prioritising chemicals before testing.
Collapse
Affiliation(s)
- Nicola Amoroso
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy.
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, via E. Orabona, 4, 70125, Bari, Italy.
| | - Nicola Gambacorta
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
- Division of Medical Genetics, Fondazione IRCCS-Casa Sollievo della Sofferenza, San Giovanni Rotondo (Foggia), Italy
| | - Fabrizio Mastrolorito
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
| | - Maria Vittoria Togo
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
| | - Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, via E. Orabona, 4, 70125, Bari, Italy
- Dipartimento Interateneo di Fisica "M. Merlin", Università degli studi di Bari Aldo Moro, Via Giovanni Amendola, 173, 70125, Bari, Italy
| | - Ester Pantaleo
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, via E. Orabona, 4, 70125, Bari, Italy
- Dipartimento Interateneo di Fisica "M. Merlin", Università degli studi di Bari Aldo Moro, Via Giovanni Amendola, 173, 70125, Bari, Italy
| | - Cosimo Damiano Altomare
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
| | - Fulvio Ciriaco
- Dipartimento di Chimica, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy.
| | - Orazio Nicolotti
- Dipartimento di Farmacia - Scienze del Farmaco, Università degli studi di Bari Aldo Moro, via E. Orabona, 4, 70125, Bari, Italy
| |
Collapse
|
7
|
Bustillo L, Laino T, Rodrigues T. The rise of automated curiosity-driven discoveries in chemistry. Chem Sci 2023; 14:10378-10384. [PMID: 37799997 PMCID: PMC10548516 DOI: 10.1039/d3sc03367h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 09/07/2023] [Indexed: 10/07/2023] Open
Abstract
The quest for generating novel chemistry knowledge is critical in scientific advancement, and machine learning (ML) has emerged as an asset in this pursuit. Through interpolation among learned patterns, ML can tackle tasks that were previously deemed demanding to machines. This distinctive capacity of ML provides invaluable aid to bench chemists in their daily work. However, current ML tools are typically designed to prioritize experiments with the highest likelihood of success, i.e., higher predictive confidence. In this perspective, we build on current trends that suggest a future in which ML could be just as beneficial in exploring uncharted search spaces through simulated curiosity. We discuss how low and 'negative' data can catalyse one-/few-shot learning, and how the broader use of curious ML and novelty detection algorithms can propel the next wave of chemical discoveries. We anticipate that ML for curiosity-driven research will help the community overcome potentially biased assumptions and uncover unexpected findings in the chemical sciences at an accelerated pace.
Collapse
Affiliation(s)
- Latimah Bustillo
- Research Institute for Medicines (iMed), Faculdade de Farmácia, Universidade de Lisboa Lisbon Portugal
| | - Teodoro Laino
- IBM Research Europe Säumerstrasse 4 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis) Zurich Switzerland
| | - Tiago Rodrigues
- Research Institute for Medicines (iMed), Faculdade de Farmácia, Universidade de Lisboa Lisbon Portugal
| |
Collapse
|
8
|
Yi J, Lee S, Lim S, Cho C, Piao Y, Yeo M, Kim D, Kim S, Lee S. Exploring chemical space for lead identification by propagating on chemical similarity network. Comput Struct Biotechnol J 2023; 21:4187-4195. [PMID: 37680266 PMCID: PMC10480321 DOI: 10.1016/j.csbj.2023.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/08/2023] [Accepted: 08/20/2023] [Indexed: 09/09/2023] Open
Abstract
Motivation Lead identification is a fundamental step to prioritize candidate compounds for downstream drug discovery process. Machine learning (ML) and deep learning (DL) approaches are widely used to identify lead compounds using both chemical property and experimental information. However, ML or DL methods rarely consider compound similarity information directly since ML and DL models use abstract representation of molecules for model construction. Alternatively, data mining approaches are also used to explore chemical space with drug candidates by screening undesirable compounds. A major challenge for data mining approaches is to develop efficient data mining methods that search large chemical space for desirable lead compounds with low false positive rate. Results In this work, we developed a network propagation (NP) based data mining method for lead identification that performs search on an ensemble of chemical similarity networks. We compiled 14 fingerprint-based similarity networks. Given a target protein of interest, we use a deep learning-based drug target interaction model to narrow down compound candidates and then we use network propagation to prioritize drug candidates that are highly correlated with drug activity score such as IC50. In an extensive experiment with BindingDB, we showed that our approach successfully discovered intentionally unlabeled compounds for given targets. To further demonstrate the prediction power of our approach, we identified 24 candidate leads for CLK1. Two out of five synthesizable candidates were experimentally validated in binding assays. In conclusion, our framework can be very useful for lead identification from very large compound databases such as ZINC.
Collapse
Affiliation(s)
- Jungseob Yi
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sangseon Lee
- Institute of Computer Technology, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sangsoo Lim
- School of AI Software Convergence, Dongguk University, Pildong-ro 1-gil, Jung-gu, Seoul, South Korea
| | - Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Yinhua Piao
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Marie Yeo
- PHARMGENSCIENCE CO., LTD., 216, Dongjak-daero, Seocho-gu, Seoul, 06554, South Korea
| | - Dongkyu Kim
- PHARMGENSCIENCE CO., LTD., 216, Dongjak-daero, Seocho-gu, Seoul, 06554, South Korea
| | - Sun Kim
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
- AIGENDRUG CO., LTD., Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| | - Sunho Lee
- AIGENDRUG CO., LTD., Gwanak-ro 1, Gwanak-gu, Seoul, 08826, South Korea
| |
Collapse
|
9
|
Eyke NS, Schneider TN, Jin B, Hart T, Monfette S, Hawkins JM, Morse PD, Howard RM, Pfisterer DM, Nandiwale KY, Jensen KF. Parallel multi-droplet platform for reaction kinetics and optimization. Chem Sci 2023; 14:8798-8809. [PMID: 37621435 PMCID: PMC10445457 DOI: 10.1039/d3sc02082g] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 08/01/2023] [Indexed: 08/26/2023] Open
Abstract
We present an automated droplet reactor platform possessing parallel reactor channels and a scheduling algorithm that orchestrates all of the parallel hardware operations and ensures droplet integrity as well as overall efficiency. We design and incorporate all of the necessary hardware and software to enable the platform to be used to study both thermal and photochemical reactions. We incorporate a Bayesian optimization algorithm into the control software to enable reaction optimization over both categorical and continuous variables. We demonstrate the capabilities of both the preliminary single-channel and parallelized versions of the platform using a series of model thermal and photochemical reactions. We conduct a series of reaction optimization campaigns and demonstrate rapid acquisition of the data necessary to determine reaction kinetics. The platform is flexible in terms of use case: it can be used either to investigate reaction kinetics or to perform reaction optimization over a wide range of chemical domains.
Collapse
Affiliation(s)
- Natalie S Eyke
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Timo N Schneider
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Brooke Jin
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Travis Hart
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Sebastien Monfette
- Pfizer Worldwide Research and Development 445 Eastern Point Rd Groton CT 06340 USA
| | - Joel M Hawkins
- Pfizer Worldwide Research and Development 445 Eastern Point Rd Groton CT 06340 USA
| | - Peter D Morse
- Pfizer Worldwide Research and Development 445 Eastern Point Rd Groton CT 06340 USA
| | - Roger M Howard
- Pfizer Worldwide Research and Development 445 Eastern Point Rd Groton CT 06340 USA
| | - David M Pfisterer
- Pfizer Worldwide Research and Development 445 Eastern Point Rd Groton CT 06340 USA
| | | | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
10
|
Góger S, Sandonas LM, Müller C, Tkatchenko A. Data-driven tailoring of molecular dipole polarizability and frontier orbital energies in chemical compound space. Phys Chem Chem Phys 2023; 25:22211-22222. [PMID: 37566426 PMCID: PMC10445328 DOI: 10.1039/d3cp02256k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 07/27/2023] [Indexed: 08/12/2023]
Abstract
Understanding correlations - or lack thereof - between molecular properties is crucial for enabling fast and accurate molecular design strategies. In this contribution, we explore the relation between two key quantities describing the electronic structure and chemical properties of molecular systems: the energy gap between the frontier orbitals and the dipole polarizability. Based on the recently introduced QM7-X dataset, augmented with accurate molecular polarizability calculations as well as analysis of functional group compositions, we show that polarizability and HOMO-LUMO gap are uncorrelated when considering sufficiently extended subsets of the chemical compound space. The relation between these two properties is further analyzed on specific examples of molecules with similar composition as well as homooligomers. Remarkably, the freedom brought by the lack of correlation between molecular polarizability and HOMO-LUMO gap enables the design of novel materials, as we demonstrate on the example of organic photodetector candidates.
Collapse
Affiliation(s)
- Szabolcs Góger
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg.
| | - Leonardo Medrano Sandonas
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg.
| | - Carolin Müller
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg.
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg.
| |
Collapse
|
11
|
Shim E, Tewari A, Cernak T, Zimmerman PM. Machine Learning Strategies for Reaction Development: Toward the Low-Data Limit. J Chem Inf Model 2023; 63:3659-3668. [PMID: 37312524 PMCID: PMC11163943 DOI: 10.1021/acs.jcim.3c00577] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Machine learning models are increasingly being utilized to predict outcomes of organic chemical reactions. A large amount of reaction data is used to train these models, which is in stark contrast to how expert chemists discover and develop new reactions by leveraging information from a small number of relevant transformations. Transfer learning and active learning are two strategies that can operate in low-data situations, which may help fill this gap and promote the use of machine learning for tackling real-world challenges in organic synthesis. This Perspective introduces active and transfer learning and connects these to potential opportunities and directions for further research, especially in the area of prospective development of chemical transformations.
Collapse
Affiliation(s)
- Eunjae Shim
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Ambuj Tewari
- Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Tim Cernak
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Paul M Zimmerman
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
12
|
Li SW, Xu LC, Zhang C, Zhang SQ, Hong X. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat Commun 2023; 14:3569. [PMID: 37322041 DOI: 10.1038/s41467-023-39283-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023] Open
Abstract
Accurate prediction of reactivity and selectivity provides the desired guideline for synthetic development. Due to the high-dimensional relationship between molecular structure and synthetic function, it is challenging to achieve the predictive modelling of synthetic transformation with the required extrapolative ability and chemical interpretability. To meet the gap between the rich domain knowledge of chemistry and the advanced molecular graph model, herein we report a knowledge-based graph model that embeds the digitalized steric and electronic information. In addition, a molecular interaction module is developed to enable the learning of the synergistic influence of reaction components. In this study, we demonstrate that this knowledge-based graph model achieves excellent predictions of reaction yield and stereoselectivity, whose extrapolative ability is corroborated by additional scaffold-based data splittings and experimental verifications with new catalysts. Because of the embedding of local environment, the model allows the atomic level of interpretation of the steric and electronic influence on the overall synthetic performance, which serves as a useful guide for the molecular engineering towards the target synthetic function. This model offers an extrapolative and interpretable approach for reaction performance prediction, pointing out the importance of chemical knowledge-constrained reaction modelling for synthetic purpose.
Collapse
Affiliation(s)
- Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China
| | - Cheng Zhang
- Department of Chemistry, University of Science and Technology of China, Hefei, China
| | - Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, China.
- Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street No. 2, Beijing, 100190, PR China.
- Key Laboratory of Precise Synthesis of Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China.
| |
Collapse
|
13
|
Zhang ZJ, Li SW, Oliveira JCA, Li Y, Chen X, Zhang SQ, Xu LC, Rogge T, Hong X, Ackermann L. Data-driven design of new chiral carboxylic acid for construction of indoles with C-central and C-N axial chirality via cobalt catalysis. Nat Commun 2023; 14:3149. [PMID: 37258542 DOI: 10.1038/s41467-023-38872-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 05/16/2023] [Indexed: 06/02/2023] Open
Abstract
Challenging enantio- and diastereoselective cobalt-catalyzed C-H alkylation has been realized by an innovative data-driven knowledge transfer strategy. Harnessing the statistics of a related transformation as the knowledge source, the designed machine learning (ML) model took advantage of delta learning and enabled accurate and extrapolative enantioselectivity predictions. Powered by the knowledge transfer model, the virtual screening of a broad scope of 360 chiral carboxylic acids led to the discovery of a new catalyst featuring an intriguing furyl moiety. Further experiments verified that the predicted chiral carboxylic acid can achieve excellent stereochemical control for the target C-H alkylation, which supported the expedient synthesis for a large library of substituted indoles with C-central and C-N axial chirality. The reported machine learning approach provides a powerful data engine to accelerate the discovery of molecular catalysis by harnessing the hidden value of the available structure-performance statistics.
Collapse
Affiliation(s)
- Zi-Jing Zhang
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany
| | - Shu-Wen Li
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, PR China
| | - João C A Oliveira
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany
| | - Yanjun Li
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany
| | - Xinran Chen
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, PR China
| | - Shuo-Qing Zhang
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, PR China
| | - Li-Cheng Xu
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, PR China
| | - Torben Rogge
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany
| | - Xin Hong
- Center of Chemistry for Frontier Technologies, Department of Chemistry, State Key Laboratory of Clean Energy Utilization, Zhejiang University, Hangzhou, 310027, PR China.
- Beijing National Laboratory for Molecular Sciences, Zhongguancun North First Street No. 2, Beijing, 100190, PR China.
- Key Laboratory of Precise Synthesis of Functional Molecules of Zhejiang Province, School of Science, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, PR China.
| | - Lutz Ackermann
- Institut für Organische und Biomolekulare Chemie, Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany.
- Wöhler Research Institute for Sustainable Chemistry (WISCh), Georg-August-Universität Göttingen, Tammannstraße 2, 37077, Göttingen, Germany.
| |
Collapse
|
14
|
Shields MD, Gurley K, Catarelli R, Chauhan M, Ojeda-Tuz M, Masters FJ. Active learning applied to automated physical systems increases the rate of discovery. Sci Rep 2023; 13:8402. [PMID: 37225752 DOI: 10.1038/s41598-023-35257-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 05/15/2023] [Indexed: 05/26/2023] Open
Abstract
Active machine learning is widely used in computational studies where repeated numerical simulations can be conducted on high performance computers without human intervention. But translation of these active learning methods to physical systems has proven more difficult and the accelerated pace of discoveries aided by these methods remains as yet unrealized. Through the presentation of a general active learning framework and its application to large-scale boundary layer wind tunnel experiments, we demonstrate that the active learning framework used so successfully in computational studies is directly applicable to the investigation of physical experimental systems and the corresponding improvements in the rate of discovery can be transformative. We specifically show that, for our wind tunnel experiments, we are able to achieve in approximately 300 experiments a learning objective that would be impossible using traditional methods.
Collapse
Affiliation(s)
- Michael D Shields
- Department of Civil and Systems Engineering, Johns Hopkins University, Baltimore, MD, 21212, USA.
| | - Kurtis Gurley
- Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Ryan Catarelli
- Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Mohit Chauhan
- Department of Civil and Systems Engineering, Johns Hopkins University, Baltimore, MD, 21212, USA
| | - Mariel Ojeda-Tuz
- Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, 32611, USA
| | - Forrest J Masters
- Department of Civil and Coastal Engineering, University of Florida, Gainesville, FL, 32611, USA
| |
Collapse
|
15
|
Ge L, Ke Y, Li X. Machine learning integrated photocatalysis: progress and challenges. Chem Commun (Camb) 2023; 59:5795-5806. [PMID: 37093605 DOI: 10.1039/d3cc00989k] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Discovering efficient photocatalysts has long been the goal of photocatalysis, which has traditionally been driven by serendipitous or try-and-error strategies. Recent developments in photocatalysis integrated with machine learning techniques promise to accelerate the discovery of photocatalysts, but are also facing significant challenges. In this review, advances in machine learning integrated photocatalysis are first presented from the perspective of three main photocatalytic processes: light harvesting, charge generation and separation, and surface redox reactions. Next, progress in using machine learning to understand complex photoactivity-structure relationships and identify the factors governing activity follows. A future photocatalysis paradigm is then provided with the integration of artificial intelligence, robots and automation. Lastly, we discuss the current challenges in machine learning integrated photocatalysis. This review aims to provide a systematic overview and guidelines to the broad scientific community interested in photocatalysis and artificial intelligence for solar fuel synthesis.
Collapse
Affiliation(s)
- Luyao Ge
- Key Laboratory of the Ministry of Education for Advanced Catalysis Materials, Zhejiang Key Laboratory for Reactive Chemistry on Solid Surfaces, Zhejiang Normal University, Jinhua 321004, China.
| | - Yuanzhen Ke
- Key Laboratory of the Ministry of Education for Advanced Catalysis Materials, Zhejiang Key Laboratory for Reactive Chemistry on Solid Surfaces, Zhejiang Normal University, Jinhua 321004, China.
| | - Xiaobo Li
- Key Laboratory of the Ministry of Education for Advanced Catalysis Materials, Zhejiang Key Laboratory for Reactive Chemistry on Solid Surfaces, Zhejiang Normal University, Jinhua 321004, China.
| |
Collapse
|
16
|
Gattu R, Ramesh SS, Nadigar S, D CG, Ramesh S. Conjugation as a Tool in Therapeutics: Role of Amino Acids/Peptides-Bioactive (Including Heterocycles) Hybrid Molecules in Treating Infectious Diseases. Antibiotics (Basel) 2023; 12:antibiotics12030532. [PMID: 36978399 PMCID: PMC10044335 DOI: 10.3390/antibiotics12030532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 02/28/2023] [Accepted: 03/03/2023] [Indexed: 03/30/2023] Open
Abstract
Peptide-based drugs are gaining significant momentum in the modern drug discovery, which is witnessed by the approval of new drugs by the FDA in recent years. On the other hand, small molecules-based drugs are an integral part of drug development since the past several decades. Peptide-containing drugs are placed between small molecules and the biologics. Both the peptides as well as the small molecules (mainly heterocycles) pose several drawbacks as therapeutics despite their success in curing many diseases. This gap may be bridged by utilising the so called 'conjugation chemistry', in which both the partners are linked to one another through a stable chemical bond, and the resulting conjugates are found to possess attracting benefits, thus eliminating the stigma associated with the individual partners. Over the past decades, the field of molecular hybridisation has emerged to afford us new and efficient molecular architectures that have shown high promise in medicinal chemistry. Taking advantage of this and also considering our experience in this field, we present herein a review concerning the molecules obtained by the conjugation of peptides (amino acids) to small molecules (heterocycles as well as bioactive compounds). More than 125 examples of the conjugates citing nearly 100 references published during the period 2000 to 2022 having therapeutic applications in curing infectious diseases have been covered.
Collapse
Affiliation(s)
- Rohith Gattu
- Postgraduate Department of Chemistry, JSS College of Arts, Commerce and Science, Ooty Road, Mysuru 570025, Karnataka, India
| | - Sanjay S Ramesh
- Postgraduate Department of Chemistry, JSS College of Arts, Commerce and Science, Ooty Road, Mysuru 570025, Karnataka, India
| | - Siddaram Nadigar
- Postgraduate Department of Chemistry, JSS College of Arts, Commerce and Science, Ooty Road, Mysuru 570025, Karnataka, India
| | - Channe Gowda D
- Department of Studies in Chemistry, Manasagangotri, University of Mysore, Mysuru 570005, Karnataka, India
| | - Suhas Ramesh
- Postgraduate Department of Chemistry, JSS College of Arts, Commerce and Science, Ooty Road, Mysuru 570025, Karnataka, India
| |
Collapse
|
17
|
Chen Y, Ou Y, Zheng P, Huang Y, Ge F, Dral PO. Benchmark of general-purpose machine learning-based quantum mechanical method AIQM1 on reaction barrier heights. J Chem Phys 2023; 158:074103. [PMID: 36813722 DOI: 10.1063/5.0137101] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of ∼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
Collapse
Affiliation(s)
- Yuxinxin Chen
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yanchi Ou
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Peikun Zheng
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Yaohuang Huang
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| |
Collapse
|
18
|
Avathan Veettil A, Kirchhoff JL, Brieger L, Strohmann C, Wu P. Petasis Sequence Reactions for the Scaffold-Diverse Synthesis of Bioactive Polycyclic Small Molecules. ACS OMEGA 2023; 8:1168-1181. [PMID: 36643548 PMCID: PMC9835185 DOI: 10.1021/acsomega.2c06585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]
Abstract
The multicomponent Petasis reaction is a versatile method to access functionalized amines. The combination of Petasis reaction with subsequent ring-closing reactions is a powerful strategy to build novel polycyclic scaffolds. In this study, we report the generation of a diverse set of small molecules with polycyclic scaffolds featuring a high content of sp3-hybridized carbon atoms and multiple stereogenic centers by employing three-component Petasis reaction (3C-PR)-Intramolecular Diels-Alder (IMDA) and 3C-PR-ring-closing metathesis (RCM)-IMDA sequence reactions. This work demonstrates the wide substrate tolerance and broad applicability to access unexplored polycyclic scaffolds of biological interest using Petasis sequence reactions.
Collapse
Affiliation(s)
- Amrutha
K. Avathan Veettil
- Chemical
Genomics Centre, Max Planck Institute of
Molecular Physiology, Dortmund 44227, Germany
- Department
of Chemical Biology, Max Planck Institute
of Molecular Physiology, Dortmund 44227, Germany
- Faculty
of Chemistry and Chemical Biology, TU Dortmund
University, Dortmund 44227, Germany
| | - Jan-Lukas Kirchhoff
- Faculty
of Chemistry and Chemical Biology, TU Dortmund
University, Dortmund 44227, Germany
| | - Lukas Brieger
- Faculty
of Chemistry and Chemical Biology, TU Dortmund
University, Dortmund 44227, Germany
| | - Carsten Strohmann
- Faculty
of Chemistry and Chemical Biology, TU Dortmund
University, Dortmund 44227, Germany
| | - Peng Wu
- Chemical
Genomics Centre, Max Planck Institute of
Molecular Physiology, Dortmund 44227, Germany
- Department
of Chemical Biology, Max Planck Institute
of Molecular Physiology, Dortmund 44227, Germany
| |
Collapse
|
19
|
Tu Z, Stuyver T, Coley CW. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem Sci 2023; 14:226-244. [PMID: 36743887 PMCID: PMC9811563 DOI: 10.1039/d2sc05089g] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022] Open
Abstract
The field of predictive chemistry relates to the development of models able to describe how molecules interact and react. It encompasses the long-standing task of computer-aided retrosynthesis, but is far more reaching and ambitious in its goals. In this review, we summarize several areas where predictive chemistry models hold the potential to accelerate the deployment, development, and discovery of organic reactions and advance synthetic chemistry.
Collapse
Affiliation(s)
- Zhengkai Tu
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Thijs Stuyver
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| | - Connor W Coley
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
- Department of Chemical Engineering, Massachusetts Institute of Technology 77 Massachusetts Avenue Cambridge MA 02139 USA
| |
Collapse
|
20
|
An Integrated Method of Bayesian Optimization and D-Optimal Design for Chemical Experiment Optimization. Processes (Basel) 2022. [DOI: 10.3390/pr11010087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The smart chemical laboratory has recently emerged as a promising trend for future chemical research, where experiment optimization is of vital importance. The traditional Bayesian optimization (BO) algorithm focuses on exploring the dependent variable space while overlooking the independent variable space. Consequently, the BO algorithm suffers from becoming stuck at local optima, which severely deteriorates the optimization performance, especially with bad-quality initial points. Herein, we propose a novel stochastic framework of Bayesian optimization with D-optimal design (BODO) by integrating BO with D-optimal design. BODO can balance the exploitation in the dependent variable space and the exploration in the independent variable space. We highlight the excellent performance of BODO even with poor initial points on the benchmark alpine2 function. Meanwhile, BODO demonstrates a better average objective function value than BO on the benchmark Summit SnAr chemical process, showing its advantage in chemical experiment optimization and potential application in future chemical experiments.
Collapse
|
21
|
Ketkaew R, Luber S. DeepCV: A Deep Learning Framework for Blind Search of Collective Variables in Expanded Configurational Space. J Chem Inf Model 2022; 62:6352-6364. [PMID: 36445176 DOI: 10.1021/acs.jcim.2c00883] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
We present Deep learning for Collective Variables (DeepCV), a computer code that provides an efficient and customizable implementation of the deep autoencoder neural network (DAENN) algorithm that has been developed in our group for computing collective variables (CVs) and can be used with enhanced sampling methods to reconstruct free energy surfaces of chemical reactions. DeepCV can be used to conveniently calculate molecular features, train models, generate CVs, validate rare events from sampling, and analyze a trajectory for chemical reactions of interest. We use DeepCV in an example study of the conformational transition of cyclohexene, where metadynamics simulations are performed using DAENN-generated CVs. The results show that the adopted CVs give free energies in line with those obtained by previously developed CVs and experimental results. DeepCV is open-source software written in Python/C++ object-oriented languages, based on the TensorFlow framework and distributed free of charge for noncommercial purposes, which can be incorporated into general molecular dynamics software. DeepCV also comes with several additional tools, i.e., an application program interface (API), documentation, and tutorials.
Collapse
Affiliation(s)
- Rangsiman Ketkaew
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Sandra Luber
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| |
Collapse
|
22
|
Yarish D, Garkot S, Grygorenko OO, Radchenko DS, Moroz YS, Gurbych O. Advancing molecular graphs with descriptors for the prediction of chemical reaction yields. J Comput Chem 2022; 44:76-92. [PMID: 36264601 DOI: 10.1002/jcc.27016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 08/31/2022] [Accepted: 09/05/2022] [Indexed: 11/08/2022]
Abstract
Chemical yield is the percentage of the reactants converted to the desired products. Chemists use predictive algorithms to select high-yielding reactions and score synthesis routes, saving time and reagents. This study suggests a novel graph neural network architecture for chemical yield prediction. The network combines structural information about participants of the transformation as well as molecular and reaction-level descriptors. It works with incomplete chemical reactions and generates reactants-product atom mapping. We show that the network benefits from advanced information by comparing it with several machine learning models and molecular representations. Models included logistic regression, support vector machine, CatBoost, and Bidirectional Encoder Representations from Transformers. Molecular representations included extended-connectivity fingerprints, Morgan fingerprints, SMILESVec embeddings, and textual. Classification and regression objectives were assessed for each model and feature set. The goal of each classification model was to separate zero- and non-zero-yielding reactions. The models were trained and evaluated on a proprietary dataset of 10 reaction types. Also, the models were benchmarked on two public single reaction type datasets. The study was supplemented with analysis of data, results, and errors, as well as the impact of steric factors, side reactions, isolation, and purification efficiency. The supplementary code is available at https://github.com/SoftServeInc/yield-paper.
Collapse
Affiliation(s)
| | - Sofiya Garkot
- SoftServe, Inc., Lviv, Ukraine.,Ukrainian Catholic University, Lviv, Ukraine
| | - Oleksandr O Grygorenko
- Enamine Ltd., Kyiv, Ukraine.,Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Dmytro S Radchenko
- Enamine Ltd., Kyiv, Ukraine.,Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Yurii S Moroz
- Taras Shevchenko National University of Kyiv, Kyiv, Ukraine.,Chemspace LLC, Kyiv, Ukraine
| | - Oleksandr Gurbych
- Lviv Polytechnic National University, Lviv, Ukraine.,Blackthorn AI, Ltd., London, UK
| |
Collapse
|
23
|
Wang J, Shen Z, Liao Y, Yuan Z, Li S, He G, Lan M, Qian X, Zhang K, Li H. Multi-modal chemical information reconstruction from images and texts for exploring the near-drug space. Brief Bioinform 2022; 23:6761958. [PMID: 36252922 PMCID: PMC9677486 DOI: 10.1093/bib/bbac461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 09/21/2022] [Accepted: 09/26/2022] [Indexed: 12/14/2022] Open
Abstract
Identification of new chemical compounds with desired structural diversity and biological properties plays an essential role in drug discovery, yet the construction of such a potential space with elements of 'near-drug' properties is still a challenging task. In this work, we proposed a multimodal chemical information reconstruction system to automatically process, extract and align heterogeneous information from the text descriptions and structural images of chemical patents. Our key innovation lies in a heterogeneous data generator that produces cross-modality training data in the form of text descriptions and Markush structure images, from which a two-branch model with image- and text-processing units can then learn to both recognize heterogeneous chemical entities and simultaneously capture their correspondence. In particular, we have collected chemical structures from ChEMBL database and chemical patents from the European Patent Office and the US Patent and Trademark Office using keywords 'A61P, compound, structure' in the years from 2010 to 2020, and generated heterogeneous chemical information datasets with 210K structural images and 7818 annotated text snippets. Based on the reconstructed results and substituent replacement rules, structural libraries of a huge number of near-drug compounds can be generated automatically. In quantitative evaluations, our model can correctly reconstruct 97% of the molecular images into structured format and achieve an F1-score around 97-98% in the recognition of chemical entities, which demonstrated the effectiveness of our model in automatic information extraction from chemical patents, and hopefully transforming them to a user-friendly, structured molecular database enriching the near-drug space to realize the intelligent retrieval technology of chemical knowledge.
Collapse
Affiliation(s)
| | | | - Yichen Liao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, China
| | - Zhen Yuan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, China
| | - Shiliang Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, China
| | - Gaoqi He
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
| | - Man Lan
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
| | - Xuhong Qian
- Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China
| | - Kai Zhang
- Corresponding authors: Kai Zhang, School of Computer Science and Technology, Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China. E-mail: ; Honglin Li, Shanghai Key Laboratory of New Drug Design, East China University of Science & Technology, Shanghai 200237, China. Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China. E-mail:
| | - Honglin Li
- Corresponding authors: Kai Zhang, School of Computer Science and Technology, Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China. E-mail: ; Honglin Li, Shanghai Key Laboratory of New Drug Design, East China University of Science & Technology, Shanghai 200237, China. Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China. E-mail:
| |
Collapse
|
24
|
Spenke F, Hartke B. Graph-based Automated Macro-Molecule Assembly. J Chem Inf Model 2022; 62:3714-3723. [PMID: 35938711 DOI: 10.1021/acs.jcim.2c00609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We present a general molecular framework assembly algorithm that takes a largely arbitrary molecular fragment database and a user-supplied target template graph as input. Automatic assembly of molecular fragments from the database, following a prescribed, user-supplied set of connection rules, then turns the template graph into an actual, chemically reasonable molecular framework. Assembly capabilities of our algorithm are tested by producing several abstract, closed-loop shapes. To indicate a few of many possible application areas we demonstrate a host-guest complex and a road toward catalysis. Postassembly substituent exchange can be used to produce electric fields of desired values at desired points inside the framework or at its surface as a stepping stone toward rationally designed, artificial heterogeneous catalysts.
Collapse
Affiliation(s)
- Florian Spenke
- Institute for Physical Chemistry, Christian-Albrechts-University, Olshausenstrasse 40, Kiel 24098, Germany
| | - Bernd Hartke
- Institute for Physical Chemistry, Christian-Albrechts-University, Olshausenstrasse 40, Kiel 24098, Germany
| |
Collapse
|
25
|
Thiede LA, Krenn M, Nigam A, Aspuru-Guzik A. Curiosity in exploring chemical spaces: Intrinsicrewards for molecular reinforcement learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac7ddc] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
Computer aided design of molecules has the potential to disrupt the field of drug and material discovery. Machine learning and deep learning in particular, made big strides in recent years and promises to greatly benefit computer aided methods. Reinforcement learning is a particularly promising approach since it enables de novo molecule design, that is molecular design, without providing any prior knowledge. However, the search space is vast, and therefore any reinforcement learning agent needs to perform efficient exploration. In this study, we examine three versions of intrinsic motivation to aid efficient exploration. The algorithms are adapted from intrinsic motivation in the literature that were developed in other settings, predominantly video games. We show that the \textit{curious} agents finds better performing molecules on two of three benchmarks. This indicates an exciting new research direction for reinforcement learning agents that can explore the chemical space out of their own motivation. This has the potential to eventually lead to unexpected new molecular designs no human has thought about so far.
Collapse
|
26
|
Otović E, Njirjak M, Kalafatovic D, Mauša G. Sequential Properties Representation Scheme for Recurrent Neural Network-Based Prediction of Therapeutic Peptides. J Chem Inf Model 2022; 62:2961-2972. [PMID: 35704881 DOI: 10.1021/acs.jcim.2c00526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The discovery of therapeutic peptides is often accelerated by means of virtual screening supported by machine learning-based predictive models. The predictive performance of such models is sensitive to the choice of data and its representation scheme. While the peptide physicochemical and compositional representations fail to distinguish sequence permutations, the amino acid arrangement within the sequence lacks the important information contained in physicochemical, conformational, topological, and geometrical properties. In this paper, we propose a solution to the identified information gap by implementing a hybrid scheme that complements the best traits from both approaches with the aim of predicting antimicrobial and antiviral activities based on experimental data from DRAMP 2.0, AVPdb, and Uniprot data repositories. Using the Friedman test of statistical significance, we compared our hybrid, sequential properties approach to peptide properties, one-hot vector encoding, and word embedding schemes in the 10-fold cross-validation setting, with respect to the F1 score, Matthews correlation coefficient, geometric mean, recall, and precision evaluation metrics. Moreover, the sequence modeling neural network was employed to gain insight into the synergic effect of both properties- and amino acid order-based predictions. The results suggest that sequential properties significantly (P < 0.01) surpasses the aforementioned state-of-the-art representation schemes. This makes it a strong candidate for increasing the predictive power of screening methods based on machine learning, applicable to any category of peptides.
Collapse
Affiliation(s)
- Erik Otović
- University of Rijeka, Faculty of Engineering, 51000 Rijeka, Croatia
| | - Marko Njirjak
- University of Rijeka, Faculty of Engineering, 51000 Rijeka, Croatia
| | - Daniela Kalafatovic
- University of Rijeka, Department of Biotechnology, 51000 Rijeka, Croatia.,University of Rijeka, Center for Artificial Intelligence and Cybersecurity, 51000 Rijeka, Croatia
| | - Goran Mauša
- University of Rijeka, Faculty of Engineering, 51000 Rijeka, Croatia.,University of Rijeka, Center for Artificial Intelligence and Cybersecurity, 51000 Rijeka, Croatia
| |
Collapse
|
27
|
Gensch T, Smith SR, Colacot TJ, Timsina YN, Xu G, Glasspoole BW, Sigman MS. Design and Application of a Screening Set for Monophosphine Ligands in Cross-Coupling. ACS Catal 2022. [DOI: 10.1021/acscatal.2c01970] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Tobias Gensch
- Department of Chemistry, TU Berlin, Straße des 17. Juni 135, Sekr. C2, 10623 Berlin, Germany
| | - Sleight R. Smith
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Thomas J. Colacot
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Yam N. Timsina
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Guolin Xu
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Ben W. Glasspoole
- MilliporeSigma, 6000 N. Teutonia Ave, Milwaukee, Wisconsin 53209, United States
| | - Matthew S. Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| |
Collapse
|
28
|
Bender A, Schneider N, Segler M, Patrick Walters W, Engkvist O, Rodrigues T. Evaluation guidelines for machine learning tools in the chemical sciences. Nat Rev Chem 2022; 6:428-442. [PMID: 37117429 DOI: 10.1038/s41570-022-00391-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2022] [Indexed: 02/07/2023]
Abstract
Machine learning (ML) promises to tackle the grand challenges in chemistry and speed up the generation, improvement and/or ordering of research hypotheses. Despite the overarching applicability of ML workflows, one usually finds diverse evaluation study designs. The current heterogeneity in evaluation techniques and metrics leads to difficulty in (or the impossibility of) comparing and assessing the relevance of new algorithms. Ultimately, this may delay the digitalization of chemistry at scale and confuse method developers, experimentalists, reviewers and journal editors. In this Perspective, we critically discuss a set of method development and evaluation guidelines for different types of ML-based publications, emphasizing supervised learning. We provide a diverse collection of examples from various authors and disciplines in chemistry. While taking into account varying accessibility across research groups, our recommendations focus on reporting completeness and standardizing comparisons between tools. We aim to further contribute to improved ML transparency and credibility by suggesting a checklist of retro-/prospective tests and dissecting their importance. We envisage that the wide adoption and continuous update of best practices will encourage an informed use of ML on real-world problems related to the chemical sciences.
Collapse
|
29
|
Luo Y, Bag S, Zaremba O, Cierpka A, Andreo J, Wuttke S, Friederich P, Tsotsalas M. MOF Synthesis Prediction Enabled by Automatic Data Mining and Machine Learning. Angew Chem Int Ed Engl 2022; 61:e202200242. [PMID: 35104033 PMCID: PMC9310626 DOI: 10.1002/anie.202200242] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Indexed: 11/24/2022]
Abstract
Despite rapid progress in the field of metal–organic frameworks (MOFs), the potential of using machine learning (ML) methods to predict MOF synthesis parameters is still untapped. Here, we show how ML can be used for rationalization and acceleration of the MOF discovery process by directly predicting the synthesis conditions of a MOF based on its crystal structure. Our approach is based on: i) establishing the first MOF synthesis database via automatic extraction of synthesis parameters from the literature, ii) training and optimizing ML models by employing the MOF database, and iii) predicting the synthesis conditions for new MOF structures. The ML models, even at an initial stage, exhibit a good prediction performance, outperforming human expert predictions, obtained through a synthesis survey. The automated synthesis prediction is available via a web‐tool on https://mof‐synthesis.aimat.science.
Collapse
Affiliation(s)
- Yi Luo
- Institute of Functional Interfaces, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
| | - Saientan Bag
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
| | - Orysia Zaremba
- Basque Center for Materials, Applications & Nanostructures, Edif. Martina Casiano, Pl. 3 Parque Científico UPV/EHU Barrio Sarriena, 48940, Leioa, Bizkaia, Spain
| | - Adrian Cierpka
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131, Karlsruhe, Germany
| | - Jacopo Andreo
- Basque Center for Materials, Applications & Nanostructures, Edif. Martina Casiano, Pl. 3 Parque Científico UPV/EHU Barrio Sarriena, 48940, Leioa, Bizkaia, Spain
| | - Stefan Wuttke
- Basque Center for Materials, Applications & Nanostructures, Edif. Martina Casiano, Pl. 3 Parque Científico UPV/EHU Barrio Sarriena, 48940, Leioa, Bizkaia, Spain.,Ikerbasque, Basque Foundation for Science, Bilbao, 48013, Spain
| | - Pascal Friederich
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany.,Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131, Karlsruhe, Germany
| | - Manuel Tsotsalas
- Institute of Functional Interfaces, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany.,Institute of Organic Chemistry, Karlsruhe Institute of Technology, Kaiserstrasse 12, 76131, Karlsruhe, Germany
| |
Collapse
|
30
|
Grantham K, Mukaidaisi M, Ooi HK, Ghaemi MS, Tchagang A, Li Y. Deep Evolutionary Learning for Molecular Design. IEEE COMPUT INTELL M 2022. [DOI: 10.1109/mci.2022.3155308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
31
|
Moshawih S, Goh HP, Kifli N, Idris AC, Yassin H, Kotra V, Goh KW, Liew KB, Ming LC. Synergy between machine learning and natural products cheminformatics: Application to the lead discovery of anthraquinone derivatives. Chem Biol Drug Des 2022; 100:185-217. [PMID: 35490393 DOI: 10.1111/cbdd.14062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 04/15/2022] [Accepted: 04/23/2022] [Indexed: 11/28/2022]
Abstract
Cheminformatics utilizing machine learning (ML) techniques have opened up a new horizon in drug discovery. This is owing to vast chemical space expansion with rocketing numbers of expected hits and lead compounds that match druggable macromolecular targets, in particular from natural compounds. Due to the natural products' (NP) structural complexity, uniqueness, and diversity, they could occupy a bigger space in pharmaceuticals, allowing the industry to pursue more selective leads in the nanomolar range of binding affinity. ML is an essential part of each step of the drug design pipeline, such as target prediction, compound library preparation, and lead optimization. Notably, molecular mechanic and dynamic simulations, induced docking, and free energy perturbations are essential in predicting best binding poses, binding free energy values, and molecular mechanics force fields. Those applications have leveraged from artificial intelligence (AI), which decreases the computational costs required for such costly simulations. This review aimed to describe chemical space and compound libraries related to NPs. High-throughput screening utilized for fractionating NPs and high-throughput virtual screening and their strategies, and significance, are reviewed. Particular emphasis was given to AI approaches, ML tools, algorithms, and techniques, especially in drug discovery of macrocyclic compounds and approaches in computer-aided and ML-based drug discovery. Anthraquinone derivatives were discussed as a source of new lead compounds that can be developed using ML tools for diverse medicinal uses such as cancer, infectious diseases, and metabolic disorders. Furthermore, the power of principal component analysis in understanding relevant protein conformations, and molecular modeling of protein-ligand interaction were also presented. Apart from being a concise reference for cheminformatics, this review is a useful text to understand the application of ML-based algorithms to molecular dynamics simulation and in silico absorption, distribution, metabolism, excretion, and toxicity prediction.
Collapse
Affiliation(s)
- Said Moshawih
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Hui Poh Goh
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Nurolaini Kifli
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Azam Che Idris
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Hayati Yassin
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Vijay Kotra
- Faculty of Pharmacy, Quest International University, Perak, Malaysia
| | - Khang Wen Goh
- Faculty of Data Science and Information Technology, INTI International University, Nilai, Malaysia
| | - Kai Bin Liew
- Faculty of Pharmacy, University of Cyberjaya, Cyberjaya, Malaysia
| | - Long Chiau Ming
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| |
Collapse
|
32
|
Luo Y, Bag S, Zaremba O, Cierpka A, Andreo J, Wuttke S, Friederich P, Tsotsalas M. Vorhersage der MOF‐Synthese durch automatisches Data‐Mining und maschinelles Lernen**. Angew Chem Int Ed Engl 2022. [DOI: 10.1002/ange.202200242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Yi Luo
- Institute of Functional Interfaces Karlsruhe Institute of Technology Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Deutschland
| | - Saientan Bag
- Institute of Nanotechnology Karlsruhe Institute of Technology Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Deutschland
| | - Orysia Zaremba
- Basque Center for Materials, Applications & Nanostructures Edif. Martina Casiano, Pl. 3 Parque Científico UPV/EHU Barrio Sarriena 48940 Leioa Bizkaia Spanien
| | - Adrian Cierpka
- Institute of Theoretical Informatics Karlsruhe Institute of Technology Am Fasanengarten 5 76131 Karlsruhe Deutschland
| | - Jacopo Andreo
- Basque Center for Materials, Applications & Nanostructures Edif. Martina Casiano, Pl. 3 Parque Científico UPV/EHU Barrio Sarriena 48940 Leioa Bizkaia Spanien
| | - Stefan Wuttke
- Basque Center for Materials, Applications & Nanostructures Edif. Martina Casiano, Pl. 3 Parque Científico UPV/EHU Barrio Sarriena 48940 Leioa Bizkaia Spanien
- Ikerbasque Basque Foundation for Science Bilbao 48013 Spanien
| | - Pascal Friederich
- Institute of Nanotechnology Karlsruhe Institute of Technology Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Deutschland
- Institute of Theoretical Informatics Karlsruhe Institute of Technology Am Fasanengarten 5 76131 Karlsruhe Deutschland
| | - Manuel Tsotsalas
- Institute of Functional Interfaces Karlsruhe Institute of Technology Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Deutschland
- Institute of Organic Chemistry Karlsruhe Institute of Technology Kaiserstrasse 12 76131 Karlsruhe Deutschland
| |
Collapse
|
33
|
Wang Y, Chen D. Application of Advanced Vibrational Spectroscopy in Revealing Critical Chemical Processes and Phenomena of Electrochemical Energy Storage and Conversion. ACS APPLIED MATERIALS & INTERFACES 2022; 14:23033-23055. [PMID: 35130433 DOI: 10.1021/acsami.1c20893] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The future of the energy industry and green transportation critically relies on exploration of high-performance, reliable, low-cost, and environmentally friendly energy storage and conversion materials. Understanding the chemical processes and phenomena involved in electrochemical energy storage and conversion is the premise of a revolutionary materials discovery. In this article, we review the recent advancements of application of state-of-the-art vibrational spectroscopic techniques in unraveling the nature of electrochemical energy, including bulk energy storage, dynamics of liquid electrolytes, interfacial processes, etc. Technique-wise, the review covers a wide range of spectroscopic methods, including classic vibrational spectroscopy (direct infrared absorption and Raman scattering), external field enhanced spectroscopy (surface enhanced Raman and IR, tip enhanced Raman, and near-field IR), and two-photon techniques (2D infrared absorption, stimulated Raman, and vibrational sum frequency generation). Finally, we provide perspectives on future directions in refining vibrational spectroscopy to contribute to the research frontier of electrochemical energy storage and conversion.
Collapse
Affiliation(s)
- You Wang
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, New Mexico 87131, United States
| | - Dongchang Chen
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, New Mexico 87131, United States
| |
Collapse
|
34
|
Mine S, Jing Y, Mukaiyama T, Takao M, Maeno Z, Shimizu KI, Takigawa I, Toyao T. Machine Learning Analysis of Literature Data on the Water Gas Shift Reaction Toward Extrapolative Prediction of Novel Catalysts. CHEM LETT 2022. [DOI: 10.1246/cl.210645] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Shinya Mine
- Institute for Catalysis, Hokkaido University, N-21, W-10, 1-5, Sapporo 001-0021, Japan
| | - Yuan Jing
- Institute for Catalysis, Hokkaido University, N-21, W-10, 1-5, Sapporo 001-0021, Japan
| | - Takumi Mukaiyama
- Institute for Catalysis, Hokkaido University, N-21, W-10, 1-5, Sapporo 001-0021, Japan
| | - Motoshi Takao
- Institute for Catalysis, Hokkaido University, N-21, W-10, 1-5, Sapporo 001-0021, Japan
| | - Zen Maeno
- Institute for Catalysis, Hokkaido University, N-21, W-10, 1-5, Sapporo 001-0021, Japan
| | - Ken-ichi Shimizu
- Institute for Catalysis, Hokkaido University, N-21, W-10, 1-5, Sapporo 001-0021, Japan
- Elements Strategy Initiative for Catalysts and Batteries, Kyoto University, Katsura, Kyoto 615-8520, Japan
| | - Ichigaku Takigawa
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, N-21, W-10, Sapporo 001-0021, Japan
| | - Takashi Toyao
- Institute for Catalysis, Hokkaido University, N-21, W-10, 1-5, Sapporo 001-0021, Japan
- Elements Strategy Initiative for Catalysts and Batteries, Kyoto University, Katsura, Kyoto 615-8520, Japan
| |
Collapse
|
35
|
Ke J, Gao C, Folgueiras-Amador AA, Jolley KE, de Frutos O, Mateos C, Rincón JA, Brown RCD, Poliakoff M, George MW. Self-Optimization of Continuous Flow Electrochemical Synthesis Using Fourier Transform Infrared Spectroscopy and Gas Chromatography. APPLIED SPECTROSCOPY 2022; 76:38-50. [PMID: 34911387 DOI: 10.1177/00037028211059848] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
A continuous-flow electrochemical synthesis platform has been developed to enable self-optimization of reaction conditions of organic electrochemical reactions using attenuated total reflection Fourier transform infrared spectroscopy (ATR FT-IR) and gas chromatography (GC) as online real-time monitoring techniques. We have overcome the challenges in using ATR FT-IR as the downstream analytical methods imposed when a large amount of hydrogen gas is produced from the counter electrode by designing two types of gas-liquid separators (GLS) for analysis of the product mixture flowing from the electrochemical reactor. In particular, we report an integrated GLS with an ATR FT-IR probe at the reactor outlet to give a facile and low-cost solution to determining the concentrations of products in gas-liquid two-phase flow. This approach provides a reliable method for quantifying low-volatile analytes, which can be problematic to be monitored by GC. Two electrochemical reactions the methoxylation of 1-formylpyrrolidine and the oxidation of 3-bromobenzyl alcohol were investigated to demonstrate that the optimal conditions can be located within the pre-defined multi-dimensional reaction parameter spaces without intervention of the operator by using the stable noisy optimization by branch and FIT (SNOBFIT) algorithm.
Collapse
Affiliation(s)
- Jie Ke
- School of Chemistry, 6123University of Nottingham, Nottingham, UK
| | - Chuang Gao
- School of Chemistry, 6123University of Nottingham, Nottingham, UK
- Department of Chemical and Environmental Engineering, The University of Nottingham Ningbo China, Ningbo, China
| | | | - Katherine E Jolley
- School of Chemistry, 6123University of Nottingham, Nottingham, UK
- School of Chemistry, University of Southampton, Southampton, UK
| | - Oscar de Frutos
- Centro de Investigación Lilly S.A., Alcobendas-Madrid, Spain
| | - Carlos Mateos
- Centro de Investigación Lilly S.A., Alcobendas-Madrid, Spain
| | - Juan A Rincón
- Centro de Investigación Lilly S.A., Alcobendas-Madrid, Spain
| | | | - Martyn Poliakoff
- School of Chemistry, 6123University of Nottingham, Nottingham, UK
| | - Michael W George
- School of Chemistry, 6123University of Nottingham, Nottingham, UK
- Department of Chemical and Environmental Engineering, The University of Nottingham Ningbo China, Ningbo, China
| |
Collapse
|
36
|
Trunschke A. Prospects and challenges for autonomous catalyst discovery viewed from an experimental perspective. Catal Sci Technol 2022. [DOI: 10.1039/d2cy00275b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Autonomous catalysis research requires elaborate integration of operando experiments into automated workflows. Suitable experimental data for analysis by artificial intelligence can be measured more readily according to standard operating procedures.
Collapse
Affiliation(s)
- Annette Trunschke
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Department of Inorganic Chemistry, Faradayweg 4-6, 14195 Berlin, Germany
| |
Collapse
|
37
|
Zhao ZW, del Cueto M, Troisi A. Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors. DIGITAL DISCOVERY 2022; 1:266-276. [PMID: 35769202 PMCID: PMC9189862 DOI: 10.1039/d2dd00004k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 03/23/2022] [Indexed: 11/21/2022]
Abstract
We try to determine if machine learning (ML) methods, applied to the discovery of new materials on the basis of existing data sets, have the power to predict completely new classes of compounds (extrapolating) or perform well only when interpolating between known materials. We introduce the leave-one-group-out cross-validation, in which the ML model is trained to explicitly perform extrapolations of unseen chemical families. This approach can be used across materials science and chemistry problems to improve the added value of ML predictions, instead of using extrapolative ML models that were trained with a regular cross-validation. We consider as a case study the problem of the discovery of non-fullerene acceptors because novel classes of acceptors are naturally classified into distinct chemical families. We show that conventional ML methods are not useful in practice when attempting to predict the efficiency of a completely novel class of materials. The approach proposed in this work increases the accuracy of the predictions to enable at least the categorization of materials with a performance above and below the median value. We try to determine if machine learning (ML) methods, applied to the discovery of new materials on the basis of existing data sets, have the power to predict new classes of compounds or perform well only when interpolating between known materials.![]()
Collapse
Affiliation(s)
- Zhi-Wen Zhao
- Department of Chemistry, University of Liverpool, Liverpool, L69 3BX, UK
- Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University, Changchun, 130024, Jilin, P. R. China
| | - Marcos del Cueto
- Department of Chemistry, University of Liverpool, Liverpool, L69 3BX, UK
| | - Alessandro Troisi
- Department of Chemistry, University of Liverpool, Liverpool, L69 3BX, UK
| |
Collapse
|
38
|
Ertl P, Gerebtzoff G, Lewis RA, Muenkler H, Schneider N, Sirockin F, Stiefl N, Tosco P. Chemical reactivity prediction: current methods and different application areas. Mol Inform 2021; 41:e2100277. [PMID: 34964302 DOI: 10.1002/minf.202100277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 12/28/2021] [Indexed: 11/10/2022]
Abstract
The ability to predict chemical reactivity of a molecule is highly desirable in drug discovery, both ex vivo (synthetic route planning, formulation, stability) and in vivo: metabolic reactions determine pharmacodynamics, pharmacokinetics and potential toxic effects, and early assessment of liabilities is vital to reduce attrition rates in later stages of development. Quantum mechanics offer a precise description of the interactions between electrons and orbitals in the breaking and forming of new bonds. Modern algorithms and faster computers have allowed the study of more complex systems in a punctual and accurate fashion, and answers for chemical questions around stability and reactivity can now be provided. Through machine learning, predictive models can be built out of descriptors derived from quantum mechanics and cheminformatics, even in the absence of experimental data to train on. In this article, current progress on computational reactivity prediction is reviewed: applications to problems in drug design, such as modelling of metabolism and covalent inhibition, are highlighted and unmet challenges are posed.
Collapse
Affiliation(s)
| | | | - Richard A Lewis
- Computer-Aided Drug Design, Eli Lilly and Company Limited, Windlesham, SWITZERLAND
| | - Hagen Muenkler
- Novartis Institutes for BioMedical Research Inc, SWITZERLAND
| | | | | | | | - Paolo Tosco
- Novartis Institutes for BioMedical Research Inc, SWITZERLAND
| |
Collapse
|
39
|
Pazniak H, Varezhnikov AS, Kolosov DA, Plugin IA, Vito AD, Glukhova OE, Sheverdyaeva PM, Spasova M, Kaikov I, Kolesnikov EA, Moras P, Bainyashev AM, Solomatin MA, Kiselev I, Wiedwald U, Sysoev VV. 2D Molybdenum Carbide MXenes for Enhanced Selective Detection of Humidity in Air. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2021; 33:e2104878. [PMID: 34601739 DOI: 10.1002/adma.202104878] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 09/07/2021] [Indexed: 05/27/2023]
Abstract
2D transition metal carbides and nitrides (MXenes) open up novel opportunities in gas sensing with high sensitivity at room temperature. Herein, 2D Mo2 CTx flakes with high aspect ratio are successfully synthesized. The chemiresistive effect in a sub-µm MXene multilayer for different organic vapors and humidity at 101 -104 ppm in dry air is studied. Reasonably, the low-noise resistance signal allows the detection of H2 O down to 10 ppm. Moreover, humidity suppresses the response of Mo2 CTx to organic analytes due to the blocking of adsorption active sites. By measuring the impedance of MXene layers as a function of ac frequency in the 10-2 -106 Hz range, it is shown that operation principle of the sensor is dominated by resistance change rather than capacitance variations. The sensor transfer function allows to conclude that the Mo2 CTx chemiresistance is mainly originating from electron transport through interflake potential barriers with heights up to 0.2 eV. Density functional theory calculations, elucidating the Mo2 C surface interaction with organic analytes and H2 O, explain the experimental data as an energy shift of the density of states under the analyte's adsorption which induces increasing electrical resistance.
Collapse
Affiliation(s)
- Hanna Pazniak
- Faculty of Physics and Center for Nanointegration Duisburg-Essen, University of Duisburg-Essen, Lotharstr. 1, 47057, Duisburg, Germany
| | - Alexey S Varezhnikov
- Yuri Gagarin State Technical University of Saratov, Politekhnicheskaya str. 77, Saratov, 410054, Russia
| | - Dmitry A Kolosov
- Department of Physics, Saratov State University, Astrakhanskaya str. 83, Saratov, 410012, Russia
| | - Ilya A Plugin
- Yuri Gagarin State Technical University of Saratov, Politekhnicheskaya str. 77, Saratov, 410054, Russia
| | - Alessia Di Vito
- Department of Electronic Engineering, University of Rome Tor Vergata, Via Cracovia, 50, Roma, 00133, Italy
| | - Olga E Glukhova
- Department of Physics, Saratov State University, Astrakhanskaya str. 83, Saratov, 410012, Russia
- Laboratory of Biomedical Nanotechnology, I. M. Sechenov First Moscow State Medical University, Trubetskaya str. 8-2, Moscow, 119991, Russia
| | | | - Marina Spasova
- Faculty of Physics and Center for Nanointegration Duisburg-Essen, University of Duisburg-Essen, Lotharstr. 1, 47057, Duisburg, Germany
| | - Igor Kaikov
- Breitmeier Messtechnik GmbH, Englerstr. 27, 76275, Ettlingen, Germany
| | - Evgeny A Kolesnikov
- National University of Science & Technology (NUST) MISIS, Leninskiy Prospekt 4, Moscow, 119049, Russia
| | - Paolo Moras
- Institute of Structure of Matter (ISM-CNR), SS 14 Km, Trieste, 34149, Italy
| | - Alexey M Bainyashev
- Yuri Gagarin State Technical University of Saratov, Politekhnicheskaya str. 77, Saratov, 410054, Russia
| | - Maksim A Solomatin
- Yuri Gagarin State Technical University of Saratov, Politekhnicheskaya str. 77, Saratov, 410054, Russia
| | - Ilia Kiselev
- Breitmeier Messtechnik GmbH, Englerstr. 27, 76275, Ettlingen, Germany
| | - Ulf Wiedwald
- Faculty of Physics and Center for Nanointegration Duisburg-Essen, University of Duisburg-Essen, Lotharstr. 1, 47057, Duisburg, Germany
| | - Victor V Sysoev
- Yuri Gagarin State Technical University of Saratov, Politekhnicheskaya str. 77, Saratov, 410054, Russia
| |
Collapse
|
40
|
Caramelli D, Granda J, Mehr SHM, Cambié D, Henson AB, Cronin L. Discovering New Chemistry with an Autonomous Robotic Platform Driven by a Reactivity-Seeking Neural Network. ACS CENTRAL SCIENCE 2021; 7:1821-1830. [PMID: 34849401 PMCID: PMC8620554 DOI: 10.1021/acscentsci.1c00435] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Indexed: 05/04/2023]
Abstract
We present a robotic chemical discovery system capable of navigating a chemical space based on a learned general association between molecular structures and reactivity, while incorporating a neural network model that can process data from online analytics and assess reactivity without knowing the identity of the reagents. Working in conjunction with this learned knowledge, our robotic platform is able to autonomously explore a large number of potential reactions and assess the reactivity of mixtures, including unknown chemical spaces, regardless of the identity of the starting materials. Through the system, we identified a range of chemical reactions and products, some of which were well-known, some new but predictable from known pathways, and some unpredictable reactions that yielded new molecules. The validation of the system was done within a budget of 15 inputs combined in 1018 reactions, further analysis of which allowed us to discover not only a new photochemical reaction but also a new reactivity mode for a well-known reagent (p-toluenesulfonylmethyl isocyanide, TosMIC). This involved the reaction of 6 equiv of TosMIC in a "multistep, single-substrate" cascade reaction yielding a trimeric product in high yield (47% unoptimized) with the formation of five new C-C bonds involving sp-sp2 and sp-sp3 carbon centers. An analysis reveals that this transformation is intrinsically unpredictable, demonstrating the possibility of a reactivity-first robotic discovery of unknown reaction methodologies without requiring human input.
Collapse
|
41
|
Aldeghi M, Häse F, Hickman RJ, Tamblyn I, Aspuru-Guzik A. Golem: an algorithm for robust experiment and process optimization. Chem Sci 2021; 12:14792-14807. [PMID: 34820095 PMCID: PMC8597856 DOI: 10.1039/d1sc01545a] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 10/11/2021] [Indexed: 01/02/2023] Open
Abstract
Numerous challenges in science and engineering can be framed as optimization tasks, including the maximization of reaction yields, the optimization of molecular and materials properties, and the fine-tuning of automated hardware protocols. Design of experiment and optimization algorithms are often adopted to solve these tasks efficiently. Increasingly, these experiment planning strategies are coupled with automated hardware to enable autonomous experimental platforms. The vast majority of the strategies used, however, do not consider robustness against the variability of experiment and process conditions. In fact, it is generally assumed that these parameters are exact and reproducible. Yet some experiments may have considerable noise associated with some of their conditions, and process parameters optimized under precise control may be applied in the future under variable operating conditions. In either scenario, the optimal solutions found might not be robust against input variability, affecting the reproducibility of results and returning suboptimal performance in practice. Here, we introduce Golem, an algorithm that is agnostic to the choice of experiment planning strategy and that enables robust experiment and process optimization. Golem identifies optimal solutions that are robust to input uncertainty, thus ensuring the reproducible performance of optimized experimental protocols and processes. It can be used to analyze the robustness of past experiments, or to guide experiment planning algorithms toward robust solutions on the fly. We assess the performance and domain of applicability of Golem through extensive benchmark studies and demonstrate its practical relevance by optimizing an analytical chemistry protocol under the presence of significant noise in its experimental conditions.
Collapse
Affiliation(s)
- Matteo Aldeghi
- Vector Institute for Artificial Intelligence Toronto ON Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto Toronto ON Canada
- Department of Computer Science, University of Toronto Toronto ON Canada
| | - Florian Häse
- Vector Institute for Artificial Intelligence Toronto ON Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto Toronto ON Canada
- Department of Computer Science, University of Toronto Toronto ON Canada
- Department of Chemistry and Chemical Biology, Harvard University Cambridge MA USA
| | - Riley J Hickman
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto Toronto ON Canada
- Department of Computer Science, University of Toronto Toronto ON Canada
| | - Isaac Tamblyn
- Vector Institute for Artificial Intelligence Toronto ON Canada
- National Research Council of Canada Ottawa ON Canada
| | - Alán Aspuru-Guzik
- Vector Institute for Artificial Intelligence Toronto ON Canada
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto Toronto ON Canada
- Department of Computer Science, University of Toronto Toronto ON Canada
- Lebovic Fellow, Canadian Institute for Advanced Research Toronto ON Canada
| |
Collapse
|
42
|
|
43
|
Reis M, Gusev F, Taylor NG, Chung SH, Verber MD, Lee YZ, Isayev O, Leibfarth FA. Machine-Learning-Guided Discovery of 19F MRI Agents Enabled by Automated Copolymer Synthesis. J Am Chem Soc 2021; 143:17677-17689. [PMID: 34637304 PMCID: PMC10833148 DOI: 10.1021/jacs.1c08181] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Modern polymer science suffers from the curse of multidimensionality. The large chemical space imposed by including combinations of monomers into a statistical copolymer overwhelms polymer synthesis and characterization technology and limits the ability to systematically study structure-property relationships. To tackle this challenge in the context of 19F magnetic resonance imaging (MRI) agents, we pursued a computer-guided materials discovery approach that combines synergistic innovations in automated flow synthesis and machine learning (ML) method development. A software-controlled, continuous polymer synthesis platform was developed to enable iterative experimental-computational cycles that resulted in the synthesis of 397 unique copolymer compositions within a six-variable compositional space. The nonintuitive design criteria identified by ML, which were accomplished by exploring <0.9% of the overall compositional space, lead to the identification of >10 copolymer compositions that outperformed state-of-the-art materials.
Collapse
Affiliation(s)
- Marcus Reis
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Filipp Gusev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Nicholas G Taylor
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Sang Hun Chung
- Department of Biomedical Engineering, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Matthew D Verber
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Yueh Z Lee
- Department of Radiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Frank A Leibfarth
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
44
|
Sharma S, Arya A, Cruz R, Cleaves II HJ. Automated Exploration of Prebiotic Chemical Reaction Space: Progress and Perspectives. Life (Basel) 2021; 11:1140. [PMID: 34833016 PMCID: PMC8624352 DOI: 10.3390/life11111140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/15/2021] [Accepted: 10/18/2021] [Indexed: 12/12/2022] Open
Abstract
Prebiotic chemistry often involves the study of complex systems of chemical reactions that form large networks with a large number of diverse species. Such complex systems may have given rise to emergent phenomena that ultimately led to the origin of life on Earth. The environmental conditions and processes involved in this emergence may not be fully recapitulable, making it difficult for experimentalists to study prebiotic systems in laboratory simulations. Computational chemistry offers efficient ways to study such chemical systems and identify the ones most likely to display complex properties associated with life. Here, we review tools and techniques for modelling prebiotic chemical reaction networks and outline possible ways to identify self-replicating features that are central to many origin-of-life models.
Collapse
Affiliation(s)
- Siddhant Sharma
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Biochemistry, Deshbandhu College, University of Delhi, New Delhi 110019, India
- Department of Chemistry and Chemical Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Aayush Arya
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Department of Physics, Lovely Professional University, Jalandhar-Delhi GT Road, Phagwara 144001, India
| | - Romulo Cruz
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Big Data Laboratory, Information and Communications Technology Center (CTIC), National University of Engineering, Amaru 210, Lima 15333, Peru
| | - Henderson James Cleaves II
- Blue Marble Space Institute of Science, Seattle, WA 98154, USA; (S.S.); (A.A.); (R.C.)
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
45
|
Hammer AS, Leonov AI, Bell NL, Cronin L. Chemputation and the Standardization of Chemical Informatics. JACS AU 2021; 1:1572-1587. [PMID: 34723260 PMCID: PMC8549037 DOI: 10.1021/jacsau.1c00303] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Indexed: 05/11/2023]
Abstract
The explosion in the use of machine learning for automated chemical reaction optimization is gathering pace. However, the lack of a standard architecture that connects the concept of chemical transformations universally to software and hardware provides a barrier to using the results of these optimizations and could cause the loss of relevant data and prevent reactions from being reproducible or unexpected findings verifiable or explainable. In this Perspective, we describe how the development of the field of digital chemistry or chemputation, that is the universal code-enabled control of chemical reactions using a standard language and ontology, will remove these barriers allowing users to focus on the chemistry and plug in algorithms according to the problem space to be explored or unit function to be optimized. We describe a standard hardware (the chemical processing programming architecture-the ChemPU) to encompass all chemical synthesis, an approach which unifies all chemistry automation strategies, from solid-phase peptide synthesis, to HTE flow chemistry platforms, while at the same time establishing a publication standard so that researchers can exchange chemical code (χDL) to ensure reproducibility and interoperability. Not only can a vast range of different chemistries be plugged into the hardware, but the ever-expanding developments in software and algorithms can also be accommodated. These technologies, when combined will allow chemistry, or chemputation, to follow computation-that is the running of code across many different types of capable hardware to get the same result every time with a low error rate.
Collapse
|
46
|
Kahana A, Lancet D. Self-reproducing catalytic micelles as nanoscopic protocell precursors. Nat Rev Chem 2021; 5:870-878. [PMID: 37117387 DOI: 10.1038/s41570-021-00329-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2021] [Indexed: 12/31/2022]
Abstract
Protocells at life's origin are often conceived as bilayer-enclosed precursors of life, whose self-reproduction rests on the early advent of replicating catalytic biopolymers. This Perspective describes an alternative scenario, wherein reproducing nanoscopic lipid micelles with catalytic capabilities were forerunners of biopolymer-containing protocells. This postulate gains considerable support from experiments describing micellar catalysis and autocatalytic proliferation, and, more recently, from reports on cross-catalysis in mixed micelles that lead to life-like steady-state dynamics. Such results, along with evidence for micellar prebiotic compatibility, synergize with predictions of our chemically stringent computer-simulated model, illustrating how mutually catalytic lipid networks may enable micellar compositional reproduction that could underlie primal selection and evolution. Finally, we highlight studies on how endogenously catalysed lipid modifications could guide further protocellular complexification, including micelle to vesicle transition and monomer to biopolymer progression. These portrayals substantiate the possibility that protocellular evolution could have been seeded by pre-RNA lipid assemblies.
Collapse
|
47
|
Towards Data‐Driven Design of Asymmetric Hydrogenation of Olefins: Database and Hierarchical Learning. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202106880] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
48
|
Nandy A, Duan C, Taylor MG, Liu F, Steeves AH, Kulik HJ. Computational Discovery of Transition-metal Complexes: From High-throughput Screening to Machine Learning. Chem Rev 2021; 121:9927-10000. [PMID: 34260198 DOI: 10.1021/acs.chemrev.1c00347] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Transition-metal complexes are attractive targets for the design of catalysts and functional materials. The behavior of the metal-organic bond, while very tunable for achieving target properties, is challenging to predict and necessitates searching a wide and complex space to identify needles in haystacks for target applications. This review will focus on the techniques that make high-throughput search of transition-metal chemical space feasible for the discovery of complexes with desirable properties. The review will cover the development, promise, and limitations of "traditional" computational chemistry (i.e., force field, semiempirical, and density functional theory methods) as it pertains to data generation for inorganic molecular discovery. The review will also discuss the opportunities and limitations in leveraging experimental data sources. We will focus on how advances in statistical modeling, artificial intelligence, multiobjective optimization, and automation accelerate discovery of lead compounds and design rules. The overall objective of this review is to showcase how bringing together advances from diverse areas of computational chemistry and computer science have enabled the rapid uncovering of structure-property relationships in transition-metal chemistry. We aim to highlight how unique considerations in motifs of metal-organic bonding (e.g., variable spin and oxidation state, and bonding strength/nature) set them and their discovery apart from more commonly considered organic molecules. We will also highlight how uncertainty and relative data scarcity in transition-metal chemistry motivate specific developments in machine learning representations, model training, and in computational chemistry. Finally, we will conclude with an outlook of areas of opportunity for the accelerated discovery of transition-metal complexes.
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adam H Steeves
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
49
|
Xu LC, Zhang SQ, Li X, Tang MJ, Xie PP, Hong X. Towards Data-driven Design of Asymmetric Hydrogenation of Olefins: Database and Hierarchical Learning. Angew Chem Int Ed Engl 2021; 60:22804-22811. [PMID: 34370892 DOI: 10.1002/anie.202106880] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 07/14/2021] [Indexed: 11/09/2022]
Abstract
Asymmetric hydrogenation of olefins is one of the most powerful asymmetric transformations in molecular synthesis. Although several privileged catalyst scaffolds are available, the catalyst development for asymmetric hydrogenation is still a time- and resource-consuming process due to the lack of predictive catalyst design strategy. Targeting the data-driven design of asymmetric catalysis, we herein report the development of a standardized database that contains the detailed information of over 12000 literature asymmetric hydrogenations of olefins. This database provides a valuable platform for the machine learning applications in asymmetric catalysis. Based on this database, we developed a hierarchical learning approach to achieve predictive machine leaning model using only dozens of enantioselectivity data with the target olefin, which offers a useful solution for the few-shot learning problem and will facilitate the reaction optimization with new olefin substrate in catalysis screening.
Collapse
Affiliation(s)
- Li-Cheng Xu
- Zhejiang University, Department of Chemistry, CHINA
| | | | - Xin Li
- Zhejiang University, Department of Chemistry, CHINA
| | | | - Pei-Pei Xie
- Zhejiang University, Department of Chemistry, CHINA
| | - Xin Hong
- Zhejiang University, Department of Chemistry, 38 Zheda Road, 310028, Hangzhou, CHINA
| |
Collapse
|
50
|
Kotliar-Shapirov A, Fedorov FS, Ouerdane H, Evlashin S, Nasibulin AG, Stevenson KJ. Chemical space mapping for multicomponent gas mixtures. J Electroanal Chem (Lausanne) 2021. [DOI: 10.1016/j.jelechem.2021.115472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|