1
|
Chen S, Noh J, Jang J, Kim S, Gu GH, Jung Y. Reaction Templates: Bridging Synthesis Knowledge and Artificial Intelligence. Acc Chem Res 2024; 57:1964-1972. [PMID: 38924502 DOI: 10.1021/acs.accounts.4c00261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
ConspectusThe field of chemical research boasts a long history of developing software to automate synthesis planning and reaction prediction. Early software relied heavily on expert systems, requiring significant effort to encode vast amounts of synthesis knowledge into a computer-readable format. However, recent advancements in deep learning have shifted the focus toward AI models, offering improved prediction capabilities. Despite these advancements, current AI models often lack the integration of known synthesis rules and intuitions, creating a gap that hinders interpretability and future development of the models. To bridge them, our research group has been actively working on incorporating reaction templates into deep learning models, achieving promising results across various applications.In this Account, we present our latest works to incorporate the known synthesis knowledge into the deep learning models through the utilization of reaction templates. We begin by highlighting the limitations of early computer programs heavily reliant on hand-coded rules. These programs, while providing a foundation for the field, presented limitations in scalability and adaptability. We then introduce SMARTS (SMILES arbitrary target specification), a popular Python-readable format for representing chemical reactions. This format of reaction encoding facilitates the quick integration of synthesis knowledge into AI models built using the Python language. With the SMARTS-based reaction templates, we introduce our recent efforts of developing an AI model for reaction-based molecule optimization. Subsequently, we discuss the recent efforts to automate the extraction of reaction templates from vast chemical reaction databases. This approach eliminates the previously required manual effort of encoding knowledge, a process that could be time-consuming and prone to error when dealing with large data sets. By customizing the automated extraction algorithm, we have developed powerful AI models for specific tasks such as retrosynthesis (LocalRetro), reaction outcome prediction (LocalTransform), and atom-to-atom mapping (LocalMapper). These models, aligned with the intuition of chemists, demonstrate the effectiveness of incorporating reaction templates into deep learning frameworks.Looking toward the future, we believe that utilizing reaction templates to connect known chemical knowledge and AI models holds immense potential for various applications. Not only can this approach significantly benefit future AI models focused on challenging tasks like reaction mechanism labeling and prediction, but we anticipate it can also extend its reach to the realm of inorganic synthesis. By integrating synthesis knowledge, we can not only achieve improved performance but also enhance the interpretability of AI models, paving the way for further advancements in AI-powered chemical synthesis.
Collapse
Affiliation(s)
- Shuan Chen
- Department of Chemical and Biological Engineering, and Institute of Chemical Process, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Juhwan Noh
- Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology (KRICT), 141 Gajeong-ro, Yuseong-gu, Daejeon 34114, South Korea
| | - Jidon Jang
- Data Convergence Drug Research Center, Korea Research Institute of Chemical Technology (KRICT), 141 Gajeong-ro, Yuseong-gu, Daejeon 34114, South Korea
| | - Seongmin Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291, Daehak-ro, Yuseong-gu, Daejeon 34141, South Korea
| | - Geun Ho Gu
- Department of Energy Engineering, Korea Institute of Energy Technology (KENTECH), 21 Kentech-gil, Naju, Jeonnam 58330, South Korea
| | - Yousung Jung
- Department of Chemical and Biological Engineering, and Institute of Chemical Process, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
- Institute of Engineering Research, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| |
Collapse
|
2
|
Stevens JN, Prockter AK, Fisher HA, Tran H, Evans MV. A database of chemical absorption in human skin with mechanistic modeling applications. Sci Data 2024; 11:755. [PMID: 38987285 PMCID: PMC11237069 DOI: 10.1038/s41597-024-03588-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 07/01/2024] [Indexed: 07/12/2024] Open
Abstract
Whether from environmental and occupational hazards or from topical pharmaceuticals, the human skin comes into contact with various chemicals every day. In vivo experiments not only require large investments of both time and money, but in vivo experiments can also be unethical due to the need to intentionally or incidentally expose humans or animals to toxic chemicals. Comparatively, in vitro experiments offer ethical and financial advantages when combined with the opportunity to selectively choose chemicals for experimentation. With in vivo experimentation being so infeasible, many scientists have chosen to make their in vitro data available publicly. Using these data, a detailed database containing 73 chemicals was created with a robust set of descriptors to be used in connection with mathematical modeling to predict diffusion, permeability, and partition coefficients. This resulting database is tailored to be easily used in various coding languages.
Collapse
Affiliation(s)
- Jessica N Stevens
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA.
| | - Alyson K Prockter
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA
| | - Hunter A Fisher
- Oak Ridge Associated Universities (ORAU) assigned to United States Environmental Protection Agency (USEPA), Office of Research and Development (ORD), Research Triangle Park, NC, USA
| | - Hien Tran
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA
| | - Marina V Evans
- United States Environmental Protection Agency (USEPA), Center for Computational Toxicity and Exposure, Office of Research and Development (ORD), Research Triangle Park, NC, USA
| |
Collapse
|
3
|
Srinivasan K, Puliyanda A, Prasad V. Identification of Reaction Network Hypotheses for Complex Feedstocks from Spectroscopic Measurements with Minimal Human Intervention. J Phys Chem A 2024; 128:4714-4729. [PMID: 38836378 DOI: 10.1021/acs.jpca.4c01592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
In this work, we detail an automated reaction network hypothesis generation protocol for processes involving complex feedstocks where information about the species and reactions involved is unknown. Our methodology is process agnostic and can be utilized in any reactive process with spectroscopic measurements that provide information on the evolution of the components in the mixture. We decompose the mixture spectra to obtain spectroscopic signatures of the individual components and use a 1-D convolutional neural network to automatically identify functional groups indicated by them. We employ atom-atom mapping to automatically recover reaction rules that are applied on candidate molecules identified from chemistry databases through fingerprint similarity. The method is tested on synthetic data and on spectroscopic measurements of lab-scale batch hydrothermal liquefaction (HTL) of biomass to determine the accuracy of prediction across datasets of varying complexities. Our methodology is able to identify reaction network hypotheses containing reaction networks close to the ground truth in the case of synthetic data, and we are also able to recover candidate molecules and reaction networks close to the ones reported in the previous literature studies for biomass pyrolysis.
Collapse
Affiliation(s)
- Karthik Srinivasan
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Anjana Puliyanda
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| | - Vinay Prasad
- Department of Chemical and Materials Engineering, Donadeo Innovation Centre for Engineering, 9211, 116st NW, Edmonton T6G 1H9, AB, Canada
| |
Collapse
|
4
|
Zhang R, Nolte D, Sanchez-Villalobos C, Ghosh S, Pal R. Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling. Nat Commun 2024; 15:5072. [PMID: 38871711 DOI: 10.1038/s41467-024-49372-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Quantitative structure-activity relationship (QSAR) modeling is a powerful tool for drug discovery, yet the lack of interpretability of commonly used QSAR models hinders their application in molecular design. We propose a similarity-based regression framework, topological regression (TR), that offers a statistically grounded, computationally fast, and interpretable technique to predict drug responses. We compare the predictive performance of TR on 530 ChEMBL human target activity datasets against the predictive performance of deep-learning-based QSAR models. Our results suggest that our sparse TR model can achieve equal, if not better, performance than the deep learning-based QSAR models and provide better intuitive interpretation by extracting an approximate isometry between the chemical space of the drugs and their activity space.
Collapse
Affiliation(s)
- Ruibo Zhang
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Daniel Nolte
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Cesar Sanchez-Villalobos
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Souparno Ghosh
- Department of Statistics, University of Nebraska - Lincoln, Lincoln, NB, 68588, USA.
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA.
| |
Collapse
|
5
|
Das M, Ghosh A, Sunoj RB. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J Comput Chem 2024; 45:1160-1176. [PMID: 38299229 DOI: 10.1002/jcc.27315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/06/2024] [Accepted: 01/09/2024] [Indexed: 02/02/2024]
Abstract
Molecular properties and reactions form the foundation of chemical space. Over the years, innumerable molecules have been synthesized, a smaller fraction of them found immediate applications, while a larger proportion served as a testimony to creative and empirical nature of the domain of chemical science. With increasing emphasis on sustainable practices, it is desirable that a target set of molecules are synthesized preferably through a fewer empirical attempts instead of a larger library, to realize an active candidate. In this front, predictive endeavors using machine learning (ML) models built on available data acquire high timely significance. Prediction of molecular property and reaction outcome remain one of the burgeoning applications of ML in chemical science. Among several methods of encoding molecular samples for ML models, the ones that employ language like representations are gaining steady popularity. Such representations would additionally help adopt well-developed natural language processing (NLP) models for chemical applications. Given this advantageous background, herein we describe several successful chemical applications of NLP focusing on molecular property and reaction outcome predictions. From relatively simpler recurrent neural networks (RNNs) to complex models like transformers, different network architecture have been leveraged for tasks such as de novo drug design, catalyst generation, forward and retro-synthesis predictions. The chemical language model (CLM) provides promising avenues toward a broad range of applications in a time and cost-effective manner. While we showcase an optimistic outlook of CLMs, attention is also placed on the persisting challenges in reaction domain, which would optimistically be addressed by advanced algorithms tailored to chemical language and with increased availability of high-quality datasets.
Collapse
Affiliation(s)
- Manajit Das
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Ankit Ghosh
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| | - Raghavan B Sunoj
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
- Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
6
|
Saigiridharan L, Hassen AK, Lai H, Torren-Peraire P, Engkvist O, Genheden S. AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application. J Cheminform 2024; 16:57. [PMID: 38778382 PMCID: PMC11112899 DOI: 10.1186/s13321-024-00860-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 05/15/2024] [Indexed: 05/25/2024] Open
Abstract
We present an updated overview of the AiZynthFinder package for retrosynthesis planning. Since the first version was released in 2020, we have added a substantial number of new features based on user feedback. Feature enhancements include policies for filter reactions, support for any one-step retrosynthesis model, a scoring framework and several additional search algorithms. To exemplify the typical use-cases of the software and highlight some learnings, we perform a large-scale analysis on several hundred thousand target molecules from diverse sources. This analysis looks at for instance route shape, stock usage and exploitation of reaction space, and points out strengths and weaknesses of our retrosynthesis approach. The software is released as open-source for educational purposes as well as to provide a reference implementation of the core algorithms for synthesis prediction. We hope that releasing the software as open-source will further facilitate innovation in developing novel methods for synthetic route prediction. AiZynthFinder is a fast, robust and extensible open-source software and can be downloaded from https://github.com/MolecularAI/aizynthfinder .
Collapse
Affiliation(s)
| | - Alan Kai Hassen
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | - Helen Lai
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Paula Torren-Peraire
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Zentrum München, Neuherberg, Germany
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Samuel Genheden
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
| |
Collapse
|
7
|
King-Smith E. Transfer learning for a foundational chemistry model. Chem Sci 2024; 15:5143-5151. [PMID: 38577363 PMCID: PMC10988575 DOI: 10.1039/d3sc04928k] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 11/15/2023] [Indexed: 04/06/2024] Open
Abstract
Data-driven chemistry has garnered much interest concurrent with improvements in hardware and the development of new machine learning models. However, obtaining sufficiently large, accurate datasets of a desired chemical outcome for data-driven chemistry remains a challenge. The community has made significant efforts to democratize and curate available information for more facile machine learning applications, but the limiting factor is usually the laborious nature of generating large-scale data. Transfer learning has been noted in certain applications to alleviate some of the data burden, but this protocol is typically carried out on a case-by-case basis, with the transfer learning task expertly chosen to fit the finetuning. Herein, I develop a machine learning framework capable of accurate chemistry-relevant prediction amid general sources of low data. First, a chemical "foundational model" is trained using a dataset of ∼1 million experimental organic crystal structures. A task specific module is then stacked atop this foundational model and subjected to finetuning. This approach achieves state-of-the-art performance on a diverse set of tasks: toxicity prediction, yield prediction, and odor prediction.
Collapse
|
8
|
Ghiandoni GM, Evertsson E, Riley DJ, Tyrchan C, Rathi PC. Augmenting DMTA using predictive AI modelling at AstraZeneca. Drug Discov Today 2024; 29:103945. [PMID: 38460568 DOI: 10.1016/j.drudis.2024.103945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 03/11/2024]
Abstract
Design-Make-Test-Analyse (DMTA) is the discovery cycle through which molecules are designed, synthesised, and assayed to produce data that in turn are analysed to inform the next iteration. The process is repeated until viable drug candidates are identified, often requiring many cycles before reaching a sweet spot. The advent of artificial intelligence (AI) and cloud computing presents an opportunity to innovate drug discovery to reduce the number of cycles needed to yield a candidate. Here, we present the Predictive Insight Platform (PIP), a cloud-native modelling platform developed at AstraZeneca. The impact of PIP in each step of DMTA, as well as its architecture, integration, and usage, are discussed and used to provide insights into the future of drug discovery.
Collapse
Affiliation(s)
- Gian Marco Ghiandoni
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK.
| | - Emma Evertsson
- Research and Early Development, Respiratory and Immunology (R&I), Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden, Mölndal, SE 43183, Sweden
| | - David J Riley
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK
| | - Christian Tyrchan
- Research and Early Development, Respiratory and Immunology (R&I), Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden, Mölndal, SE 43183, Sweden
| | - Prakash Chandra Rathi
- Augmented DMTA Platform, R&D IT, AstraZeneca, The Discovery Centre (DISC), Francis Crick Avenue, Cambridge CB2 0AA, UK
| |
Collapse
|
9
|
Mansouri M, Fussenegger M. Small-Molecule Regulators for Gene Switches to Program Mammalian Cell Behaviour. Chembiochem 2024; 25:e202300717. [PMID: 38081780 DOI: 10.1002/cbic.202300717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 12/11/2023] [Indexed: 01/13/2024]
Abstract
Synthetic or natural small molecules have been extensively employed as trigger signals or inducers to regulate engineered gene circuits introduced into living cells in order to obtain desired outputs in a controlled and predictable manner. Here, we provide an overview of small molecules used to drive synthetic-biology-based gene circuits in mammalian cells, together with examples of applications at different levels of control, including regulation of DNA manipulation, RNA synthesis and editing, and protein synthesis, maturation, and trafficking. We also discuss the therapeutic potential of these small-molecule-responsive gene circuits, focusing on the advantages and disadvantages of using small molecules as triggers, the mechanisms involved, and the requirements for selecting suitable molecules, including efficiency, specificity, orthogonality, and safety. Finally, we explore potential future directions for translation of these devices to clinical medicine.
Collapse
Affiliation(s)
- Maysam Mansouri
- ETH Zurich, Department of Biosystems Science and Engineering, Klingelbergstrasse 48, CH-4056, Basel, Switzerland
| | - Martin Fussenegger
- ETH Zurich, Department of Biosystems Science and Engineering, Klingelbergstrasse 48, CH-4056, Basel, Switzerland
- University of Basel, Faculty of Science, Klingelbergstrasse 48, CH-4056, Basel, Switzerland
| |
Collapse
|
10
|
Chew AK, Sender M, Kaplan Z, Chandrasekaran A, Chief Elk J, Browning AR, Kwak HS, Halls MD, Afzal MAF. Advancing material property prediction: using physics-informed machine learning models for viscosity. J Cheminform 2024; 16:31. [PMID: 38486289 PMCID: PMC10938832 DOI: 10.1186/s13321-024-00820-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 02/27/2024] [Indexed: 03/18/2024] Open
Abstract
In materials science, accurately computing properties like viscosity, melting point, and glass transition temperatures solely through physics-based models is challenging. Data-driven machine learning (ML) also poses challenges in constructing ML models, especially in the material science domain where data is limited. To address this, we integrate physics-informed descriptors from molecular dynamics (MD) simulations to enhance the accuracy and interpretability of ML models. Our current study focuses on accurately predicting viscosity in liquid systems using MD descriptors. In this work, we curated a comprehensive dataset of over 4000 small organic molecules' viscosities from scientific literature, publications, and online databases. This dataset enabled us to develop quantitative structure-property relationships (QSPR) consisting of descriptor-based and graph neural network models to predict temperature-dependent viscosities for a wide range of viscosities. The QSPR models reveal that including MD descriptors improves the prediction of experimental viscosities, particularly at the small data set scale of fewer than a thousand data points. Furthermore, feature importance tools reveal that intermolecular interactions captured by MD descriptors are most important for viscosity predictions. Finally, the QSPR models can accurately capture the inverse relationship between viscosity and temperature for six battery-relevant solvents, some of which were not included in the original data set. Our research highlights the effectiveness of incorporating MD descriptors into QSPR models, which leads to improved accuracy for properties that are difficult to predict when using physics-based models alone or when limited data is available.
Collapse
|
11
|
Mora JR, Marquez EA, Pérez-Pérez N, Contreras-Torres E, Perez-Castillo Y, Agüero-Chapin G, Martinez-Rios F, Marrero-Ponce Y, Barigye SJ. Rethinking the applicability domain analysis in QSAR models. J Comput Aided Mol Des 2024; 38:9. [PMID: 38351144 DOI: 10.1007/s10822-024-00550-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 02/05/2024] [Indexed: 02/16/2024]
Abstract
Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between in silico predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in "rational" model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.
Collapse
Affiliation(s)
- Jose R Mora
- Departamento de Ingeniería Química, Universidad San Francisco de Quito (USFQ), Instituto de Simulación Computacional (ISC- USFQ), Diego de Robles y Vía Interoceánica, Quito, 170901, Ecuador
| | - Edgar A Marquez
- Grupo de Investigaciones en Química Y Biología, Departamento de Química Y Biología, Facultad de Ciencias Básicas, Universidad del Norte, Carrera 51B, Km 5, vía Puerto Colombia, Barranquilla, 081007, Colombia
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Cátedras Conacyt, Ensenada, Baja California, México
| | - Noel Pérez-Pérez
- Colegio de Ciencias e Ingenierías "El Politécnico", Universidad San Francisco de Quito (USFQ), Quito, Ecuador
| | - Ernesto Contreras-Torres
- Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, Quito, 1200-841, Ecuador
| | - Yunierkis Perez-Castillo
- Bio-Chemoinformatics Research Group, Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito, 170504, Ecuador
| | - Guillermin Agüero-Chapin
- CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n, Porto, 4450-208, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, Porto, 4169- 007, Portugal
| | - Felix Martinez-Rios
- Facultad de Ingeniería, Universidad Panamericana, CDMX, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México, 03920, México
| | - Yovani Marrero-Ponce
- Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, Quito, 1200-841, Ecuador
- Facultad de Ingeniería, Universidad Panamericana, CDMX, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México, 03920, México
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador
| | - Stephen J Barigye
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid (UAM), Madrid, 28049, Spain.
| |
Collapse
|
12
|
Blanchard AE, Bhowmik D, Fox Z, Gounley J, Glaser J, Akpa BS, Irle S. Adaptive language model training for molecular design. J Cheminform 2023; 15:59. [PMID: 37291633 DOI: 10.1186/s13321-023-00719-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 04/03/2023] [Indexed: 06/10/2023] Open
Abstract
The vast size of chemical space necessitates computational approaches to automate and accelerate the design of molecular sequences to guide experimental efforts for drug discovery. Genetic algorithms provide a useful framework to incrementally generate molecules by applying mutations to known chemical structures. Recently, masked language models have been applied to automate the mutation process by leveraging large compound libraries to learn commonly occurring chemical sequences (i.e., using tokenization) and predict rearrangements (i.e., using mask prediction). Here, we consider how language models can be adapted to improve molecule generation for different optimization tasks. We use two different generation strategies for comparison, fixed and adaptive. The fixed strategy uses a pre-trained model to generate mutations; the adaptive strategy trains the language model on each new generation of molecules selected for target properties during optimization. Our results show that the adaptive strategy allows the language model to more closely fit the distribution of molecules in the population. Therefore, for enhanced fitness optimization, we suggest the use of the fixed strategy during an initial phase followed by the use of the adaptive strategy. We demonstrate the impact of adaptive training by searching for molecules that optimize both heuristic metrics, drug-likeness and synthesizability, as well as predicted protein binding affinity from a surrogate model. Our results show that the adaptive strategy provides a significant improvement in fitness optimization compared to the fixed pre-trained model, empowering the application of language models to molecular design tasks.
Collapse
Affiliation(s)
- Andrew E Blanchard
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Debsindhu Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA.
| | - Zachary Fox
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - John Gounley
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Jens Glaser
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Belinda S Akpa
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
- Chemical & Biomolecular Engineering, University of Tennessee, Knoxville, TN, 37996, USA
| | - Stephan Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| |
Collapse
|
13
|
Durojaye OA, Okoro NO, Odiba AS, Nwanguma BC. MasitinibL shows promise as a drug-like analog of masitinib that elicits comparable SARS-Cov-2 3CLpro inhibition with low kinase preference. Sci Rep 2023; 13:6972. [PMID: 37117213 PMCID: PMC10141821 DOI: 10.1038/s41598-023-33024-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 04/06/2023] [Indexed: 04/30/2023] Open
Abstract
SARS-CoV-2 infection has led to several million deaths worldwide and ravaged the economies of many countries. Hence, developing therapeutics against SARS-CoV-2 remains a core priority in the fight against COVID-19. Most of the drugs that have received emergency use authorization for treating SARS-CoV-2 infection exhibit a number of limitations, including side effects and questionable efficacy. This challenge is further compounded by reinfection after vaccination and the high likelihood of mutations, as well as the emergence of viral escape mutants that render SARS-CoV-2 spike glycoprotein-targeting vaccines ineffective. Employing de novo drug synthesis or repurposing to discover broad-spectrum antivirals that target highly conserved pathways within the viral machinery is a focus of current research. In a recent drug repurposing study, masitinib, a clinically safe drug against the human coronavirus OC43 (HCoV-OC43), was identified as an antiviral agent with effective inhibitory activity against the SARS-CoV-2 3CLpro. Masitinib is currently under clinical trial in combination with isoquercetin in hospitalized patients (NCT04622865). Nevertheless, masitinib has kinase-related side effects; hence, the development of masitinib analogs with lower anti-tyrosine kinase activity becomes necessary. In this study, in an attempt to address this limitation, we executed a comprehensive virtual workflow in silico to discover drug-like compounds matching selected pharmacophore features in the SARS-CoV-2 3CLpro-bound state of masitinib. We identified a novel lead compound, "masitinibL", a drug-like analog of masitinib that demonstrated strong inhibitory properties against the SARS-CoV-2 3CLpro. In addition, masitinibL further displayed low selectivity for tyrosine kinases, which strongly suggests that masitinibL is a highly promising therapeutic that is preferable to masitinib.
Collapse
Affiliation(s)
- Olanrewaju Ayodeji Durojaye
- MOE Key Laboratory of Membraneless Organelle and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, 230027, Anhui, China
- School of Life Sciences, University of Science and Technology of China, Hefei, 230027, Anhui, China
- Department of Chemical Sciences, Coal City University, Emene, Enugu State, Nigeria
| | - Nkwachukwu Oziamara Okoro
- Department of Pharmaceutical and Medicinal Chemistry, Faculty of Pharmaceutical Sciences, University of Nigeria, Nsukka, 410001, Nigeria
| | - Arome Solomon Odiba
- Department of Molecular Genetics and Biotechnology, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.
- Department of Biochemistry, Faculty of Biological Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.
| | - Bennett Chima Nwanguma
- Department of Molecular Genetics and Biotechnology, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.
- Department of Biochemistry, Faculty of Biological Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.
| |
Collapse
|
14
|
Joshi PB. Navigating with chemometrics and machine learning in chemistry. Artif Intell Rev 2023; 56:1-26. [PMID: 36714038 PMCID: PMC9870782 DOI: 10.1007/s10462-023-10391-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/09/2023] [Indexed: 01/25/2023]
Abstract
Chemometrics and machine learning are artificial intelligence-based methods stirring a transformative change in chemistry. Organic synthesis, drug discovery and analytical techniques are incorporating machine learning techniques at an accelerated pace. However, machine-assisted chemistry faces challenges while solving critical problems in chemistry due to complex relationships in data sets. Even with increasing publishing volumes on machine learning, its application in areas of chemistry is not a straightforward endeavour. A particular concern in applying machine learning in chemistry is data availability and reproducibility. The present review article discusses the various chemometric methods, expert systems, and machine learning techniques developed for solving problems of organic synthesis and drug discovery with selected examples. Further, a concise discussion on chemometrics and ML deployed in analytical techniques such as, spectroscopy, microscopy and chromatography are presented. Finally, the review reflects the challenges, opportunities and future perspectives on machine learning and automation in chemistry. The review concludes by pondering on some tough questions on applying machine learning and their possibility of navigation in the different terrains of chemistry.
Collapse
Affiliation(s)
- Payal B. Joshi
- Operations and Method Development, Shefali Research Laboratories, Ambernath (East), Thane, Maharashtra 421501 India
| |
Collapse
|
15
|
Synthesis, cytotoxicity, Pan-HDAC inhibitory activity and docking study of new N-(2-aminophenyl)-2-methylquinoline-4-carboxamide and (E)-N-(2-aminophenyl)-2-styrylquinoline-4-carboxamide derivatives as anticancer agents. Med Chem Res 2023. [DOI: 10.1007/s00044-023-03018-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
16
|
Skoraczyński G, Kitlas M, Miasojedow B, Gambin A. Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning. J Cheminform 2023; 15:6. [PMID: 36641473 PMCID: PMC9840255 DOI: 10.1186/s13321-023-00678-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/04/2023] [Indexed: 01/15/2023] Open
Abstract
Modern computer-assisted synthesis planning tools provide strong support for this problem. However, they are still limited by computational complexity. This limitation may be overcome by scoring the synthetic accessibility as a pre-retrosynthesis heuristic. A wide range of machine learning scoring approaches is available, however, their applicability and correctness were studied to a limited extent. Moreover, there is a lack of critical assessment of synthetic accessibility scores with common test conditions.In the present work, we assess if synthetic accessibility scores can reliably predict the outcomes of retrosynthesis planning. Using a specially prepared compounds database, we examine the outcomes of the retrosynthetic tool AiZynthFinder. We test whether synthetic accessibility scores: SAscore, SYBA, SCScore, and RAscore accurately predict the results of retrosynthesis planning. Furthermore, we investigate if synthetic accessibility scores can speed up retrosynthesis planning by better prioritizing explored partial synthetic routes and thus reducing the size of the search space. For that purpose, we analyze the AiZynthFinder partial solutions search trees, their structure, and complexity parameters, such as the number of nodes, or treewidth.We confirm that synthetic accessibility scores in most cases well discriminate feasible molecules from infeasible ones and can be potential boosters of retrosynthesis planning tools. Moreover, we show the current challenges of designing computer-assisted synthesis planning tools. We conclude that hybrid machine learning and human intuition-based synthetic accessibility scores can efficiently boost the effectiveness of computer-assisted retrosynthesis planning, however, they need to be carefully crafted for retrosynthesis planning algorithms.The source code of this work is publicly available at https://github.com/grzsko/ASAP .
Collapse
Affiliation(s)
- Grzegorz Skoraczyński
- grid.12847.380000 0004 1937 1290Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, Poland
| | - Mateusz Kitlas
- grid.12847.380000 0004 1937 1290Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, Poland
| | - Błażej Miasojedow
- grid.12847.380000 0004 1937 1290Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, Poland
| | - Anna Gambin
- grid.12847.380000 0004 1937 1290Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Stefana Banacha 2, Warsaw, Poland
| |
Collapse
|
17
|
Béquignon OJM, Bongers BJ, Jespers W, IJzerman AP, van der Water B, van Westen GJP. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J Cheminform 2023; 15:3. [PMID: 36609528 PMCID: PMC9824924 DOI: 10.1186/s13321-022-00672-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/17/2022] [Indexed: 01/07/2023] Open
Abstract
With the ongoing rapid growth of publicly available ligand-protein bioactivity data, there is a trove of valuable data that can be used to train a plethora of machine-learning algorithms. However, not all data is equal in terms of size and quality and a significant portion of researchers' time is needed to adapt the data to their needs. On top of that, finding the right data for a research question can often be a challenge on its own. To meet these challenges, we have constructed the Papyrus dataset. Papyrus is comprised of around 60 million data points. This dataset contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with several smaller datasets containing high-quality data. The aggregated data has been standardised and normalised in a manner that is suitable for machine learning. We show how data can be filtered in a variety of ways and also perform some examples of quantitative structure-activity relationship analyses and proteochemometric modelling. Our ambition is that this pruned data collection constitutes a benchmark set that can be used for constructing predictive models, while also providing an accessible data source for research.
Collapse
Affiliation(s)
- O. J. M. Béquignon
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. J. Bongers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - W. Jespers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - A. P. IJzerman
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. van der Water
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - G. J. P. van Westen
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| |
Collapse
|
18
|
Hawash M, Jaradat N, Abualhasan M, Qaoud MT, Joudeh Y, Jaber Z, Sawalmeh M, Zarour A, Mousa A, Arar M. Molecular docking studies and biological evaluation of isoxazole-carboxamide derivatives as COX inhibitors and antimicrobial agents. 3 Biotech 2022; 12:342. [PMID: 36345437 PMCID: PMC9636359 DOI: 10.1007/s13205-022-03408-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 10/23/2022] [Indexed: 11/06/2022] Open
Abstract
Non-steroidal anti-inflammatory drugs (NSAIDs) are considered one of the most commonly used medications globally. Seventeen isoxazole-containing compounds with various functional groups were evaluated in this work to identify which one was the most potent and which group was most selective toward COX-1 and COX-2 by using an in vitro COX inhibition assay kit. Their cytotoxicity was evaluated on the normal hepatic cell line (LX-2) utilizing the MTS assay. Moreover, these molecules' antibacterial and antifungal activities were evaluated using a microdilution assay against several bacterial and fungal species. In addition, molecular docking studies were conducted to identify the possible binding interactions between these compounds and their biological targets by using the X-ray crystal structure of the human COX enzyme and different proteins of bacterial and fungal strains. At the same time, the QiKProp module was used for ADME-T analysis. The results showed that all evaluated isoxazole derivatives showed moderate to potent activities against COX enzymes. The most potent compound against COX-1 and COX-2 enzymes was A13, with IC50 values of 64 and 13 nM, respectively, and a significant selectivity ratio of 4.63. It was clear that the 3,4-dimethoxy substitution on the first phenyl ring and the Cl atom on the other phenyl pushed the 5-methyl-isoxazole ring toward the secondary binding pocket and created the ideal binding interactions with the COX-2 enzyme in comparison with the other compounds. Compound A8 showed antibacterial and antifungal activities against Pseudomonas aeruginosa, Klebsiella pneumonia, and Candida albicans with MIC values of 2 mg/ml. In fact, this compound showed possible binding interactions with the elastase in P. aeruginosa and KPC-2 carbapenemase in K. pneumonia. Furthermore, for better understanding, molecular dynamics simulations were undertaken to study the change in dynamicity of the protein backbone and ligand after the ligand binds to the protein and to ensure the stability of ligand-protein complexes. Supplementary Information The online version contains supplementary material available at 10.1007/s13205-022-03408-8.
Collapse
Affiliation(s)
- Mohammed Hawash
- Department of Pharmacy, Faculty of Medicine and Health Sciences, An-Najah National University, Nablus, Palestine
| | - Nidal Jaradat
- Department of Pharmacy, Faculty of Medicine and Health Sciences, An-Najah National University, Nablus, Palestine
| | - Murad Abualhasan
- Department of Pharmacy, Faculty of Medicine and Health Sciences, An-Najah National University, Nablus, Palestine
| | - Mohammed T. Qaoud
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Gazi University, 06330 Etiler, Ankara, Turkey
| | - Yara Joudeh
- Department of Pharmacy, Faculty of Medicine and Health Sciences, An-Najah National University, Nablus, Palestine
| | - Zeina Jaber
- Department of Pharmacy, Faculty of Medicine and Health Sciences, An-Najah National University, Nablus, Palestine
| | - Majd Sawalmeh
- Department of Pharmacy, Faculty of Medicine and Health Sciences, An-Najah National University, Nablus, Palestine
| | - Abdulraziq Zarour
- Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, An-Najah National University, 00970 Nablus, Palestine
| | - Ahmed Mousa
- Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, An-Najah National University, 00970 Nablus, Palestine
| | - Mohammed Arar
- Department of Pharmacy, Faculty of Medicine and Health Sciences, An-Najah National University, Nablus, Palestine
| |
Collapse
|
19
|
Elbadawi MM, Eldehna WM, Abd El-Hafeez AA, Somaa WR, Albohy A, Al-Rashood ST, Agama KK, Elkaeed EB, Ghosh P, Pommier Y, Abe M. 2-Arylquinolines as novel anticancer agents with dual EGFR/FAK kinase inhibitory activity: synthesis, biological evaluation, and molecular modelling insights. J Enzyme Inhib Med Chem 2022; 37:349-372. [PMID: 34923887 PMCID: PMC8725837 DOI: 10.1080/14756366.2021.2015344] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 11/29/2021] [Accepted: 12/01/2021] [Indexed: 01/15/2023] Open
Abstract
In this study, different assortments of 2-arylquinolines and 2,6-diarylquinolines have been developed. Recently, we have developed a new series of 6,7-dimethoxy-4-alkoxy-2-arylquinolines as Topoisomerase I (TOP1) inhibitors with potent anticancer activity. Utilising the SAR outputs from this study, we tried to enhance anticancer and TOP1 inhibitory activities. Though target quinolines demonstrated potent antiproliferative effect, specifically against colorectal cancer DLD-1 and HCT-116, they showed weak TOP1 inhibition which may be attributable to their non-coplanarity. Thereafter, screening against kinase panel revealed their dual inhibitory activity against EGFR and FAK. Quinolines 6f, 6h, 6i, and 20f were the most potent EGFR inhibitors (IC50s = 25.39, 20.15, 22.36, and 24.81 nM, respectively). Meanwhile, quinolines 6f, 6h, 6i, 16d, and 20f exerted the best FAK inhibition (IC50s = 22.68, 14.25, 18.36, 17.36, and 15.36 nM, respectively). Finally, molecular modelling was employed to justify the promising EGFR/FAK inhibition. The study outcomes afforded the first reported quinolines with potent EGFR/FAK dual inhibition.
Collapse
Affiliation(s)
- Mostafa M. Elbadawi
- Department of Chemistry, Graduate School of Science, Hiroshima University, Hiroshima, Japan
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Kafrelsheikh University, Kafrelsheikh, Egypt
| | - Wagdy M. Eldehna
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Kafrelsheikh University, Kafrelsheikh, Egypt
| | - Amer Ali Abd El-Hafeez
- Pharmacology and Experimental Oncology Unit, Cancer Biology Department, National Cancer Institute, Cairo University, Cairo, Egypt
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
| | - Warda R. Somaa
- Faculty of Pharmacy, Kafrelsheikh University, Kafrelsheikh, Egypt
| | - Amgad Albohy
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, The British University in Egypt (BUE), Cairo, Egypt
| | - Sara T. Al-Rashood
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Keli K. Agama
- Developmental Therapeutics Branch, Laboratory of Molecular Pharmacology, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Eslam B. Elkaeed
- Department of Pharmaceutical Sciences, College of Pharmacy, AlMaarefa University, Riyadh, Saudi Arabia
| | - Pradipta Ghosh
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Moores Comprehensive Cancer Center, University of California San Diego, La Jolla, CA, USA
- Veterans Affairs Medical Center, La Jolla, CA, USA
| | - Yves Pommier
- Developmental Therapeutics Branch, Laboratory of Molecular Pharmacology, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Manabu Abe
- Department of Chemistry, Graduate School of Science, Hiroshima University, Hiroshima, Japan
| |
Collapse
|
20
|
Metabolite annotation from knowns to unknowns through knowledge-guided multi-layer metabolic networking. Nat Commun 2022; 13:6656. [PMID: 36333358 PMCID: PMC9636193 DOI: 10.1038/s41467-022-34537-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 10/27/2022] [Indexed: 11/06/2022] Open
Abstract
Liquid chromatography - mass spectrometry (LC-MS) based untargeted metabolomics allows to measure both known and unknown metabolites in the metabolome. However, unknown metabolite annotation is a major challenge in untargeted metabolomics. Here, we develop an approach, namely, knowledge-guided multi-layer network (KGMN), to enable global metabolite annotation from knowns to unknowns in untargeted metabolomics. The KGMN approach integrates three-layer networks, including knowledge-based metabolic reaction network, knowledge-guided MS/MS similarity network, and global peak correlation network. To demonstrate the principle, we apply KGMN in an in vitro enzymatic reaction system and different biological samples, with ~100-300 putative unknowns annotated in each data set. Among them, >80% unknown metabolites are corroborated with in silico MS/MS tools. Finally, we validate 5 metabolites that are absent in common MS/MS libraries through repository mining and synthesis of chemical standards. Together, the KGMN approach enables efficient unknown annotations, and substantially advances the discovery of recurrent unknown metabolites for common biological samples from model organisms, towards deciphering dark matter in untargeted metabolomics.
Collapse
|
21
|
Zheng S, Tan Y, Wang Z, Li C, Zhang Z, Sang X, Chen H, Yang Y. Accelerated rational PROTAC design via deep learning and molecular simulations. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00527-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
22
|
Omidkhah N, Eisvand F, Hadizadeh F, Zarghi A, Ghodsi R. Synthesis, Cytotoxicity, Pan‐HDAC Inhibitory Activity and Docking Study of N‐(2‐Aminophenyl)‐2‐arylquinoline‐4‐ and N‐(2‐Aminophenyl)‐2‐arylbenzo[h]quinoline‐4‐carboxamides**. ChemistrySelect 2022. [DOI: 10.1002/slct.202201239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Negar Omidkhah
- Student Research Committee Mashhad University of Medical Science Mashhad Iran
- Biotechnology Research Center, Pharmaceutical Technology Institute Mashhad University of Medical Sciences Mashhad Iran
- Department of Medicinal Chemistry, School of Pharmacy Mashhad University of Medical Sciences Mashhad Iran
| | - Farhad Eisvand
- Department of Pharmacodynamics and Toxicology School of Pharmacy Mashhad University of Medical Sciences Mashhad Iran
| | - Farzin Hadizadeh
- Biotechnology Research Center, Pharmaceutical Technology Institute Mashhad University of Medical Sciences Mashhad Iran
- Department of Medicinal Chemistry, School of Pharmacy Mashhad University of Medical Sciences Mashhad Iran
| | - Afshin Zarghi
- Department of Pharmaceutical Chemistry School of Pharmacy Shahid Beheshti University of Medical Sciences Tehran Iran
| | - Razieh Ghodsi
- Biotechnology Research Center, Pharmaceutical Technology Institute Mashhad University of Medical Sciences Mashhad Iran
- Department of Medicinal Chemistry, School of Pharmacy Mashhad University of Medical Sciences Mashhad Iran
| |
Collapse
|
23
|
Tan RK, Liu Y, Xie L. Reinforcement learning for systems pharmacology-oriented and personalized drug design. Expert Opin Drug Discov 2022; 17:849-863. [PMID: 35510835 PMCID: PMC9824901 DOI: 10.1080/17460441.2022.2072288] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
INTRODUCTION Many multi-genic systemic diseases such as neurological disorders, inflammatory diseases, and the majority of cancers do not have effective treatments yet. Reinforcement learning powered systems pharmacology is a potentially effective approach to designing personalized therapies for untreatable complex diseases. AREAS COVERED In this survey, state-of-the-art reinforcement learning methods and their latest applications to drug design are reviewed. The challenges on harnessing reinforcement learning for systems pharmacology and personalized medicine are discussed. Potential solutions to overcome the challenges are proposed. EXPERT OPINION In spite of successful application of advanced reinforcement learning techniques to target-based drug discovery, new reinforcement learning strategies are needed to address systems pharmacology-oriented personalized de novo drug design.
Collapse
Affiliation(s)
- Ryan K. Tan
- Department of Computer Science, Hunter College, The City University of New York
| | - Yang Liu
- Department of Computer Science, Hunter College, The City University of New York
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York,Ph.D. Program in Computer Science, Biology & Biochemistry, The Graduate Center, The City University of New York,Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University,Correspondence should be addressed to Lei Xie -
| |
Collapse
|
24
|
Park S, Han H, Kim H, Choi S. Machine Learning Applications for Chemical Reactions. Chem Asian J 2022; 17:e202200203. [PMID: 35471772 PMCID: PMC9401034 DOI: 10.1002/asia.202200203] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/26/2022] [Indexed: 11/30/2022]
Abstract
Machine learning (ML) approaches have enabled rapid and efficient molecular property predictions as well as the design of new novel materials. In addition to great success for molecular problems, ML techniques are applied to various chemical reaction problems that require huge costs to solve with the existing experimental and simulation methods. In this review, starting with basic representations of chemical reactions, we summarized recent achievements of ML studies on two different problems; predicting reaction properties and synthetic routes. The various ML models are used to predict physical properties related to chemical reaction properties (e. g. thermodynamic changes, activation barriers, and reaction rates). Furthermore, the predictions of reactivity, self-optimization of reaction, and designing retrosynthetic reaction paths are also tackled by ML approaches. Herein we illustrate various ML strategies utilized in the various context of chemical reaction studies.
Collapse
Affiliation(s)
- Sanggil Park
- Department of ChemistryIncheon Natoinal University and Research Institute of Basic SciencesIncheon22012Republic of Korea
| | - Herim Han
- Digital Bio R&D CenterMediazenSeoul07789Republic of Korea
- Department of Polymer Science and EngineeringDankook UniversityYongin, Gyeonggi16890Republic of Korea
| | - Hyungjun Kim
- Department of ChemistryIncheon Natoinal University and Research Institute of Basic SciencesIncheon22012Republic of Korea
| | - Sunghwan Choi
- Division of National SupercomputingKorea Institute of Science and Technology InformationDaejeon34141Republic of Korea
| |
Collapse
|
25
|
Urbina F, Lowden CT, Culberson JC, Ekins S. MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction. ACS OMEGA 2022; 7:18699-18713. [PMID: 35694522 PMCID: PMC9178760 DOI: 10.1021/acsomega.2c01404] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 05/11/2022] [Indexed: 05/04/2023]
Abstract
Generative machine learning models have become widely adopted in drug discovery and other fields to produce new molecules and explore molecular space, with the goal of discovering novel compounds with optimized properties. These generative models are frequently combined with transfer learning or scoring of the physicochemical properties to steer generative design, yet often, they are not capable of addressing a wide variety of potential problems, as well as converge into similar molecular space when combined with a scoring function for the desired properties. In addition, these generated compounds may not be synthetically feasible, reducing their capabilities and limiting their usefulness in real-world scenarios. Here, we introduce a suite of automated tools called MegaSyn representing three components: a new hill-climb algorithm, which makes use of SMILES-based recurrent neural network (RNN) generative models, analog generation software, and retrosynthetic analysis coupled with fragment analysis to score molecules for their synthetic feasibility. We show that by deconstructing the targeted molecules and focusing on substructures, combined with an ensemble of generative models, MegaSyn generally performs well for the specific tasks of generating new scaffolds as well as targeted analogs, which are likely synthesizable and druglike. We now describe the development, benchmarking, and testing of this suite of tools and propose how they might be used to optimize molecules or prioritize promising lead compounds using these RNN examples provided by multiple test case examples.
Collapse
Affiliation(s)
- Fabio Urbina
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
| | - Christopher T. Lowden
- Workflow
Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States
| | - J. Christopher Culberson
- Workflow
Informatics Corporation, 9316 Bramden Court, Wake Forest, North Carolina 27587, United States
| | - Sean Ekins
- Collaborations
Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States
- . Phone: 215-687-1320
| |
Collapse
|
26
|
Flam-Shepherd D, Zhu K, Aspuru-Guzik A. Language models can learn complex molecular distributions. Nat Commun 2022; 13:3293. [PMID: 35672310 PMCID: PMC9174447 DOI: 10.1038/s41467-022-30839-x] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 05/16/2022] [Indexed: 11/09/2022] Open
Abstract
Deep generative models of molecules have grown immensely in popularity, trained on relevant datasets, these models are used to search through chemical space. The downstream utility of generative models for the inverse design of novel functional compounds, depends on their ability to learn a training distribution of molecules. The most simple example is a language model that takes the form of a recurrent neural network and generates molecules using a string representation. Since their initial use, subsequent work has shown that language models are very capable, in particular, recent research has demonstrated their utility in the low data regime. In this work, we investigate the capacity of simple language models to learn more complex distributions of molecules. For this purpose, we introduce several challenging generative modeling tasks by compiling larger, more complex distributions of molecules and we evaluate the ability of language models on each task. The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions. Language models can accurately generate: distributions of the highest scoring penalized LogP molecules in ZINC15, multi-modal molecular distributions as well as the largest molecules in PubChem. The results highlight the limitations of some of the most popular and recent graph generative models- many of which cannot scale to these molecular distributions.
Collapse
Affiliation(s)
- Daniel Flam-Shepherd
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, M5S 1M1, Canada.
| | - Kevin Zhu
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada
| | - Alán Aspuru-Guzik
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, M5S 1M1, Canada.
- Department of Chemistry, University of Toronto, Toronto, ON, M5G 1Z8, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, M5G 1Z8, Canada.
| |
Collapse
|
27
|
Alam A, Agrawal GP, Khan S, Khalilullah H, Saifullah MK, Arshad MF. Towards the discovery of potential RdRp inhibitors for the treatment of COVID-19: structure guided virtual screening, computational ADME and molecular dynamics study. Struct Chem 2022; 33:1569-1583. [PMID: 35669792 PMCID: PMC9161180 DOI: 10.1007/s11224-022-01976-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 05/25/2022] [Indexed: 01/18/2023]
Abstract
Coronavirus disease 2019 (COVID-19) has become a major challenge affecting almost every corner of the world, with more than five million deaths worldwide. Despite several efforts, no drug or vaccine has shown the potential to check the ever-mutating SARS-COV-2. The emergence of novel variants is a major concern increasing the need for the discovery of novel therapeutics for the management of this pandemic. Out of several potential drug targets such as S protein, human ACE2, TMPRSS2 (transmembrane protease serine 2), 3CLpro, RdRp, and PLpro (papain-like protease), RNA-dependent RNA polymerase (RdRP) is a vital enzyme for viral RNA replication in the mammalian host cell and is one of the legitimate targets for the development of therapeutics against this disease. In this study, we have performed structure-based virtual screening to identify potential hit compounds against RdRp using molecular docking of a commercially available small molecule library of structurally diverse and drug-like molecules. Since non-optimal ADME properties create hurdles in the clinical development of drugs, we performed detailed in silico ADMET prediction to facilitate the selection of compounds for further studies. The results from the ADMET study indicated that most of the hit compounds had optimal properties. Moreover, to explore the conformational dynamics of protein-ligand interaction, we have performed an atomistic molecular dynamics simulation which indicated a stable interaction throughout the simulation period. We believe that the current findings may assist in the discovery of drug candidates against SARS-CoV-2.
Collapse
Affiliation(s)
- Aftab Alam
- Department of Pharmacognosy, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Al Kharj, 11942 Kingdom of Saudi Arabia
| | | | - Shamshir Khan
- College of Dentistry and Pharmacy, Buraydah Private Colleges, Al-Qassim, Kingdom of Saudi Arabia
| | - Habibullah Khalilullah
- Department of Pharmaceutical Chemistry and Pharmacognosy Unaizah College of Pharmacy, Qassim University, Buraydah, Kingdom of Saudi Arabia
| | - Muhammed Khalid Saifullah
- Department of Pharmaceutical Chemistry, College of Pharmacy, Umm-Al Qura University Makkah, Mecca, Kingdom of Saudi Arabia
| | - Mohammed Faiz Arshad
- Department of Research and Scientific Communications, Isthmus Research and Publishing House, U-13, Near Badi Masjid, Pulpehlad Pur, New Delhi, 110044 India
| |
Collapse
|
28
|
Venkatasubramanian V, Mann V. Artificial intelligence in reaction prediction and chemical synthesis. Curr Opin Chem Eng 2022. [DOI: 10.1016/j.coche.2021.100749] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
29
|
Morger A, Garcia de Lomana M, Norinder U, Svensson F, Kirchmair J, Mathea M, Volkamer A. Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data. Sci Rep 2022; 12:7244. [PMID: 35508546 PMCID: PMC9068909 DOI: 10.1038/s41598-022-09309-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 03/17/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.
Collapse
Affiliation(s)
- Andrea Morger
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany
| | - Marina Garcia de Lomana
- BASF SE, 67056, Ludwigshafen, Germany
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | - Ulf Norinder
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, 751 24, Sweden
- Dept Computer and Systems Sciences, Stockholm University, Kista, 164 07, Sweden
- MTM Research Centre, School of Science and Technology, 701 82, Örebro, Sweden
| | - Fredrik Svensson
- Alzheimer's Research UK UCL Drug Discovery Institute, London, WC1E 6BT, UK
| | - Johannes Kirchmair
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | | | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany.
| |
Collapse
|
30
|
Virtual screening, optimization and molecular dynamics analyses highlighting a pyrrolo[1,2-a]quinazoline derivative as a potential inhibitor of DNA gyrase B of Mycobacterium tuberculosis. Sci Rep 2022; 12:4742. [PMID: 35304513 PMCID: PMC8933452 DOI: 10.1038/s41598-022-08359-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 02/28/2022] [Indexed: 11/09/2022] Open
Abstract
Tuberculosis is a disease that remains a significant threat to public health worldwide, and this is mainly due to the selection of strains increasingly resistant to Mycobacterium tuberculosis, its causative agent. One of the validated targets for the development of new antibiotics is DNA gyrase. This enzyme is a type II topoisomerase responsible for regulating DNA topology and, as it is essential in bacteria. Thus, to contribute to the search for new molecules with potential to act as competitive inhibitors at the active site of M. tuberculosis DNA gyrase B, the present work explored a dataset of 20,098 natural products that were filtered using the FAF-Drugs4 server to obtain a total of 5462 structures that were subsequently used in virtual screenings. The consensus score analysis between LeDock and Auto-Dock Vina software showed that ZINC000040309506 (pyrrolo[1,2-a]quinazoline derivative) exhibit the best binding energy with the enzyme. In addition, its subsequent optimization generated the derivative described as PQPNN, which show better binding energy in docking analysis, more stability in molecular dynamics simulations and improved pharmacokinetic and toxicological profiles, compared to the parent compound. Taken together, the pyrrolo[1,2-a]quinazoline derivative described for the first time in the present work shows promising potential to inhibit DNA gyrase B of M. tuberculosis.
Collapse
|
31
|
Kim JH, Kim H, Kim WY. Effect of molecular representation on deep learning performance for prediction of molecular electronic properties. B KOREAN CHEM SOC 2022. [DOI: 10.1002/bkcs.12516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Jun Hyeong Kim
- Department of Chemistry Korea Advanced Institute of Science and Technology Daejeon South Korea
| | - Hyeonsu Kim
- Department of Chemistry Korea Advanced Institute of Science and Technology Daejeon South Korea
| | - Woo Youn Kim
- Department of Chemistry Korea Advanced Institute of Science and Technology Daejeon South Korea
- KI for Artificial Intelligence Korea Advanced Institute of Science and Technology Daejeon South Korea
| |
Collapse
|
32
|
Abstract
Artificial intelligence (AI) offers new possibilities for hit and lead finding in medicinal chemistry. Several instances of AI have been used for prospective de novo drug design. Among these, chemical language models have been shown to perform well in various experimental scenarios. In this study, we provide a hands-on introduction to chemical language modeling. A technique based on recurrent neural networks is discussed in detail, together with a step-by-step guide to applying this AI method for focused compound library design. The program code is freely available at URL: github.com/ETHmodlab/de_novo_design_RNN .
Collapse
Affiliation(s)
- Francesca Grisoni
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
- Eindhoven University of Technology, Department of Biomedical Engineering, Eindhoven, Netherlands.
| | - Gisbert Schneider
- ETH Zurich, Department of Chemistry and Applied Biosciences, RETHINK, Zurich, Switzerland.
| |
Collapse
|
33
|
Ranjan A, Shukla S, Datta D, Misra R. Generating novel molecule for target protein (SARS-CoV-2) using drug-target interaction based on graph neural network. NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS 2021; 11:6. [PMID: 34956815 PMCID: PMC8683294 DOI: 10.1007/s13721-021-00351-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 10/26/2021] [Accepted: 12/03/2021] [Indexed: 12/23/2022]
Abstract
The transmittable spread of viral coronavirus (SARS-CoV-2) has resulted in a significant rise in global mortality. Due to lack of effective treatment, our aim is to generate a highly potent active molecule that can bind with the protein structure of SARS-CoV-2. Different machine learning and deep learning approaches have been proposed for molecule generation; however, most of these approaches represent the drug molecule and protein structure in 1D sequence, ignoring the fact that molecules are by nature in 3D structure, and because of this many critical properties are lost. In this work, a framework is proposed that takes account of both tertiary and sequential representations of molecules and proteins using Gated Graph Neural Network (GGNN), Knowledge graph, and Early Fusion approach. The generated molecules from GGNN are screened using Knowledge Graph to reduce the search space by discarding the non-binding molecules before being fed into the Early Fusion model. Further, the binding affinity score of the generated molecule is predicted using the early fusion approach. Experimental result shows that our framework generates valid and unique molecules with high accuracy while preserving the chemical properties. The use of a knowledge graph claims that the entire generated dataset of molecules was reduced by roughly 96% while retaining more than 85% of good binding desirable molecules and the rejection of more than 99% of fruitless molecules. Additionally, the framework was tested with two of the SARS-CoV-2 viral proteins: RNA-dependent-RNA polymerase (RdRp) and 3C-like protease (3CLpro).
Collapse
Affiliation(s)
- Amit Ranjan
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, 801103 India
| | - Shivansh Shukla
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, 801103 India
| | - Deepanjan Datta
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, 801103 India
| | - Rajiv Misra
- Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, 801103 India
| |
Collapse
|
34
|
Omar ÖH, Del Cueto M, Nematiaram T, Troisi A. High-throughput virtual screening for organic electronics: a comparative study of alternative strategies. JOURNAL OF MATERIALS CHEMISTRY. C 2021; 9:13557-13583. [PMID: 34745630 PMCID: PMC8515942 DOI: 10.1039/d1tc03256a] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 09/13/2021] [Indexed: 06/01/2023]
Abstract
We present a review of the field of high-throughput virtual screening for organic electronics materials focusing on the sequence of methodological choices that determine each virtual screening protocol. These choices are present in all high-throughput virtual screenings and addressing them systematically will lead to optimised workflows and improve their applicability. We consider the range of properties that can be computed and illustrate how their accuracy can be determined depending on the quality and size of the experimental datasets. The approaches to generate candidates for virtual screening are also extremely varied and their relative strengths and weaknesses are discussed. The analysis of high-throughput virtual screening is almost never limited to the identification of top candidates and often new patterns and structure-property relations are the most interesting findings of such searches. The review reveals a very dynamic field constantly adapting to match an evolving landscape of applications, methodologies and datasets.
Collapse
Affiliation(s)
- Ömer H Omar
- Department of Chemistry, University of Liverpool Liverpool L69 3BX UK
| | - Marcos Del Cueto
- Department of Chemistry, University of Liverpool Liverpool L69 3BX UK
| | | | - Alessandro Troisi
- Department of Chemistry, University of Liverpool Liverpool L69 3BX UK
| |
Collapse
|
35
|
Kpanou R, Osseni MA, Tossou P, Laviolette F, Corbeil J. On the robustness of generalization of drug-drug interaction models. BMC Bioinformatics 2021; 22:477. [PMID: 34607569 PMCID: PMC8489092 DOI: 10.1186/s12859-021-04398-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 09/10/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Deep learning methods are a proven commodity in many fields and endeavors. One of these endeavors is predicting the presence of adverse drug-drug interactions (DDIs). The models generated can predict, with reasonable accuracy, the phenotypes arising from the drug interactions using their molecular structures. Nevertheless, this task requires improvement to be truly useful. Given the complexity of the predictive task, an extensive benchmarking on structure-based models for DDIs prediction was performed to evaluate their drawbacks and advantages. RESULTS We rigorously tested various structure-based models that predict drug interactions using different splitting strategies to simulate different real-world scenarios. In addition to the effects of different training and testing setups on the robustness and generalizability of the models, we then explore the contribution of traditional approaches such as multitask learning and data augmentation. CONCLUSION Structure-based models tend to generalize poorly to unseen drugs despite their ability to identify new DDIs among drugs seen during training accurately. Indeed, they efficiently propagate information between known drugs and could be valuable for discovering new DDIs in a database. However, these models will most probably fail when exposed to unknown drugs. While multitask learning does not help in our case to solve the problem, the use of data augmentation does at least mitigate it. Therefore, researchers must be cautious of the bias of the random evaluation scheme, especially if their goal is to discover new DDIs.
Collapse
Affiliation(s)
- Rogia Kpanou
- Computer Science and Software Engineering, Université Laval, 1065, av. de la Médecine, Quebec, CA Canada
- InVivo AI, Mila - 180 Corporate Lab L, 6650, 01 Rue Saint-Urbain, Montreal, CA H2S 3G9 Canada
| | - Mazid Abiodoun Osseni
- Computer Science and Software Engineering, Université Laval, 1065, av. de la Médecine, Quebec, CA Canada
| | - Prudencio Tossou
- Computer Science and Software Engineering, Université Laval, 1065, av. de la Médecine, Quebec, CA Canada
- InVivo AI, Mila - 180 Corporate Lab L, 6650, 01 Rue Saint-Urbain, Montreal, CA H2S 3G9 Canada
| | - Francois Laviolette
- Computer Science and Software Engineering, Université Laval, 1065, av. de la Médecine, Quebec, CA Canada
| | - Jacques Corbeil
- Department of Molecular Medicine, Université Laval, 1065, av. de la Médecine, Quebec, CA Canada
| |
Collapse
|
36
|
Dong J, Zhao M, Liu Y, Su Y, Zeng X. Deep learning in retrosynthesis planning: datasets, models and tools. Brief Bioinform 2021; 23:6375056. [PMID: 34571535 DOI: 10.1093/bib/bbab391] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/16/2021] [Accepted: 08/30/2021] [Indexed: 12/29/2022] Open
Abstract
In recent years, synthesizing drugs powered by artificial intelligence has brought great convenience to society. Since retrosynthetic analysis occupies an essential position in synthetic chemistry, it has received broad attention from researchers. In this review, we comprehensively summarize the development process of retrosynthesis in the context of deep learning. This review covers all aspects of retrosynthesis, including datasets, models and tools. Specifically, we report representative models from academia, in addition to a detailed description of the available and stable platforms in the industry. We also discuss the disadvantages of the existing models and provide potential future trends, so that more abecedarians will quickly understand and participate in the family of retrosynthesis planning.
Collapse
Affiliation(s)
- Jingxin Dong
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Mingyi Zhao
- Department of Pediatrics, Third Xiangya Hospital, Central South University, 400013, Hunan, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 230601, Hefei, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| |
Collapse
|
37
|
Melo MCR, Maasch JRMA, de la Fuente-Nunez C. Accelerating antibiotic discovery through artificial intelligence. Commun Biol 2021; 4:1050. [PMID: 34504303 PMCID: PMC8429579 DOI: 10.1038/s42003-021-02586-0] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023] Open
Abstract
By targeting invasive organisms, antibiotics insert themselves into the ancient struggle of the host-pathogen evolutionary arms race. As pathogens evolve tactics for evading antibiotics, therapies decline in efficacy and must be replaced, distinguishing antibiotics from most other forms of drug development. Together with a slow and expensive antibiotic development pipeline, the proliferation of drug-resistant pathogens drives urgent interest in computational methods that promise to expedite candidate discovery. Strides in artificial intelligence (AI) have encouraged its application to multiple dimensions of computer-aided drug design, with increasing application to antibiotic discovery. This review describes AI-facilitated advances in the discovery of both small molecule antibiotics and antimicrobial peptides. Beyond the essential prediction of antimicrobial activity, emphasis is also given to antimicrobial compound representation, determination of drug-likeness traits, antimicrobial resistance, and de novo molecular design. Given the urgency of the antimicrobial resistance crisis, we analyze uptake of open science best practices in AI-driven antibiotic discovery and argue for openness and reproducibility as a means of accelerating preclinical research. Finally, trends in the literature and areas for future inquiry are discussed, as artificially intelligent enhancements to drug discovery at large offer many opportunities for future applications in antibiotic development.
Collapse
Affiliation(s)
- Marcelo C R Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacqueline R M A Maasch
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
38
|
Gimadiev TR, Lin A, Afonina VA, Batyrshin D, Nugmanov RI, Akhmetshin T, Sidorov P, Duybankova N, Verhoeven J, Wegner J, Ceulemans H, Gedich A, Madzhidov TI, Varnek A. Reaction Data Curation I: Chemical Structures and Transformations Standardization. Mol Inform 2021; 40:e2100119. [PMID: 34427989 DOI: 10.1002/minf.202100119] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 08/13/2021] [Indexed: 12/11/2022]
Abstract
The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning).
Collapse
Affiliation(s)
- Timur R Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| | - Valentina A Afonina
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Dinar Batyrshin
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Ramil I Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Tagir Akhmetshin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France.,Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan
| | | | - Jonas Verhoeven
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Joerg Wegner
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Hugo Ceulemans
- Janssen Pharmaceutica, 30, Turnhoutseweg str., 2340, Beerse, Belgium
| | - Andrey Gedich
- Arcadia Inc., Bol'shoy Sampsoniyevskiy Prospekt, 28 κopпyc 2, 194044, St Petersburg, Russia
| | - Timur I Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Alexandre Varnek
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, 001-0021, Sapporo, Japan.,Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 4, Blaise Pascal str., 67081, Strasbourg, France
| |
Collapse
|
39
|
Lei B, Zang Y, Xue Z, Ge Y, Li W, Zhai Q, Jiao L. [Ensemble hologram quantitative structure activity relationship model of the chromatographic retention index of aldehydes and ketones]. Se Pu 2021; 39:331-337. [PMID: 34227314 PMCID: PMC9403813 DOI: 10.3724/sp.j.1123.2020.06011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
色谱保留指数(retention index, RI)是色谱分析中的重要参数,不同化合物在不同极性固定相上具有不同的保留行为。醛酮化合物种类众多,实验测定其RI值的时间和经济成本高。该论文采用集成建模(ensemble modeling)结合全息定量构效关系(HQSAR)方法研究了醛酮化合物在2种固定相(DB-210和HP-Innowax)上色谱保留指数的定量构效关系(QSAR)模型。用外部测试集验证法和留一交叉验证法评估了所建立模型的预测能力。首先建立了34种被研究化合物的个体HQSAR模型。在固定相DB-210上,片段特性(FD)为“供体/受体原子(DA)”且片段尺寸(FS)为1~9时可得到最优个体模型,在固定相HP-Innowax上,FD为“DA”且FS为4~7时可得到最优个体模型,这两个模型的交叉验证相关系数(
qcv2)分别为0.935和0.909,外部验证相关系数(
qext2)分别为0.925和0.927,一致性相关系数(CCC)分别为0.953和0.960,预测平方相关系数F2(
QF22)分别为0.922和0.918,预测平方相关系数F3(
QF32)分别为0.931和0.927。研究结果表明醛酮化合物的分子结构与RI值之间存在定量关系,用HQSAR方法可以建立二者之间的QSAR模型。其次,以4个预测准确度最高的个体HQSAR模型作为子模型通过算术平均建立了集成HQSAR模型。建立的集成HQSAR模型预测被研究化合物在DB-210和HP-Innowax固定相上RI值的
qcv2分别为0.927和0.919,
qext2分别为0.929和0.963, CCC分别为0.956和0.979,
QF22分别为0.927和0.958,
QF32分别为0.935和0.963。与个体HQSAR模型相比,建立的集成HQSAR模型预测准确度更高。这说明集成建模是提高HQSAR模型预测能力的有效方法,HQSAR与集成建模方法相结合可以用于研究和预测醛酮化合物的RI值。
Collapse
Affiliation(s)
- Bin Lei
- College of Chemistry and Chemical Engineering, Xi'an Shiyou University, Xi'an 710065, China
| | - Yunlei Zang
- College of Chemistry and Chemical Engineering, Xi'an Shiyou University, Xi'an 710065, China
| | - Zhiwei Xue
- No. 203 Research Institute of Nuclear Industry, Xianyang 712000, China
| | - Yiqing Ge
- Qing'an Group Co., Ltd., Xi'an 710077, China
| | - Wei Li
- Qing'an Group Co., Ltd., Xi'an 710077, China
| | - Qian Zhai
- College of Chemistry and Chemical Engineering, Xi'an Shiyou University, Xi'an 710065, China
| | - Long Jiao
- College of Chemistry and Chemical Engineering, Xi'an Shiyou University, Xi'an 710065, China
| |
Collapse
|
40
|
Zaidman D, Gehrtz P, Filep M, Fearon D, Gabizon R, Douangamath A, Prilusky J, Duberstein S, Cohen G, Owen CD, Resnick E, Strain-Damerell C, Lukacik P, Barr H, Walsh MA, von Delft F, London N. An automatic pipeline for the design of irreversible derivatives identifies a potent SARS-CoV-2 M pro inhibitor. Cell Chem Biol 2021; 28:1795-1806.e5. [PMID: 34174194 PMCID: PMC8228784 DOI: 10.1016/j.chembiol.2021.05.018] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 03/24/2021] [Accepted: 05/27/2021] [Indexed: 01/20/2023]
Abstract
Designing covalent inhibitors is increasingly important, although it remains challenging. Here, we present covalentizer, a computational pipeline for identifying irreversible inhibitors based on structures of targets with non-covalent binders. Through covalent docking of tailored focused libraries, we identify candidates that can bind covalently to a nearby cysteine while preserving the interactions of the original molecule. We found ∼11,000 cysteines proximal to a ligand across 8,386 complexes in the PDB. Of these, the protocol identified 1,553 structures with covalent predictions. In a prospective evaluation, five out of nine predicted covalent kinase inhibitors showed half-maximal inhibitory concentration (IC50) values between 155 nM and 4.5 μM. Application against an existing SARS-CoV Mpro reversible inhibitor led to an acrylamide inhibitor series with low micromolar IC50 values against SARS-CoV-2 Mpro. The docking was validated by 12 co-crystal structures. Together these examples hint at the vast number of covalent inhibitors accessible through our protocol.
Collapse
Affiliation(s)
- Daniel Zaidman
- Department of Chemical and Structural Biology, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - Paul Gehrtz
- Department of Chemical and Structural Biology, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - Mihajlo Filep
- Department of Chemical and Structural Biology, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - Daren Fearon
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, Didcot OX11 0QX, UK
| | - Ronen Gabizon
- Department of Chemical and Structural Biology, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - Alice Douangamath
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, Didcot OX11 0QX, UK
| | - Jaime Prilusky
- Life Sciences Core Facilities, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - Shirly Duberstein
- Wohl Institute for Drug Discovery of the Nancy and Stephen Grand Israel National Center for Personalized Medicine, The Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - Galit Cohen
- Wohl Institute for Drug Discovery of the Nancy and Stephen Grand Israel National Center for Personalized Medicine, The Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - C David Owen
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, Didcot OX11 0QX, UK; Research Complex at Harwell, Harwell Science and Innovation Campus, Didcot OX11 0FA, UK
| | - Efrat Resnick
- Department of Chemical and Structural Biology, Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - Claire Strain-Damerell
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, Didcot OX11 0QX, UK; Research Complex at Harwell, Harwell Science and Innovation Campus, Didcot OX11 0FA, UK
| | - Petra Lukacik
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, Didcot OX11 0QX, UK; Research Complex at Harwell, Harwell Science and Innovation Campus, Didcot OX11 0FA, UK
| | | | - Haim Barr
- Wohl Institute for Drug Discovery of the Nancy and Stephen Grand Israel National Center for Personalized Medicine, The Weizmann Institute of Science, 7610001 Rehovot, Israel
| | - Martin A Walsh
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, Didcot OX11 0QX, UK; Research Complex at Harwell, Harwell Science and Innovation Campus, Didcot OX11 0FA, UK
| | - Frank von Delft
- Diamond Light Source Ltd., Harwell Science and Innovation Campus, Didcot OX11 0QX, UK; Research Complex at Harwell, Harwell Science and Innovation Campus, Didcot OX11 0FA, UK; Structural Genomics Consortium, University of Oxford, Old Road Campus, Roosevelt Drive, Headington OX3 7DQ, UK; Department of Biochemistry, University of Johannesburg, Auckland Park 2006, South Africa
| | - Nir London
- Department of Chemical and Structural Biology, Weizmann Institute of Science, 7610001 Rehovot, Israel.
| |
Collapse
|
41
|
Daley SK, Cordell GA. Natural Products, the Fourth Industrial Revolution, and the Quintuple Helix. Nat Prod Commun 2021. [DOI: 10.1177/1934578x211003029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The profound interconnectedness of the sciences and technologies embodied in the Fourth Industrial Revolution is discussed in terms of the global role of natural products, and how that interplays with the development of sustainable and climate-conscious practices of cyberecoethnopharmacolomics within the Quintuple Helix for the promotion of a healthier planet and society.
Collapse
Affiliation(s)
| | - Geoffrey A. Cordell
- Natural Products Inc., Evanston, IL, USA
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, FL, USA
| |
Collapse
|
42
|
Biziukova NY, Tarasova OA, Rudik AV, Filimonov DA, Poroikov VV. Automatic Recognition of Chemical Entity Mentions in Texts of Scientific Publications. AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS 2021. [DOI: 10.3103/s0005105520060023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
43
|
Williams CM, Dallaston MA. The Future of Retrosynthesis and Synthetic Planning: Algorithmic, Humanistic or the Interplay? Aust J Chem 2021. [DOI: 10.1071/ch20371] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The practice of deploying and teaching retrosynthesis is on the cusp of considerable change, which in turn forces practitioners and educators to contemplate whether this impending change will advance or erode the efficiency and elegance of organic synthesis in the future. A short treatise is presented herein that covers the concept of retrosynthesis, along with exemplified methods and theories, and an attempt to comprehend the impact of artificial intelligence in an era when freely and commercially available retrosynthetic and forward synthesis planning programs are increasingly prevalent. Will the computer ever compete with human retrosynthetic design and the art of organic synthesis?
Collapse
|
44
|
Genheden S, Thakkar A, Chadimová V, Reymond JL, Engkvist O, Bjerrum E. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J Cheminform 2020; 12:70. [PMID: 33292482 PMCID: PMC7672904 DOI: 10.1186/s13321-020-00472-1] [Citation(s) in RCA: 110] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 10/24/2020] [Indexed: 11/11/2022] Open
Abstract
We present the open-source AiZynthFinder software that can be readily used in retrosynthetic planning. The algorithm is based on a Monte Carlo tree search that recursively breaks down a molecule to purchasable precursors. The tree search is guided by an artificial neural network policy that suggests possible precursors by utilizing a library of known reaction templates. The software is fast and can typically find a solution in less than 10 s and perform a complete search in less than 1 min. Moreover, the development of the code was guided by a range of software engineering principles such as automatic testing, system design and continuous integration leading to robust software with high maintainability. Finally, the software is well documented to make it suitable for beginners. The software is available at http://www.github.com/MolecularAI/aizynthfinder.
Collapse
Affiliation(s)
- Samuel Genheden
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden.
| | - Amol Thakkar
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden.,Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Veronika Chadimová
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | - Esben Bjerrum
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden.
| |
Collapse
|
45
|
Verhellen J, Van den Abeele J. Illuminating elite patches of chemical space. Chem Sci 2020; 11:11485-11491. [PMID: 34094392 PMCID: PMC8162856 DOI: 10.1039/d0sc03544k] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 09/15/2020] [Indexed: 11/21/2022] Open
Abstract
In the past few years, there has been considerable activity in both academic and industrial research to develop innovative machine learning approaches to locate novel, high-performing molecules in chemical space. Here we describe a new and fundamentally different type of approach that provides a holistic overview of how high-performing molecules are distributed throughout a search space. Based on an open-source, graph-based implementation [J. H. Jensen, Chem. Sci., 2019, 10, 3567-3572] of a traditional genetic algorithm for molecular optimisation, and influenced by state-of-the-art concepts from soft robot design [J. B. Mouret and J. Clune, Proceedings of the Artificial Life Conference, 2012, pp. 593-594], we provide an algorithm that (i) produces a large diversity of high-performing, yet qualitatively different molecules, (ii) illuminates the distribution of optimal solutions, and (iii) improves search efficiency compared to both machine learning and traditional genetic algorithm approaches.
Collapse
Affiliation(s)
- Jonas Verhellen
- Centre for Integrative Neuroplasticity, University of Oslo N-0316 Oslo Norway
| | | |
Collapse
|
46
|
David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform 2020; 12:56. [PMID: 33431035 PMCID: PMC7495975 DOI: 10.1186/s13321-020-00460-5] [Citation(s) in RCA: 150] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 09/05/2020] [Indexed: 02/08/2023] Open
Abstract
The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and visualization of bioactive molecules. For this purpose, it became necessary to represent molecules in a syntax that would be readable by computers and understandable by scientists of various fields. A large number of chemical representations have been developed over the years, their numerosity being due to the fast development of computers and the complexity of producing a representation that encompasses all structural and chemical characteristics. We present here some of the most popular electronic molecular and macromolecular representations used in drug discovery, many of which are based on graph representations. Furthermore, we describe applications of these representations in AI-driven drug discovery. Our aim is to provide a brief guide on structural representations that are essential to the practice of AI in drug discovery. This review serves as a guide for researchers who have little experience with the handling of chemical representations and plan to work on applications at the interface of these fields.
Collapse
Affiliation(s)
- Laurianne David
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden.
| | - Amol Thakkar
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Rocío Mercado
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, Astrazeneca Gothenburg, Sweden
| |
Collapse
|
47
|
Shibukawa R, Ishida S, Yoshizoe K, Wasa K, Takasu K, Okuno Y, Terayama K, Tsuda K. CompRet: a comprehensive recommendation framework for chemical synthesis planning with algorithmic enumeration. J Cheminform 2020; 12:52. [PMID: 33431005 PMCID: PMC7465358 DOI: 10.1186/s13321-020-00452-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Accepted: 08/08/2020] [Indexed: 01/21/2023] Open
Abstract
In computer-assisted synthesis planning (CASP) programs, providing as many chemical synthetic routes as possible is essential for considering optimal and alternative routes in a chemical reaction network. As the majority of CASP programs have been designed to provide one or a few optimal routes, it is likely that the desired one will not be included. To avoid this, an exact algorithm that lists possible synthetic routes within the chemical reaction network is required, alongside a recommendation of synthetic routes that meet specified criteria based on the chemist’s objectives. Herein, we propose a chemical-reaction-network-based synthetic route recommendation framework called “CompRet” with a mathematically guaranteed enumeration algorithm. In a preliminary experiment, CompRet was shown to successfully provide alternative routes for a known antihistaminic drug, cetirizine. CompRet is expected to promote desirable enumeration-based chemical synthesis searches and aid the development of an interactive CASP framework for chemists.
Collapse
Affiliation(s)
- Ryosuke Shibukawa
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Shoichi Ishida
- Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, 606-8501, Kyoto, Japan
| | - Kazuki Yoshizoe
- RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | | | - Kiyosei Takasu
- Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, 606-8501, Kyoto, Japan
| | - Yasushi Okuno
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.,Medical Sciences Innovation Hub Program, RIKEN, Kanagawa, Japan
| | - Kei Terayama
- RIKEN Center for Advanced Intelligence Project, Tokyo, Japan. .,Graduate School of Medicine, Kyoto University, Kyoto, Japan. .,Medical Sciences Innovation Hub Program, RIKEN, Kanagawa, Japan. .,Graduate School of Medical Life Science, Yokohama City University, Kanagawa, Japan.
| | - Koji Tsuda
- Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan. .,RIKEN Center for Advanced Intelligence Project, Tokyo, Japan. .,Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Kyoto, Japan.
| |
Collapse
|
48
|
Correlation in plant volatile metabolites: physiochemical properties as a proxy for enzymatic pathways and an alternative metric of biosynthetic constraint. CHEMOECOLOGY 2020. [DOI: 10.1007/s00049-020-00322-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
49
|
Struble TJ, Alvarez JC, Brown SP, Chytil M, Cisar J, DesJarlais RL, Engkvist O, Frank SA, Greve DR, Griffin DJ, Hou X, Johannes JW, Kreatsoulas C, Lahue B, Mathea M, Mogk G, Nicolaou CA, Palmer AD, Price DJ, Robinson RI, Salentin S, Xing L, Jaakkola T, Green WH, Barzilay R, Coley CW, Jensen KF. Current and Future Roles of Artificial Intelligence in Medicinal Chemistry Synthesis. J Med Chem 2020; 63:8667-8682. [PMID: 32243158 PMCID: PMC7457232 DOI: 10.1021/acs.jmedchem.9b02120] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
![]()
Artificial
intelligence and machine learning have demonstrated
their potential role in predictive chemistry and synthetic planning
of small molecules; there are at least a few reports of companies
employing in silico synthetic planning into their
overall approach to accessing target molecules. A data-driven synthesis
planning program is one component being developed and evaluated by
the Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS)
consortium, comprising MIT and 13 chemical and pharmaceutical company
members. Together, we wrote this perspective to share how we think
predictive models can be integrated into medicinal chemistry synthesis
workflows, how they are currently used within MLPDS member companies,
and the outlook for this field.
Collapse
Affiliation(s)
- Thomas J Struble
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Juan C Alvarez
- Computational and Structural Chemistry, Merck & Co. Inc., Kenilworth, New Jersey 07033, United States
| | - Scott P Brown
- Sunovion Pharmaceuticals Inc., Marlborough, Massachusetts 01752, United States
| | - Milan Chytil
- Sunovion Pharmaceuticals Inc., Marlborough, Massachusetts 01752, United States
| | - Justin Cisar
- Janssen Research & Development LLC, Spring House, Pennsylvania 19477, United States
| | - Renee L DesJarlais
- Janssen Research & Development LLC, Spring House, Pennsylvania 19477, United States
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, 431 83 Mölndal, Sweden
| | - Scott A Frank
- Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Daniel R Greve
- LEO Pharma A/S, Industriparken 55, DK-2750 Ballerup, Denmark
| | | | - Xinjun Hou
- Pfizer Inc., Cambridge, Massachusetts 02139, United States
| | - Jeffrey W Johannes
- Medicinal Chemistry, Early Oncology, Oncology R&D, AstraZeneca, Boston, Massachusetts 02451, United States
| | | | - Brian Lahue
- Computational and Structural Chemistry, Merck & Co. Inc., Kenilworth, New Jersey 07033, United States
| | - Miriam Mathea
- BASF SE, Carl-Bosch-Strasse 38, 67056 Ludwigshafen am Rhein, Germany
| | | | | | - Andrew D Palmer
- BASF SE, Carl-Bosch-Strasse 38, 67056 Ludwigshafen am Rhein, Germany
| | - Daniel J Price
- GlaxoSmithKline, Collegeville, Pennsylvania 19426, United States
| | - Richard I Robinson
- Novartis Institutes for BioMedical Research, Cambridge, Massachusetts 02139, United States
| | | | - Li Xing
- WuXi AppTec, Cambridge, Massachusetts 02142, United States
| | - Tommi Jaakkola
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| | - Klavs F Jensen
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
50
|
Nicolaou CA, Watson IA, LeMasters M, Masquelin T, Wang J. Context Aware Data-Driven Retrosynthetic Analysis. J Chem Inf Model 2020; 60:2728-2738. [DOI: 10.1021/acs.jcim.9b01141] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Christos A. Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Ian A. Watson
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Mark LeMasters
- Research Chemistry IT, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Thierry Masquelin
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Jibo Wang
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| |
Collapse
|