1
|
Boichenko DS, Kolomoets NI, Boiko DA, Galushko AS, Posvyatenko AV, Kolesnikov AE, Egorova KS, Ananikov VP. Build-a-Bio-Strip: An Online Platform for Rapid Toxicity Assessment in Chemical Synthesis. J Chem Inf Model 2024; 64:8373-8378. [PMID: 39488853 DOI: 10.1021/acs.jcim.4c01381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2024]
Abstract
The increasing need to understand and control the environmental impact of chemical processes has revealed the challenge in efficient evaluation of toxicity of the vast number of chemical compounds and their varying effects on biological systems. In this study, we introduce "Build-a-bio-Strip", a novel online service designed to carry out a quick initial analysis of the toxic impact of chemical processes. This platform enables users to automatically generate toxicity characteristics of chemical reactions using their own data on cytotoxicity or median lethal doses of the substances involved or computational predictions based on SMILES strings. The service calculates the toxicity metrics such as bio-Factors and cytotoxicity potentials, which can be used to identify the substances with significant contributions to the overall toxicity of a particular process. This facilitates the selection of safer synthetic routes and the optimization of chemical processes from a toxicity perspective. "Build-a-bio-Strip" represents a step toward safer and more sustainable chemical practices. It is available free-of-charge at http://app.ananikovlab.ai:8080/.
Collapse
Affiliation(s)
- Dmitry S Boichenko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
- Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory GSP-1, 1-3, Moscow 119991, Russia
| | - Nikita I Kolomoets
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| | - Daniil A Boiko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| | - Alexey S Galushko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| | - Alexandra V Posvyatenko
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
- Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Ministry of Health of Russian Federation, Moscow 117198, Russia
| | - Andrey E Kolesnikov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| | - Ksenia S Egorova
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| | - Valentine P Ananikov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow 119991, Russia
| |
Collapse
|
2
|
Fan F, Wu G, Yang Y, Liu F, Qian Y, Yu Q, Ren H, Geng J. A Graph Neural Network Model with a Transparent Decision-Making Process Defines the Applicability Domain for Environmental Estrogen Screening. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18236-18245. [PMID: 37749748 DOI: 10.1021/acs.est.3c04571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2023]
Abstract
The application of deep learning (DL) models for screening environmental estrogens (EEs) for the sound management of chemicals has garnered significant attention. However, the currently available DL model for screening EEs lacks both a transparent decision-making process and effective applicability domain (AD) characterization, making the reliability of its prediction results uncertain and limiting its practical applications. To address this issue, a graph neural network (GNN) model was developed to screen EEs, achieving accuracy rates of 88.9% and 92.5% on the internal and external test sets, respectively. The decision-making process of the GNN model was explored through the network-like similarity graphs (NSGs) based on the model features (FT). We discovered that the accuracy of the predictions is dependent on the feature distribution of compounds in NSGs. An AD characterization method called ADFT was proposed, which excludes predictions falling outside of the model's prediction range, leading to a 15% improvement in the F1 score of the GNN model. The GNN model with the AD method may serve as an efficient tool for screening EEs, identifying 800 potential EEs in the Inventory of Existing Chemical Substances of China. Additionally, this study offers new insights into comprehending the decision-making process of DL models.
Collapse
Affiliation(s)
- Fan Fan
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Gang Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Yining Yang
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Fu Liu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Yuli Qian
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Qingmiao Yu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environment, Ministry of Education, Chongqing University, Chongqing 400044, China
| | - Hongqiang Ren
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
| | - Jinju Geng
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210023, Jiangsu, P. R. China
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environment, Ministry of Education, Chongqing University, Chongqing 400044, China
| |
Collapse
|
3
|
Venkatraman V. FP-MAP: an extensive library of fingerprint-based molecular activity prediction tools. Front Chem 2023; 11:1239467. [PMID: 37649967 PMCID: PMC10462816 DOI: 10.3389/fchem.2023.1239467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023] Open
Abstract
Discovering new drugs for disease treatment is challenging, requiring a multidisciplinary effort as well as time, and resources. With a view to improving hit discovery and lead compound identification, machine learning (ML) approaches are being increasingly used in the decision-making process. Although a number of ML-based studies have been published, most studies only report fragments of the wider range of bioactivities wherein each model typically focuses on a particular disease. This study introduces FP-MAP, an extensive atlas of fingerprint-based prediction models that covers a diverse range of activities including neglected tropical diseases (caused by viral, bacterial and parasitic pathogens) as well as other targets implicated in diseases such as Alzheimer's. To arrive at the best predictive models, performance of ≈4,000 classification/regression models were evaluated on different bioactivity data sets using 12 different molecular fingerprints. The best performing models that achieved test set AUC values of 0.62-0.99 have been integrated into an easy-to-use graphical user interface that can be downloaded from https://gitlab.com/vishsoft/fpmap.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
4
|
Abstract
The problem of human trust is one of the most fundamental problems in applied artificial intelligence in drug discovery. In silico models have been widely used to accelerate the process of drug discovery in recent years. However, most of these models can only give reliable predictions within a limited chemical space that the training set covers (applicability domain). Predictions of samples falling outside the applicability domain are unreliable and sometimes dangerous for the drug-design decision-making process. Uncertainty quantification accordingly has drawn great attention to enable autonomous drug designing. By quantifying the confidence level of model predictions, the reliability of the predictions can be quantitatively represented to assist researchers in their molecular reasoning and experimental design. Here we summarize the state-of-the-art approaches to uncertainty quantification and underline how they can be used for drug design and discovery projects. Furthermore, we also outline four representative application scenarios of uncertainty quantification in drug discovery.
Collapse
Affiliation(s)
- Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
5
|
Brigo A, Naga D, Muster W. Increasing the Value of Data Within a Large Pharmaceutical Company Through In Silico Models. Methods Mol Biol 2022; 2425:637-674. [PMID: 35188649 DOI: 10.1007/978-1-0716-1960-5_24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The present contribution describes how in silico models and methods are applied at different stages of the drug discovery process in the pharmaceutical industry. A description of the most relevant computational methods and tools is given along with an evaluation of their performance in the assessment of potential genotoxic impurities and the prediction of off-target in vitro pharmacology. The challenges of predicting the outcome of highly complex in vivo studies are discussed followed by considerations on how novel ways to manage, store, exchange, and analyze data may advance knowledge and facilitate modeling efforts. In this context, the current status of broad data sharing initiatives, namely, eTOX and eTransafe, will be described along with related projects that could significantly reduce the use of animals in drug discovery in the future.
Collapse
Affiliation(s)
- Alessandro Brigo
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Centre Basel, Basel, Switzerland.
| | - Doha Naga
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Centre Basel, Basel, Switzerland
- Department of Pharmaceutical Chemistry, Group of Pharmacoinformatics, University of Vienna, Wien, Austria
| | - Wolfgang Muster
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Centre Basel, Basel, Switzerland
| |
Collapse
|
6
|
Thomas M, Boardman A, Garcia-Ortegon M, Yang H, de Graaf C, Bender A. Applications of Artificial Intelligence in Drug Design: Opportunities and Challenges. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:1-59. [PMID: 34731463 DOI: 10.1007/978-1-0716-1787-8_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Artificial intelligence (AI) has undergone rapid development in recent years and has been successfully applied to real-world problems such as drug design. In this chapter, we review recent applications of AI to problems in drug design including virtual screening, computer-aided synthesis planning, and de novo molecule generation, with a focus on the limitations of the application of AI therein and opportunities for improvement. Furthermore, we discuss the broader challenges imposed by AI in translating theoretical practice to real-world drug design; including quantifying prediction uncertainty and explaining model behavior.
Collapse
Affiliation(s)
- Morgan Thomas
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andrew Boardman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Miguel Garcia-Ortegon
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.,Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Hongbin Yang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | | | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
7
|
Menke J, Massa J, Koch O. Natural product scores and fingerprints extracted from artificial neural networks. Comput Struct Biotechnol J 2021; 19:4593-4602. [PMID: 34584636 PMCID: PMC8445839 DOI: 10.1016/j.csbj.2021.07.032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/26/2021] [Accepted: 07/26/2021] [Indexed: 11/21/2022] Open
Abstract
Due to their desirable properties, natural products are an important ligand class for medicinal chemists. However, due to their structural distinctiveness, traditional cheminformatic approaches, like ligand-based virtual screening, often perform worse for natural products. Based on our recent work, we evaluated the ability of neural networks to generate fingerprints more appropriate for use with natural products. A manually curated dataset of natural products and synthetic decoys was used to train a multi-layer perceptron network and an autoencoder-like network. In-depth analysis showed that the extracted natural product-specific neural fingerprint outperforms traditional as well as natural product-specific fingerprints on three datasets. Further, we explored how the activations from the output layer of a network can work as a novel natural product likeness score. Overall, two natural product-specific datasets were generated, which are publicly available together with the code to create the fingerprints and the novel natural product likeness score.
Collapse
Affiliation(s)
- Janosch Menke
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, 48149 Münster, Germany
| | - Joana Massa
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, 48149 Münster, Germany
| | - Oliver Koch
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, 48149 Münster, Germany
- Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität Münster, Corrensstraße 48, 48149 Münster, Germany
| |
Collapse
|
8
|
Ahmad F, Mahmood A, Muhmood T. Machine learning-integrated omics for the risk and safety assessment of nanomaterials. Biomater Sci 2021; 9:1598-1608. [PMID: 33443512 DOI: 10.1039/d0bm01672a] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
With the advancement in nanotechnology, we are experiencing transformation in world order with deep insemination of nanoproducts from basic necessities to advanced electronics, health care products and medicines. Therefore, nanoproducts, however, can have negative side effects and must be strictly monitored to avoid negative outcomes. Future toxicity and safety challenges regarding nanomaterial incorporation into consumer products, including rapid addition of nanomaterials with diverse functionalities and attributes, highlight the limitations of traditional safety evaluation tools. Currently, artificial intelligence and machine learning algorithms are envisioned for enhancing and improving the nano-bio-interaction simulation and modeling, and they extend to the post-marketing surveillance of nanomaterials in the real world. Thus, hyphenation of machine learning with biology and nanomaterials could provide exclusive insights into the perturbations of delicate biological functions after integration with nanomaterials. In this review, we discuss the potential of combining integrative omics with machine learning in profiling nanomaterial safety and risk assessment and provide guidance for regulatory authorities as well.
Collapse
Affiliation(s)
- Farooq Ahmad
- College of Engineering and Applied Sciences, Nanjing National Laboratory of Microstructures, Jiangsu Key Laboratory of Artificial Functional Materials, Nanjing University, Nanjing, Jiangsu 210093, China.
| | - Asif Mahmood
- Beijing Key Laboratory of Photoelectronic/Electrophotonic Conversion Materials, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing, 100081, China
| | - Tahir Muhmood
- State Key Lab of Metal Matrix Composites, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
9
|
Kovács DP, McCorkindale W, Lee AA. Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias. Nat Commun 2021; 12:1695. [PMID: 33727552 PMCID: PMC7966799 DOI: 10.1038/s41467-021-21895-w] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 02/10/2021] [Indexed: 12/30/2022] Open
Abstract
Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models.
Collapse
Affiliation(s)
| | | | - Alpha A Lee
- Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
10
|
Wedlake AJ, Allen TEH, Goodman JM, Gutsell S, Kukic P, Russell PJ. Confidence in Inactive and Active Predictions from Structural Alerts. Chem Res Toxicol 2020; 33:3010-3022. [PMID: 33295767 DOI: 10.1021/acs.chemrestox.0c00332] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Having a measure of confidence in computational predictions of biological activity from in silico tools is vital when making predictions for new chemicals, for example, in chemical risk assessment. Where predictions of biological activity are used as an indicator of a potential hazard, false-negative predictions are the most concerning prediction; however, assigning confidence in inactive predictions is particularly challenging. How can one confidently identify the absence of activating features? In this study, we present methods for assigning confidence to both active and inactive predictions from structural alerts for protein-binding molecular initiating events (MIEs). Structural alerts were derived through an iterative statistical method. Confidence in the activity predictions is assigned by measuring the Tanimoto similarity between Morgan fingerprints of chemicals in the test set to relevant chemicals in the training set, and suitable cutoff values have been defined to give different confidence categories. To avoid a potential compound series bias in the test set and hence overestimate the performance of the method, we measured the biological activity of 27 compounds with 24 proteins, which gave us an additional 648 experimental measurements; many of the measurements are currently nonexistent in the literature and databases. This data set was complemented with newly measured biological activities published in ChEMBL25 and formed a combined independent validation data set. Applying the confidence categories to the computational predictions for the new data leads to the identification of chemicals for which one should be confident of either an inactive or active prediction, allowing model predictions to be used responsibly.
Collapse
Affiliation(s)
- Andrew J Wedlake
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Timothy E H Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.,MRC Toxicology Unit, University of Cambridge, Gleeson Building, Tennis Court Road, Cambridge CB2 1QR, United Kingdom
| | - Jonathan M Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Steve Gutsell
- Unilever Safety and Environmental Assurance Centre, Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, United Kingdom
| | - Predrag Kukic
- Unilever Safety and Environmental Assurance Centre, Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, United Kingdom
| | - Paul J Russell
- Unilever Safety and Environmental Assurance Centre, Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, United Kingdom
| |
Collapse
|
11
|
Mervin LH, Johansson S, Semenova E, Giblin KA, Engkvist O. Uncertainty quantification in drug design. Drug Discov Today 2020; 26:474-489. [PMID: 33253918 DOI: 10.1016/j.drudis.2020.11.027] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 07/13/2020] [Accepted: 11/23/2020] [Indexed: 01/03/2023]
Abstract
Machine learning and artificial intelligence are increasingly being applied to the drug-design process as a result of the development of novel algorithms, growing access, the falling cost of computation and the development of novel technologies for generating chemically and biologically relevant data. There has been recent progress in fields such as molecular de novo generation, synthetic route prediction and, to some extent, property predictions. Despite this, most research in these fields has focused on improving the accuracy of the technologies, rather than on quantifying the uncertainty in the predictions. Uncertainty quantification will become a key component in autonomous decision making and will be crucial for integrating machine learning and chemistry automation to create an autonomous design-make-test-analyse cycle. This review covers the empirical, frequentist and Bayesian approaches to uncertainty quantification, and outlines how they can be used for drug design. We also outline the impact of uncertainty quantification on decision making.
Collapse
Affiliation(s)
- Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Simon Johansson
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden; Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Elizaveta Semenova
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Kathryn A Giblin
- Medicinal Chemistry, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|