1
|
Whitehead TM, Strickland J, Conduit GJ, Borrel A, Mucs D, Baskerville-Abraham I. Quantifying the Benefits of Imputation over QSAR Methods in Toxicology Data Modeling. J Chem Inf Model 2024; 64:2624-2636. [PMID: 38091381 DOI: 10.1021/acs.jcim.3c01695] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Imputation machine learning (ML) surpasses traditional approaches in modeling toxicity data. The method was tested on an open-source data set comprising approximately 2500 ingredients with limited in vitro and in vivo data obtained from the OECD QSAR Toolbox. By leveraging the relationships between different toxicological end points, imputation extracts more valuable information from each data point compared to well-established single end point methods, such as ML-based Quantitative Structure Activity Relationship (QSAR) approaches, providing a final improvement of up to around 0.2 in the coefficient of determination. A significant aspect of this methodology is its resilience to the inclusion of extraneous chemical or experimental data. While additional data typically introduces a considerable level of noise and can hinder performance of single end point QSAR modeling, imputation models remain unaffected. This implies a reduction in the need for laborious manual preprocessing tasks such as feature selection, thereby making data preparation for ML analysis more efficient. This successful test, conducted on open-source data, validates the efficacy of imputation approaches in toxicity data analysis. This work opens the way for applying similar methods to other types of sparse toxicological data matrices, and so we discuss the development of regulatory authority guidelines to accept imputation models, a key aspect for the wider adoption of these methods.
Collapse
Affiliation(s)
- Thomas M Whitehead
- Intellegens Ltd., The Studio, Chesterton Mill, Cambridge CB4 3NP, United Kingdom
| | - Joel Strickland
- Intellegens Ltd., The Studio, Chesterton Mill, Cambridge CB4 3NP, United Kingdom
| | - Gareth J Conduit
- Intellegens Ltd., The Studio, Chesterton Mill, Cambridge CB4 3NP, United Kingdom
| | - Alexandre Borrel
- Inotiv, Research Triangle Park, North Carolina 27560, United States
| | - Daniel Mucs
- Scientific and Regulatory Affairs, JT International SA, 8, rue Kazem Radjavi, 1202 Geneva, Switzerland
| | - Irene Baskerville-Abraham
- Scientific and Regulatory Affairs, JT International SA, 8, rue Kazem Radjavi, 1202 Geneva, Switzerland
| |
Collapse
|
2
|
Tse EG, Aithani L, Anderson M, Cardoso-Silva J, Cincilla G, Conduit GJ, Galushka M, Guan D, Hallyburton I, Irwin BWJ, Kirk K, Lehane AM, Lindblom JCR, Lui R, Matthews S, McCulloch J, Motion A, Ng HL, Öeren M, Robertson MN, Spadavecchio V, Tatsis VA, van Hoorn WP, Wade AD, Whitehead TM, Willis P, Todd MH. An Open Drug Discovery Competition: Experimental Validation of Predictive Models in a Series of Novel Antimalarials. J Med Chem 2021; 64:16450-16463. [PMID: 34748707 DOI: 10.1021/acs.jmedchem.1c00313] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The Open Source Malaria (OSM) consortium is developing compounds that kill the human malaria parasite, Plasmodium falciparum, by targeting PfATP4, an essential ion pump on the parasite surface. The structure of PfATP4 has not been determined. Here, we describe a public competition created to develop a predictive model for the identification of PfATP4 inhibitors, thereby reducing project costs associated with the synthesis of inactive compounds. Competition participants could see all entries as they were submitted. In the final round, featuring private sector entrants specializing in machine learning methods, the best-performing models were used to predict novel inhibitors, of which several were synthesized and evaluated against the parasite. Half possessed biological activity, with one featuring a motif that the human chemists familiar with this series would have dismissed as "ill-advised". Since all data and participant interactions remain in the public domain, this research project "lives" and may be improved by others.
Collapse
Affiliation(s)
- Edwin G Tse
- School of Pharmacy, University College London, London WC1N 1AX, U.K
| | - Laksh Aithani
- Exscientia Ltd., The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Mark Anderson
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, School of Life Sciences, University of Dundee, Dundee DD1 5EH, U.K
| | - Jonathan Cardoso-Silva
- Department of Informatics, Faculty of Natural and Mathematical Sciences, King's College London, London WC2B 4BG, U.K
| | | | - Gareth J Conduit
- Intellegens Ltd., Eagle Labs, Chesterton Road, Cambridge CB4 3AZ, U.K.,Theory of Condensed Matter Group, Cavendish Laboratories, University of Cambridge, Cambridge CB3 0HE, U.K
| | | | - Davy Guan
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Irene Hallyburton
- Drug Discovery Unit, Division of Biological Chemistry and Drug Discovery, School of Life Sciences, University of Dundee, Dundee DD1 5EH, U.K
| | - Benedict W J Irwin
- Theory of Condensed Matter Group, Cavendish Laboratories, University of Cambridge, Cambridge CB3 0HE, U.K.,Optibrium Ltd. Blenheim House, Denny End Road, Cambridge CB25 9QE, U.K
| | - Kiaran Kirk
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Adele M Lehane
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Julia C R Lindblom
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Raymond Lui
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Slade Matthews
- School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - James McCulloch
- Kellerberrin, 6 Wharf Rd, Balmain, Sydney, NSW 2041, Australia
| | - Alice Motion
- School of Chemistry, The University of Sydney, Sydney, NSW 2006, Australia
| | - Ho Leung Ng
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan Kansas 66506, United States
| | - Mario Öeren
- Optibrium Ltd. Blenheim House, Denny End Road, Cambridge CB25 9QE, U.K
| | - Murray N Robertson
- Strathclyde Institute Of Pharmacy And Biomedical Sciences, University of Strathclyde, Glasgow G4 ORE, U.K
| | | | - Vasileios A Tatsis
- Exscientia Ltd., The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Willem P van Hoorn
- Exscientia Ltd., The Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Alexander D Wade
- Theory of Condensed Matter Group, Cavendish Laboratories, University of Cambridge, Cambridge CB3 0HE, U.K
| | | | - Paul Willis
- Medicines for Malaria Venture, PO Box 1826, 20 rte de Pre-Bois, 1215 Geneva 15, Switzerland
| | - Matthew H Todd
- School of Pharmacy, University College London, London WC1N 1AX, U.K
| |
Collapse
|
4
|
Whitehead TM, Chen F, Daly C, Conduit GJ. Accelerating the Design of Automotive Catalyst Products Using Machine Learning. Johnson Matthey Technology Review 2021. [DOI: 10.1595/205651322x16270488736796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The design of catalyst products to reduce harmful emissions is currently an intensive process of expert-driven discovery, taking several years to develop a product. Machine learning can accelerate this timescale, leveraging historic experimental data from related products to guide which new formulations and experiments will enable a project to most directly reach its targets. We used machine learning to accurately model 16 key performance targets for catalyst products, enabling detailed understanding of the factors governing catalyst performance and realistic suggestions of future experiments to rapidly develop more effective products. The proposed formulations are currently undergoing experimental validation.
Collapse
Affiliation(s)
| | - Flora Chen
- Johnson Matthey, Orchard Road, Royston, Hertfordshire, SG8 5HE, UK
| | - Christopher Daly
- Johnson Matthey, Orchard Road, Royston, Hertfordshire, SG8 5HE, UK
| | - Gareth J. Conduit
- Intellegens Ltd, Eagle Labs, Chesterton Road, Cambridge, UK
- Theory of Condensed Matter, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK
| |
Collapse
|
5
|
Irwin BWJ, Levell JR, Whitehead TM, Segall MD, Conduit GJ. Practical Applications of Deep Learning To Impute Heterogeneous Drug Discovery Data. J Chem Inf Model 2020; 60:2848-2857. [PMID: 32478517 DOI: 10.1021/acs.jcim.0c00443] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Contemporary deep learning approaches still struggle to bring a useful improvement in the field of drug discovery because of the challenges of sparse, noisy, and heterogeneous data that are typically encountered in this context. We use a state-of-the-art deep learning method, Alchemite, to impute data from drug discovery projects, including multitarget biochemical activities, phenotypic activities in cell-based assays, and a variety of absorption, distribution, metabolism, and excretion (ADME) endpoints. The resulting model gives excellent predictions for activity and ADME endpoints, offering an average increase in R2 of 0.22 versus quantitative structure-activity relationship methods. The model accuracy is robust to combining data across uncorrelated endpoints and projects with different chemical spaces, enabling a single model to be trained for all compounds and endpoints. We demonstrate improvements in accuracy on the latest chemistry and data when updating models with new data as an ongoing medicinal chemistry project progresses.
Collapse
Affiliation(s)
- Benedict W J Irwin
- Optibrium Limited, Cambridge Innovation Park, Denny End Rd, Cambridge CB25 9PB, U.K.,Cavendish Laboratory, University of Cambridge, 19 JJ Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Julian R Levell
- Constellation Pharmaceuticals Inc., 215 First St Suite 200, Cambridge, Massachusetts 02142, United States
| | - Thomas M Whitehead
- Intellegens Limited, Eagle Labs, 28 Chesterton Road, Cambridge CB4 3AZ, U.K
| | - Matthew D Segall
- Optibrium Limited, Cambridge Innovation Park, Denny End Rd, Cambridge CB25 9PB, U.K
| | - Gareth J Conduit
- Intellegens Limited, Eagle Labs, 28 Chesterton Road, Cambridge CB4 3AZ, U.K.,Cavendish Laboratory, University of Cambridge, 19 JJ Thomson Avenue, Cambridge CB3 0HE, U.K
| |
Collapse
|
6
|
Abstract
We describe a novel deep learning neural network method and its application to impute assay pIC50 values. Unlike conventional machine learning approaches, this method is trained on sparse bioactivity data as input, typical of that found in public and commercial databases, enabling it to learn directly from correlations between activities measured in different assays. In two case studies on public domain data sets we show that the neural network method outperforms traditional quantitative structure-activity relationship (QSAR) models and other leading approaches. Furthermore, by focusing on only the most confident predictions the accuracy is increased to R2 > 0.9 using our method, as compared to R2 = 0.44 when reporting all predictions.
Collapse
Affiliation(s)
- T M Whitehead
- Intellegens , Eagle Labs , Chesterton Road , Cambridge CB4 3AZ , United Kingdom
| | - B W J Irwin
- Optibrium , F5-6 Blenheim House, Cambridge Innovation Park, Denny End Road , Cambridge CB25 9PB , United Kingdom
| | - P Hunt
- Optibrium , F5-6 Blenheim House, Cambridge Innovation Park, Denny End Road , Cambridge CB25 9PB , United Kingdom
| | - M D Segall
- Optibrium , F5-6 Blenheim House, Cambridge Innovation Park, Denny End Road , Cambridge CB25 9PB , United Kingdom
| | - G J Conduit
- Intellegens , Eagle Labs , Chesterton Road , Cambridge CB4 3AZ , United Kingdom.,Cavendish Laboratory , University of Cambridge , J.J. Thomson Avenue , Cambridge CB3 0HE , United Kingdom
| |
Collapse
|