1
|
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A. Will we ever be able to accurately predict solubility? Sci Data 2024; 11:303. [PMID: 38499581 PMCID: PMC10948805 DOI: 10.1038/s41597-024-03105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 02/29/2024] [Indexed: 03/20/2024] Open
Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
Collapse
Affiliation(s)
- P Llompart
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
- IDD/CADD, Sanofi, Vitry-Sur-Seine, France
| | | | - S Baybekov
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - G Marcou
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.
| | - A Varnek
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
2
|
Ghahremanpour MM, Saar A, Tirado-Rives J, Jorgensen WL. Ensemble Geometric Deep Learning of Aqueous Solubility. J Chem Inf Model 2023; 63:7338-7349. [PMID: 37990484 DOI: 10.1021/acs.jcim.3c01536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Geometric deep learning is one of the main workhorses for harnessing the power of big data to predict molecular properties such as aqueous solubility, which is key to the pharmacokinetic improvement of drug candidates. Two ensembles of graph neural network architectures were built, one based on spectral convolution and the other on spatial convolution. The pretrained models, denoted respectively as SolNet-GCN and SolNet-GAT, significantly outperformed the existing neural networks benchmarked on a validation set of 207 molecules. The SolNet-GCN model demonstrated the best performance on both the training and validation sets, with RMSE values of 0.53 and 0.72 log molar unit and Pearson r2 values of 0.95 and 0.75, respectively. Further, the ranking power of the SolNet models agreed well with a QM-based thermodynamic cycle approach at the PBE-vdW level of theory on a series of benzophenylurea derivatives and a series of benzodiazepine derivatives. Nevertheless, testing the resultant models on a set of inhibitors of the macrophage migration inhibitory factor (MIF) illustrated that the inclusion of atomic attributes to discriminate atoms with a higher tendency to form intermolecular hydrogen bonds in the crystalline state and to identify planar or nonplanar substructures can be beneficial for the prediction of aqueous solubility.
Collapse
Affiliation(s)
| | - Anastasia Saar
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| | - Julian Tirado-Rives
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| | - William L Jorgensen
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| |
Collapse
|
3
|
Obrezanova O, Martinsson A, Whitehead T, Mahmoud S, Bender A, Miljković F, Grabowski P, Irwin B, Oprisiu I, Conduit G, Segall M, Smith GF, Williamson B, Winiwarter S, Greene N. Prediction of In Vivo Pharmacokinetic Parameters and Time-Exposure Curves in Rats Using Machine Learning from the Chemical Structure. Mol Pharm 2022; 19:1488-1504. [PMID: 35412314 DOI: 10.1021/acs.molpharmaceut.2c00027] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Animal pharmacokinetic (PK) data as well as human and animal in vitro systems are utilized in drug discovery to define the rate and route of drug elimination. Accurate prediction and mechanistic understanding of drug clearance and disposition in animals provide a degree of confidence for extrapolation to humans. In addition, prediction of in vivo properties can be used to improve design during drug discovery, help select compounds with better properties, and reduce the number of in vivo experiments. In this study, we generated machine learning models able to predict rat in vivo PK parameters and concentration-time PK profiles based on the molecular chemical structure and either measured or predicted in vitro parameters. The models were trained on internal in vivo rat PK data for over 3000 diverse compounds from multiple projects and therapeutic areas, and the predicted endpoints include clearance and oral bioavailability. We compared the performance of various traditional machine learning algorithms and deep learning approaches, including graph convolutional neural networks. The best models for PK parameters achieved R2 = 0.63 [root mean squared error (RMSE) = 0.26] for clearance and R2 = 0.55 (RMSE = 0.46) for bioavailability. The models provide a fast and cost-efficient way to guide the design of molecules with optimal PK profiles, to enable the prediction of virtual compounds at the point of design, and to drive prioritization of compounds for in vivo assays.
Collapse
Affiliation(s)
- Olga Obrezanova
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Cambridge CB4 0FZ, U.K
| | - Anton Martinsson
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Gothenburg SE-43183, Sweden
| | - Tom Whitehead
- Intellegens Ltd., Eagle Labs, Cambridge CB4 3AZ, U.K
| | - Samar Mahmoud
- Optibrium Ltd., Cambridge Innovation Park, Cambridge CB25 9PB, U.K
| | - Andreas Bender
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Cambridge CB4 0FZ, U.K.,Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge CB2 1EW, U.K
| | - Filip Miljković
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Gothenburg SE-43183, Sweden
| | - Piotr Grabowski
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Cambridge CB4 0FZ, U.K
| | - Ben Irwin
- Optibrium Ltd., Cambridge Innovation Park, Cambridge CB25 9PB, U.K
| | - Ioana Oprisiu
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Gothenburg SE-43183, Sweden
| | | | - Matthew Segall
- Optibrium Ltd., Cambridge Innovation Park, Cambridge CB25 9PB, U.K
| | - Graham F Smith
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Cambridge CB4 0FZ, U.K
| | - Beth Williamson
- Drug Metabolism and Pharmacokinetics, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge CB10 1XL, U.K
| | - Susanne Winiwarter
- Drug Metabolism and Pharmacokinetics, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), Biopharmaceutical R&D, AstraZeneca, Gothenburg SE-43183, Sweden
| | - Nigel Greene
- Imaging and Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Waltham, Massachusetts 02451, United States
| |
Collapse
|
4
|
|
5
|
Mervin LH, Johansson S, Semenova E, Giblin KA, Engkvist O. Uncertainty quantification in drug design. Drug Discov Today 2020; 26:474-489. [PMID: 33253918 DOI: 10.1016/j.drudis.2020.11.027] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 07/13/2020] [Accepted: 11/23/2020] [Indexed: 01/03/2023]
Abstract
Machine learning and artificial intelligence are increasingly being applied to the drug-design process as a result of the development of novel algorithms, growing access, the falling cost of computation and the development of novel technologies for generating chemically and biologically relevant data. There has been recent progress in fields such as molecular de novo generation, synthetic route prediction and, to some extent, property predictions. Despite this, most research in these fields has focused on improving the accuracy of the technologies, rather than on quantifying the uncertainty in the predictions. Uncertainty quantification will become a key component in autonomous decision making and will be crucial for integrating machine learning and chemistry automation to create an autonomous design-make-test-analyse cycle. This review covers the empirical, frequentist and Bayesian approaches to uncertainty quantification, and outlines how they can be used for drug design. We also outline the impact of uncertainty quantification on decision making.
Collapse
Affiliation(s)
- Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Simon Johansson
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden; Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Elizaveta Semenova
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Kathryn A Giblin
- Medicinal Chemistry, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
6
|
Abstract
INTRODUCTION Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach. AREAS COVERED In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening. EXPERT OPINION Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.
Collapse
Affiliation(s)
- Igor I Baskin
- a Faculty of Physics , M.V. Lomonosov Moscow State University , Moscow , Russia.,b A.M. Butlerov Institute of Chemistry , Kazan Federal University , Kazan , Russia
| | - David Winkler
- c CSIRO Manufacturing , Clayton , VIC , Australia.,d Monash Institute for Pharmaceutical Sciences , Monash University , Parkville , VIC , Australia.,e Latrobe Institute for Molecular Science , Bundoora , VIC , Australia.,f School of Chemical and Physical Sciences , Flinders University , Bedford Park , SA , Australia
| | - Igor V Tetko
- g Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Institute of Structural Biology , Neuherberg , Germany.,h BigChem GmbH , Neuherberg , Germany
| |
Collapse
|
7
|
|
8
|
Quantification of frequent-hitter behavior based on historical high-throughput screening data. Future Med Chem 2015; 6:1113-26. [PMID: 25078133 DOI: 10.4155/fmc.14.72] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
AIM We mine historical high-throughput data to identify and characterize 'frequent hitters', hits that are potentially false-positive results. BACKGROUND A key problem in the field of high-throughput screening (HTS) is recognition of frequent hitters, which are false-positive or otherwise anomalous compounds that tend to crop up across many screens. Follow-up of such compounds constitutes a waste of resource and decreases efficiency. METHODOLOGY We describe a systematic retrospective approach to identify anomalous hitter behavior using historical screening data. We take into account the uncertainty that arises if not enough screen data are available and extend implementation to target and technology classes. CONCLUSION Use of the descriptor in analyzing high-throughput screen results frees up resource for follow-up of more likely true hits in the downstream hit-deconvolution cascade, thereby increasing efficiency of screen delivery. Although effective, historical data bias can affect the annotation, and we exemplify cases where this happened.
Collapse
|
9
|
Varadharajan S, Winiwarter S, Carlsson L, Engkvist O, Anantha A, Kogej T, Fridén M, Stålring J, Chen H. Exploring In Silico Prediction of the Unbound Brain-to-Plasma Drug Concentration Ratio: Model Validation, Renewal, and Interpretation. J Pharm Sci 2015; 104:1197-206. [DOI: 10.1002/jps.24301] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Revised: 11/14/2014] [Accepted: 11/18/2014] [Indexed: 01/13/2023]
|
10
|
Dearden JC, Rowe PH. Use of artificial neural networks in the QSAR prediction of physicochemical properties and toxicities for REACH legislation. Methods Mol Biol 2015; 1260:65-88. [PMID: 25502376 DOI: 10.1007/978-1-4939-2239-0_5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
With the introduction of the REACH legislation in the European Union, there is a requirement for property and toxicity data on chemicals produced in or imported into the EU at levels of 1 tonne/year or more. This has meant an increase in the in silico prediction of such data. One of the chief predictive approaches is QSAR (quantitative structure-activity relationships), which is widely used in many fields. A QSAR approach that is increasingly being used is that of artificial neural networks (ANNs), and this chapter discusses its application to the range of physicochemical properties and toxicities required by REACH. ANNs generally outperform the main QSAR approach of multiple linear regression (MLR), although other approaches such as support vector machines sometimes outperform ANNs. Most ANN QSARs reported to date comply with only two of the five OECD Guidelines for the Validation of (Q)SARs.
Collapse
Affiliation(s)
- John C Dearden
- School of Pharmacy & Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, UK,
| | | |
Collapse
|
11
|
Wenlock MC, Carlsson LA. How Experimental Errors Influence Drug Metabolism and Pharmacokinetic QSAR/QSPR Models. J Chem Inf Model 2014; 55:125-34. [DOI: 10.1021/ci500535s] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Mark C. Wenlock
- Drug Safety & Metabolism, AstraZeneca R&D Alderley Park, Macclesfield, Cheshire, SK10 4TF, U.K
| | - Lars A. Carlsson
- Drug Safety & Metabolism, AstraZeneca R&D Mölndal, 431 83, Mölndal, Sweden
| |
Collapse
|
12
|
Cumming JG, Davis AM, Muresan S, Haeberlein M, Chen H. Chemical predictive modelling to improve compound quality. Nat Rev Drug Discov 2014; 12:948-62. [PMID: 24287782 DOI: 10.1038/nrd4128] [Citation(s) in RCA: 167] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The 'quality' of small-molecule drug candidates, encompassing aspects including their potency, selectivity and ADMET (absorption, distribution, metabolism, excretion and toxicity) characteristics, is a key factor influencing the chances of success in clinical trials. Importantly, such characteristics are under the control of chemists during the identification and optimization of lead compounds. Here, we discuss the application of computational methods, particularly quantitative structure-activity relationships (QSARs), in guiding the selection of higher-quality drug candidates, as well as cultural factors that may have affected their use and impact.
Collapse
Affiliation(s)
- John G Cumming
- Chemistry Innovation Centre, Discovery Sciences, AstraZeneca R&D, Alderley Park, Macclesfield SK10 4TG, UK
| | | | | | | | | |
Collapse
|
13
|
Soars MG, Barton P, Elkin LL, Mosure KW, Sproston JL, Riley RJ. Application of anin vitroOAT assay in drug design and optimization of renal clearance. Xenobiotica 2014; 44:657-65. [DOI: 10.3109/00498254.2013.879625] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
14
|
Chen H, Carlsson L, Eriksson M, Varkonyi P, Norinder U, Nilsson I. Beyond the Scope of Free-Wilson Analysis: Building Interpretable QSAR Models with Machine Learning Algorithms. J Chem Inf Model 2013; 53:1324-36. [DOI: 10.1021/ci4001376] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | | | | | | | - Ulf Norinder
- CNSP Innovative Medicines, AstraZeneca R&D Södertälje, Sweden
| | | |
Collapse
|
15
|
Leach AG, McCoull W, Bailey A, Barton P, Mee C, Rosevere E. Experimental Testing of Quantum Mechanical Predictions of Mutagenicity: Aminopyrazoles. Chem Res Toxicol 2013; 26:703-9. [DOI: 10.1021/tx3005136] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Andrew G. Leach
- AstraZeneca, Alderley Park,
Macclesfield, Cheshire SK10 4TG, United Kingdom
| | - William McCoull
- AstraZeneca, Alderley Park,
Macclesfield, Cheshire SK10 4TG, United Kingdom
| | - Andrew Bailey
- AstraZeneca, Alderley Park,
Macclesfield, Cheshire SK10 4TG, United Kingdom
| | - Peter Barton
- AstraZeneca, Alderley Park,
Macclesfield, Cheshire SK10 4TG, United Kingdom
| | - Christine Mee
- AstraZeneca, Alderley Park,
Macclesfield, Cheshire SK10 4TG, United Kingdom
| | - Eleanor Rosevere
- AstraZeneca, Alderley Park,
Macclesfield, Cheshire SK10 4TG, United Kingdom
| |
Collapse
|
16
|
Davis AM, Wood DJ. Quantitative Structure–Activity Relationship Models That Stand the Test of Time. Mol Pharm 2013; 10:1183-90. [DOI: 10.1021/mp300466n] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- Andrew M. Davis
- AstraZeneca R&D Mölndal, Pepparedsleden 1, Mölndal, 431 83 Sweden
| | - David J. Wood
- AstraZeneca R&D Alderley Park, Alderley Edge, Cheshire, United Kingdom
| |
Collapse
|
17
|
Norinder U, Boström H. Introducing Uncertainty in Predictive Modeling—Friend or Foe? J Chem Inf Model 2012; 52:2815-22. [DOI: 10.1021/ci3003446] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ulf Norinder
- AstraZeneca R&D Södertälje, Sweden
- Department of
Pharmacy, Uppsala University, Sweden
| | - Henrik Boström
- Department of Computer and Systems
Sciences, Stockholm University, Sweden
| |
Collapse
|
18
|
Warner DJ, Chen H, Cantin LD, Kenna JG, Stahl S, Walker CL, Noeske T. Mitigating the inhibition of human bile salt export pump by drugs: opportunities provided by physicochemical property modulation, in silico modeling, and structural modification. Drug Metab Dispos 2012; 40:2332-41. [PMID: 22961681 DOI: 10.1124/dmd.112.047068] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The human bile salt export pump (BSEP) is a membrane protein expressed on the canalicular plasma membrane domain of hepatocytes, which mediates active transport of unconjugated and conjugated bile salts from liver cells into bile. BSEP activity therefore plays an important role in bile flow. In humans, genetically inherited defects in BSEP expression or activity cause cholestatic liver injury, and many drugs that cause cholestatic drug-induced liver injury (DILI) in humans have been shown to inhibit BSEP activity in vitro and in vivo. These findings suggest that inhibition of BSEP activity by drugs could be one of the mechanisms that initiate human DILI. To gain insight into the chemical features responsible for BSEP inhibition, we have used a recently described in vitro membrane vesicle BSEP inhibition assay to quantify transporter inhibition for a set of 624 compounds. The relationship between BSEP inhibition and molecular physicochemical properties was investigated, and our results show that lipophilicity and molecular size are significantly correlated with BSEP inhibition. This data set was further used to build predictive BSEP classification models through multiple quantitative structure-activity relationship modeling approaches. The highest level of predictive accuracy was provided by a support vector machine model (accuracy = 0.87, κ = 0.74). These analyses highlight the potential value that can be gained by combining computational methods with experimental efforts in early stages of drug discovery projects to minimize the propensity of drug candidates to inhibit BSEP.
Collapse
Affiliation(s)
- Daniel J Warner
- Department of Medicinal Chemistry, AstraZeneca R&D Montreal, Montreal, Quebec, Canada
| | | | | | | | | | | | | |
Collapse
|
19
|
Onuki Y, Kawai S, Arai H, Maeda J, Takagaki K, Takayama K. Contribution of the Physicochemical Properties of Active Pharmaceutical Ingredients to Tablet Properties Identified by Ensemble Artificial Neural Networks and Kohonen's Self-Organizing Maps. J Pharm Sci 2012; 101:2372-81. [DOI: 10.1002/jps.23134] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2011] [Revised: 02/22/2012] [Accepted: 03/08/2012] [Indexed: 11/11/2022]
|
20
|
Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model 2012; 52:1413-37. [PMID: 22582859 DOI: 10.1021/ci200409x] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
This paper is focused on modern approaches to machine learning, most of which are as yet used infrequently or not at all in chemoinformatics. Machine learning methods are characterized in terms of the "modes of statistical inference" and "modeling levels" nomenclature and by considering different facets of the modeling with respect to input/ouput matching, data types, models duality, and models inference. Particular attention is paid to new approaches and concepts that may provide efficient solutions of common problems in chemoinformatics: improvement of predictive performance of structure-property (activity) models, generation of structures possessing desirable properties, model applicability domain, modeling of properties with functional endpoints (e.g., phase diagrams and dose-response curves), and accounting for multiple molecular species (e.g., conformers or tautomers).
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France.
| | | |
Collapse
|
21
|
Soars MG, Barton P, Ismair M, Jupp R, Riley RJ. The Development, Characterization, and Application of an OATP1B1 Inhibition Assay in Drug Discovery. Drug Metab Dispos 2012; 40:1641-8. [DOI: 10.1124/dmd.111.042382] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
|
22
|
Yang Y, Engkvist O, Llinàs A, Chen H. Beyond Size, Ionization State, and Lipophilicity: Influence of Molecular Topology on Absorption, Distribution, Metabolism, Excretion, and Toxicity for Druglike Compounds. J Med Chem 2012; 55:3667-77. [DOI: 10.1021/jm201548z] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Yidong Yang
- Discovery
Sciences, Computational Sciences, Computational Chemistry, and ‡R&I iMED, In Vitro & In Vivo ADME, AstraZeneca R&D Mölndal,
SE-431 83 Mölndal, Sweden
| | - Ola Engkvist
- Discovery
Sciences, Computational Sciences, Computational Chemistry, and ‡R&I iMED, In Vitro & In Vivo ADME, AstraZeneca R&D Mölndal,
SE-431 83 Mölndal, Sweden
| | - Antonio Llinàs
- Discovery
Sciences, Computational Sciences, Computational Chemistry, and ‡R&I iMED, In Vitro & In Vivo ADME, AstraZeneca R&D Mölndal,
SE-431 83 Mölndal, Sweden
| | - Hongming Chen
- Discovery
Sciences, Computational Sciences, Computational Chemistry, and ‡R&I iMED, In Vitro & In Vivo ADME, AstraZeneca R&D Mölndal,
SE-431 83 Mölndal, Sweden
| |
Collapse
|
23
|
Wood DJ, Buttar D, Cumming JG, Davis AM, Norinder U, Rodgers SL. Automated QSAR with a Hierarchy of Global and Local Models. Mol Inform 2011; 30:960-72. [DOI: 10.1002/minf.201100107] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Accepted: 10/13/2011] [Indexed: 11/06/2022]
|
24
|
Luker T, Alcaraz L, Chohan KK, Blomberg N, Brown DS, Butlin RJ, Elebring T, Griffin AM, Guile S, St-Gallay S, Swahn BM, Swallow S, Waring MJ, Wenlock MC, Leeson PD. Strategies to improve in vivo toxicology outcomes for basic candidate drug molecules. Bioorg Med Chem Lett 2011; 21:5673-9. [DOI: 10.1016/j.bmcl.2011.07.074] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2011] [Accepted: 07/18/2011] [Indexed: 11/25/2022]
|
25
|
Griffen E, Leach AG, Robb GR, Warner DJ. Matched Molecular Pairs as a Medicinal Chemistry Tool. J Med Chem 2011; 54:7739-50. [DOI: 10.1021/jm200452d] [Citation(s) in RCA: 204] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ed Griffen
- Oncology Innovative Medicines Unit, AstraZeneca Pharmaceuticals, Mereside, Alderley Park, Macclesfield, SK10 4TG, U.K
| | - Andrew G. Leach
- Cardiovascular and Gastrointestinal Innovative Medicines Unit, AstraZeneca Pharmaceuticals, 30S373 Mereside, Alderley Park, Macclesfield, SK10 4TG, U.K
| | - Graeme R. Robb
- Cardiovascular and Gastrointestinal Innovative Medicines Unit, AstraZeneca Pharmaceuticals, 30S373 Mereside, Alderley Park, Macclesfield, SK10 4TG, U.K
| | - Daniel J. Warner
- Department of Medicinal Chemistry, AstraZeneca R&D Montreal, Montreal, Quebec, H4S 1Z9, Canada
| |
Collapse
|
26
|
In silico prediction of unbound brain-to-plasma concentration ratio using machine learning algorithms. J Mol Graph Model 2011; 29:985-95. [DOI: 10.1016/j.jmgm.2011.04.004] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Revised: 04/08/2011] [Accepted: 04/12/2011] [Indexed: 11/22/2022]
|
27
|
Trade-off between accuracy and interpretability for predictive in silico modeling. Future Med Chem 2011; 3:647-63. [DOI: 10.4155/fmc.11.23] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background: Accuracy concerns the ability of a model to make correct predictions, while interpretability concerns to what degree the model allows for human understanding. Models exhibiting the former property are many times more complex and opaque, while interpretable models may lack the necessary accuracy. The trade-off between accuracy and interpretability for predictive in silico modeling is investigated. Method: A number of state-of-the-art methods for generating accurate models are compared with state-of-the-art methods for generating transparent models. Conclusion: Results on 16 biopharmaceutical classification tasks demonstrate that, although the opaque methods generally obtain higher accuracies than the transparent ones, one often only has to pay a quite limited penalty in terms of predictive performance when choosing an interpretable model.
Collapse
|
28
|
Chen H, Engkvist O, Blomberg N. Combinatorial library design from reagent pharmacophore fingerprints. Methods Mol Biol 2011; 685:135-152. [PMID: 20981522 DOI: 10.1007/978-1-60761-931-4_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Combinatorial and parallel chemical synthesis technologies are powerful tools in early drug discovery projects. Over the past couple of years an increased emphasis on targeted lead generation libraries and focussed screening libraries in the pharmaceutical industry has driven a surge in computational methods to explore molecular frameworks to establish new chemical equity. In this chapter we describe a complementary technique in the library design process, termed ProSAR, to effectively cover the accessible pharmacophore space around a given scaffold. With this method reagents are selected such that each R-group on the scaffold has an optimal coverage of pharmacophoric features. This is achieved by optimising the Shannon entropy, i.e. the information content, of the topological pharmacophore distribution for the reagents. As this method enumerates compounds with a systematic variation of user-defined pharmacophores to the attachment point on the scaffold, the enumerated compounds may serve as a good starting point for deriving a structure-activity relationship (SAR).
Collapse
Affiliation(s)
- Hongming Chen
- DECS GCS Computational Chemistry, AstraZeneca R&D Mölndal, Mölndal, Sweden.
| | | | | |
Collapse
|
29
|
Takagaki K, Arai H, Takayama K. Creation of a tablet database containing several active ingredients and prediction of their pharmaceutical characteristics based on ensemble artificial neural networks. J Pharm Sci 2010; 99:4201-14. [PMID: 20310024 DOI: 10.1002/jps.22135] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A tablet database containing several active ingredients for a standard tablet formulation was created. Tablet tensile strength (TS) and disintegration time (DT) were measured before and after storage for 30 days at 40 degrees C and 75% relative humidity. An ensemble artificial neural network (EANN) was used to predict responses to differences in quantities of excipients and physical-chemical properties of active ingredients in tablets. Most classical neural networks involve a tedious trial and error approach, but EANNs automatically determine basal key parameters, which ensure that an optimal structure is rapidly obtained. We compared the predictive abilities of EANNs in which the following kinds of training algorithms were used: linear, radial basis function, general regression (GR), and multilayer perceptron. The GR EANN predicted pharmaceutical responses such as TS and DT most accurately, as evidenced by high correlation coefficients in a leave-some-out cross-validation procedure. When used in conjunction with a tablet database, the GR EANN is capable of identifying acceptable candidate tablet formulations.
Collapse
Affiliation(s)
- Keisuke Takagaki
- Department of Pharmaceutics, Hoshi University, 2-4-41 Ebara, Shinagawa-ku, Tokyo 142-8501, Japan
| | | | | |
Collapse
|
30
|
Hecht D. Applications of machine learning and computational intelligence to drug discovery and development. Drug Dev Res 2010. [DOI: 10.1002/ddr.20402] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- David Hecht
- Southwestern College, Chula Vista, California
| |
Collapse
|
31
|
Paine SW, Barton P, Bird J, Denton R, Menochet K, Smith A, Tomkinson NP, Chohan KK. A rapid computational filter for predicting the rate of human renal clearance. J Mol Graph Model 2010; 29:529-37. [DOI: 10.1016/j.jmgm.2010.10.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Revised: 10/08/2010] [Accepted: 10/12/2010] [Indexed: 11/30/2022]
|
32
|
Cooper A, Potter T, Luker T. Prediction of Efficacious Inhalation Lung Doses via the Use of In Silico Lung Retention Quantitative Structure-Activity Relationship Models and In Vitro Potency Screens. Drug Metab Dispos 2010; 38:2218-25. [DOI: 10.1124/dmd.110.034462] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
33
|
Katritzky AR, Kuanar M, Slavov S, Hall CD, Karelson M, Kahn I, Dobchev DA. Quantitative Correlation of Physical and Chemical Properties with Chemical Structure: Utility for Prediction. Chem Rev 2010; 110:5714-89. [DOI: 10.1021/cr900238d] [Citation(s) in RCA: 386] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Alan R. Katritzky
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611
| | - Minati Kuanar
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611
| | - Svetoslav Slavov
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611
| | - C. Dennis Hall
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611
| | - Mati Karelson
- Institute of Chemistry, Tallinn University of Technology, Akadeemia tee 15, Tallinn 19086, Estonia, and MolCode, Ltd., Soola 8, Tartu 51013, Estonia
| | - Iiris Kahn
- Institute of Chemistry, Tallinn University of Technology, Akadeemia tee 15, Tallinn 19086, Estonia, and MolCode, Ltd., Soola 8, Tartu 51013, Estonia
| | - Dimitar A. Dobchev
- Institute of Chemistry, Tallinn University of Technology, Akadeemia tee 15, Tallinn 19086, Estonia, and MolCode, Ltd., Soola 8, Tartu 51013, Estonia
| |
Collapse
|
34
|
Su BH, Shen MY, Esposito EX, Hopfinger AJ, Tseng YJ. In Silico Binary Classification QSAR Models Based on 4D-Fingerprints and MOE Descriptors for Prediction of hERG Blockage. J Chem Inf Model 2010; 50:1304-18. [DOI: 10.1021/ci100081j] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Bo-Han Su
- Department of Computer Science and Information Engineering, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106, exeResearch, LLC, 32 University Drive, East Lansing, Michigan 48823, Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106, The Chem21 Group, Inc., 1780 Wilson Drive, Lake Forest, Illinois 60045, and College of Pharmacy MSC09 5360, 1 University of New Mexico, Albuquerque, New Mexico
| | - Meng-yu Shen
- Department of Computer Science and Information Engineering, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106, exeResearch, LLC, 32 University Drive, East Lansing, Michigan 48823, Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106, The Chem21 Group, Inc., 1780 Wilson Drive, Lake Forest, Illinois 60045, and College of Pharmacy MSC09 5360, 1 University of New Mexico, Albuquerque, New Mexico
| | - Emilio Xavier Esposito
- Department of Computer Science and Information Engineering, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106, exeResearch, LLC, 32 University Drive, East Lansing, Michigan 48823, Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106, The Chem21 Group, Inc., 1780 Wilson Drive, Lake Forest, Illinois 60045, and College of Pharmacy MSC09 5360, 1 University of New Mexico, Albuquerque, New Mexico
| | - Anton J. Hopfinger
- Department of Computer Science and Information Engineering, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106, exeResearch, LLC, 32 University Drive, East Lansing, Michigan 48823, Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106, The Chem21 Group, Inc., 1780 Wilson Drive, Lake Forest, Illinois 60045, and College of Pharmacy MSC09 5360, 1 University of New Mexico, Albuquerque, New Mexico
| | - Yufeng J. Tseng
- Department of Computer Science and Information Engineering, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106, exeResearch, LLC, 32 University Drive, East Lansing, Michigan 48823, Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No.1 Sec.4, Roosevelt Road, Taipei, Taiwan 106, The Chem21 Group, Inc., 1780 Wilson Drive, Lake Forest, Illinois 60045, and College of Pharmacy MSC09 5360, 1 University of New Mexico, Albuquerque, New Mexico
| |
Collapse
|
35
|
Sakiyama Y. The use of machine learning and nonlinear statistical tools for ADME prediction. Expert Opin Drug Metab Toxicol 2010; 5:149-69. [PMID: 19239395 DOI: 10.1517/17425250902753261] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future.
Collapse
Affiliation(s)
- Yojiro Sakiyama
- Pharmacokinetics Dynamics Metabolism, Pfizer Global Research and Development, Sandwich Laboratories, Kent, UK.
| |
Collapse
|
36
|
Ibrić S, Jovanović M, Djurić Z, Parojcić J, Solomun L, Lucić B. Generalized regression neural networks in prediction of drug stability. J Pharm Pharmacol 2010; 59:745-50. [PMID: 17524242 DOI: 10.1211/jpp.59.5.0017] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Abstract
This study had two aims. Firstly, we wanted to model the effects of the percentage of Eudragit RS PO and compression pressure as the most important process and formulation variables on the time course of drug release from extended-release matrix aspirin tablets. Secondly, we investigated the possibility of predicting drug stability and shelf-life using an artificial neural network (ANN). Ten types of matrix aspirin tablets were prepared as model formulations and were stored in stability chambers at 60°C, 50°C, 40°C and 30°C and controlled humidity. Samples were removed at predefined time points and analysed for acetylsalicylic acid (ASA) and salicylic acid (SA) content using stability-indicating HPLC. The decrease in aspirin content followed apparent zero-order kinetics. The amount of Eudragit RS PO and compression pressure were selected as causal factors. The apparent zero-order rate constants for each temperature were chosen as output variables for the ANN. A set of output parameters and causal factors were used as training data for the generalized regression neural network (GRNN). For two additional test formulations, Arrhenius plots were constructed from the experimentally observed and GRNN-predicted results. The slopes of experimentally observed and predicted Arrhenius plots were tested for significance using Student's t-test. For test formulations, the shelf life (t95%) was then calculated from experimentally observed values (t95% 82.90 weeks), as well as from GRNN-predicted values (t95% 81.88 weeks). These results demonstrate that GRNN networks can be used to predict ASA content and shelf life without stability testing for formulations in which the amount of polymer and tablet hardness are within the investigated range.
Collapse
Affiliation(s)
- Svetlana Ibrić
- Department of Pharmaceutical Technology, Faculty of Pharmacy, University of Belgrade, Vojvode Stepe 450, 11221 Belgrade, Serbia.
| | | | | | | | | | | |
Collapse
|
37
|
|
38
|
Ghavami R, Najafi A, Hemmateenejad B. QSPR studies on normal boiling points and molar refractivities of organic compounds by correlation-ranking-based PCR and PC–ANN analyses of new topological indices. CAN J CHEM 2009. [DOI: 10.1139/v09-109] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The new topological indices (Sh indices) based on the distance sum and connectivity of a molecular graph, previously developed by our team, were extended to predict the two physicochemical properties, including normal boiling point (NBP) and molar refractivity (MR), of a large set of organic compounds consisting of alkanes, alkenes, ethers, amines, alcohols, alkylbenzenes, and alkylhalides. The sets of molecular descriptors were derived directly from the two-dimensional molecular structure of the compounds based on graph theory. Both linear and nonlinear modelings were implemented by using principal component regression (PCR) and principal component – artificial neural network (PC–ANN) with back-propagation learning algorithm, respectively. Eigenvalue and correlation-ranking procedures were used to rank the principal components and entered them into the models. Principal component analysis of Sh data matrix showed that the respective six and seven PCs could explain 97.49% and 99.22% of variances in the Sh indices. PCR analysis of the NBP and MR data demonstrated that the proposed Sh indices could explain about 97.52% and 99.52% of variations, while the variations explained by the PC–ANN modeling were more than 99.00% and 99.82%, respectively. The predictive ability of the models were evaluated using an external test set for NBP and MR of the molecules with the respective root-mean-square errors lower than 9.69 K and 0.660 cm3mol–1for the linear model and 6.17 K and 0.416 cm3mol–1for the nonlinear model.
Collapse
Affiliation(s)
- Raouf Ghavami
- Department of Chemistry, Faculty of Science, University of Kurdistan, P. O. Box 416, Sanandaj, Iran
- Department of Chemistry, Shiraz University, Shiraz 71454, Iran
| | - Amir Najafi
- Department of Chemistry, Faculty of Science, University of Kurdistan, P. O. Box 416, Sanandaj, Iran
- Department of Chemistry, Shiraz University, Shiraz 71454, Iran
| | - Bahram Hemmateenejad
- Department of Chemistry, Faculty of Science, University of Kurdistan, P. O. Box 416, Sanandaj, Iran
- Department of Chemistry, Shiraz University, Shiraz 71454, Iran
| |
Collapse
|
39
|
Behzadi SS, Prakasvudhisarn C, Klocker J, Wolschann P, Viernstein H. Comparison between two types of Artificial Neural Networks used for validation of pharmaceutical processes. POWDER TECHNOL 2009. [DOI: 10.1016/j.powtec.2009.05.025] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
40
|
Goudarzi N, Goodarzi M. Prediction of the acidic dissociation constant (pKa) of some organic compounds using linear and nonlinear QSPR methods. Mol Phys 2009. [DOI: 10.1080/00268970902950394] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
41
|
Oliferenko PV, Oliferenko AA, Poda G, Palyulin VA, Zefirov NS, Katritzky AR. New Developments in Hydrogen Bonding Acidity and Basicity of Small Organic Molecules for the Prediction of Physical and ADMET Properties. Part 2. The Universal Solvation Equation. J Chem Inf Model 2009; 49:634-46. [DOI: 10.1021/ci800323q] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Polina V. Oliferenko
- Department of Chemistry, Moscow State University, Moscow, 119992 Russia, Structural & Computational Chemistry Group, Pfizer Global Research & Development, Chesterfield, Missouri 63017, and Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | - Alexander A. Oliferenko
- Department of Chemistry, Moscow State University, Moscow, 119992 Russia, Structural & Computational Chemistry Group, Pfizer Global Research & Development, Chesterfield, Missouri 63017, and Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | - Gennadiy Poda
- Department of Chemistry, Moscow State University, Moscow, 119992 Russia, Structural & Computational Chemistry Group, Pfizer Global Research & Development, Chesterfield, Missouri 63017, and Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | - Vladimir A. Palyulin
- Department of Chemistry, Moscow State University, Moscow, 119992 Russia, Structural & Computational Chemistry Group, Pfizer Global Research & Development, Chesterfield, Missouri 63017, and Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | - Nikolay S. Zefirov
- Department of Chemistry, Moscow State University, Moscow, 119992 Russia, Structural & Computational Chemistry Group, Pfizer Global Research & Development, Chesterfield, Missouri 63017, and Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | - Alan R. Katritzky
- Department of Chemistry, Moscow State University, Moscow, 119992 Russia, Structural & Computational Chemistry Group, Pfizer Global Research & Development, Chesterfield, Missouri 63017, and Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| |
Collapse
|
42
|
Chen H, Börjesson U, Engkvist O, Kogej T, Svensson MA, Blomberg N, Weigelt D, Burrows JN, Lange T. ProSAR: A New Methodology for Combinatorial Library Design. J Chem Inf Model 2009; 49:603-14. [DOI: 10.1021/ci800231d] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hongming Chen
- DECS GCS Computational Chemistry, AstraZeneca R&D Mölndal, Pepparedsleden 1, SE-43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca R&D Södertälje, SE-151 85 Södertälje, Sweden
| | - Ulf Börjesson
- DECS GCS Computational Chemistry, AstraZeneca R&D Mölndal, Pepparedsleden 1, SE-43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca R&D Södertälje, SE-151 85 Södertälje, Sweden
| | - Ola Engkvist
- DECS GCS Computational Chemistry, AstraZeneca R&D Mölndal, Pepparedsleden 1, SE-43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca R&D Södertälje, SE-151 85 Södertälje, Sweden
| | - Thierry Kogej
- DECS GCS Computational Chemistry, AstraZeneca R&D Mölndal, Pepparedsleden 1, SE-43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca R&D Södertälje, SE-151 85 Södertälje, Sweden
| | - Mats A. Svensson
- DECS GCS Computational Chemistry, AstraZeneca R&D Mölndal, Pepparedsleden 1, SE-43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca R&D Södertälje, SE-151 85 Södertälje, Sweden
| | - Niklas Blomberg
- DECS GCS Computational Chemistry, AstraZeneca R&D Mölndal, Pepparedsleden 1, SE-43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca R&D Södertälje, SE-151 85 Södertälje, Sweden
| | - Dirk Weigelt
- DECS GCS Computational Chemistry, AstraZeneca R&D Mölndal, Pepparedsleden 1, SE-43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca R&D Södertälje, SE-151 85 Södertälje, Sweden
| | - Jeremy N. Burrows
- DECS GCS Computational Chemistry, AstraZeneca R&D Mölndal, Pepparedsleden 1, SE-43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca R&D Södertälje, SE-151 85 Södertälje, Sweden
| | - Tim Lange
- DECS GCS Computational Chemistry, AstraZeneca R&D Mölndal, Pepparedsleden 1, SE-43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca R&D Södertälje, SE-151 85 Södertälje, Sweden
| |
Collapse
|
43
|
The application of artificial neural networks in the prediction of microemulsion phase boundaries in PEG-8 caprylic/capric glycerides based systems. Int J Pharm 2008; 361:41-6. [DOI: 10.1016/j.ijpharm.2008.05.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2008] [Revised: 05/05/2008] [Accepted: 05/07/2008] [Indexed: 11/19/2022]
|
44
|
The importance of the domain of applicability in QSAR modeling. J Mol Graph Model 2008; 26:1315-26. [DOI: 10.1016/j.jmgm.2008.01.002] [Citation(s) in RCA: 211] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2007] [Revised: 01/11/2008] [Accepted: 01/11/2008] [Indexed: 11/19/2022]
|
45
|
Lamanna C, Bellini M, Padova A, Westerberg G, Maccari L. Straightforward Recursive Partitioning Model for Discarding Insoluble Compounds in the Drug Discovery Process. J Med Chem 2008; 51:2891-7. [DOI: 10.1021/jm701407x] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | - Marta Bellini
- Siena Biotech S.p.A., Via Fiorentina 1, 53100, Siena, Italy
| | | | | | - Laura Maccari
- Siena Biotech S.p.A., Via Fiorentina 1, 53100, Siena, Italy
| |
Collapse
|
46
|
|
47
|
Abstract
Bayesian regularized artificial neural networks (BRANNs) are more robust than standard back-propagation nets and can reduce or eliminate the need for lengthy cross-validation. Bayesian regularization is a mathematical process that converts a nonlinear regression into a "well-posed" statistical problem in the manner of a ridge regression. The advantage of BRANNs is that the models are robust and the validation process, which scales as O(N2) in normal regression methods, such as back propagation, is unnecessary. These networks provide solutions to a number of problems that arise in QSAR modeling, such as choice of model, robustness of model, choice of validation set, size of validation effort, and optimization of network architecture. They are difficult to overtrain, since evidence procedures provide an objective Bayesian criterion for stopping training. They are also difficult to overfit, because the BRANN calculates and trains on a number of effective network parameters or weights, effectively turning off those that are not relevant. This effective number is usually considerably smaller than the number of weights in a standard fully connected back-propagation neural net. Automatic relevance determination (ARD) of the input variables can be used with BRANNs, and this allows the network to "estimate" the importance of each input. The ARD method ensures that irrelevant or highly correlated indices used in the modeling are neglected as well as showing which are the most important variables for modeling the activity data. This chapter outlines the equations that define the BRANN method plus a flowchart for producing a BRANN-QSAR model. Some results of the use of BRANNs on a number of data sets are illustrated and compared with other linear and nonlinear models.
Collapse
|
48
|
Maikut OM, Makitra RG, Pal’chikova EY. Application of multiparameter equations to calculation of the mutual solubility in water-organic solvent binary systems. RUSS J GEN CHEM+ 2007. [DOI: 10.1134/s1070363207120195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
49
|
Generation of in-silico cytochrome P450 1A2, 2C9, 2C19, 2D6, and 3A4 inhibition QSAR models. J Comput Aided Mol Des 2007; 21:559-73. [DOI: 10.1007/s10822-007-9139-6] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2007] [Accepted: 10/04/2007] [Indexed: 01/22/2023]
|
50
|
Schroeter TS, Schwaighofer A, Mika S, Ter Laak A, Suelzle D, Ganzer U, Heinrich N, Müller KR. Predicting Lipophilicity of Drug-Discovery Molecules using Gaussian Process Models. ChemMedChem 2007; 2:1265-7. [PMID: 17576646 DOI: 10.1002/cmdc.200700041] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Timon S Schroeter
- Intelligent Data Analysis Group, Fraunhofer FIRST, Kekulestrasse 7, 12489 Berlin, Germany.
| | | | | | | | | | | | | | | |
Collapse
|