1
|
Gheta SKO, Bonin A, Gerlach T, Göller AH. Predicting absolute aqueous solubility by applying a machine learning model for an artificially liquid-state as proxy for the solid-state. J Comput Aided Mol Des 2023; 37:765-789. [PMID: 37878216 DOI: 10.1007/s10822-023-00538-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 10/02/2023] [Indexed: 10/26/2023]
Abstract
In this study, we use machine learning algorithms with QM-derived COSMO-RS descriptors, along with Morgan fingerprints, to predict the absolute solubility of drug-like compounds. The QM-derived descriptors account for the molecular properties of the solute, i.e., the solute-solute interactions in an artificial-liquid-state (super-cooled liquid), and the solute-solvent interactions in solution. We employ two main approaches to predict solubility: (i) a hypothetical pathway that involves melting the solute at room temperature T = T¯ ([Formula: see text]) and mixing the artificially liquid solute into the solvent ([Formula: see text]). In this approach [Formula: see text] is predicted using machine learning models, and the [Formula: see text] is obtained from COSMO-RS calculations; (ii) direct solubility prediction using machine learning algorithms. The models were trained on a large number of Bayer in-house compounds for which water solubility data is available at physiological pH of 6.5 and ambient temperature. We also evaluated our models using external datasets from a solubility challenge. Our models present great improvements compared to the absolute solubility prediction with the QSAR model for the artificial liquid state as implemented in the COSMOtherm software, for both in-house and external datasets. We are furthermore able to demonstrate the superiority of QM-derived descriptors compared to cheminformatics descriptors. We finally present low-cost alternative models using fragment-based COSMOquick calculations with only marginal reduction in the quality of predicted solubility.
Collapse
Affiliation(s)
- Sadra Kashef Ol Gheta
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany
| | - Anne Bonin
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany
| | - Thomas Gerlach
- Bayer AG, Crop Science, R&D, Digital Transformation, 40789, Monheim, Germany
- Bayer AG, Engineering & Technology, Thermal Separation Technologies, 51368, Leverkusen, Germany
| | - Andreas H Göller
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany.
| |
Collapse
|
2
|
Syed TA, Ansari KB, Banerjee A, Wood DA, Khan MS, Al Mesfer MK. Machine‐learning predictions of caffeine co‐crystal formation accompanying experimental and molecular validations. J FOOD PROCESS ENG 2022. [DOI: 10.1111/jfpe.14230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Tanweer A. Syed
- Department of Chemical Engineering Institute of Chemical Technology Mumbai Maharashtra India
| | - Khursheed B. Ansari
- Department of Chemical Engineering Zakir Husain College of Engineering and Technology, Aligarh Muslim University Aligarh Uttar Pradesh India
| | - Arghya Banerjee
- Department of Chemical Engineering Indian Institute of Technology Ropar Punjab India
| | | | - Mohd Shariq Khan
- Department of Chemical Engineering, College of Engineering Dhofar University Salalah Oman
| | | |
Collapse
|
3
|
Xiouras C, Cameli F, Quilló GL, Kavousanakis ME, Vlachos DG, Stefanidis GD. Applications of Artificial Intelligence and Machine Learning Algorithms to Crystallization. Chem Rev 2022; 122:13006-13042. [PMID: 35759465 DOI: 10.1021/acs.chemrev.2c00141] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Artificial intelligence and specifically machine learning applications are nowadays used in a variety of scientific applications and cutting-edge technologies, where they have a transformative impact. Such an assembly of statistical and linear algebra methods making use of large data sets is becoming more and more integrated into chemistry and crystallization research workflows. This review aims to present, for the first time, a holistic overview of machine learning and cheminformatics applications as a novel, powerful means to accelerate the discovery of new crystal structures, predict key properties of organic crystalline materials, simulate, understand, and control the dynamics of complex crystallization process systems, as well as contribute to high throughput automation of chemical process development involving crystalline materials. We critically review the advances in these new, rapidly emerging research areas, raising awareness in issues such as the bridging of machine learning models with first-principles mechanistic models, data set size, structure, and quality, as well as the selection of appropriate descriptors. At the same time, we propose future research at the interface of applied mathematics, chemistry, and crystallography. Overall, this review aims to increase the adoption of such methods and tools by chemists and scientists across industry and academia.
Collapse
Affiliation(s)
- Christos Xiouras
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Fabio Cameli
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Gustavo Lunardon Quilló
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium.,Chemical and BioProcess Technology and Control, Department of Chemical Engineering, Faculty of Engineering Technology, KU Leuven, Gebroeders de Smetstraat 1, 9000 Ghent, Belgium
| | - Mihail E Kavousanakis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece
| | - Dionisios G Vlachos
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Georgios D Stefanidis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece.,Laboratory for Chemical Technology, Ghent University; Tech Lane Ghent Science Park 125, B-9052 Ghent, Belgium
| |
Collapse
|
4
|
Shin HK, Kim S, Yoon S. Use of size-dependent electron configuration fingerprint to develop general prediction models for nanomaterials. NANOIMPACT 2021; 21:100298. [PMID: 35559785 DOI: 10.1016/j.impact.2021.100298] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 01/18/2021] [Accepted: 01/18/2021] [Indexed: 06/15/2023]
Abstract
Due to the lack of nano descriptors that can appropriately represent the wide chemical space of engineered nanomaterials (ENMs), applicability domain of nano-quantitative structure-activity relationship models are limited to certain types of ENMs, such as metal oxides, metals, carbon-based nanomaterials, or quantum dots. In this study, a size-dependent electron configuration fingerprint (SDEC FP) was introduced to estimate the quantity of electrons based on the core, doping, and coating materials of ENMs in different sizes. SDEC FP was used in prediction model development and nanostructure similarity analysis on datasets including metal and carbon-based nanomaterials with and without surface modifications. Cytotoxicity and zeta potential prediction models developed with SDEC FP achieved good prediction accuracies on test set. Nanostructure similarity analysis was performed through principal component analysis which showed that structural similarity between ENMs measured by SDEC FP was highly correlated with their properties.
Collapse
Affiliation(s)
- Hyun Kil Shin
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon 34114, Republic of Korea.
| | - Soojin Kim
- Molecular Toxicology Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon 34114, Republic of Korea
| | - Seokjoo Yoon
- Molecular Toxicology Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon 34114, Republic of Korea
| |
Collapse
|
5
|
Esaki T, Ohashi R, Watanabe R, Natsume-Kitatani Y, Kawashima H, Nagao C, Komura H, Mizuguchi K. Constructing an In Silico Three-Class Predictor of Human Intestinal Absorption With Caco-2 Permeability and Dried-DMSO Solubility. J Pharm Sci 2019; 108:3630-3639. [DOI: 10.1016/j.xphs.2019.07.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 07/06/2019] [Accepted: 07/17/2019] [Indexed: 01/03/2023]
|
6
|
Ruiz IL, Gómez-Nieto MÁ. Study of the Applicability Domain of the QSAR Classification Models by Means of the Rivality and Modelability Indexes. Molecules 2018; 23:molecules23112756. [PMID: 30356020 PMCID: PMC6278359 DOI: 10.3390/molecules23112756] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 10/14/2018] [Accepted: 10/22/2018] [Indexed: 11/30/2022] Open
Abstract
The reliability of a QSAR classification model depends on its capacity to achieve confident predictions of new compounds not considered in the building of the model. The results of this external validation process show the applicability domain (AD) of the QSAR model and, therefore, the robustness of the model to predict the property/activity of new molecules. In this paper we propose the use of the rivality and modelability indexes for the study of the characteristics of the datasets to be correctly modeled by a QSAR algorithm and to predict the reliability of the built model to prognosticate the property/activity of new molecules. The calculation of these indexes has a very low computational cost, not requiring the building of a model, thus being good tools for the analysis of the datasets in the first stages of the building of QSAR classification models. In our study, we have selected two benchmark datasets with similar number of molecules but with very different modelability and we have corroborated the capacity of the predictability of the rivality and modelability indexes regarding the classification models built using Support Vector Machine and Random Forest algorithms with 5-fold cross-validation and leave-one-out techniques. The results have shown the excellent ability of both indexes to predict outliers and the applicability domain of the QSAR classification models. In all cases, these values accurately predicted the statistic parameters of the QSAR models generated by the algorithms.
Collapse
Affiliation(s)
- Irene Luque Ruiz
- Department of Computing and Numerical Analysis, Campus Universitario de Rabanales, Albert Einstein Building, University of Córdoba, E-14071 Córdoba, Spain.
| | - Miguel Ángel Gómez-Nieto
- Department of Computing and Numerical Analysis, Campus Universitario de Rabanales, Albert Einstein Building, University of Córdoba, E-14071 Córdoba, Spain.
| |
Collapse
|
7
|
Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling. Methods Mol Biol 2018; 1800:141-169. [PMID: 29934891 DOI: 10.1007/978-1-4939-7899-1_6] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
In the context of human safety assessment through quantitative structure-activity relationship (QSAR) modeling, the concept of applicability domain (AD) has an enormous role to play. The Organization of Economic Co-operation and Development (OECD) for QSAR model validation recommended as principle 3 "A defined domain of applicability" to be present for a predictive QSAR model. The study of AD allows estimating the uncertainty in the prediction for a particular molecule based on how similar it is to the training compounds which are used in the model development. In the current scenario, AD represents an active research topic, and many methods have been designed to estimate the competence of a model and the confidence in its outcome for a given prediction task. Thus, characterization of interpolation space is significant in defining the AD. The diverse set of reported AD methods was constructed through different hypotheses and algorithms. These multiplicities of methodologies mystify the end users and make the comparison of the AD for different models a complex issue to address. We have attempted to summarize in this chapter the important concepts of AD including particulars of the available methods to compute the AD along with their thresholds and criteria for estimating AD through training set interpolation in the descriptor space. The idea about transparent domain and decision domain are also discussed. To help readers determine the AD in their projects, practical examples together with available open source software tools are provided.
Collapse
|
8
|
Lombardo F, Desai PV, Arimoto R, Desino KE, Fischer H, Keefer CE, Petersson C, Winiwarter S, Broccatelli F. In Silico Absorption, Distribution, Metabolism, Excretion, and Pharmacokinetics (ADME-PK): Utility and Best Practices. An Industry Perspective from the International Consortium for Innovation through Quality in Pharmaceutical Development. J Med Chem 2017; 60:9097-9113. [DOI: 10.1021/acs.jmedchem.7b00487] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Franco Lombardo
- Alkermes Inc., 852 Winter Street, Waltham, Massachusetts 02451, United States
| | - Prashant V. Desai
- Computational
ADME, Drug Disposition, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Rieko Arimoto
- Vertex Pharmaceuticals Inc., 50 Northern Avenue, Boston, Massachusetts 02210, United States
| | | | - Holger Fischer
- Roche
Pharmaceutical Research and Early Development, Pharmaceutical Sciences,
Innovation Center Basel, F. Hoffmann-La Roche Ltd., 4070 Basel, Switzerland
| | | | - Carl Petersson
- Discovery Drug Disposition, Biopharma, R&D Global Early Development, EMD Serono, Frankfurter Strasse 250 I Postcode D39/001, 64293 Darmstadt, Germany
| | - Susanne Winiwarter
- Drug Safety and Metabolism, AstraZeneca R&D Gothenburg, 431 83 Mölndal, Sweden
| | - Fabio Broccatelli
- Genentech Inc., South San Francisco, California 94080, United States
| |
Collapse
|
9
|
Abstract
Quantitative Structure-Activity Relationship (QSAR) models have manifold applications in drug discovery, environmental fate modeling, risk assessment, and property prediction of chemicals and pharmaceuticals. One of the principles recommended by the Organization of Economic Co-operation and Development (OECD) for model validation requires defining the Applicability Domain (AD) for QSAR models, which allows one to estimate the uncertainty in the prediction of a compound based on how similar it is to the training compounds, which are used in the model development. The AD is a significant tool to build a reliable QSAR model, which is generally limited in use to query chemicals structurally similar to the training compounds. Thus, characterization of interpolation space is significant in defining the AD. An attempt is made in this chapter to address the important concepts and methodology of the AD as well as criteria for estimating AD through training set interpolation in the descriptor space.
Collapse
|
10
|
Mathea M, Klingspohn W, Baumann K. Chemoinformatic Classification Methods and their Applicability Domain. Mol Inform 2016; 35:160-80. [PMID: 27492083 DOI: 10.1002/minf.201501019] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 01/20/2016] [Indexed: 11/08/2022]
Abstract
Classification rules are often used in chemoinformatics to predict categorical properties of drug candidates related to bioactivity from explanatory variables, which encode the respective molecular structures (i.e. molecular descriptors). To avoid predictions with an unduly large error probability, the domain the classifier is applied to should be restricted to the domain covered by the training set objects. This latter domain is commonly referred to as applicability domain in chemoinformatics. Conceptually, the applicability domain defines the region in space where the "normal" objects are located. Defining the border of the applicability domain may then be viewed as detecting anomalous or novel objects or as detecting outliers. Currently two different types of measures are in use. The first one defines the applicability domain solely in terms of the molecular descriptor space, which is referred to as novelty detection. The second type defines the applicability domain in terms of the expected reliability of the predictions which is referred to as confidence estimation. Both types are systematically differentiated here and the most popular measures are reviewed. It will be shown that all common chemoinformatic classifiers have built-in confidence scores. Since confidence estimation uses information of the class labels for computing the confidence scores, it is expected to be more efficient in reducing the error rate than novelty detection, which solely uses the information of the explanatory variables.
Collapse
Affiliation(s)
- Miriam Mathea
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany
| | - Waldemar Klingspohn
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany.
| |
Collapse
|
11
|
Gawehn E, Hiss JA, Schneider G. Deep Learning in Drug Discovery. Mol Inform 2015; 35:3-14. [PMID: 27491648 DOI: 10.1002/minf.201501008] [Citation(s) in RCA: 309] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 12/01/2015] [Indexed: 12/18/2022]
Abstract
Artificial neural networks had their first heyday in molecular informatics and drug discovery approximately two decades ago. Currently, we are witnessing renewed interest in adapting advanced neural network architectures for pharmaceutical research by borrowing from the field of "deep learning". Compared with some of the other life sciences, their application in drug discovery is still limited. Here, we provide an overview of this emerging field of molecular informatics, present the basic concepts of prominent deep learning methods and offer motivation to explore these techniques for their usefulness in computer-assisted drug discovery and design. We specifically emphasize deep neural networks, restricted Boltzmann machine networks and convolutional networks.
Collapse
Affiliation(s)
- Erik Gawehn
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093 Zurich, Switzerland, Fax: +41 44 633 13 79, Tel: +41 44 633 74 38
| | - Jan A Hiss
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093 Zurich, Switzerland, Fax: +41 44 633 13 79, Tel: +41 44 633 74 38
| | - Gisbert Schneider
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093 Zurich, Switzerland, Fax: +41 44 633 13 79, Tel: +41 44 633 74 38.
| |
Collapse
|
12
|
Abstract
Quantitative Structure-Activity Relationship (QSAR) models have manifold applications in drug discovery, environmental fate modeling, risk assessment, and property prediction of chemicals and pharmaceuticals. One of the principles recommended by the Organization of Economic Co-operation and Development (OECD) for model validation requires defining the Applicability Domain (AD) for QSAR models, which allows one to estimate the uncertainty in the prediction of a compound based on how similar it is to the training compounds, which are used in the model development. The AD is a significant tool to build a reliable QSAR model, which is generally limited in use to query chemicals structurally similar to the training compounds. Thus, characterization of interpolation space is significant in defining the AD. An attempt is made in this chapter to address the important concepts and methodology of the AD as well as criteria for estimating AD through training set interpolation in the descriptor space.
Collapse
|
13
|
Salazar J, Ghanem A, Müller RH, Möschwitzer JP. Nanocrystals: comparison of the size reduction effectiveness of a novel combinative method with conventional top-down approaches. Eur J Pharm Biopharm 2012; 81:82-90. [PMID: 22233547 DOI: 10.1016/j.ejpb.2011.12.015] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2011] [Revised: 12/21/2011] [Accepted: 12/22/2011] [Indexed: 10/14/2022]
Abstract
Nanosizing is a non-specific approach to improve the oral bioavailability of poorly soluble drugs. The decreased particle size of these compounds results in an increase in surface area. The outcome is an increased rate of dissolution, which can lead to a better oral absorption. Standard approaches are bottom-up and top-down techniques. Combinative technologies are relatively new approaches, and they can be described as a combination of a bottom-up process followed by a top-down step. The work presented in this paper can be described as a combination of a non-aqueous freeze drying step (bottom-up), followed by wet ball milling or high pressure homogenization (top-down) to produce fine drug nanocrystals. The crystal habit of the model drug glibenclamide was modified by freeze drying from dimethyl sulfoxide (DMSO)/tert-butanol (TBA) solvent mixtures using different ratios. The resulting drug powders were characterized by scanning electron microscopy (SEM) as well as by X-ray powder diffraction (XRPD) and differential scanning calorimetry (DSC). It was shown that the combinative approach can significantly improve the particle size reduction effectiveness of both top-down methods over conventional approaches. Drug lyophilization using DMSO:TBA in 25:75 and 10:90 v/v ratios resulted in a highly porous and breakable material. The milling time to achieve nanosuspensions was reduced from 24h with the jet-milled glibenclamide to only 1h with the modified starting material. The number of homogenization cycles was decreased from 20 with unmodified API to only 5 with the modified drug. The smallest particle size, achieved on modified samples, was 160nm by wet ball milling after 24h and 355nm by high pressure homogenization after 20 homogenization cycles at 1500bar.
Collapse
Affiliation(s)
- Jaime Salazar
- Dept. of Pharmaceutics, Biopharmaceutics and NutriCosmetics, Freie Universität Berlin, Berlin, Germany
| | | | | | | |
Collapse
|
14
|
Soto AJ, Vazquez GE, Strickert M, Ponzoni I. Target-Driven Subspace Mapping Methods and Their Applicability Domain Estimation. Mol Inform 2011; 30:779-89. [DOI: 10.1002/minf.201100053] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Accepted: 05/26/2011] [Indexed: 11/06/2022]
|
15
|
Garkani-Nejad Z, Poshteh-Shirani M. Modeling of 13C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors. CAN J CHEM 2011. [DOI: 10.1139/v11-041] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The primary goal of a quantitative structure–property relationship study is to identify a set of structurally based numerical descriptors that can be mathematically linked to a property of interest. In this work, two main groups of descriptors have been used to predict 13C NMR chemical shifts of ipso, ortho, meta, and para positions in a series of 113 monosubstituted benzenes. First, two groups of descriptors — original molecular descriptors (constitutional, topological, electronic, and geometrical) and multivariate image analysis (MIA) descriptors — were calculated. Then, calculated descriptors were subjected to principal component analysis and the most significant principal components were extracted. Finally, more correlated principal components were used as inputs of artificial neural networks. The results obtained using the rank correlation–principal component–artificial neural network (RC–PC–ANN) modeling method show high ability to predict 13C NMR chemical shifts. Also, comparison of the results indicates that MIA descriptors show better ability to predict 13C NMR chemical shifts than the original molecular descriptors.
Collapse
Affiliation(s)
- Zahra Garkani-Nejad
- Chemistry Department, Faculty of Science, Vali-e-Asr University, Rafsanjan, Iran
| | | |
Collapse
|
16
|
Garkani-Nejad Z, Ahmadvand M. Comparative QSRR Modeling of Nitrobenzene Derivatives Based on Original Molecular Descriptors and Multivariate Image Analysis Descriptors. Chromatographia 2011. [DOI: 10.1007/s10337-011-1969-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
17
|
Cheng T, Li Q, Wang Y, Bryant SH. Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection. J Chem Inf Model 2011; 51:229-36. [PMID: 21214224 PMCID: PMC3047290 DOI: 10.1021/ci100364a] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Aqueous solubility is recognized as a critical parameter in both the early- and late-stage drug discovery. Therefore, in silico modeling of solubility has attracted extensive interests in recent years. Most previous studies have been limited in using relatively small data sets with limited diversity, which in turn limits the predictability of derived models. In this work, we present a support vector machines model for the binary classification of solubility by taking advantage of the largest known public data set that contains over 46 000 compounds with experimental solubility. Our model was optimized in combination with a reduction and recombination feature selection strategy. The best model demonstrated robust performance in both cross-validation and prediction of two independent test sets, indicating it could be a practical tool to select soluble compounds for screening, purchasing, and synthesizing. Moreover, our work may be used for comparative evaluation of solubility classification studies ascribe to the use of completely public resources.
Collapse
Affiliation(s)
- Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | | | | | |
Collapse
|
18
|
Suenderhauf C, Hammann F, Maunz A, Helma C, Huwyler J. Combinatorial QSAR Modeling of Human Intestinal Absorption. Mol Pharm 2010; 8:213-24. [DOI: 10.1021/mp100279d] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Claudia Suenderhauf
- Division of Pharmaceutical Technology, Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, CH-4056 Basel, Switzerland, Freiburger Zentrum für Datenanalyse und Modellbildung, University Freiburg, Hermann Herder Strasse 3a, D-70104 Freiburg, Germany, and In silico toxicology, Altkircherstrasse 3a, CH-4054 Basel, Switzerland
| | - Felix Hammann
- Division of Pharmaceutical Technology, Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, CH-4056 Basel, Switzerland, Freiburger Zentrum für Datenanalyse und Modellbildung, University Freiburg, Hermann Herder Strasse 3a, D-70104 Freiburg, Germany, and In silico toxicology, Altkircherstrasse 3a, CH-4054 Basel, Switzerland
| | - Andreas Maunz
- Division of Pharmaceutical Technology, Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, CH-4056 Basel, Switzerland, Freiburger Zentrum für Datenanalyse und Modellbildung, University Freiburg, Hermann Herder Strasse 3a, D-70104 Freiburg, Germany, and In silico toxicology, Altkircherstrasse 3a, CH-4054 Basel, Switzerland
| | - Christoph Helma
- Division of Pharmaceutical Technology, Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, CH-4056 Basel, Switzerland, Freiburger Zentrum für Datenanalyse und Modellbildung, University Freiburg, Hermann Herder Strasse 3a, D-70104 Freiburg, Germany, and In silico toxicology, Altkircherstrasse 3a, CH-4054 Basel, Switzerland
| | - Jörg Huwyler
- Division of Pharmaceutical Technology, Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, CH-4056 Basel, Switzerland, Freiburger Zentrum für Datenanalyse und Modellbildung, University Freiburg, Hermann Herder Strasse 3a, D-70104 Freiburg, Germany, and In silico toxicology, Altkircherstrasse 3a, CH-4054 Basel, Switzerland
| |
Collapse
|
19
|
Hecht D. Applications of machine learning and computational intelligence to drug discovery and development. Drug Dev Res 2010. [DOI: 10.1002/ddr.20402] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- David Hecht
- Southwestern College, Chula Vista, California
| |
Collapse
|
20
|
Garkani-Nejad Z, Poshteh-Shirani M. Application of multivariate image analysis in QSPR study of 13C chemical shifts of naphthalene derivatives: A comparative study. Talanta 2010; 83:225-32. [DOI: 10.1016/j.talanta.2010.09.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Revised: 09/06/2010] [Accepted: 09/08/2010] [Indexed: 11/26/2022]
|
21
|
Sushko I, Novotarskyi S, Körner R, Pandey AK, Cherkasov A, Li J, Gramatica P, Hansen K, Schroeter T, Müller KR, Xi L, Liu H, Yao X, Öberg T, Hormozdiari F, Dao P, Sahinalp C, Todeschini R, Polishchuk P, Artemenko A, Kuz’min V, Martin TM, Young DM, Fourches D, Muratov E, Tropsha A, Baskin I, Horvath D, Marcou G, Muller C, Varnek A, Prokopenko VV, Tetko IV. Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set. J Chem Inf Model 2010; 50:2094-111. [DOI: 10.1021/ci100253r] [Citation(s) in RCA: 172] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Affiliation(s)
- Iurii Sushko
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Sergii Novotarskyi
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Robert Körner
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Anil Kumar Pandey
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Artem Cherkasov
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Jiazhong Li
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Paola Gramatica
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Katja Hansen
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Timon Schroeter
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Klaus-Robert Müller
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Lili Xi
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Huanxiang Liu
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Xiaojun Yao
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Tomas Öberg
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Farhad Hormozdiari
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Phuong Dao
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Cenk Sahinalp
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Roberto Todeschini
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Pavel Polishchuk
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Anatoliy Artemenko
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Victor Kuz’min
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Todd M. Martin
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Douglas M. Young
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Denis Fourches
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Eugene Muratov
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Alexander Tropsha
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Igor Baskin
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Dragos Horvath
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Gilles Marcou
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Christophe Muller
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Alexander Varnek
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Volodymyr V. Prokopenko
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| | - Igor V. Tetko
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum Muenchen—German Research Center for Environmental Health (GmbH), Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany, University of British Columbia, Vancouver Prostate Centre, 2660 Oak str., Vancouver, BC, V6H 3Z6, Canada, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Structural and Functional Biology, University of Insubria, Via Dunant 3, Varese 21100, Italy, Machine Learning Department, Technical
| |
Collapse
|
22
|
Kramer C, Beck B, Clark T. Insolubility classification with accurate prediction probabilities using a MetaClassifier. J Chem Inf Model 2010; 50:404-14. [PMID: 20088498 DOI: 10.1021/ci900377e] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Insolubility is a crucial issue in drug design because insoluble compounds are often measured to be inactive although they might be active if they were soluble. We provide and analyze various insolubility classification models based on a recently published data set and compounds measured in-house at Boehringer-Ingelheim. The 2D descriptor sets from pharmacophore fingerprints and MOE and the 3D descriptor sets from ParaSurf and VolSurf were examined in conjunction with support vector machines, Bayesian regularized neural networks, and random forests. We introduce a classifier-fusion strategy, called metaclassifier, which improves upon the best single prediction and at the same time avoids descriptor selection, a potential source of overfitting. The metaclassifier strategy is compared to the simpler fusion strategies of maximum vote and highest probability picking. A prediction accuracy of 72.6% on a three class model is achieved with the metaclassifier, with nearly perfect separation of soluble and insoluble compounds and prediction as good as our calculated maximum possible agreement with experiment.
Collapse
Affiliation(s)
- Christian Kramer
- Department of Lead Discovery, Boehringer-Ingelheim Pharma GmbH & Co. KG, Biberach, Germany
| | | | | |
Collapse
|
23
|
Sushko I, Novotarskyi S, Pandey AK, Körner R, Tetko I. Applicability domain for classification problems. J Cheminform 2010. [PMCID: PMC2867176 DOI: 10.1186/1758-2946-2-s1-p41] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
24
|
Oyarzabal J, Pastor J, Howe TJ. Optimizing the Performance of In Silico ADMET General Models According to Local Requirements: MARS Approach. Solubility Estimations As Case Study. J Chem Inf Model 2009; 49:2837-50. [DOI: 10.1021/ci900308u] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Julen Oyarzabal
- Departments of Molecular Informatics and Medicinal Chemistry, Johnson & Johnson Pharmaceutical Research and Development, Jarama 75, 45007 Toledo, Spain and Department of Molecular Informatics, Johnson & Johnson Pharmaceutical Research and Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Joaquin Pastor
- Departments of Molecular Informatics and Medicinal Chemistry, Johnson & Johnson Pharmaceutical Research and Development, Jarama 75, 45007 Toledo, Spain and Department of Molecular Informatics, Johnson & Johnson Pharmaceutical Research and Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Trevor J. Howe
- Departments of Molecular Informatics and Medicinal Chemistry, Johnson & Johnson Pharmaceutical Research and Development, Jarama 75, 45007 Toledo, Spain and Department of Molecular Informatics, Johnson & Johnson Pharmaceutical Research and Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| |
Collapse
|
25
|
|
26
|
Kramer C, Heinisch T, Fligge T, Beck B, Clark T. A Consistent Dataset of Kinetic Solubilities for Early-Phase Drug Discovery. ChemMedChem 2009; 4:1529-36. [DOI: 10.1002/cmdc.200900205] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
27
|
Three-class classification models of logS and logP derived by using GA–CG–SVM approach. Mol Divers 2009; 13:261-8. [DOI: 10.1007/s11030-009-9108-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Accepted: 01/09/2009] [Indexed: 10/21/2022]
|
28
|
Simmons K, Kinney J, Owens A, Kleier DA, Bloch K, Argentar D, Walsh A, Vaidyanathan G. Practical Outcomes of Applying Ensemble Machine Learning Classifiers to High-Throughput Screening (HTS) Data Analysis and Screening. J Chem Inf Model 2008; 48:2196-206. [DOI: 10.1021/ci800164u] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Kirk Simmons
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - John Kinney
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Aaron Owens
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Daniel A. Kleier
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Karen Bloch
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Dave Argentar
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Alicia Walsh
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| | - Ganesh Vaidyanathan
- Simmons Consulting, 52 Windybush Way, Titusville, New Jersey 08560, DuPont Stine Haskell Research Laboratories, 1090 Elkton Road, Newark, Delaware 19711, DuPont Engineering Research and Technology, POB 80249, Wilmington, Delaware 19880, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, Sun Edge, LLC, 147 Tuckahoe Lane, Bear, Delaware 19701, and Quantum Leap Innovations, 3 Innovation Way, Suite 100, Newark, Delaware 19711
| |
Collapse
|
29
|
Highly correlating distance/connectivity-based topological indices. J Mol Graph Model 2008; 27:506-11. [DOI: 10.1016/j.jmgm.2008.09.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2008] [Revised: 08/16/2008] [Accepted: 09/02/2008] [Indexed: 11/20/2022]
|
30
|
Johnson SR, Chen XQ, Murphy D, Gudmundsson O. A Computational Model for the Prediction of Aqueous Solubility That Includes Crystal Packing, Intrinsic Solubility, and Ionization Effects. Mol Pharm 2007; 4:513-23. [PMID: 17539661 DOI: 10.1021/mp070030+] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The optimization of aqueous solubility is an important step along the route to bringing a new therapeutic to market. We describe the development of an empirical computational model to rank the pH-dependent aqueous solubility of drug candidates. The model consists of three core components to describe aqueous solubility. The first is a multivariate QSAR model for the prediction of the intrinsic solubility of the neutral solute. The second facet of the approach is the consideration of ionization using a predicted pKa and the Henderson-Hasselbalch equation. The third aspect of the model is a novel method for assessing the effects of crystal packing on solubility through a series of short molecular dynamics simulations of an actual or hypothetical small molecule crystal structure at escalating temperatures. The model also includes a Monte Carlo error function that considers the variability of each of the underlying components of the model to estimate the 90% confidence interval of estimation.
Collapse
|
31
|
Fredsted B, Brockhoff P, Vind C, Padkjær S, Refsgaard H. In Silico Classification of Solubility using Binaryk-Nearest Neighbor and Physicochemical Descriptors. ACTA ACUST UNITED AC 2007. [DOI: 10.1002/qsar.200610099] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
32
|
Maniyar DM, Nabney IT, Williams BS, Sewing A. Data visualization during the early stages of drug discovery. J Chem Inf Model 2006; 46:1806-18. [PMID: 16859312 DOI: 10.1021/ci050471a] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software.
Collapse
Affiliation(s)
- Dharmesh M Maniyar
- Neural Computing Research Group, Information Engineering, Aston University, Birmingham, B4 7ET, United Kingdom.
| | | | | | | |
Collapse
|
33
|
Abstract
This Review describes some of the approaches and techniques used today to derive in silico models for the prediction of ADMET properties. The article also discusses some of the fundamental requirements for deriving statistically sound and predictive ADMET relationships as well as some of the pitfalls and problems encountered during these investigations. It is the intension of the authors to make the reader aware of some of the challenges involved in deriving useful in silico ADMET models for drug development.
Collapse
Affiliation(s)
- Ulf Norinder
- AstraZeneca Research and Development Södertälje, Södertälje, Sweden.
| | | |
Collapse
|
34
|
Tetko IV, Bruneau P, Mewes HW, Rohrer DC, Poda GI. Can we estimate the accuracy of ADME–Tox predictions? Drug Discov Today 2006; 11:700-7. [PMID: 16846797 DOI: 10.1016/j.drudis.2006.06.013] [Citation(s) in RCA: 162] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2006] [Revised: 04/07/2006] [Accepted: 06/16/2006] [Indexed: 11/26/2022]
Abstract
There have recently been developments in the methods used to access the accuracy of the prediction and applicability domain of absorption, distribution, metabolism, excretion and toxicity models, and also in the methods used to predict the physicochemical properties of compounds in the early stages of drug development. The methods are classified into two main groups: those based on the analysis of similarity of molecules, and those based on the analysis of calculated properties. An analysis of octanol-water distribution coefficients is used to exemplify the consistency of estimated and calculated accuracy of the ALOGPS program (http://www.vcclab.org) to predict in-house and publicly available datasets.
Collapse
Affiliation(s)
- Igor V Tetko
- Institute for Bioinformatics, GSF--National Research Centre for Environment and Health, Neuherberg, D-85764, Germany.
| | | | | | | | | |
Collapse
|
35
|
|
36
|
Johnson SR, Zheng W. Recent progress in the computational prediction of aqueous solubility and absorption. AAPS JOURNAL 2006; 8:E27-40. [PMID: 16584131 PMCID: PMC2751421 DOI: 10.1208/aapsj080104] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The computational prediction of aqueous solubility and/or human absorption has been the goal of many researchers in recent years. Such an in silico counterpart to the biopharmaceutical classification system (BCS) would have great utility. This review focuses on recent developments in the computational prediction of aqueous solubility, P-glycoprotein transport, and passive absorption. We find that, while great progress has been achieved, models that can reliably affect chemistry and development are still lacking. We briefly discuss aspects of emerging scientific understanding that may lead to breakthroughs in the computational modeling of these properties.
Collapse
Affiliation(s)
- Stephen R. Johnson
- />Computer-Assisted Drug Design, Bristol-Myers Squibb Pharmaceutical Research Institute, PO Box 4000, 08543 Princeton, NJ
| | - Weifan Zheng
- />Division of Medicinal Chemistry, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC
| |
Collapse
|
37
|
Triballeau N, Acher F, Brabet I, Pin JP, Bertrand HO. Virtual Screening Workflow Development Guided by the “Receiver Operating Characteristic” Curve Approach. Application to High-Throughput Docking on Metabotropic Glutamate Receptor Subtype 4. J Med Chem 2005; 48:2534-47. [PMID: 15801843 DOI: 10.1021/jm049092j] [Citation(s) in RCA: 458] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The "receiver operating characteristic" (ROC) curve method is a well-recognized metric used as an objective way to evaluate the ability of a given test to discriminate between two populations. This facilitates decision-making in a plethora of fields in which a wrong judgment may have serious consequences including clinical diagnosis, public safety, travel security, and economic strategies. When virtual screening is used to speed-up the drug discovery process in pharmaceutical research, taking the right decision upon selecting or discarding a molecule prior to in vitro evaluation is of paramount importance. Characterizing both the ability of a virtual screening workflow to select active molecules and the ability to discard inactive ones, the ROC curve approach is well suited for this critical decision gate. As a case study, the first virtual screening workflow focused on metabotropic glutamate receptor subtype 4 (mGlu4R) agonists is reported here. Six compounds out of 38 selected and tested in vitro were shown to have agonist activity on this target of therapeutic interest.
Collapse
Affiliation(s)
- Nicolas Triballeau
- Laboratoire de Chimie et Biochimie Pharmacologiques et Toxicologiques, UMR8601-CNRS, Université René Descartes-Paris V, 75270 Paris Cedex 06, France.
| | | | | | | | | |
Collapse
|
38
|
von Korff M, Steger M. GPCR-tailored pharmacophore pattern recognition of small molecular ligands. ACTA ACUST UNITED AC 2005; 44:1137-47. [PMID: 15154783 DOI: 10.1021/ci0303013] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The goal of our work was to differentiate between patterns, which are responsible for the activity of small molecular ligands binding to G-protein coupled receptors (GPCRs) and molecules, which are pharmacologically active on other target classes. Second the aim was to go one step further and analyze the chemical space occupied by GPCR active ligands itself, to distinguish between the actives of different subclasses or even cluster ligands for single receptors. To achieve these objectives, we have built a database of small, organic molecules, which bind to GPCRs. Once this crucial foundation for pattern recognition has been laid, we needed to find a descriptor, which is able to detect the compulsory features responsible for activity within a molecule. In this matter we found that the well accepted pharmacophore descriptor served us well. Finally we needed to find a method to display the clustering or separation of the specific ligands. We found that self-organizing maps (SOMs) perform excellently in this task. We herein present the analysis of the chemical space of active compounds, depending on their biological target, the GPCRs. We will also discuss the techniques used to create the chemical spaces. The findings can be applied and have an impact at various stages of the drug discovery process.
Collapse
|
39
|
Wegner JK, Fröhlich H, Zell A. Feature selection for descriptor based classification models. 1. Theory and GA-SEC algorithm. ACTA ACUST UNITED AC 2005; 44:921-30. [PMID: 15154758 DOI: 10.1021/ci0342324] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present several standard approaches and modifications of our Genetic Algorithm based on the Shannon Entropy Cliques (GA-SEC) algorithm and the extension for classification problems using boosting.
Collapse
Affiliation(s)
- Jörg K Wegner
- Zentrum für Bioinformatik Tübingen (ZBIT), Universität Tübingen, Sand 1, D-72076 Tübingen, Germany.
| | | | | |
Collapse
|
40
|
Ford MG, Pitt WR, Whitley DC. Selecting compounds for focused screening using linear discriminant analysis and artificial neural networks. J Mol Graph Model 2004; 22:467-72. [PMID: 15182805 DOI: 10.1016/j.jmgm.2004.03.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/03/2004] [Indexed: 10/26/2022]
Abstract
Linear discriminant analysis and a committee of neural networks have been applied to recognise compounds that act at biological targets belonging to a specific gene family, protein kinases. The MDDR database was used to provide compounds targeted against this family and sets of randomly selected molecules. BCUT parameters were employed as input descriptors that encode structural properties and information relevant to ligand-receptor interactions. The technique was applied to purchasing compounds from external suppliers. These compounds achieved hit rates on a par with those achieved using known actives for related targets when tested for the ability to inhibit kinases at a single concentration. This approach is intended as one of a series of filters in the selection of screening candidates, compound purchases and the application of synthetic priorities to combinatorial libraries.
Collapse
Affiliation(s)
- M G Ford
- Centre for Molecular Design, IBBS, University of Portsmouth, King Henry Building, King Henry I St., Portsmouth PO1 2DY, UK.
| | | | | |
Collapse
|
41
|
Biological data mining with neural networks: implementation and application of a flexible decision tree extraction algorithm to genomic problem domains. Neurocomputing 2004. [DOI: 10.1016/j.neucom.2003.10.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
42
|
Wegner JK, Zell A. Prediction of aqueous solubility and partition coefficient optimized by a genetic algorithm based descriptor selection method. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2003; 43:1077-84. [PMID: 12767167 DOI: 10.1021/ci034006u] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The paper describes a fast and flexible descriptor selection method using a genetic algorithm variant (GA-SEC). The relevance of the descriptors will be measured using Shannon entropy (SE) and differential Shannon entropy (DSE), which have very sparse memory requirements and allow the processing of huge data sets. A small quantity of the most important descriptors will be used automatically to build a value prediction model. The most important descriptors are not a linear combination of other descriptors, but transparent, pure descriptors. We used an artificial neural network (ANN) model to predict the aqueous solubility logS and the octanol/water partition coefficient logP. The logS data set was divided into a training set of 1016 compounds and a test set of 253 compounds. A correlation coefficient of 0.93 and an empirical standard deviation of 0.54 were achieved. The logP data set was divided into a training set of 1853 compounds and a test set of 138 compounds. A correlation coefficient of 0.92 and an empirical standard deviation of 0.44 were achieved.
Collapse
Affiliation(s)
- Jörg K Wegner
- Zentrum für Bioinformatik Tübingen, Universität Tübingen, Sand 1, D-72076 Tübingen, Germany.
| | | |
Collapse
|