101
|
Aniceto N, Freitas AA, Bender A, Ghafourian T. Simultaneous Prediction of four ATP-binding Cassette Transporters’ Substrates Using Multi-label QSAR. Mol Inform 2016; 35:514-528. [DOI: 10.1002/minf.201600036] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Accepted: 07/11/2016] [Indexed: 12/21/2022]
Affiliation(s)
- Natália Aniceto
- Medway School of Pharmacy; Universities of Kent and Greenwich; Anson Building, Central Avenue, Chatham Maritime, Chatham Kent postCode/>ME4 4TB UK
| | - Alex A. Freitas
- School of Computing; University of Kent; Canterbury CT2 7NF UK
| | - Andreas Bender
- Centre for Molecular Science Informatics, Department of Chemistry; University of Cambridge; Lensfield Road Cambridge CB2 1EW UK
| | - Taravat Ghafourian
- School of Life Sciences, JMS Building; University of Sussex; Brighton BN1 9QG UK
| |
Collapse
|
102
|
Zhang H, Cao ZX, Li M, Li YZ, Peng C. Novel naïve Bayes classification models for predicting the carcinogenicity of chemicals. Food Chem Toxicol 2016; 97:141-149. [PMID: 27597133 DOI: 10.1016/j.fct.2016.09.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2016] [Revised: 08/02/2016] [Accepted: 09/01/2016] [Indexed: 02/05/2023]
Abstract
The carcinogenicity prediction has become a significant issue for the pharmaceutical industry. The purpose of this investigation was to develop a novel prediction model of carcinogenicity of chemicals by using a naïve Bayes classifier. The established model was validated by the internal 5-fold cross validation and external test set. The naïve Bayes classifier gave an average overall prediction accuracy of 90 ± 0.8% for the training set and 68 ± 1.9% for the external test set. Moreover, five simple molecular descriptors (e.g., AlogP, Molecular weight (MW), No. of H donors, Apol and Wiener) considered as important for the carcinogenicity of chemicals were identified, and some substructures related to the carcinogenicity were achieved. Thus, we hope the established naïve Bayes prediction model could be applied to filter early-stage molecules for this potential carcinogenicity adverse effect; and the identified five simple molecular descriptors and substructures of carcinogens would give a better understanding of the carcinogenicity of chemicals, and further provide guidance for medicinal chemists in the design of new candidate drugs and lead optimization, ultimately reducing the attrition rate in later stages of drug development.
Collapse
Affiliation(s)
- Hui Zhang
- College of Life Science, Northwest Normal University, Lanzhou, Gansu, 730070, PR China; State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Chengdu, Sichuan, 610041, PR China.
| | - Zhi-Xing Cao
- Pharmacy College, Chengdu University of Traditional Chinese Medicine, Key Laboratory of Systematic Research, Development and Utilization of Chinese Medicine Resources in Sichuan Province-key Laboratory Breeding Base of Co-founded by Sichuan Province and MOST, Chendu, Sichuan, PR China; State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Chengdu, Sichuan, 610041, PR China
| | - Meng Li
- College of Life Science, Northwest Normal University, Lanzhou, Gansu, 730070, PR China
| | - Yu-Zhi Li
- Pharmacy College, Chengdu University of Traditional Chinese Medicine, Key Laboratory of Systematic Research, Development and Utilization of Chinese Medicine Resources in Sichuan Province-key Laboratory Breeding Base of Co-founded by Sichuan Province and MOST, Chendu, Sichuan, PR China
| | - Cheng Peng
- Pharmacy College, Chengdu University of Traditional Chinese Medicine, Key Laboratory of Systematic Research, Development and Utilization of Chinese Medicine Resources in Sichuan Province-key Laboratory Breeding Base of Co-founded by Sichuan Province and MOST, Chendu, Sichuan, PR China
| |
Collapse
|
103
|
Automatically updating predictive modeling workflows support decision-making in drug design. Future Med Chem 2016; 8:1779-96. [DOI: 10.4155/fmc-2016-0070] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Using predictive models for early decision-making in drug discovery has become standard practice. We suggest that model building needs to be automated with minimum input and low technical maintenance requirements. Models perform best when tailored to answering specific compound optimization related questions. If qualitative answers are required, 2-bin classification models are preferred. Integrating predictive modeling results with structural information stimulates better decision making. For in silico models supporting rapid structure–activity relationship cycles the performance deteriorates within weeks. Frequent automated updates of predictive models ensure best predictions. Consensus between multiple modeling approaches increases the prediction confidence. Combining qualified and nonqualified data optimally uses all available information. Dose predictions provide a holistic alternative to multiple individual property predictions for reaching complex decisions.
Collapse
|
104
|
Yosipof A, Shimanovich K, Senderowitz H. Materials Informatics: Statistical Modeling in Material Science. Mol Inform 2016; 35:568-579. [DOI: 10.1002/minf.201600047] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 07/11/2016] [Indexed: 01/01/2023]
Affiliation(s)
- Abraham Yosipof
- Department of Business Administration; Peres Academic Center; Rehovot 76102 Israel
- College of Law & Business; Ramat-Gan 26 Ben Gurion Street Israel
| | - Klimentiy Shimanovich
- Department of Chemistry; Bar Ilan University; Ramat-Gan 5290002 Israel
- Department of Physical Electronics, School of Electrical Engineering, Faculty of Engineering; Tel Aviv University; Ramat Aviv 69978 Israel
| | | |
Collapse
|
105
|
Trush MM, Kovalishyn VV, Blagodatnyi VM, Brovarets VS, Pilyo SG, Prokopenko VM, Hodyna DM, Metelytsia LO. QSAR studies and antimicrobial potential of 1,3-thiazolylphosphonium salts. UKRAINIAN BIOCHEMICAL JOURNAL 2016; 88:57-65. [PMID: 29235765 DOI: 10.15407/ubj88.04.057] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The regression QSAR models were built to predict the antimicrobial activity of new thiazole derivatives. Compounds with high predicting activity were synthesized and evaluated against Gram-positive and Gram-negative bacteria and fungi. 1,3-Thiazole-4-ylphosphonium salts 4 and 5 displayed good antibacterial properties and high antifungal activity. The predictions are in a good agreement with the experiment results, which indicate the good predictive power of the created QSAR models.
Collapse
|
106
|
Kaneko H, Funatsu K. Applicability Domains and Consistent Structure Generation. Mol Inform 2016; 36. [DOI: 10.1002/minf.201600032] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 04/25/2016] [Indexed: 11/08/2022]
Affiliation(s)
- Hiromasa Kaneko
- Department of Chemical System Engineering The University of Tokyo 7-3-1 Hongo Bunkyo-ku, Tokyo 113-8656 Japan
| | - Kimito Funatsu
- Department of Chemical System Engineering The University of Tokyo 7-3-1 Hongo Bunkyo-ku, Tokyo 113-8656 Japan
| |
Collapse
|
107
|
Norinder U, Boyer S. Conformal Prediction Classification of a Large Data Set of Environmental Chemicals from ToxCast and Tox21 Estrogen Receptor Assays. Chem Res Toxicol 2016; 29:1003-10. [DOI: 10.1021/acs.chemrestox.6b00037] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Ulf Norinder
- Swedish Toxicology Sciences Research Center, SE-151
36 Södertälje, Sweden
| | - Scott Boyer
- Swedish Toxicology Sciences Research Center, SE-151
36 Södertälje, Sweden
| |
Collapse
|
108
|
Novotarskyi S, Abdelaziz A, Sushko Y, Körner R, Vogt J, Tetko IV. ToxCast EPA in Vitro to in Vivo Challenge: Insight into the Rank-I Model. Chem Res Toxicol 2016; 29:768-75. [PMID: 27120770 PMCID: PMC5413193 DOI: 10.1021/acs.chemrestox.5b00481] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The
ToxCast EPA challenge was managed by TopCoder in Spring 2014.
The goal of the challenge was to develop a model to predict the lowest
effect level (LEL) concentration based on in vitro measurements and calculated in silico descriptors.
This article summarizes the computational steps used to develop the
Rank-I model, which calculated the lowest prediction error for the
secret test data set of the challenge. The model was developed using
the publicly available Online CHEmical database and Modeling environment
(OCHEM), and it is freely available at http://ochem.eu/article/68104. Surprisingly, this model does not use any in vitro measurements. The logic of the decision steps used to develop the
model and the reason to skip inclusion of in vitro measurements is described. We also show that inclusion of in vitro assays would not improve the accuracy of the model.
Collapse
Affiliation(s)
| | - Ahmed Abdelaziz
- Rosettastein Consulting (UG) , D-85354 Freising, Germany.,Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, TUM-Technische Universität München , Freising, Germany
| | - Yurii Sushko
- eADMET GmbH , Lichtenbergstraße 8, D-85748 Garching, Munich, Germany
| | - Robert Körner
- eADMET GmbH , Lichtenbergstraße 8, D-85748 Garching, Munich, Germany
| | - Joachim Vogt
- eADMET GmbH , Lichtenbergstraße 8, D-85748 Garching, Munich, Germany
| | - Igor V Tetko
- Helmholtz Zentrum München - Research Center for Environmental Health (GmbH), Institute of Structural Biology , Ingolstädter Landstraße 1 b. 60w, D-85764 Neuherberg, Germany.,BigChem GmbH , Ingolstädter Landstraße 1 b. 60w, D-85764 Neuherberg, Germany
| |
Collapse
|
109
|
Mathea M, Klingspohn W, Baumann K. Chemoinformatic Classification Methods and their Applicability Domain. Mol Inform 2016; 35:160-80. [PMID: 27492083 DOI: 10.1002/minf.201501019] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 01/20/2016] [Indexed: 11/08/2022]
Abstract
Classification rules are often used in chemoinformatics to predict categorical properties of drug candidates related to bioactivity from explanatory variables, which encode the respective molecular structures (i.e. molecular descriptors). To avoid predictions with an unduly large error probability, the domain the classifier is applied to should be restricted to the domain covered by the training set objects. This latter domain is commonly referred to as applicability domain in chemoinformatics. Conceptually, the applicability domain defines the region in space where the "normal" objects are located. Defining the border of the applicability domain may then be viewed as detecting anomalous or novel objects or as detecting outliers. Currently two different types of measures are in use. The first one defines the applicability domain solely in terms of the molecular descriptor space, which is referred to as novelty detection. The second type defines the applicability domain in terms of the expected reliability of the predictions which is referred to as confidence estimation. Both types are systematically differentiated here and the most popular measures are reviewed. It will be shown that all common chemoinformatic classifiers have built-in confidence scores. Since confidence estimation uses information of the class labels for computing the confidence scores, it is expected to be more efficient in reducing the error rate than novelty detection, which solely uses the information of the explanatory variables.
Collapse
Affiliation(s)
- Miriam Mathea
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany
| | - Waldemar Klingspohn
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany.
| |
Collapse
|
110
|
Lei T, Li Y, Song Y, Li D, Sun H, Hou T. ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling. J Cheminform 2016; 8:6. [PMID: 26839598 PMCID: PMC4736633 DOI: 10.1186/s13321-016-0117-7] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 01/20/2016] [Indexed: 01/31/2023] Open
Abstract
Background
Determination of acute toxicity, expressed as median lethal dose (LD50), is one of the most important steps in drug discovery pipeline. Because in vivo assays for oral acute toxicity in mammals are time-consuming and costly, there is thus an urgent need to develop in silico prediction models of oral acute toxicity.
Results In this study, based on a comprehensive data set containing 7314 diverse chemicals with rat oral LD50 values, relevance vector machine (RVM) technique was employed to build the regression models for the prediction of oral acute toxicity in rate, which were compared with those built using other six machine learning approaches, including k-nearest-neighbor regression, random forest (RF), support vector machine, local approximate Gaussian process, multilayer perceptron ensemble, and eXtreme gradient boosting. A subset of the original molecular descriptors and structural fingerprints (PubChem or SubFP) was chosen by the Chi squared statistics. The prediction capabilities of individual QSAR models, measured by qext2 for the test set containing 2376 molecules, ranged from 0.572 to 0.659. Conclusion Considering the overall prediction accuracy for the test set, RVM with Laplacian kernel and RF were recommended to build in silico models with better predictivity for rat oral acute toxicity. By combining the predictions from individual models, four consensus models were developed, yielding better prediction capabilities for the test set (qext2 = 0.669–0.689). Finally, some essential descriptors and substructures relevant to oral acute toxicity were identified and analyzed, and they may be served as property or substructure alerts to avoid toxicity. We believe that the best consensus model with high prediction accuracy can be used as a reliable virtual screening tool to filter out compounds with high rat oral acute toxicity.
Workflow of combinatorial QSAR modelling to predict rat oral acute toxicity ![]()
Collapse
Affiliation(s)
- Tailong Lei
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang People's Republic of China
| | - Youyong Li
- Institute of Functional Nano and Soft Materials (FUNSOM), Soochow University, Suzhou, 215123 Jiangsu People's Republic of China
| | - Yunlong Song
- Department of Medicinal Chemistry, School of Pharmacy, Second Military Medical University, Shanghai, 200433 People's Republic of China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang People's Republic of China
| | - Huiyong Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang People's Republic of China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang People's Republic of China ; State Key Lab of CAD&CG, Zhejiang University, Hangzhou, 310058 Zhejiang People's Republic of China
| |
Collapse
|
111
|
Tetko IV, M. Lowe D, Williams AJ. The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS. J Cheminform 2016; 8:2. [PMID: 26807157 PMCID: PMC4724158 DOI: 10.1186/s13321-016-0113-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 01/08/2016] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826.
Collapse
Affiliation(s)
- Igor V. Tetko
- />Institute of Structural Biology, Helmholtz Zentrum München für Gesundheit und Umwelt (HMGU), Ingolstädter Landstraße 1, b. 60w, 85764 Neuherberg, Germany
- />BigChem GmbH, 85764 Neuherberg, Germany
| | - Daniel M. Lowe
- />NextMove Software Limited, Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY UK
| | | |
Collapse
|
112
|
Gaspar HA, Sidorov P, Horvath D, Baskin II, Marcou G, Varnek A. Generative Topographic Mapping Approach to Chemical Space Analysis. FRONTIERS IN MOLECULAR DESIGN AND CHEMICAL INFORMATION SCIENCE - HERMAN SKOLNIK AWARD SYMPOSIUM 2015: JÜRGEN BAJORATH 2016. [DOI: 10.1021/bk-2016-1222.ch011] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Affiliation(s)
- Héléna A. Gaspar
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Pavel Sidorov
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Dragos Horvath
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Igor I. Baskin
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| |
Collapse
|
113
|
Mombelli E, Raitano G, Benfenati E. In Silico Prediction of Chemically Induced Mutagenicity: How to Use QSAR Models and Interpret Their Results. Methods Mol Biol 2016; 1425:87-105. [PMID: 27311463 DOI: 10.1007/978-1-4939-3609-0_5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Information on genotoxicity is an essential piece of information gathering for a comprehensive toxicological characterization of chemicals. Several QSAR models that can predict Ames genotoxicity are freely available for download from the Internet and they can provide relevant information for the toxicological profiling of chemicals. Indeed, they can be straightforwardly used for predicting the presence or absence of genotoxic hazards associated with the interactions of chemicals with DNA.Nevertheless, and despite the ease of use of these models, the scientific challenge is to assess the reliability of information that can be obtained from these tools. This chapter provides instructions on how to use freely available QSAR models and on how to interpret their predictions.
Collapse
Affiliation(s)
- Enrico Mombelli
- INERIS-Institut National de l'Environnement Industriel et des Risques, Parc technologique ALATA - B.P. n°2, 5, rue Taffanel, 60550, Verneuil-en-Halatte, France.
| | - Giuseppa Raitano
- Laboratory of Environmental Chemistry and Toxicology, IRCCS-Istituto di Ricerche Farmacologiche "Mario Negri", Milano, Italy
| | - Emilio Benfenati
- Mario Negri Institute for Pharmacological Research, IRCCS, Milano, Italy
| |
Collapse
|
114
|
Gawehn E, Hiss JA, Schneider G. Deep Learning in Drug Discovery. Mol Inform 2015; 35:3-14. [PMID: 27491648 DOI: 10.1002/minf.201501008] [Citation(s) in RCA: 309] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 12/01/2015] [Indexed: 12/18/2022]
Abstract
Artificial neural networks had their first heyday in molecular informatics and drug discovery approximately two decades ago. Currently, we are witnessing renewed interest in adapting advanced neural network architectures for pharmaceutical research by borrowing from the field of "deep learning". Compared with some of the other life sciences, their application in drug discovery is still limited. Here, we provide an overview of this emerging field of molecular informatics, present the basic concepts of prominent deep learning methods and offer motivation to explore these techniques for their usefulness in computer-assisted drug discovery and design. We specifically emphasize deep neural networks, restricted Boltzmann machine networks and convolutional networks.
Collapse
Affiliation(s)
- Erik Gawehn
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093 Zurich, Switzerland, Fax: +41 44 633 13 79, Tel: +41 44 633 74 38
| | - Jan A Hiss
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093 Zurich, Switzerland, Fax: +41 44 633 13 79, Tel: +41 44 633 74 38
| | - Gisbert Schneider
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093 Zurich, Switzerland, Fax: +41 44 633 13 79, Tel: +41 44 633 74 38.
| |
Collapse
|
115
|
Salmina ES, Haider N, Tetko IV. Extended Functional Groups (EFG): An Efficient Set for Chemical Characterization and Structure-Activity Relationship Studies of Chemical Compounds. Molecules 2015; 21:E1. [PMID: 26703557 PMCID: PMC6273096 DOI: 10.3390/molecules21010001] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Revised: 12/09/2015] [Accepted: 12/15/2015] [Indexed: 11/16/2022] Open
Abstract
The article describes a classification system termed “extended functional groups” (EFG), which are an extension of a set previously used by the CheckMol software, that covers in addition heterocyclic compound classes and periodic table groups. The functional groups are defined as SMARTS patterns and are available as part of the ToxAlerts tool (http://ochem.eu/alerts) of the On-line CHEmical database and Modeling (OCHEM) environment platform. The article describes the motivation and the main ideas behind this extension and demonstrates that EFG can be efficiently used to develop and interpret structure-activity relationship models.
Collapse
Affiliation(s)
- Elena S Salmina
- Institute for Organic Chemistry, Technical University Bergakademie Freiberg, Leipziger Str. 29, Freiberg D-09596, Germany.
| | - Norbert Haider
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, Vienna A-1090, Austria.
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, b. 60w, Neuherberg D-85764, Germany.
- BigChem GmbH, Ingolstädter Landstraße 1, b. 60w, Neuherberg D-85764, Germany.
| |
Collapse
|
116
|
Computational assessment of environmental hazards of nitroaromatic compounds: influence of the type and position of aromatic ring substituents on toxicity. Struct Chem 2015. [DOI: 10.1007/s11224-015-0715-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
117
|
Nieto-Draghi C, Fayet G, Creton B, Rozanska X, Rotureau P, de Hemptinne JC, Ungerer P, Rousseau B, Adamo C. A General Guidebook for the Theoretical Prediction of Physicochemical Properties of Chemicals for Regulatory Purposes. Chem Rev 2015; 115:13093-164. [PMID: 26624238 DOI: 10.1021/acs.chemrev.5b00215] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Carlos Nieto-Draghi
- IFP Energies nouvelles , 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| | - Guillaume Fayet
- INERIS, Parc Technologique Alata, BP2 , 60550 Verneuil-en-Halatte, France
| | - Benoit Creton
- IFP Energies nouvelles , 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| | - Xavier Rozanska
- Materials Design S.A.R.L. , 18, rue de Saisset, 92120 Montrouge, France
| | - Patricia Rotureau
- INERIS, Parc Technologique Alata, BP2 , 60550 Verneuil-en-Halatte, France
| | | | - Philippe Ungerer
- Materials Design S.A.R.L. , 18, rue de Saisset, 92120 Montrouge, France
| | - Bernard Rousseau
- Laboratoire de Chimie-Physique, Université Paris Sud , UMR 8000 CNRS, Bât. 349, 91405 Orsay Cedex, France
| | - Carlo Adamo
- Institut de Recherche Chimie Paris, PSL Research University, CNRS, Chimie Paristech , 11 rue P. et M. Curie, F-75005 Paris, France.,Institut Universitaire de France , 103 Boulevard Saint Michel, F-75005 Paris, France
| |
Collapse
|
118
|
Polishchuk PG, Samoylenko GV, Khristova TM, Krysko OL, Kabanova TA, Kabanov VM, Kornylov AY, Klimchuk O, Langer T, Andronati SA, Kuz'min VE, Krysko AA, Varnek A. Design, Virtual Screening, and Synthesis of Antagonists of αIIbβ3 as Antiplatelet Agents. J Med Chem 2015; 58:7681-94. [PMID: 26367138 DOI: 10.1021/acs.jmedchem.5b00865] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
This article describes design, virtual screening, synthesis, and biological tests of novel αIIbβ3 antagonists, which inhibit platelet aggregation. Two types of αIIbβ3 antagonists were developed: those binding either closed or open form of the protein. At the first step, available experimental data were used to build QSAR models and ligand- and structure-based pharmacophore models and to select the most appropriate tool for ligand-to-protein docking. Virtual screening of publicly available databases (BioinfoDB, ZINC, Enamine data sets) with developed models resulted in no hits. Therefore, small focused libraries for two types of ligands were prepared on the basis of pharmacophore models. Their screening resulted in four potential ligands for open form of αIIbβ3 and four ligands for its closed form followed by their synthesis and in vitro tests. Experimental measurements of affinity for αIIbβ3 and ability to inhibit ADP-induced platelet aggregation (IC50) showed that two designed ligands for the open form 4c and 4d (IC50 = 6.2 nM and 25 nM, respectively) and one for the closed form 12b (IC50 = 11 nM) were more potent than commercial antithrombotic Tirofiban (IC50 = 32 nM).
Collapse
Affiliation(s)
- Pavel G Polishchuk
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Georgiy V Samoylenko
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Tetiana M Khristova
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine.,Laboratory of Chemoinformatics (UMR 7140 CNRS/UniStra), University of Strasbourg , 1, rue B. Pascal, Strasbourg 67000, France
| | - Olga L Krysko
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Tatyana A Kabanova
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Vladimir M Kabanov
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Alexander Yu Kornylov
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Olga Klimchuk
- Laboratory of Chemoinformatics (UMR 7140 CNRS/UniStra), University of Strasbourg , 1, rue B. Pascal, Strasbourg 67000, France
| | - Thierry Langer
- Department of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna , Althanstraße 14, 1090 Vienna, Austria
| | - Sergei A Andronati
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Victor E Kuz'min
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Andrei A Krysko
- A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of Ukraine , Lustdorfskaya doroga 86, Odessa 65080, Ukraine
| | - Alexandre Varnek
- Laboratory of Chemoinformatics (UMR 7140 CNRS/UniStra), University of Strasbourg , 1, rue B. Pascal, Strasbourg 67000, France
| |
Collapse
|
119
|
Castillo-González D, Mergny JL, De Rache A, Pérez-Machado G, Cabrera-Pérez MA, Nicolotti O, Introcaso A, Mangiatordi GF, Guédin A, Bourdoncle A, Garrigues T, Pallardó F, Cordeiro MNDS, Paz-y-Miño C, Tejera E, Borges F, Cruz-Monteagudo M. Harmonization of QSAR Best Practices and Molecular Docking Provides an Efficient Virtual Screening Tool for Discovering New G-Quadruplex Ligands. J Chem Inf Model 2015; 55:2094-110. [DOI: 10.1021/acs.jcim.5b00415] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Daimel Castillo-González
- ARNA Laboratory, IECB, University of Bordeaux, F-33600 Pessac, France
- ARNA Laboratory,
INSERM, U869, F-33000 Bordeaux, France
| | - Jean-Louis Mergny
- ARNA Laboratory, IECB, University of Bordeaux, F-33600 Pessac, France
- ARNA Laboratory,
INSERM, U869, F-33000 Bordeaux, France
| | - Aurore De Rache
- ARNA Laboratory, IECB, University of Bordeaux, F-33600 Pessac, France
- ARNA Laboratory,
INSERM, U869, F-33000 Bordeaux, France
| | - Gisselle Pérez-Machado
- Molecular Simulation and
Drug Design Group, Centro de Bioactivos Químicos (CBQ), Central University of Las Villas, Santa Clara, Villa Clara 54830, Cuba
- Department of Physiology,
Faculty of Medicine, University of Valencia, Valencia 46010, Valencia, Spain
- Department
of Pharmacy and Pharmaceutical Technology, University of Valencia, Burjassot 46100, Valencia, Spain
| | - Miguel Angel Cabrera-Pérez
- Molecular Simulation and
Drug Design Group, Centro de Bioactivos Químicos (CBQ), Central University of Las Villas, Santa Clara, Villa Clara 54830, Cuba
- Department
of Pharmacy and Pharmaceutical Technology, University of Valencia, Burjassot 46100, Valencia, Spain
- Department of Engineering, Area of Pharmacy and Pharmaceutical
Technology, Miguel Hernández University, 03550 Sant Joan d’Alacant, Alicante, Alicante, Spain
| | - Orazio Nicolotti
- Dipartimento
di Farmacia-Scienze, Università degli Studi di Bari “Aldo Moro″, Via Orabona 4, 70125 Bari, Bari, Italy
| | - Antonellina Introcaso
- Dipartimento
di Farmacia-Scienze, Università degli Studi di Bari “Aldo Moro″, Via Orabona 4, 70125 Bari, Bari, Italy
| | - Giuseppe Felice Mangiatordi
- Dipartimento
di Farmacia-Scienze, Università degli Studi di Bari “Aldo Moro″, Via Orabona 4, 70125 Bari, Bari, Italy
| | - Aurore Guédin
- ARNA Laboratory, IECB, University of Bordeaux, F-33600 Pessac, France
- ARNA Laboratory,
INSERM, U869, F-33000 Bordeaux, France
| | - Anne Bourdoncle
- ARNA Laboratory, IECB, University of Bordeaux, F-33600 Pessac, France
- ARNA Laboratory,
INSERM, U869, F-33000 Bordeaux, France
| | - Teresa Garrigues
- Department
of Pharmacy and Pharmaceutical Technology, University of Valencia, Burjassot 46100, Valencia, Spain
| | - Federico Pallardó
- Department of Physiology,
Faculty of Medicine, University of Valencia, Valencia 46010, Valencia, Spain
| | | | - Cesar Paz-y-Miño
- Instituto de Investigaciones
Biomédicas (IIB), Universidad de Las Américas, 170513 Quito, Pichincha, Ecuador
| | - Eduardo Tejera
- Instituto de Investigaciones
Biomédicas (IIB), Universidad de Las Américas, 170513 Quito, Pichincha, Ecuador
| | | | - Maykel Cruz-Monteagudo
- Instituto de Investigaciones
Biomédicas (IIB), Universidad de Las Américas, 170513 Quito, Pichincha, Ecuador
| |
Collapse
|
120
|
Kaneko H, Funatsu K. Strategy of Structure Generation within Applicability Domains with One-Class Support Vector Machine. BULLETIN OF THE CHEMICAL SOCIETY OF JAPAN 2015. [DOI: 10.1246/bcsj.20150054] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Hiromasa Kaneko
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo
| | - Kimito Funatsu
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo
| |
Collapse
|
121
|
Sheridan RP. The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity. J Chem Inf Model 2015; 55:1098-107. [DOI: 10.1021/acs.jcim.5b00110] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Robert P. Sheridan
- Cheminformatics Department, RY800B-305, Merck Research Laboratories, Rahway, New Jersey 07065, United States
| |
Collapse
|
122
|
Norinder U, Carlsson L, Boyer S, Eklund M. Introducing conformal prediction in predictive modeling for regulatory purposes. A transparent and flexible alternative to applicability domain determination. Regul Toxicol Pharmacol 2015; 71:279-84. [DOI: 10.1016/j.yrtph.2014.12.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2014] [Revised: 12/23/2014] [Accepted: 12/24/2014] [Indexed: 10/24/2022]
|
123
|
Benfenati E, Manganelli S, Giordano S, Raitano G, Manganaro A. Hierarchical Rules for Read-Across and In Silico Models of Mutagenicity. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2015; 33:385-403. [PMID: 26403277 DOI: 10.1080/10590501.2015.1096881] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A broad set of rules has been implemented within the ToxRead software for read-across of chemicals for bacterial mutagenicity. These rules were obtained by manually analyzing more than 6000 chemicals and the associated chemical classes. A hierarchy of rules was established to identify those most specifically relating to the target compounds, linked in sequence to the other, more generic ones, which may match with the target compound. Rules related to both mutagenicity and lack of mutagenicity were found. Some of the latter are exceptions to the mutagenicity rules, while others are modulators of activity. These rules can also be used to predict mutagenicity, offering good performance.
Collapse
Affiliation(s)
- Emilio Benfenati
- a IRCCS-Istituto di Ricerche Farmacologiche 'Mario Negri,' Department of Environmental Health Sciences , Milan , Italy
| | - Serena Manganelli
- a IRCCS-Istituto di Ricerche Farmacologiche 'Mario Negri,' Department of Environmental Health Sciences , Milan , Italy
| | - Sabrina Giordano
- a IRCCS-Istituto di Ricerche Farmacologiche 'Mario Negri,' Department of Environmental Health Sciences , Milan , Italy
| | - Giuseppa Raitano
- a IRCCS-Istituto di Ricerche Farmacologiche 'Mario Negri,' Department of Environmental Health Sciences , Milan , Italy
| | - Alberto Manganaro
- a IRCCS-Istituto di Ricerche Farmacologiche 'Mario Negri,' Department of Environmental Health Sciences , Milan , Italy
| |
Collapse
|
124
|
Sushko Y, Novotarskyi S, Körner R, Vogt J, Abdelaziz A, Tetko IV. Prediction-driven matched molecular pairs to interpret QSARs and aid the molecular optimization process. J Cheminform 2014; 6:48. [PMID: 25544551 PMCID: PMC4272757 DOI: 10.1186/s13321-014-0048-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Accepted: 11/07/2014] [Indexed: 11/24/2022] Open
Abstract
Background QSAR is an established and powerful method for cheap in silico assessment of physicochemical properties and biological activities of chemical compounds. However, QSAR models are rather complex mathematical constructs that cannot easily be interpreted. Medicinal chemists would benefit from practical guidance regarding which molecules to synthesize. Another possible approach is analysis of pairs of very similar molecules, so-called matched molecular pairs (MMPs). Such an approach allows identification of molecular transformations that affect particular activities (e.g. toxicity). In contrast to QSAR, chemical interpretation of these transformations is straightforward. Furthermore, such transformations can give medicinal chemists useful hints for the hit-to-lead optimization process. Results The current study suggests a combination of QSAR and MMP approaches by finding MMP transformations based on QSAR predictions for large chemical datasets. The study shows that such an approach, referred to as prediction-driven MMP analysis, is a useful tool for medicinal chemists, allowing identification of large numbers of “interesting” transformations that can be used to drive the molecular optimization process. All the methodological developments have been implemented as software products available online as part of OCHEM (http://ochem.eu/). Conclusions The prediction-driven MMPs methodology was exemplified by two use cases: modelling of aquatic toxicity and CYP3A4 inhibition. This approach helped us to interpret QSAR models and allowed identification of a number of “significant” molecular transformations that affect the desired properties. This can facilitate drug design as a part of molecular optimization process. Molecular matched pairs and transformation graphs facilitate interpretable molecular optimisation process. ![]()
Collapse
Affiliation(s)
- Yurii Sushko
- eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Munich Germany
| | | | - Robert Körner
- eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Munich Germany
| | - Joachim Vogt
- eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Munich Germany
| | - Ahmed Abdelaziz
- eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Munich Germany
| | - Igor V Tetko
- eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Munich Germany ; Helmholtz-Zentrum München - German Research Centre for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany ; A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya St. 18, 420008 Kazan, Russia
| |
Collapse
|
125
|
Tetko IV, Sushko Y, Novotarskyi S, Patiny L, Kondratov I, Petrenko AE, Charochkina L, Asiri AM. How accurately can we predict the melting points of drug-like compounds? J Chem Inf Model 2014; 54:3320-9. [PMID: 25489863 PMCID: PMC4702524 DOI: 10.1021/ci5005288] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
This article contributes a highly accurate model for predicting the melting points (MPs) of medicinal chemistry compounds. The model was developed using the largest published data set, comprising more than 47k compounds. The distributions of MPs in drug-like and drug lead sets showed that >90% of molecules melt within [50,250]°C. The final model calculated an RMSE of less than 33 °C for molecules from this temperature interval, which is the most important for medicinal chemistry users. This performance was achieved using a consensus model that performed calculations to a significantly higher accuracy than the individual models. We found that compounds with reactive and unstable groups were overrepresented among outlying compounds. These compounds could decompose during storage or measurement, thus introducing experimental errors. While filtering the data by removing outliers generally increased the accuracy of individual models, it did not significantly affect the results of the consensus models. Three analyzed distance to models did not allow us to flag molecules, which had MP values fell outside the applicability domain of the model. We believe that this negative result and the public availability of data from this article will encourage future studies to develop better approaches to define the applicability domain of models. The final model, MP data, and identified reactive groups are available online at http://ochem.eu/article/55638.
Collapse
Affiliation(s)
- Igor V Tetko
- Helmholtz-Zentrum München - German Research Centre for Environmental Health (GmbH), Institute of Structural Biology , Munich 85764, Germany
| | | | | | | | | | | | | | | |
Collapse
|
126
|
Anger LT, Wolf A, Schleifer KJ, Schrenk D, Rohrer SG. Generalized Workflow for Generating Highly Predictive in Silico Off-Target Activity Models. J Chem Inf Model 2014; 54:2411-22. [DOI: 10.1021/ci500342q] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Lennart T. Anger
- Computational
Chemistry and Biology, BASF SE, Carl-Bosch-Strasse
38, 67056 Ludwigshafen, Germany
- Food
Chemistry and Toxicology, University of Kaiserslautern, Erwin-Schroedinger-Strasse
52, 67663 Kaiserslautern, Germany
| | - Antje Wolf
- Computational
Chemistry and Biology, BASF SE, Carl-Bosch-Strasse
38, 67056 Ludwigshafen, Germany
| | - Klaus-Juergen Schleifer
- Computational
Chemistry and Biology, BASF SE, Carl-Bosch-Strasse
38, 67056 Ludwigshafen, Germany
| | - Dieter Schrenk
- Food
Chemistry and Toxicology, University of Kaiserslautern, Erwin-Schroedinger-Strasse
52, 67663 Kaiserslautern, Germany
| | - Sebastian G. Rohrer
- Mechanistic Biology
Fungicides, BASF SE, Speyerer Strasse
2, 67117 Limburgerhof, Germany
| |
Collapse
|
127
|
Kaneko H, Funatsu K. Applicability Domain Based on Ensemble Learning in Classification and Regression Analyses. J Chem Inf Model 2014; 54:2469-82. [DOI: 10.1021/ci500364e] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hiromasa Kaneko
- Department
of Chemical Systems
Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Kimito Funatsu
- Department
of Chemical Systems
Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
128
|
Effect of information leakage and method of splitting (rational and random) on external predictive ability and behavior of different statistical parameters of QSAR model. Med Chem Res 2014. [DOI: 10.1007/s00044-014-1193-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
129
|
Liu Z, Zheng M, Yan X, Gu Q, Gasteiger J, Tijhuis J, Maas P, Li J, Xu J. ChemStable: a web server for rule-embedded naïve Bayesian learning approach to predict compound stability. J Comput Aided Mol Des 2014; 28:941-50. [PMID: 25031075 DOI: 10.1007/s10822-014-9778-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2014] [Accepted: 07/09/2014] [Indexed: 11/26/2022]
Abstract
Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H2O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability (p(s)) and an unstable probability (p(uns)). 13,340 ACFs, together with their p(s) and p(uns) data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned p(s) and p(uns) values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes' theorem, based upon the p(s) and p(uns) values of the compound ACFs. We were able to achieve performance with an AUC value of 84% and a tenfold cross validation accuracy of 76.5%. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds.
Collapse
Affiliation(s)
- Zhihong Liu
- Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | | | | | | | | | | | | | | | | |
Collapse
|
130
|
Clark RD, Liang W, Lee AC, Lawless MS, Fraczkiewicz R, Waldman M. Using beta binomials to estimate classification uncertainty for ensemble models. J Cheminform 2014; 6:34. [PMID: 24987464 PMCID: PMC4076254 DOI: 10.1186/1758-2946-6-34] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 06/16/2014] [Indexed: 12/14/2022] Open
Abstract
Background Quantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions. Results Submodels in an ensemble model which have been trained on different subsets of a shared training pool represent multiple samples of the model space, and the degree of agreement among them contains information on the reliability of ensemble predictions. For artificial neural network ensembles (ANNEs) using two different methods for determining ensemble classification – one using vote tallies and the other averaging individual network outputs – we have found that the distribution of predictions across positive vote tallies can be reasonably well-modeled as a beta binomial distribution, as can the distribution of errors. Together, these two distributions can be used to estimate the probability that a given predictive classification will be in error. Large data sets comprised of logP, Ames mutagenicity, and CYP2D6 inhibition data are used to illustrate and validate the method. The distributions of predictions and errors for the training pool accurately predicted the distribution of predictions and errors for large external validation sets, even when the number of positive and negative examples in the training pool were not balanced. Moreover, the likelihood of a given compound being prospectively misclassified as a function of the degree of consensus between networks in the ensemble could in most cases be estimated accurately from the fitted beta binomial distributions for the training pool. Conclusions Confidence in an individual predictive classification by an ensemble model can be accurately assessed by examining the distributions of predictions and errors as a function of the degree of agreement among the constituent submodels. Further, ensemble uncertainty estimation can often be improved by adjusting the voting or classification threshold based on the parameters of the error distribution. Finally, the profiles for models whose predictive uncertainty estimates are not reliable provide clues to that effect without the need for comparison to an external test set.
Collapse
Affiliation(s)
- Robert D Clark
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| | - Wenkel Liang
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| | - Adam C Lee
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| | - Michael S Lawless
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| | - Robert Fraczkiewicz
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| | - Marvin Waldman
- Department of Life Sciences, Simulations Plus, Inc., 45205 10th Street West, Lancaster, CA 93534, USA
| |
Collapse
|
131
|
Norinder U, Carlsson L, Boyer S, Eklund M. Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. J Chem Inf Model 2014; 54:1596-603. [PMID: 24797111 DOI: 10.1021/ci5001168] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Conformal prediction is introduced as an alternative approach to domain applicability estimation. The advantages of using conformal prediction are as follows: First, the approach is based on a consistent and well-defined mathematical framework. Second, the understanding of the confidence level concept in conformal predictions is straightforward, e.g. a confidence level of 0.8 means that the conformal predictor will commit, at most, 20% errors (i.e., true values outside the assigned prediction range). Third, the confidence level can be varied depending on the situation where the model is to be applied and the consequences of such changes are readily understandable, i.e. prediction ranges are increased or decreased, and the changes can immediately be inspected. We demonstrate the usefulness of conformal prediction by applying it to 10 publicly available data sets.
Collapse
Affiliation(s)
- Ulf Norinder
- H. Lundbeck A/S, Ottiliavej 9, 2500 Valby, Denmark
| | | | | | | |
Collapse
|
132
|
Ovchinnikova SI, Bykov AA, Tsivadze AY, Dyachkov EP, Kireeva NV. Supervised extensions of chemography approaches: case studies of chemical liabilities assessment. J Cheminform 2014; 6:20. [PMID: 24868246 PMCID: PMC4018504 DOI: 10.1186/1758-2946-6-20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 04/28/2014] [Indexed: 12/04/2022] Open
Abstract
Chemical liabilities, such as adverse effects and toxicity, play a significant role in modern drug discovery process. In silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Herein, we propose an approach combining several classification and chemography methods to be able to predict chemical liabilities and to interpret obtained results in the context of impact of structural changes of compounds on their pharmacological profile. To our knowledge for the first time, the supervised extension of Generative Topographic Mapping is proposed as an effective new chemography method. New approach for mapping new data using supervised Isomap without re-building models from the scratch has been proposed. Two approaches for estimation of model's applicability domain are used in our study to our knowledge for the first time in chemoinformatics. The structural alerts responsible for the negative characteristics of pharmacological profile of chemical compounds has been found as a result of model interpretation.
Collapse
Affiliation(s)
- Svetlana I Ovchinnikova
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| | - Arseniy A Bykov
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| | - Aslan Yu Tsivadze
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
| | - Evgeny P Dyachkov
- Kurnakov Institute of General and Inorganic Chemistry RAS, Leninsky pr-t 31, 119071 Moscow, Russia
| | - Natalia V Kireeva
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| |
Collapse
|
133
|
Lewis RA, Wood D. Modern 2D QSAR for drug discovery. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2014. [DOI: 10.1002/wcms.1187] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Richard A. Lewis
- Novartis Institutes for BioMedical Research; Novartis Pharma AG; Basel Switzerland
| | - David Wood
- Novartis Institutes for BioMedical Research; Novartis Horsham Research Centre; Horsham UK
| |
Collapse
|
134
|
Low YS, Sedykh AY, Rusyn I, Tropsha A. Integrative approaches for predicting in vivo effects of chemicals from their structural descriptors and the results of short-term biological assays. Curr Top Med Chem 2014; 14:1356-64. [PMID: 24805064 PMCID: PMC5344042 DOI: 10.2174/1568026614666140506121116] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2014] [Revised: 02/05/2014] [Accepted: 02/05/2014] [Indexed: 12/22/2022]
Abstract
Cheminformatics approaches such as Quantitative Structure Activity Relationship (QSAR) modeling have been used traditionally for predicting chemical toxicity. In recent years, high throughput biological assays have been increasingly employed to elucidate mechanisms of chemical toxicity and predict toxic effects of chemicals in vivo. The data generated in such assays can be considered as biological descriptors of chemicals that can be combined with molecular descriptors and employed in QSAR modeling to improve the accuracy of toxicity prediction. In this review, we discuss several approaches for integrating chemical and biological data for predicting biological effects of chemicals in vivo and compare their performance across several data sets. We conclude that while no method consistently shows superior performance, the integrative approaches rank consistently among the best yet offer enriched interpretation of models over those built with either chemical or biological data alone. We discuss the outlook for such interdisciplinary methods and offer recommendations to further improve the accuracy and interpretability of computational models that predict chemical toxicity.
Collapse
Affiliation(s)
| | | | | | - Alexander Tropsha
- 100K Beard Hall, Campus Box 7568, University of North Carolina, Chapel Hill, NC 27599-7568, USA.
| |
Collapse
|
135
|
Cassano A, Raitano G, Mombelli E, Fernández A, Cester J, Roncaglioni A, Benfenati E. Evaluation of QSAR models for the prediction of ames genotoxicity: a retrospective exercise on the chemical substances registered under the EU REACH regulation. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2014; 32:273-298. [PMID: 25226221 DOI: 10.1080/10590501.2014.938955] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We evaluated the performance of seven freely available quantitative structure-activity relationship models predicting Ames genotoxicity thanks to a dataset of chemicals that were registered under the EU Registration, Evaluation, Authorization and Restriction of Chemicals (REACH) regulation. The performance of the models was estimated according to Cooper's statistics and Matthew's Correlation Coefficients (MCC). The Benigni/Bossa rule base originally implemented in Toxtree and re-implemented within the Virtual models for property Evaluation of chemicals within a Global Architecture (VEGA) platform displayed the best performance (accuracy = 92%, sensitivity = 83%, specificity = 93%, MCC = 0.68) indicating that this rule base provides a reliable tool for the identification of genotoxic chemicals. Finally, we elaborated a consensus model that outperformed the accuracy of the individual models.
Collapse
Affiliation(s)
- Antonio Cassano
- a Unité Modèles pour l'Ecotoxicologie et la Toxicologie (METO) , Institut National de l'Environnement Industriel et des Risques (INERIS) , Verneuil en Halatte , France
| | | | | | | | | | | | | |
Collapse
|
136
|
Kulkarni SA, Barton-Maclaren TS. Performance of (Q)SAR models for predicting Ames mutagenicity of aryl azo and benzidine based compounds. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2014; 32:46-82. [PMID: 24598040 DOI: 10.1080/10590501.2014.877648] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Regulatory agencies worldwide are committed to the objectives of the Strategic Approach to International Chemicals Management to ensure that by 2020 chemicals are used and produced in ways that lead to the minimization of significant adverse effects on human health and the environment. Under the Government of Canada's Chemicals Management Plan, the commitment to address a large number of substances, many with limited data, has highlighted the importance of pursuing alternative hazard assessment methodologies that are able to accommodate chemicals with varying toxicological information. One such method is (Quantitative) Structure Activity Relationships ((Q)SAR) models. The current investigation into the predictivity of 20 (Q)SAR tools designed to model bacterial reverse mutation in Salmonella typhimurium is one of the first of this magnitude to be carried out using an external validation set comprised mainly of industrial chemicals which represent a diverse group of aromatic and benzidine-based azo dyes and pigments. Overall, this study highlights the value in challenging the predictivity of existing models using a small but representative subset of data-rich chemicals. Furthermore, external validation revealed that only a handful of models satisfactorily predicted for the test chemical space. The exercise also provides insight into using the Organisation for Economic Co-operation and Development (Q)SAR Toolbox as a read across tool.
Collapse
Affiliation(s)
- Sunil A Kulkarni
- a Existing Substances Risk Assessment Bureau , Health Canada , Ottawa , Ontario , Canada
| | | |
Collapse
|
137
|
Vorberg S, Tetko IV. Modeling the Biodegradability of Chemical Compounds Using the Online CHEmical Modeling Environment (OCHEM). Mol Inform 2013; 33:73-85. [PMID: 27485201 PMCID: PMC5175213 DOI: 10.1002/minf.201300030] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Accepted: 10/11/2013] [Indexed: 11/10/2022]
Abstract
Biodegradability describes the capacity of substances to be mineralized by free‐living bacteria. It is a crucial property in estimating a compound’s long‐term impact on the environment. The ability to reliably predict biodegradability would reduce the need for laborious experimental testing. However, this endpoint is difficult to model due to unavailability or inconsistency of experimental data. Our approach makes use of the Online Chemical Modeling Environment (OCHEM) and its rich supply of machine learning methods and descriptor sets to build classification models for ready biodegradability. These models were analyzed to determine the relationship between characteristic structural properties and biodegradation activity. The distinguishing feature of the developed models is their ability to estimate the accuracy of prediction for each individual compound. The models developed using seven individual descriptor sets were combined in a consensus model, which provided the highest accuracy. The identified overrepresented structural fragments can be used by chemists to improve the biodegradability of new chemical compounds. The consensus model, the datasets used, and the calculated structural fragments are publicly available at http://ochem.eu/article/31660.
Collapse
Affiliation(s)
- Susann Vorberg
- Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany tel: +49-89-3187-3575; fax: +49-89-3187-3585
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany tel: +49-89-3187-3575; fax: +49-89-3187-3585. .,Chemistry Department, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah 21589, Saudi Arabia. .,eADMET GmbH, Lichtenbergstraße 8, D-85748 Garching, Germany.
| |
Collapse
|
138
|
Sheridan RP. Using random forest to model the domain applicability of another random forest model. J Chem Inf Model 2013; 53:2837-50. [PMID: 24152204 DOI: 10.1021/ci400482e] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In QSAR, a statistical model is generated from a training set of molecules (represented by chemical descriptors) and their biological activities. We will call this traditional type of QSAR model an "activity model". The activity model can be used to predict the activities of molecules not in the training set. A relatively new subfield for QSAR is domain applicability. The aim is to estimate the reliability of prediction of a specific molecule on a specific activity model. A number of different metrics have been proposed in the literature for this purpose. It is desirable to build a quantitative model of reliability against one or more of these metrics. We can call this an "error model". A previous publication from our laboratory (Sheridan J. Chem. Inf. Model., 2012, 52, 814-823.) suggested the simultaneous use of three metrics would be more discriminating than any one metric. An error model could be built in the form of a three-dimensional set of bins. When the number of metrics exceeds three, however, the bin paradigm is not practical. An obvious solution for constructing an error model using multiple metrics is to use a QSAR method, in our case random forest. In this paper we demonstrate the usefulness of this paradigm, specifically for determining whether a useful error model can be built and which metrics are most useful for a given problem. For the ten data sets and for the seven metrics we examine here, it appears that it is possible to construct a useful error model using only two metrics (TREE_SD and PREDICTED). These do not require calculating similarities/distances between the molecules being predicted and the molecules used to build the activity model, which can be rate-limiting.
Collapse
Affiliation(s)
- Robert P Sheridan
- Cheminformatics Department, Merck Research Laboratories , RY800-D133, Rahway, New Jersey 07065, United States
| |
Collapse
|
139
|
Cruz-Monteagudo M, Ancede-Gallardo E, Jorge M, Dias Soeiro Cordeiro MN. Chemoinformatics Profiling of Ionic Liquids—Automatic and Chemically Interpretable Cytotoxicity Profiling, Virtual Screening, and Cytotoxicophore Identification. Toxicol Sci 2013; 136:548-65. [DOI: 10.1093/toxsci/kft209] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
|
140
|
Fourches D, Tropsha A. Using Graph Indices for the Analysis and Comparison of Chemical Datasets. Mol Inform 2013; 32:827-42. [PMID: 27480235 DOI: 10.1002/minf.201300076] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 08/05/2013] [Indexed: 12/13/2022]
Abstract
In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively characterized and compared by computing simple graph indices such as the average vertex degree or Randic connectivity index. This approach is used to characterize and quantify the similarity between different datasets or subsets of the same dataset (e.g., training, test, and external validation sets used in QSAR modeling). The freely available ADDAGRA program has been implemented to build and visualize DGs. The approach proposed and discussed in this report could be further explored and utilized for different cheminformatics applications such as dataset diversification by acquiring external compounds, dataset processing prior to QSAR modeling, or (dis)similarity modeling of multiple datasets studied in chemical genomics applications.
Collapse
Affiliation(s)
- Denis Fourches
- Laboratory for Molecular Modeling, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill NC 27599, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill NC 27599, USA.
| |
Collapse
|
141
|
Enhanced QSAR model performance by integrating structural and gene expression information. Molecules 2013; 18:10789-801. [PMID: 24008242 PMCID: PMC6270197 DOI: 10.3390/molecules180910789] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Revised: 07/20/2013] [Accepted: 07/26/2013] [Indexed: 11/29/2022] Open
Abstract
Despite decades of intensive research and a number of demonstrable successes, quantitative structure-activity relationship (QSAR) models still fail to yield predictions with reasonable accuracy in some circumstances, especially when the QSAR paradox occurs. In this study, to avoid the QSAR paradox, we proposed a novel integrated approach to improve the model performance through using both structural and biological information from compounds. As a proof-of-concept, the integrated models were built on a toxicological dataset to predict non-genotoxic carcinogenicity of compounds, using not only the conventional molecular descriptors but also expression profiles of significant genes selected from microarray data. For test set data, our results demonstrated that the prediction accuracy of QSAR model was dramatically increased from 0.57 to 0.67 with incorporation of expression data of just one selected signature gene. Our successful integration of biological information into classic QSAR model provided a new insight and methodology for building predictive models especially when QSAR paradox occurred.
Collapse
|
142
|
Tetko IV, Novotarskyi S, Sushko I, Ivanov V, Petrenko AE, Dieden R, Lebon F, Mathieu B. Development of dimethyl sulfoxide solubility models using 163,000 molecules: using a domain applicability metric to select more reliable predictions. J Chem Inf Model 2013; 53:1990-2000. [PMID: 23855787 PMCID: PMC3760295 DOI: 10.1021/ci400213d] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
![]()
The
dimethyl sulfoxide (DMSO) solubility data from Enamine and two UCB
pharma compound collections were analyzed using 8 different machine
learning methods and 12 descriptor sets. The analyzed data sets were
highly imbalanced with 1.7–5.8% nonsoluble compounds. The libraries’
enrichment by soluble molecules from the set of 10% of the most reliable
predictions was used to compare prediction performances of the methods.
The highest accuracies were calculated using a C4.5 decision classification
tree, random forest, and associative neural networks. The performances
of the methods developed were estimated on individual data sets and
their combinations. The developed models provided on average a 2-fold
decrease of the number of nonsoluble compounds amid all compounds
predicted as soluble in DMSO. However, a 4–9-fold enrichment
was observed if only 10% of the most reliable predictions were considered.
The structural features influencing compounds to be soluble or nonsoluble
in DMSO were also determined. The best models developed with the publicly
available Enamine data set are freely available online at http://ochem.eu/article/33409.
Collapse
Affiliation(s)
- Igor V Tetko
- Helmholtz Zentrum München-German Research Center for Environmental Health-GmbH, Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
143
|
Brandmaier S, Novotarskyi S, Sushko I, Tetko IV. From descriptors to predicted properties: experimental design by using applicability domain estimation. Altern Lab Anim 2013; 41:33-47. [PMID: 23614543 DOI: 10.1177/026119291304100106] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The importance of reliable methods for representative sub-sampling in terms of experimental design and risk assessment within the European Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) system is crucial. We developed experimental design approaches, by utilising predicted properties and the 'distance to model' parameter, to estimate the benefits of certain compounds to the quality of a resulting model. A statistical evaluation of four regression data sets and one classification data set showed that the adaptive concept of iteratively refining the representation of the chemical space contributes to a more efficient and more reliable selection in comparison to traditional approaches. The evaluation of compounds with regard to the uncertainty and the correlation of prediction is beneficial, and in particular, for regression data sets of sufficient size, whereas the use of predicted properties to define the chemical space is beneficial for classification models.
Collapse
Affiliation(s)
- Stefan Brandmaier
- Helmholtz-Zentrum München - German Research Centre for Environmental Health (GmbH), Institute of Structural Biology, Munich, Germany.
| | | | | | | |
Collapse
|
144
|
Masand VH, Mahajan DT, Hadda TB, Jawarkar RD, Chavan H, Bandgar BP, Chauhan H. Molecular docking and quantitative structure–activity relationship (QSAR) analyses of indolylarylsulfones as HIV-1 non-nucleoside reverse transcriptase inhibitors. Med Chem Res 2013. [DOI: 10.1007/s00044-013-0647-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
145
|
Kolumbin O, Ognichenko L, Artemenko A, Polischuk P, Kulinsky М, Мuratov Е, Kuz’min V, Bobeica V. Nonexperimental Screening of the Water Solubility, Lipophilicity, Bioavailability, Mutagenicity and Toxicity of Various Pesticides with QSAR Models Aid. CHEMISTRY JOURNAL OF MOLDOVA 2013. [DOI: 10.19261/cjm.2013.08(1).12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
146
|
Weidlich IE, Filippov IV, Brown J, Kaushik-Basu N, Krishnan R, Nicklaus MC, Thorpe IF. Inhibitors for the hepatitis C virus RNA polymerase explored by SAR with advanced machine learning methods. Bioorg Med Chem 2013; 21:3127-37. [PMID: 23608107 PMCID: PMC3653294 DOI: 10.1016/j.bmc.2013.03.032] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Revised: 03/10/2013] [Accepted: 03/18/2013] [Indexed: 12/30/2022]
Abstract
Hepatitis C virus (HCV) is a global health challenge, affecting approximately 200 million people worldwide. In this study we developed SAR models with advanced machine learning classifiers Random Forest and k Nearest Neighbor Simulated Annealing for 679 small molecules with measured inhibition activity for NS5B genotype 1b. The activity was expressed as a binary value (active/inactive), where actives were considered molecules with IC50 ≤0.95 μM. We applied our SAR models to various drug-like databases and identified novel chemical scaffolds for NS5B inhibitors. Subsequent in vitro antiviral assays suggested a new activity for an existing prodrug, Candesartan cilexetil, which is currently used to treat hypertension and heart failure but has not been previously tested for anti-HCV activity. We also identified NS5B inhibitors with two novel non-nucleoside chemical motifs.
Collapse
Affiliation(s)
- Iwona E. Weidlich
- Department of Chemistry and Biochemistry, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250
- Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, DHHS, Frederick National Laboratory for Cancer Research, 376 Boyles Street, Frederick, MD 21702
- Computational Drug Design Systems (CODDES) LLC, Rockville, MD
| | - Igor V. Filippov
- Chemical Biology Laboratory, Center for Cancer Research, SAIC-Frederick, Inc., 376 Boyles Street, Frederick, MD 21702
| | - Jodian Brown
- Department of Chemistry and Biochemistry, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250
| | - Neerja Kaushik-Basu
- Department of Biochemistry and Molecular Biology, UMDNJ New Jersey Medical School, 185 South Orange Ave, Newark, NJ 07103
| | - Ramalingam Krishnan
- Department of Biochemistry and Molecular Biology, UMDNJ New Jersey Medical School, 185 South Orange Ave, Newark, NJ 07103
| | - Marc C. Nicklaus
- Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, DHHS, Frederick National Laboratory for Cancer Research, 376 Boyles Street, Frederick, MD 21702
| | - Ian F. Thorpe
- Department of Chemistry and Biochemistry, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250
| |
Collapse
|
147
|
Sheridan RP. Time-split cross-validation as a method for estimating the goodness of prospective prediction. J Chem Inf Model 2013; 53:783-90. [PMID: 23521722 DOI: 10.1021/ci400084k] [Citation(s) in RCA: 160] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R(2)) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R(2) that is more like that of true prospective prediction than the R(2) from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.
Collapse
Affiliation(s)
- Robert P Sheridan
- Cheminformatics Department, Merck Research Laboratories, Rahway, New Jersey 07065, USA.
| |
Collapse
|
148
|
Wood DJ, Carlsson L, Eklund M, Norinder U, Stålring J. QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality. J Comput Aided Mol Des 2013; 27:203-19. [PMID: 23504478 PMCID: PMC3639359 DOI: 10.1007/s10822-013-9639-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2012] [Accepted: 03/05/2013] [Indexed: 11/29/2022]
Abstract
We propose that quantitative structure–activity relationship (QSAR) predictions should be explicitly represented as predictive (probability) distributions. If both predictions and experimental measurements are treated as probability distributions, the quality of a set of predictive distributions output by a model can be assessed with Kullback–Leibler (KL) divergence: a widely used information theoretic measure of the distance between two probability distributions. We have assessed a range of different machine learning algorithms and error estimation methods for producing predictive distributions with an analysis against three of AstraZeneca’s global DMPK datasets. Using the KL-divergence framework, we have identified a few combinations of algorithms that produce accurate and valid compound-specific predictive distributions. These methods use reliability indices to assign predictive distributions to the predictions output by QSAR models so that reliable predictions have tight distributions and vice versa. Finally we show how valid predictive distributions can be used to estimate the probability that a test compound has properties that hit single- or multi- objective target profiles.
Collapse
|
149
|
He Y, Chong FHT, Lim J, Lee RJT, Yap CW. Determination of the Potential of Drug Candidates to Cause Severe Skin Disorders Using Computational Modeling. Mol Inform 2013; 32:303-12. [PMID: 27481525 DOI: 10.1002/minf.201200086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Accepted: 02/20/2013] [Indexed: 11/11/2022]
Abstract
Efficient and accurate prediction for drugs' potential to cause rare and severe adverse drug reactions (ADRs) is needed to facilitate the evaluation of risk-benefit ratio of drug candidates during drug development. Severe skin disorders like the Stevens Johnson syndrome (SJS) and toxic epidermal necrolysis (TEN), which are life-threatening dermatological conditions, are such ADRs that have not received sufficient attention so far. In this study, a total of 1127 marketed drugs were screened for their potential to cause SJS/TEN, of which 255 were found to cause SJS/TEN and 239 were unlikely to cause SJS/TEN. One-class classification method was used to develop multiple prediction models. An applicability domain was determined to define the applicability of the model. Ensemble method was used to develop ensemble models to improve prediction ability. The final ensemble model achieved a sensitivity and specificity of 81 % and 67.4 %, respectively, when estimated using the external 5-fold cross validation method, and a sensitivity of 66.7 % when assessed using an external positive set. The results suggest the methods used in this study are potentially useful for facilitating the prediction of rare and severe ADRs.
Collapse
Affiliation(s)
- Yuye He
- Pharmaceutical Data Exploration Laboratory, Department of Pharmacy, National University of Singapore, Singapore tel: 065-65165971, fax: 065-67791554
| | | | | | | | - Chun Wei Yap
- Pharmaceutical Data Exploration Laboratory, Department of Pharmacy, National University of Singapore, Singapore tel: 065-65165971, fax: 065-67791554.
| |
Collapse
|
150
|
Piir G, Sild S, Maran U. Comparative analysis of local and consensus quantitative structure-activity relationship approaches for the prediction of bioconcentration factor. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2013; 24:175-199. [PMID: 23410132 DOI: 10.1080/1062936x.2012.762426] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Quantitative structure-activity relationships (QSARs) are broadly classified as global or local, depending on their molecular constitution. Global models use large and diverse training sets covering a wide range of chemical space. Local models focus on smaller structurally or chemically similar subsets that are conventionally selected by human experts or alternatively using clustering analysis. The current study focuses on the comparative analysis of different clustering algorithms (expectation-maximization, K-means and hierarchical) for seven different descriptor sets as structural characteristics and two rule-based approaches to select subsets for designing local QSAR models. A total of 111 local QSAR models are developed for predicting bioconcentration factor. Predictions from local models were compared with corresponding predictions from the global model. The comparison of coefficients of determination (r(2)) and standard deviations for local models with similar subsets from the global model show improved prediction quality in 97% of cases. The descriptor content of derived QSARs is discussed and analyzed. Local QSAR models were further consolidated within the framework of consensus approach. All different consensus approaches increased performance over the global and local models. The consensus approach reduced the number of strongly deviating predictions by evening out prediction errors, which were produced by some local QSARs.
Collapse
Affiliation(s)
- G Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | | | | |
Collapse
|