1
|
König C, Vellido A. Understanding predictions of drug profiles using explainable machine learning models. BioData Min 2024; 17:25. [PMID: 39090651 PMCID: PMC11293102 DOI: 10.1186/s13040-024-00378-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Accepted: 07/26/2024] [Indexed: 08/04/2024] Open
Abstract
PURPOSE The analysis of absorption, distribution, metabolism, and excretion (ADME) molecular properties is of relevance to drug design, as they directly influence the drug's effectiveness at its target location. This study concerns their prediction, using explainable Machine Learning (ML) models. The aim of the study is to find which molecular features are relevant to the prediction of the different ADME properties and measure their impact on the predictive model. METHODS The relative relevance of individual features for ADME activity is gauged by estimating feature importance in ML models' predictions. Feature importance is calculated using feature permutation and the individual impact of features is measured by SHAP additive explanations. RESULTS The study reveals the relevance of specific molecular descriptors for each ADME property and quantifies their impact on the ADME property prediction. CONCLUSION The reported research illustrates how explainable ML models can provide detailed insights about the individual contributions of molecular features to the final prediction of an ADME property, as an effort to support experts in the process of drug candidate selection through a better understanding of the impact of molecular features.
Collapse
Affiliation(s)
- Caroline König
- Intelligent Data Science and Artificial Intelligence (IDEAI-UPC) Research Centre, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain.
- Department of Computer Science, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain.
| | - Alfredo Vellido
- Intelligent Data Science and Artificial Intelligence (IDEAI-UPC) Research Centre, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain
- Department of Computer Science, Universitat Politècnica de Catalunya (UPC Barcelona Tech), Jordi Girona 1-3, Barcelona, 08034, Catalonia, Spain
| |
Collapse
|
2
|
Xiouras C, Cameli F, Quilló GL, Kavousanakis ME, Vlachos DG, Stefanidis GD. Applications of Artificial Intelligence and Machine Learning Algorithms to Crystallization. Chem Rev 2022; 122:13006-13042. [PMID: 35759465 DOI: 10.1021/acs.chemrev.2c00141] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Artificial intelligence and specifically machine learning applications are nowadays used in a variety of scientific applications and cutting-edge technologies, where they have a transformative impact. Such an assembly of statistical and linear algebra methods making use of large data sets is becoming more and more integrated into chemistry and crystallization research workflows. This review aims to present, for the first time, a holistic overview of machine learning and cheminformatics applications as a novel, powerful means to accelerate the discovery of new crystal structures, predict key properties of organic crystalline materials, simulate, understand, and control the dynamics of complex crystallization process systems, as well as contribute to high throughput automation of chemical process development involving crystalline materials. We critically review the advances in these new, rapidly emerging research areas, raising awareness in issues such as the bridging of machine learning models with first-principles mechanistic models, data set size, structure, and quality, as well as the selection of appropriate descriptors. At the same time, we propose future research at the interface of applied mathematics, chemistry, and crystallography. Overall, this review aims to increase the adoption of such methods and tools by chemists and scientists across industry and academia.
Collapse
Affiliation(s)
- Christos Xiouras
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Fabio Cameli
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Gustavo Lunardon Quilló
- Chemical Process R&D, Crystallization Technology Unit, Janssen R&D, Turnhoutseweg 30, 2340 Beerse, Belgium.,Chemical and BioProcess Technology and Control, Department of Chemical Engineering, Faculty of Engineering Technology, KU Leuven, Gebroeders de Smetstraat 1, 9000 Ghent, Belgium
| | - Mihail E Kavousanakis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece
| | - Dionisios G Vlachos
- Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States
| | - Georgios D Stefanidis
- School of Chemical Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece.,Laboratory for Chemical Technology, Ghent University; Tech Lane Ghent Science Park 125, B-9052 Ghent, Belgium
| |
Collapse
|
3
|
Li T, Zhang C, Li X. Machine learning for flow batteries: opportunities and challenges. Chem Sci 2022; 13:4740-4752. [PMID: 35655893 PMCID: PMC9067567 DOI: 10.1039/d2sc00291d] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 04/06/2022] [Indexed: 11/22/2022] Open
Abstract
With increased computational ability of modern computers, the rapid development of mathematical algorithms and the continuous establishment of material databases, artificial intelligence (AI) has shown tremendous potential in chemistry. Machine learning (ML), as one of the most important branches of AI, plays an important role in accelerating the discovery and design of key materials for flow batteries (FBs), and the optimization of FB systems. In this perspective, we first provide a fundamental understanding of the workflow of ML in FBs. Moreover, recent progress on applications of the state-of-art ML in both organic FBs and vanadium FBs are discussed. Finally, the challenges and future directions of ML research in FBs are proposed.
Collapse
Affiliation(s)
- Tianyu Li
- Division of Energy Storage, Dalian National Laboratory for Clean Energy (DNL), Dalian Institute of Chemical Physics, Chinese Academy of Sciences Zhongshan Road 457 Dalian 116023 China
| | - Changkun Zhang
- Division of Energy Storage, Dalian National Laboratory for Clean Energy (DNL), Dalian Institute of Chemical Physics, Chinese Academy of Sciences Zhongshan Road 457 Dalian 116023 China
| | - Xianfeng Li
- Division of Energy Storage, Dalian National Laboratory for Clean Energy (DNL), Dalian Institute of Chemical Physics, Chinese Academy of Sciences Zhongshan Road 457 Dalian 116023 China
| |
Collapse
|
4
|
Hu P, Jiao Z, Zhang Z, Wang Q. Development of Solubility Prediction Models with Ensemble Learning. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.1c02142] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Pingfan Hu
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Zeren Jiao
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Zhuoran Zhang
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Qingsheng Wang
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| |
Collapse
|
5
|
Artificial intelligence in drug design: algorithms, applications, challenges and ethics. FUTURE DRUG DISCOVERY 2021. [DOI: 10.4155/fdd-2020-0028] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The discovery paradigm of drugs is rapidly growing due to advances in machine learning (ML) and artificial intelligence (AI). This review covers myriad faces of AI and ML in drug design. There is a plethora of AI algorithms, the most common of which are summarized in this review. In addition, AI is fraught with challenges that are highlighted along with plausible solutions to them. Examples are provided to illustrate the use of AI and ML in drug discovery and in predicting drug properties such as binding affinities and interactions, solubility, toxicology, blood–brain barrier permeability and chemical properties. The review also includes examples depicting the implementation of AI and ML in tackling intractable diseases such as COVID-19, cancer and Alzheimer’s disease. Ethical considerations and future perspectives of AI are also covered in this review.
Collapse
|
6
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
7
|
Hussin SK, Abdelmageid SM, Alkhalil A, Omar YM, Marie MI, Ramadan RA. Handling Imbalance Classification Virtual Screening Big Data Using Machine Learning Algorithms. COMPLEXITY 2021; 2021:1-15. [DOI: 10.1155/2021/6675279] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Virtual screening is the most critical process in drug discovery, and it relies on machine learning to facilitate the screening process. It enables the discovery of molecules that bind to a specific protein to form a drug. Despite its benefits, virtual screening generates enormous data and suffers from drawbacks such as high dimensions and imbalance. This paper tackles data imbalance and aims to improve virtual screening accuracy, especially for a minority dataset. For a dataset identified without considering the data’s imbalanced nature, most classification methods tend to have high predictive accuracy for the majority category. However, the accuracy was significantly poor for the minority category. The paper proposes a K-mean algorithm coupled with Synthetic Minority Oversampling Technique (SMOTE) to overcome the problem of imbalanced datasets. The proposed algorithm is named as KSMOTE. Using KSMOTE, minority data can be identified at high accuracy and can be detected at high precision. A large set of experiments were implemented on Apache Spark using numeric PaDEL and fingerprint descriptors. The proposed solution was compared to both no-sampling method and SMOTE on the same datasets. Experimental results showed that the proposed solution outperformed other methods.
Collapse
Affiliation(s)
- Sahar K. Hussin
- Communication and Computers Engineering Department Alshrouck Academy, Cairo, Egypt
| | - Salah M. Abdelmageid
- Computer Engineering Department, Collage of Comp. Science and Engineering, Taibah University, Medina, Saudi Arabia
| | - Adel Alkhalil
- College of Computer Science and Engineering, University of Hai’l, Hai’l, Saudi Arabia
| | - Yasser M. Omar
- Arab Academy for Science Technology and Maritime Transport, Cairo, Egypt
| | - Mahmoud I. Marie
- Computer and System Engineering Department, Al-Azhar University, Cairo, Egypt
| | - Rabie A. Ramadan
- College of Computer Science and Engineering, University of Hai’l, Hai’l, Saudi Arabia
- Computer Engineering Department, Cairo Universality, Cairo, Egypt
| |
Collapse
|
8
|
Falcón-Cano G, Molina C, Cabrera-Pérez MÁ. ADME prediction with KNIME: In silico aqueous solubility consensus model based on supervised recursive random forest approaches. ADMET AND DMPK 2020; 8:251-273. [PMID: 35300309 PMCID: PMC8915604 DOI: 10.5599/admet.852] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 08/01/2020] [Indexed: 12/12/2022] Open
Abstract
In-silico prediction of aqueous solubility plays an important role during the drug discovery and development processes. For many years, the limited performance of in-silico solubility models has been attributed to the lack of high-quality solubility data for pharmaceutical molecules. However, some studies suggest that the poor accuracy of solubility prediction is not related to the quality of the experimental data and that more precise methodologies (algorithms and/or set of descriptors) are required for predicting aqueous solubility for pharmaceutical molecules. In this study a large and diverse database was generated with aqueous solubility values collected from two public sources; two new recursive machine-learning approaches were developed for data cleaning and variable selection, and a consensus model based on regression and classification algorithms was created. The modeling protocol, which includes the curation of chemical and experimental data, was implemented in KNIME, with the aim of obtaining an automated workflow for the prediction of new databases. Finally, we compared several methods or models available in the literature with our consensus model, showing results comparable or even outperforming previous published models.
Collapse
Affiliation(s)
- Gabriela Falcón-Cano
- Unit of Modeling and Experimental Biopharmaceutics. Centro de Bioactivos Químicos. Universidad Central “Marta Abreu” de las Villas. Santa Clara 54830, Villa Clara, Cuba
| | | | - Miguel Ángel Cabrera-Pérez
- Unit of Modeling and Experimental Biopharmaceutics. Centro de Bioactivos Químicos. Universidad Central “Marta Abreu” de las Villas. Santa Clara 54830, Villa Clara, Cuba
- Department of Pharmacy and Pharmaceutical Technology, University of Valencia, Burjassot 46100, Valencia, Spain
- Department of Engineering, Area of Pharmacy and Pharmaceutical Technology, Miguel Hernández University, 03550 Sant Joan d'Alacant, Alicante, Spain
| |
Collapse
|
9
|
Toropov AA, Toropova AP, Marzo M, Benfenati E. Use of the index of ideality of correlation to improve aquatic solubility model. J Mol Graph Model 2020; 96:107525. [DOI: 10.1016/j.jmgm.2019.107525] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 11/27/2019] [Accepted: 12/23/2019] [Indexed: 12/18/2022]
|
10
|
A micro-XRT image analysis and machine learning methodology for the characterisation of multi-particulate capsule formulations. INTERNATIONAL JOURNAL OF PHARMACEUTICS-X 2020; 2:100041. [PMID: 32025658 PMCID: PMC6997304 DOI: 10.1016/j.ijpx.2020.100041] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 11/18/2019] [Accepted: 11/18/2019] [Indexed: 11/30/2022]
Abstract
The application of X-ray microtomography for quantitative structural analysis of pharmaceutical multi-particulate systems was demonstrated for commercial capsules, each containing approximately 300 formulated ibuprofen pellets. The implementation of a marker-supported watershed transformation enabled the reliable segmentation of the pellet population for the 3D analysis of individual pellets. Isolated translation- and rotation-invariant object cross-sections expanded the applicability to additional 2D image analysis techniques. The full structural characterisation gave access to over 200 features quantifying aspects of the pellets' size, shape, porosity, surface and orientation. The extracted features were assessed using a ReliefF feature selection method and a supervised Support Vector Machine learning algorithm to build a model for the detection of broken pellets within each capsule. Data of three features from distinct structure-related categories were used to build classification models with an accuracy of more than 99.55% and a minimum precision of 86.20% validated with a test dataset of 886 pellets. This approach to extract quantitative information on particle quality attributes combined with advanced data analysis strategies has clear potential to directly inform manufacturing processes, accelerating development and optimisation. Coupling micro-XRT analysis with feature selection and machine learning for advanced pharmaceutical product characterisation. Information on particle 3D-orientation were utilised to extract translation- and rotation-invariant object cross-sections. Successful extraction of over 200 quantitative pellet descriptors linked to size, shape, porosity, surface and orientation. Sensitivity analysis and ReliefF feature selection approach to identify predictive features for pellet classification. Feature-based binary SVM classification model for the detection of broken pellets within the formulated system.
Collapse
Key Words
- Abbreviation, Description
- Classification model
- Feature selection
- IEV, Translation- and rotation-invariant cross-section
- Machine learning
- Micro-XRT particle analysis
- OC-SVM, One-class support vector machine
- OSH, Optimal separating hyperplane
- Pharmaceutical formulation
- RBF, Radial basis function
- ROI, Region-of-interest
- Sensitivity analysis
- TC-SVM, Two-class support vector machine
- V, Single pellet
- V_CP, Pellet population
- V_CP_Poros, Pellet population porosity
- V_CP_ROI, Pellet population region-of-interest
- V_CS, Capsule shell
- V_CS_InV, Capsule shell internal volume
- V_CS_Poros, Capsule shell void
- V_CS_ROI, Capsule shell region-of-interest
- V_ROI, Single pellet region-of-interest
- Watershed image segmentation
Collapse
|
11
|
Modeling Physico-Chemical ADMET Endpoints with Multitask Graph Convolutional Networks. Molecules 2019; 25:molecules25010044. [PMID: 31877719 PMCID: PMC6982787 DOI: 10.3390/molecules25010044] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 11/19/2022] Open
Abstract
Simple physico-chemical properties, like logD, solubility, or melting point, can reveal a great deal about how a compound under development might later behave. These data are typically measured for most compounds in drug discovery projects in a medium throughput fashion. Collecting and assembling all the Bayer in-house data related to these properties allowed us to apply powerful machine learning techniques to predict the outcome of those assays for new compounds. In this paper, we report our finding that, especially for predicting physicochemical ADMET endpoints, a multitask graph convolutional approach appears a highly competitive choice. For seven endpoints of interest, we compared the performance of that approach to fully connected neural networks and different single task models. The new model shows increased predictive performance compared to previous modeling methods and will allow early prioritization of compounds even before they are synthesized. In addition, our model follows the generalized solubility equation without being explicitly trained under this constraint.
Collapse
|
12
|
Doerr FJS, Florence AJ. WITHDRAWN: A micro-XRT Image Analysis and Machine Learning Methodology for the Characterisation of Multi-Particulate Capsule Formulations. Int J Pharm 2019:118897. [PMID: 31836483 DOI: 10.1016/j.ijpharm.2019.118897] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 11/18/2019] [Accepted: 11/18/2019] [Indexed: 11/22/2022]
Abstract
The Publisher regrets that this article is an accidental duplication of a published article,https://doi.org/10.1016/j.ijpx.2020.100041. The duplicate article has therefore been withdrawn. The full Elsevier Policy on Article Withdrawal can be found at https://www.elsevier.com/about/our-business/policies/article-withdrawal.
Collapse
Affiliation(s)
- Frederik J S Doerr
- EPSRC CMAC Future Manufacturing Research Hub, Technology and Innovation Centre, 99 George Street, Glasgow, G1 1RD, UK; Strathclyde Institute of Pharmacy \& Biomedical Sciences (SIPBS), University of Strathclyde, Glasgow, G4 0RE, UK
| | - Alastair J Florence
- EPSRC CMAC Future Manufacturing Research Hub, Technology and Innovation Centre, 99 George Street, Glasgow, G1 1RD, UK; Strathclyde Institute of Pharmacy \& Biomedical Sciences (SIPBS), University of Strathclyde, Glasgow, G4 0RE, UK. http://www.cmac.ac.uk
| |
Collapse
|
13
|
Esaki T, Ohashi R, Watanabe R, Natsume-Kitatani Y, Kawashima H, Nagao C, Komura H, Mizuguchi K. Constructing an In Silico Three-Class Predictor of Human Intestinal Absorption With Caco-2 Permeability and Dried-DMSO Solubility. J Pharm Sci 2019; 108:3630-3639. [DOI: 10.1016/j.xphs.2019.07.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 07/06/2019] [Accepted: 07/17/2019] [Indexed: 01/03/2023]
|
14
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 351] [Impact Index Per Article: 70.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
15
|
Sun H, Shah P, Nguyen K, Yu KR, Kerns E, Kabir M, Wang Y, Xu X. Predictive models of aqueous solubility of organic compounds built on A large dataset of high integrity. Bioorg Med Chem 2019; 27:3110-3114. [PMID: 31176566 DOI: 10.1016/j.bmc.2019.05.037] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 05/09/2019] [Accepted: 05/25/2019] [Indexed: 11/18/2022]
Abstract
Aqueous solubility is one of the most important properties in drug discovery, as it has profound impact on various drug properties, including biological activity, pharmacokinetics (PK), toxicity, and in vivo efficacy. Both kinetic and thermodynamic solubilities are determined during different stages of drug discovery and development. Since kinetic solubility is more relevant in preclinical drug discovery research, especially during the structure optimization process, we have developed predictive models for kinetic solubility with in-house data generated from 11,780 compounds collected from over 200 NCATS intramural research projects. This represents one of the largest kinetic solubility datasets of high quality and integrity. Based on the customized atom type descriptors, the support vector classification (SVC) models were trained on 80% of the whole dataset, and exhibited high predictive performance for estimating the solubility of the remaining 20% compounds within the test set. The values of the area under the receiver operating characteristic curve (AUC-ROC) for the compounds in the test sets reached 0.93 and 0.91, when the threshold for insoluble compounds was set to 10 and 50 μg/mL respectively. The predictive models of aqueous solubility can be used to identify insoluble compounds in drug discovery pipeline, provide design ideas for improving solubility by analyzing the atom types associated with poor solubility and prioritize compound libraries to be purchased or synthesized.
Collapse
Affiliation(s)
- Hongmao Sun
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States.
| | - Pranav Shah
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States
| | - Kimloan Nguyen
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States
| | - Kyeong Ri Yu
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States
| | - Ed Kerns
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States
| | - Md Kabir
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States
| | - Yuhong Wang
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States
| | - Xin Xu
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Dr., Rockville, MD 20850, United States.
| |
Collapse
|
16
|
Raevsky OA, Grigorev VY, Polianczyk DE, Raevskaja OE, Dearden JC. Aqueous Drug Solubility: What Do We Measure, Calculate and QSPR Predict? Mini Rev Med Chem 2019; 19:362-372. [PMID: 30058484 DOI: 10.2174/1389557518666180727164417] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Revised: 07/06/2018] [Accepted: 07/20/2018] [Indexed: 01/07/2023]
Abstract
Detailed critical analysis of publications devoted to QSPR of aqueous solubility is presented in the review with discussion of four types of aqueous solubility (three different thermodynamic solubilities with unknown solute structure, intrinsic solubility, solubility in physiological media at pH=7.4 and kinetic solubility), variety of molecular descriptors (from topological to quantum chemical), traditional statistical and machine learning methods as well as original QSPR models.
Collapse
Affiliation(s)
- Oleg A Raevsky
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, Chernogolovka, Russian Federation
| | - Veniamin Y Grigorev
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, Chernogolovka, Russian Federation
| | - Daniel E Polianczyk
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, Chernogolovka, Russian Federation
| | - Olga E Raevskaja
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, Chernogolovka, Russian Federation
| | - John C Dearden
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
17
|
Raevsky OA, Grigorev VY, Polianczyk DE, Raevskaja OE, Dearden JC. Six global and local QSPR models of aqueous solubility at pH = 7.4 based on structural similarity and physicochemical descriptors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:661-676. [PMID: 28891683 DOI: 10.1080/1062936x.2017.1368704] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 08/14/2017] [Indexed: 06/07/2023]
Abstract
Aqueous solubility at pH = 7.4 is a very important property for medicinal chemists because this is the pH value of physiological media. The present work describes the application of three different methods (support vector machine (SVM), random forest (RF) and multiple linear regression (MLR)) and three local quantitative structure-property relationship (QSPR) models (regression corrected by nearest neighbours (RCNN), arithmetic mean property (AMP) and local regression property (LoReP)) to construct stable QSPRs with clear mechanistic interpretation. Our data set contained experimental values of aqueous solubility at pH = 7.4 of 387 chemicals (349 in the training set and 38 in the test set including 16 own measurements). The initial descriptor pool contained 210 physicochemical descriptors, calculated from the HYBOT, DRAGON, SYBYL and VolSurf+ programs. Six QSPRs with good statistics based on fundamentals of aqueous solubility and optimization of descriptor space were obtained. Those models have an RMSE close to experimental error (0.70), and are amenable to physical interpretation. The QSPR models developed in this study may be useful for medicinal chemists. Global MLR, RF and SVM models may be valuable for consideration of common factors that influence solubility. The RCNN, AMP and LoReP local models may be helpful for the optimization of aqueous solubility in small sets of related chemicals.
Collapse
Affiliation(s)
- O A Raevsky
- a Department of Computer-Aided Molecular Design , Russian Academy of Science , Chernogolovka , Russia
| | - V Y Grigorev
- a Department of Computer-Aided Molecular Design , Russian Academy of Science , Chernogolovka , Russia
| | - D E Polianczyk
- a Department of Computer-Aided Molecular Design , Russian Academy of Science , Chernogolovka , Russia
| | - O E Raevskaja
- a Department of Computer-Aided Molecular Design , Russian Academy of Science , Chernogolovka , Russia
| | - J C Dearden
- b School of Pharmacy and Biomolecular Sciences , Liverpool John Moores University , Liverpool , UK
| |
Collapse
|
18
|
Kim S, Jinich A, Aspuru-Guzik A. MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes. J Chem Inf Model 2017; 57:657-668. [DOI: 10.1021/acs.jcim.6b00332] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Sungjin Kim
- Department of Chemistry and
Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, United States
| | - Adrián Jinich
- Department of Chemistry and
Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, United States
| | - Alán Aspuru-Guzik
- Department of Chemistry and
Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, United States
| |
Collapse
|
19
|
Ekins S. The Next Era: Deep Learning in Pharmaceutical Research. Pharm Res 2016; 33:2594-603. [PMID: 27599991 DOI: 10.1007/s11095-016-2029-7] [Citation(s) in RCA: 99] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2016] [Accepted: 08/23/2016] [Indexed: 01/22/2023]
Abstract
Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule's properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations Pharmaceuticals, Inc, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina, 27526, USA. .,Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California, 94010, USA.
| |
Collapse
|
20
|
Mining Chemical Activity Status from High-Throughput Screening Assays. PLoS One 2015; 10:e0144426. [PMID: 26658480 PMCID: PMC4682830 DOI: 10.1371/journal.pone.0144426] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 11/18/2015] [Indexed: 01/20/2023] Open
Abstract
High-throughput screening (HTS) experiments provide a valuable resource that reports biological activity of numerous chemical compounds relative to their molecular targets. Building computational models that accurately predict such activity status (active vs. inactive) in specific assays is a challenging task given the large volume of data and frequently small proportion of active compounds relative to the inactive ones. We developed a method, DRAMOTE, to predict activity status of chemical compounds in HTP activity assays. For a class of HTP assays, our method achieves considerably better results than the current state-of-the-art-solutions. We achieved this by modification of a minority oversampling technique. To demonstrate that DRAMOTE is performing better than the other methods, we performed a comprehensive comparison analysis with several other methods and evaluated them on data from 11 PubChem assays through 1,350 experiments that involved approximately 500,000 interactions between chemicals and their target proteins. As an example of potential use, we applied DRAMOTE to develop robust models for predicting FDA approved drugs that have high probability to interact with the thyroid stimulating hormone receptor (TSHR) in humans. Our findings are further partially and indirectly supported by 3D docking results and literature information. The results based on approximately 500,000 interactions suggest that DRAMOTE has performed the best and that it can be used for developing robust virtual screening models. The datasets and implementation of all solutions are available as a MATLAB toolbox online at www.cbrc.kaust.edu.sa/dramote and can be found on Figshare.
Collapse
|
21
|
Jasial S, Balfer J, Vogt M, Bajorath J. Determination of Meta-Parameters for Support Vector Machine Linear Combinations. Mol Inform 2015; 34:127-33. [DOI: 10.1002/minf.201400163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 12/16/2014] [Indexed: 11/05/2022]
|
22
|
Abstract
The emphasis of this review is particularly on multivariate statistical methods currently used in quantitative structure–activity relationship (QSAR) studies.
Collapse
Affiliation(s)
- Somayeh Pirhadi
- Drug Design in Silico Lab
- Chemistry Faculty
- K. N. Toosi University of Technology
- Tehran
- Iran
| | | | - Jahan B. Ghasemi
- Drug Design in Silico Lab
- Chemistry Faculty
- K. N. Toosi University of Technology
- Tehran
- Iran
| |
Collapse
|
23
|
Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today 2014; 20:318-31. [PMID: 25448759 DOI: 10.1016/j.drudis.2014.10.012] [Citation(s) in RCA: 358] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 09/27/2014] [Accepted: 10/24/2014] [Indexed: 12/19/2022]
Abstract
During the past decade, virtual screening (VS) has evolved from traditional similarity searching, which utilizes single reference compounds, into an advanced application domain for data mining and machine-learning approaches, which require large and representative training-set compounds to learn robust decision rules. The explosive growth in the amount of public domain-available chemical and biological data has generated huge effort to design, analyze, and apply novel learning methodologies. Here, I focus on machine-learning techniques within the context of ligand-based VS (LBVS). In addition, I analyze several relevant VS studies from recent publications, providing a detailed view of the current state-of-the-art in this field and highlighting not only the problematic issues, but also the successes and opportunities for further advances.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Department of Pharmacy, Drug Discovery Laboratory, University of Napoli 'Federico II', via D. Montesano 49, I-80131 Napoli, Italy.
| |
Collapse
|
24
|
Korkmaz S, Zararsiz G, Goksuluk D. Drug/nondrug classification using Support Vector Machines with various feature selection strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 117:51-60. [PMID: 25224081 DOI: 10.1016/j.cmpb.2014.08.009] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 08/15/2014] [Accepted: 08/27/2014] [Indexed: 06/03/2023]
Abstract
In conjunction with the advance in computer technology, virtual screening of small molecules has been started to use in drug discovery. Since there are thousands of compounds in early-phase of drug discovery, a fast classification method, which can distinguish between active and inactive molecules, can be used for screening large compound collections. In this study, we used Support Vector Machines (SVM) for this type of classification task. SVM is a powerful classification tool that is becoming increasingly popular in various machine-learning applications. The data sets consist of 631 compounds for training set and 216 compounds for a separate test set. In data pre-processing step, the Pearson's correlation coefficient used as a filter to eliminate redundant features. After application of the correlation filter, a single SVM has been applied to this reduced data set. Moreover, we have investigated the performance of SVM with different feature selection strategies, including SVM-Recursive Feature Elimination, Wrapper Method and Subset Selection. All feature selection methods generally represent better performance than a single SVM while Subset Selection outperforms other feature selection methods. We have tested SVM as a classification tool in a real-life drug discovery problem and our results revealed that it could be a useful method for classification task in early-phase of drug discovery.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey.
| | - Gokmen Zararsiz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| |
Collapse
|
25
|
Hao M, Wang Y, Bryant SH. An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data. Anal Chim Acta 2014; 806:117-27. [PMID: 24331047 PMCID: PMC3884825 DOI: 10.1016/j.aca.2013.10.050] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2013] [Revised: 10/25/2013] [Accepted: 10/28/2013] [Indexed: 01/28/2023]
Abstract
It is common that imbalanced datasets are often generated from high-throughput screening (HTS). For a given dataset without taking into account the imbalanced nature, most classification methods tend to produce high predictive accuracy for the majority class, but significantly poor performance for the minority class. In this work, an efficient algorithm, GLMBoost, coupled with Synthetic Minority Over-sampling TEchnique (SMOTE) is developed and utilized to overcome the problem for several imbalanced datasets from PubChem BioAssay. By applying the proposed combinatorial method, those data of rare samples (active compounds), for which usually poor results are generated, can be detected apparently with high balanced accuracy (Gmean). As a comparison with GLMBoost, Random Forest (RF) combined with SMOTE is also adopted to classify the same datasets. Our results show that the former (GLMBoost+SMOTE) not only exhibits higher performance as measured by the percentage of correct classification for the rare samples (Sensitivity) and Gmean, but also demonstrates greater computational efficiency than the latter (RF+SMOTE). Therefore, we hope that the proposed combinatorial algorithm based on GLMBoost and SMOTE could be extensively used to tackle the imbalanced classification problem.
Collapse
Affiliation(s)
- Ming Hao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
26
|
Zang Q, Rotroff DM, Judson RS. Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure–Activity Relationship and Machine Learning Methods. J Chem Inf Model 2013; 53:3244-61. [DOI: 10.1021/ci400527b] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
| | - Daniel M. Rotroff
- Bioinformatics
Research Center, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, United States
| | | |
Collapse
|
27
|
|
28
|
Salahinejad M, Le TC, Winkler DA. Aqueous Solubility Prediction: Do Crystal Lattice Interactions Help? Mol Pharm 2013; 10:2757-66. [DOI: 10.1021/mp4001958] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Maryam Salahinejad
- Faculty of Chemistry, Tarbiat Moallem University, Tehran 15719-14911, Iran
- CSIRO Materials Science & Engineering, Clayton 3168, Australia
- Monash Institute of Pharmaceutical Sciences, Parkville 3052, Australia
| | - Tu C. Le
- CSIRO Materials Science & Engineering, Clayton 3168, Australia
| | - David A. Winkler
- CSIRO Materials Science & Engineering, Clayton 3168, Australia
- Monash Institute of Pharmaceutical Sciences, Parkville 3052, Australia
| |
Collapse
|
29
|
Fingerprint design and engineering strategies: rationalizing and improving similarity search performance. Future Med Chem 2013; 4:1945-59. [PMID: 23088275 DOI: 10.4155/fmc.12.126] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Fingerprints (FPs) are bit or integer string representations of molecular structure and properties, and are popular descriptors for chemical similarity searching. A major goal of similarity searching is the identification of novel active compounds on the basis of known reference molecules. In this review recent FP design and engineering strategies are discussed. New types of FPs continue to be replaced, often applying different design principles. FP engineering techniques have recently been introduced to further improve search performance and computational efficiency and elucidate mechanisms by which FPs recognize active compounds. In addition, through feature selection and hybridization techniques, standard FPs have been transformed into compound class-specific versions with further increased search performance. Moreover, scaffold hopping mechanisms have been explored. FPs will continue to play an important role in the search for novel active compounds.
Collapse
|
30
|
Yu P, Wild DJ. Discovering associations in biomedical datasets by link-based associative classifier (LAC). PLoS One 2012; 7:e51018. [PMID: 23227228 PMCID: PMC3515483 DOI: 10.1371/journal.pone.0051018] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 10/31/2012] [Indexed: 11/21/2022] Open
Abstract
Associative classification mining (ACM) can be used to provide predictive models with high accuracy as well as interpretability. However, traditional ACM ignores the difference of significances among the features used for mining. Although weighted associative classification mining (WACM) addresses this issue by assigning different weights to features, most implementations can only be utilized when pre-assigned weights are available. In this paper, we propose a link-based approach to automatically derive weight information from a dataset using link-based models which treat the dataset as a bipartite model. By combining this link-based feature weighting method with a traditional ACM method–classification based on associations (CBA), a Link-based Associative Classifier (LAC) is developed. We then demonstrate the application of LAC to biomedical datasets for association discovery between chemical compounds and bioactivities or diseases. The results indicate that the novel link-based weighting method is comparable to support vector machine (SVM) and RELIEF method, and is capable of capturing significant features. Additionally, LAC is shown to produce models with high accuracies and discover interesting associations which may otherwise remain unrevealed by traditional ACM.
Collapse
Affiliation(s)
- Pulan Yu
- School of Informatics and Computing, Indiana University, Bloomington, Indiana, United States of America
| | - David J. Wild
- School of Informatics and Computing, Indiana University, Bloomington, Indiana, United States of America
- * E-mail:
| |
Collapse
|
31
|
Elder D, Holm R. Aqueous solubility: simple predictive methods (in silico, in vitro and bio-relevant approaches). Int J Pharm 2012; 453:3-11. [PMID: 23124107 DOI: 10.1016/j.ijpharm.2012.10.041] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Revised: 10/18/2012] [Accepted: 10/24/2012] [Indexed: 11/28/2022]
Abstract
Aqueous solubility is a key physicochemical attribute required for the characterisation of an active pharmaceutical ingredient (API) during drug discovery and beyond. Furthermore, aqueous solubility is highly important for formulation selection and subsequent development processes. This review provides a summary of simple predictive methods used to assess aqueous solubility as well as an assessment of the more complex in silico methodologies and a review of the recent solubility challenge. In addition, a summary of experimental methods to determine solubility is included, with a discussion of some potential pitfalls.
Collapse
Affiliation(s)
- David Elder
- GSK Pharmaceuticals, Park Road, Ware, Hertfordshire, SG12 0DP, United Kingdom
| | | |
Collapse
|
32
|
Cheng F, Li W, Zhou Y, Shen J, Wu Z, Liu G, Lee PW, Tang Y. admetSAR: a comprehensive source and free tool for assessment of chemical ADMET properties. J Chem Inf Model 2012; 52:3099-105. [PMID: 23092397 DOI: 10.1021/ci300367a] [Citation(s) in RCA: 1136] [Impact Index Per Article: 94.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties play key roles in the discovery/development of drugs, pesticides, food additives, consumer products, and industrial chemicals. This information is especially useful when to conduct environmental and human hazard assessment. The most critical rate limiting step in the chemical safety assessment workflow is the availability of high quality data. This paper describes an ADMET structure-activity relationship database, abbreviated as admetSAR. It is an open source, text and structure searchable, and continually updated database that collects, curates, and manages available ADMET-associated properties data from the published literature. In admetSAR, over 210,000 ADMET annotated data points for more than 96,000 unique compounds with 45 kinds of ADMET-associated properties, proteins, species, or organisms have been carefully curated from a large number of diverse literatures. The database provides a user-friendly interface to query a specific chemical profile, using either CAS registry number, common name, or structure similarity. In addition, the database includes 22 qualitative classification and 5 quantitative regression models with highly predictive accuracy, allowing to estimate ecological/mammalian ADMET properties for novel chemicals. AdmetSAR is accessible free of charge at http://www.admetexp.org.
Collapse
Affiliation(s)
- Feixiong Cheng
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | | | | | | | | | | | | | | |
Collapse
|
33
|
Vogt M, Bajorath J. Chemoinformatics: A view of the field and current trends in method development. Bioorg Med Chem 2012; 20:5317-23. [DOI: 10.1016/j.bmc.2012.03.030] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Revised: 03/09/2012] [Accepted: 03/12/2012] [Indexed: 12/18/2022]
|
34
|
O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open Babel: An open chemical toolbox. J Cheminform 2011; 3:33. [PMID: 21982300 PMCID: PMC3198950 DOI: 10.1186/1758-2946-3-33] [Citation(s) in RCA: 5080] [Impact Index Per Article: 390.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Accepted: 10/07/2011] [Indexed: 02/08/2023] Open
Abstract
Background A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-neutral formats. Results We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license from http://openbabel.org.
Collapse
Affiliation(s)
- Noel M O'Boyle
- University of Pittsburgh, Department of Chemistry, 219 Parkman Avenue, Pittsburgh, PA 15217, USA.
| | | | | | | | | | | |
Collapse
|
35
|
O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open Babel: An open chemical toolbox. J Cheminform 2011. [PMID: 21982300 DOI: 10.1186/1758-2946-3-33.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-neutral formats. RESULTS We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. CONCLUSIONS Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license from http://openbabel.org.
Collapse
Affiliation(s)
- Noel M O'Boyle
- University of Pittsburgh, Department of Chemistry, 219 Parkman Avenue, Pittsburgh, PA 15217, USA.
| | | | | | | | | | | |
Collapse
|
36
|
Hammann F, Suenderhauf C, Huwyler J. A binary ant colony optimization classifier for molecular activities. J Chem Inf Model 2011; 51:2690-6. [PMID: 21854036 DOI: 10.1021/ci200186m] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Chemical fingerprints encode the presence or absence of molecular features and are available in many large databases. Using a variation of the Ant Colony Optimization (ACO) paradigm, we describe a binary classifier based on feature selection from fingerprints. We discuss the algorithm and possible cross-validation procedures. As a real-world example, we use our algorithm to analyze a Plasmodium falciparum inhibition assay and contrast its performance with other machine learning paradigms in use today (decision tree induction, random forests, support vector machines, artificial neural networks). Our algorithm matches established paradigms in predictive power, yet supplies the medicinal chemist and basic researcher with easily interpretable results. Furthermore, models generated with our paradigm are easy to implement and can complement virtual screenings by additionally exploiting the precalculated fingerprint information.
Collapse
Affiliation(s)
- Felix Hammann
- Division of Pharmaceutical Technology, Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50 4056, Basel, Switzerland.
| | | | | |
Collapse
|
37
|
Guha R, Dexheimer TS, Kestranek AN, Jadhav A, Chervenak AM, Ford MG, Simeonov A, Roth GP, Thomas CJ. Exploratory analysis of kinetic solubility measurements of a small molecule library. Bioorg Med Chem 2011; 19:4127-34. [PMID: 21640593 PMCID: PMC3236531 DOI: 10.1016/j.bmc.2011.05.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2011] [Revised: 04/29/2011] [Accepted: 05/04/2011] [Indexed: 11/20/2022]
Abstract
Kinetic solubility measurements using prototypical assay buffer conditions are presented for a ∼58,000 member library of small molecules. Analyses of the data based upon physical and calculated properties of each individual molecule were performed and resulting trends were considered in the context of commonly held opinions of how physicochemical properties influence aqueous solubility. We further analyze the data using a decision tree model for solubility prediction and via a multi-dimensional assessment of physicochemical relationships to solubility in the context of specific 'rule-breakers' relative to common dogma. The role of solubility as a determinant of assay outcome is also considered based upon each compound's cross-assay activity score for a collection of publicly available screening results. Further, the role of solubility as a governing factor for colloidal aggregation formation within a specified assay setting is examined and considered as a possible cause of a high cross-assay activity score. The results of this solubility profile should aid chemists during library design and optimization efforts and represent a useful training set for computational solubility prediction.
Collapse
Affiliation(s)
- Rajarshi Guha
- NIH Chemical Genomics Center, National Human Genome Research Institute, NIH 9800 Medical Center Drive, MSC 3370 Bethesda, MD 20892-3370 USA
| | - Thomas S. Dexheimer
- NIH Chemical Genomics Center, National Human Genome Research Institute, NIH 9800 Medical Center Drive, MSC 3370 Bethesda, MD 20892-3370 USA
| | - Aimee N. Kestranek
- Analiza, Inc., 3615 Superior Avenue, Suite 4407B, Cleveland, OH 44114 USA
| | - Ajit Jadhav
- NIH Chemical Genomics Center, National Human Genome Research Institute, NIH 9800 Medical Center Drive, MSC 3370 Bethesda, MD 20892-3370 USA
| | | | - Michael G. Ford
- Analiza, Inc., 3615 Superior Avenue, Suite 4407B, Cleveland, OH 44114 USA
| | - Anton Simeonov
- NIH Chemical Genomics Center, National Human Genome Research Institute, NIH 9800 Medical Center Drive, MSC 3370 Bethesda, MD 20892-3370 USA
| | - Gregory P. Roth
- Sanford–Burnham Medical Research Institute at Lake Nona, Conrad Prebys Center for Chemical Genomics, 6400 Sanger Road, Orlando, Florida 32827
| | - Craig J. Thomas
- NIH Chemical Genomics Center, National Human Genome Research Institute, NIH 9800 Medical Center Drive, MSC 3370 Bethesda, MD 20892-3370 USA
| |
Collapse
|