1
|
de Oliveira LHD, Cruz JN, Dos Santos CBR, de Melo EB. Multivariate QSAR, similarity search and ADMET studies based in a set of methylamine derivatives described as dopamine transporter inhibitors. Mol Divers 2024; 28:2931-2946. [PMID: 37670118 DOI: 10.1007/s11030-023-10724-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 08/27/2023] [Indexed: 09/07/2023]
Abstract
The dopamine transporter (DAT), responsible for the regulation of dopaminergic neurotransmission, is implicated in the etiology of several neuropsychiatric disorders which, in turn, have contributed to high rates of disability and numerous deaths in recent years, significantly impacting the global health system. Although the research for new drugs for the treatment of neuropsychiatric disorders has evolved in recent years, the availability of DAT-selective drugs that do not generate the same psychostimulant effects observed in drugs of abuse remains scarce. Therefore, we performed a QSAR study based on a dataset of 36 methylamine derivatives described as DAT inhibitors. The model was obtained based only in descriptors derived from 2D structures, and it was validated and generated satisfactory results considering the metrics used for internal and external validation. Subsequently, a virtual screening step also based on 2D similarity was performed, where it was possible to identify a total of 1157 compounds. After a series of reductions of the set using toxicity filters, applicability domain evaluation, and pharmacokinetic properties in silico assessment, seven hit compounds were selected as the most promising to be used, in future studies, as new scaffolds for the development of new DAT inhibitors.
Collapse
Affiliation(s)
- Luiz Henrique Dias de Oliveira
- Theorical Medicinal and Environmental Chemistry Laboratory (LQMAT), Department of Pharmacy, Western Paraná State University (UNIOESTE), 2069 Universitária St., Cascavel, PR, 85819-110, Brazil
| | - Jorddy Neves Cruz
- Laboratory of Modeling and Computational Chemistry, Department of Biological and Health Sciences, Federal University of Amapá, Macapá, AP, 68902-280, Brazil
| | - Cleydson Breno Rodrigues Dos Santos
- Laboratory of Modeling and Computational Chemistry, Department of Biological and Health Sciences, Federal University of Amapá, Macapá, AP, 68902-280, Brazil
| | - Eduardo Borges de Melo
- Theorical Medicinal and Environmental Chemistry Laboratory (LQMAT), Department of Pharmacy, Western Paraná State University (UNIOESTE), 2069 Universitária St., Cascavel, PR, 85819-110, Brazil.
| |
Collapse
|
2
|
Kaneko H. Evaluation and Optimization Methods for Applicability Domain Methods and Their Hyperparameters, Considering the Prediction Performance of Machine Learning Models. ACS OMEGA 2024; 9:11453-11458. [PMID: 38496944 PMCID: PMC10938389 DOI: 10.1021/acsomega.3c08036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 01/19/2024] [Accepted: 02/12/2024] [Indexed: 03/19/2024]
Abstract
In molecular, material, and process design and control, the applicability domain (AD) of a mathematical model y = f(x) between properties, activities, and features x is constructed. As there are multiple AD methods, each with its own set of hyperparameters, it is necessary to select an appropriate AD method and hyperparameters for each data set and mathematical model. However, there is no method for optimizing the AD model. This study proposes a method for evaluating and optimizing the AD model for each data set and a mathematical model. Using the predictions of double cross-validation with all samples, the relationship between coverage and root-mean-squared error (RMSE) was calculated for all combinations of AD methods and their hyperparameters, and the area under the coverage and RMSE curve (AUCR) was calculated. The AD model with the lowest AUCR value was selected as the optimal fit for the mathematical model. The proposed method was validated using eight data sets, including molecules, materials, and spectra, demonstrating that the proposed method could generate optimal AD models for all data sets. The Python code for the proposed method is available at https://github.com/hkaneko1985/dcekit.
Collapse
Affiliation(s)
- Hiromasa Kaneko
- Department of Applied Chemistry,
School of Science and Technology, Meiji
University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| |
Collapse
|
3
|
Han M, Jin B, Liang J, Huang C, Arp HPH. Developing machine learning approaches to identify candidate persistent, mobile and toxic (PMT) and very persistent and very mobile (vPvM) substances based on molecular structure. WATER RESEARCH 2023; 244:120470. [PMID: 37595327 DOI: 10.1016/j.watres.2023.120470] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 08/20/2023]
Abstract
Determining which substances on the global market could be classified as persistent, mobile and toxic (PMT) substances or very persistent, very mobile (vPvM) substances is essential to prevent or reduce drinking water contamination from them. This study developed machine learning models based on different molecular descriptors (MDs) and defined applicability domains for the screening of PMT/vPvM substances. The models were trained with 3111 substances with expert weight-of-evidence based PMT/vPvM hazard classifications that considered the highest quality data available. The model was based on the hypothesis that PMT/vPvM substances contain similar MDs, representative of chemical structures resistant to degradation, be associated with low sorption (or high-water solubility) and in some cases be associated with known toxic mechanisms. All possible model combinations were tested by integrating different molecular description methods, data balancing strategies and machine learning algorithms. Our model allows one-step prediction of candidate PMT/vPvM substances, and our method was compared with the approach predicting P, M and T separately (i.e. three-step prediction). The results showed that the one-step model achieved a higher accuracy of 92% for PMT/vPvM identification (i.e. positive samples) for an internal test set, and also resulted in a higher accuracy of 90% for an external test set of chemical pollutants detected in Taihu Lake, China. Furthermore, prediction mechanism of the model was interpreted by Shapley additive explanations (SHAP). This work presents an advance of big data in silico screening models for the identification of substances that potentially meet the PMT/vPvM criteria.
Collapse
Affiliation(s)
- Min Han
- State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China
| | - Biao Jin
- State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China.
| | - Jun Liang
- School of Software, South China Normal University, Foshan, 528225, China
| | - Chen Huang
- State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou, 510640, China; CAS Center for Excellence in Deep Earth Science, Guangzhou, 510640, China; University of Chinese Academy of Sciences, Beijing, 10069, China
| | - Hans Peter H Arp
- Norwegian Geotechnical Institute (NGI), P.O. Box 3930 Ullevaal Stadion, Oslo, N-0806, Norway; Norwegian University of Science and Technology (NTNU), Trondheim, NO-7491, Norway
| |
Collapse
|
4
|
Li J, Wang C, Yue L, Chen F, Cao X, Wang Z. Nano-QSAR modeling for predicting the cytotoxicity of metallic and metal oxide nanoparticles: A review. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2022; 243:113955. [PMID: 35961199 DOI: 10.1016/j.ecoenv.2022.113955] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 07/11/2022] [Accepted: 08/03/2022] [Indexed: 06/15/2023]
Abstract
Given the rapid development of nanotechnology, it is crucial to understand the effects of nanoparticles on living organisms. However, it is laborious to perform toxicological tests on a case-by-case basis. Quantitative structure-activity relationship (QSAR) is an effective computational technique because it saves time, costs, and animal sacrifice. Therefore, this review presents general procedures for the construction and application of nano-QSAR models of metal-based and metal-oxide nanoparticles (MBNPs and MONPs). We also provide an overview of available databases and common algorithms. The molecular descriptors and their roles in the toxicological interpretation of MBNPs and MONPs are systematically reviewed and the future of nano-QSAR is discussed. Finally, we address the growing demand for novel nano-specific descriptors, new computational strategies to address the data shortage, in situ data for regulatory concerns, a better understanding of the physicochemical properties of NPs with bioactivity, and, most importantly, the design of nano-QSAR for real-life environmental predictions rather than laboratory simulations.
Collapse
Affiliation(s)
- Jing Li
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Chuanxi Wang
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Le Yue
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Feiran Chen
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Xuesong Cao
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Zhenyu Wang
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China.
| |
Collapse
|
5
|
Korolev V, Nevolin I, Protsenko P. A universal similarity based approach for predictive uncertainty quantification in materials science. Sci Rep 2022; 12:14931. [PMID: 36056050 PMCID: PMC9440040 DOI: 10.1038/s41598-022-19205-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/25/2022] [Indexed: 11/08/2022] Open
Abstract
Immense effort has been exerted in the materials informatics community towards enhancing the accuracy of machine learning (ML) models; however, the uncertainty quantification (UQ) of state-of-the-art algorithms also demands further development. Most prominent UQ methods are model-specific or are related to the ensembles of models; therefore, there is a need to develop a universal technique that can be readily applied to a single model from a diverse set of ML algorithms. In this study, we suggest a new UQ measure known as the Δ-metric to address this issue. The presented quantitative criterion was inspired by the k-nearest neighbor approach adopted for applicability domain estimation in chemoinformatics. It surpasses several UQ methods in accurately ranking the predictive errors and could be considered a low-cost option for a more advanced deep ensemble strategy. We also evaluated the performance of the presented UQ measure on various classes of materials, ML algorithms, and types of input features, thus demonstrating its universality.
Collapse
Affiliation(s)
- Vadim Korolev
- Department of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia.
| | - Iurii Nevolin
- Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, 119071, Russia
| | - Pavel Protsenko
- Department of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia
| |
Collapse
|
6
|
Abstract
The problem of human trust is one of the most fundamental problems in applied artificial intelligence in drug discovery. In silico models have been widely used to accelerate the process of drug discovery in recent years. However, most of these models can only give reliable predictions within a limited chemical space that the training set covers (applicability domain). Predictions of samples falling outside the applicability domain are unreliable and sometimes dangerous for the drug-design decision-making process. Uncertainty quantification accordingly has drawn great attention to enable autonomous drug designing. By quantifying the confidence level of model predictions, the reliability of the predictions can be quantitatively represented to assist researchers in their molecular reasoning and experimental design. Here we summarize the state-of-the-art approaches to uncertainty quantification and underline how they can be used for drug design and discovery projects. Furthermore, we also outline four representative application scenarios of uncertainty quantification in drug discovery.
Collapse
Affiliation(s)
- Jie Yu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
- University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
| |
Collapse
|
7
|
Wang D, Yu J, Chen L, Li X, Jiang H, Chen K, Zheng M, Luo X. A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling. J Cheminform 2021; 13:69. [PMID: 34544485 PMCID: PMC8454160 DOI: 10.1186/s13321-021-00551-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/05/2021] [Indexed: 11/24/2022] Open
Abstract
Reliable uncertainty quantification for statistical models is crucial in various downstream applications, especially for drug design and discovery where mistakes may incur a large amount of cost. This topic has therefore absorbed much attention and a plethora of methods have been proposed over the past years. The approaches that have been reported so far can be mainly categorized into two classes: distance-based approaches and Bayesian approaches. Although these methods have been widely used in many scenarios and shown promising performance with their distinct superiorities, being overconfident on out-of-distribution examples still poses challenges for the deployment of these techniques in real-world applications. In this study we investigated a number of consensus strategies in order to combine both distance-based and Bayesian approaches together with post-hoc calibration for improved uncertainty quantification in QSAR (Quantitative Structure-Activity Relationship) regression modeling. We employed a set of criteria to quantitatively assess the ranking and calibration ability of these models. Experiments based on 24 bioactivity datasets were designed to make critical comparison between the model we proposed and other well-studied baseline models. Our findings indicate that the hybrid framework proposed by us can robustly enhance the model ability of ranking absolute errors. Together with post-hoc calibration on the validation set, we show that well-calibrated uncertainty quantification results can be obtained in domain shift settings. The complementarity between different methods is also conceptually analyzed.
Collapse
Affiliation(s)
- Dingyan Wang
- Shanghai Key Laboratory of Forensic Medicine, Academy of Forensic Science, Shanghai, 200063, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Jie Yu
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Lifan Chen
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xutong Li
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Hualiang Jiang
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Kaixian Chen
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Mingyue Zheng
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
| | - Xiaomin Luo
- Shanghai Key Laboratory of Forensic Medicine, Academy of Forensic Science, Shanghai, 200063, China.
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
| |
Collapse
|
8
|
Berenger F, Yamanishi Y. Ranking Molecules with Vanishing Kernels and a Single Parameter: Active Applicability Domain Included. J Chem Inf Model 2020; 60:4376-4387. [PMID: 32281797 DOI: 10.1021/acs.jcim.9b01075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In ligand-based virtual screening, high-throughput screening (HTS) data sets can be exploited to train classification models. Such models can be used to prioritize yet untested molecules, from the most likely active (against a protein target of interest) to the least likely active. In this study, a single-parameter ranking method with an Applicability Domain (AD) is proposed. In effect, Kernel Density Estimates (KDE) are revisited to improve their computational efficiency and incorporate an AD. Two modifications are proposed: (i) using vanishing kernels (i.e., kernel functions with a finite support) and (ii) using the Tanimoto distance between molecular fingerprints as a radial basis function. This construction is termed "Vanishing Ranking Kernels" (VRK). Using VRK on 21 HTS assays, it is shown that VRK can compete in performance with a graph convolutional deep neural network. VRK are conceptually simple and fast to train. During training, they require optimizing a single parameter. A trained VRK model usually defines an active AD. Exploiting this AD can significantly increase the screening frequency of a VRK model. Software: https://github.com/UnixJunkie/rankers. Data sets: https://zenodo.org/record/1320776 and https://zenodo.org/record/3540423.
Collapse
Affiliation(s)
- Francois Berenger
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Kawazu, 680-4 Iizuka, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Kawazu, 680-4 Iizuka, Japan
| |
Collapse
|
9
|
Computational Models Using Multiple Machine Learning Algorithms for Predicting Drug Hepatotoxicity with the DILIrank Dataset. Int J Mol Sci 2020; 21:ijms21062114. [PMID: 32204453 PMCID: PMC7139829 DOI: 10.3390/ijms21062114] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 03/13/2020] [Accepted: 03/17/2020] [Indexed: 02/07/2023] Open
Abstract
Drug-induced liver injury (DILI) remains one of the challenges in the safety profile of both authorized and candidate drugs, and predicting hepatotoxicity from the chemical structure of a substance remains a task worth pursuing. Such an approach is coherent with the current tendency for replacing non-clinical tests with in vitro or in silico alternatives. In 2016, a group of researchers from the FDA published an improved annotated list of drugs with respect to their DILI risk, constituting “the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans” (DILIrank). This paper is one of the few attempting to predict liver toxicity using the DILIrank dataset. Molecular descriptors were computed with the Dragon 7.0 software, and a variety of feature selection and machine learning algorithms were implemented in the R computing environment. Nested (double) cross-validation was used to externally validate the models selected. A total of 78 models with reasonable performance were selected and stacked through several approaches, including the building of multiple meta-models. The performance of the stacked models was slightly superior to other models published. The models were applied in a virtual screening exercise on over 100,000 compounds from the ZINC database and about 20% of them were predicted to be non-hepatotoxic.
Collapse
|
10
|
Ancuceanu R, Tamba B, Stoicescu CS, Dinu M. Use of QSAR Global Models and Molecular Docking for Developing New Inhibitors of c-src Tyrosine Kinase. Int J Mol Sci 2019; 21:ijms21010019. [PMID: 31861445 PMCID: PMC6981969 DOI: 10.3390/ijms21010019] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 12/15/2019] [Accepted: 12/16/2019] [Indexed: 12/11/2022] Open
Abstract
A prototype of a family of at least nine members, cellular Src tyrosine kinase is a therapeutically interesting target because its inhibition might be of interest not only in a number of malignancies, but also in a diverse array of conditions, from neurodegenerative pathologies to certain viral infections. Computational methods in drug discovery are considerably cheaper than conventional methods and offer opportunities of screening very large numbers of compounds in conditions that would be simply impossible within the wet lab experimental settings. We explored the use of global quantitative structure-activity relationship (QSAR) models and molecular ligand docking in the discovery of new c-src tyrosine kinase inhibitors. Using a dataset of 1038 compounds from ChEMBL database, we developed over 350 QSAR classification models. A total of 49 models with reasonably good performance were selected and the models were assembled by stacking with a simple majority vote and used for the virtual screening of over 100,000 compounds. A total of 744 compounds were predicted by at least 50% of the QSAR models as active, 147 compounds were within the applicability domain and predicted by at least 75% of the models to be active. The latter 147 compounds were submitted to molecular ligand docking using AutoDock Vina and LeDock, and 89 were predicted to be active based on the energy of binding.
Collapse
Affiliation(s)
- Robert Ancuceanu
- Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020956 Bucharest, Romania; (R.A.); (M.D.)
| | - Bogdan Tamba
- Advanced Research and Development Center for Experimental Medicine (CEMEX), Grigore T. Popa, University of Medicine and Pharmacy of Iasi, 700115 Iasi, Romania
- Correspondence:
| | - Cristina Silvia Stoicescu
- Department of Chemical Thermodynamics, Institute of Physical Chemistry “Ilie Murgulescu”, 060021 Bucharest, Romania;
| | - Mihaela Dinu
- Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020956 Bucharest, Romania; (R.A.); (M.D.)
| |
Collapse
|
11
|
Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105777] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
12
|
Cortés-Ciriano I, Bender A. Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout. J Chem Inf Model 2019; 59:3330-3339. [PMID: 31241929 DOI: 10.1021/acs.jcim.9b00297] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
While the use of deep learning in drug discovery is gaining increasing attention, the lack of methods to compute reliable errors in prediction for Neural Networks prevents their application to guide decision making in domains where identifying unreliable predictions is essential, e.g., precision medicine. Here, we present a framework to compute reliable errors in prediction for Neural Networks using Test-Time Dropout and Conformal Prediction. Specifically, the algorithm consists of training a single Neural Network using dropout, and then applying it N times to both the validation and test sets, also employing dropout in this step. Therefore, for each instance in the validation and test sets an ensemble of predictions are generated. The residuals and absolute errors in prediction for the validation set are then used to compute prediction errors for the test set instances using Conformal Prediction. We show using 24 bioactivity data sets from ChEMBL 23 that Dropout Conformal Predictors are valid (i.e., the fraction of instances whose true value lies within the predicted interval strongly correlates with the confidence level) and efficient, as the predicted confidence intervals span a narrower set of values than those computed with Conformal Predictors generated using Random Forest (RF) models. Lastly, we show in retrospective virtual screening experiments that dropout and RF-based Conformal Predictors lead to comparable retrieval rates of active compounds. Overall, we propose a computationally efficient framework (as only N extra forward passes are required in addition to training a single network) to harness Test-Time Dropout and the Conformal Prediction framework, which is generally applicable to generate reliable prediction errors for Deep Neural Networks in drug discovery and beyond.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| |
Collapse
|