1
|
de Oliveira LHD, Cruz JN, Dos Santos CBR, de Melo EB. Multivariate QSAR, similarity search and ADMET studies based in a set of methylamine derivatives described as dopamine transporter inhibitors. Mol Divers 2023:10.1007/s11030-023-10724-5. [PMID: 37670118 DOI: 10.1007/s11030-023-10724-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 08/27/2023] [Indexed: 09/07/2023]
Abstract
The dopamine transporter (DAT), responsible for the regulation of dopaminergic neurotransmission, is implicated in the etiology of several neuropsychiatric disorders which, in turn, have contributed to high rates of disability and numerous deaths in recent years, significantly impacting the global health system. Although the research for new drugs for the treatment of neuropsychiatric disorders has evolved in recent years, the availability of DAT-selective drugs that do not generate the same psychostimulant effects observed in drugs of abuse remains scarce. Therefore, we performed a QSAR study based on a dataset of 36 methylamine derivatives described as DAT inhibitors. The model was obtained based only in descriptors derived from 2D structures, and it was validated and generated satisfactory results considering the metrics used for internal and external validation. Subsequently, a virtual screening step also based on 2D similarity was performed, where it was possible to identify a total of 1157 compounds. After a series of reductions of the set using toxicity filters, applicability domain evaluation, and pharmacokinetic properties in silico assessment, seven hit compounds were selected as the most promising to be used, in future studies, as new scaffolds for the development of new DAT inhibitors.
Collapse
Affiliation(s)
- Luiz Henrique Dias de Oliveira
- Theorical Medicinal and Environmental Chemistry Laboratory (LQMAT), Department of Pharmacy, Western Paraná State University (UNIOESTE), 2069 Universitária St., Cascavel, PR, 85819-110, Brazil
| | - Jorddy Neves Cruz
- Laboratory of Modeling and Computational Chemistry, Department of Biological and Health Sciences, Federal University of Amapá, Macapá, AP, 68902-280, Brazil
| | - Cleydson Breno Rodrigues Dos Santos
- Laboratory of Modeling and Computational Chemistry, Department of Biological and Health Sciences, Federal University of Amapá, Macapá, AP, 68902-280, Brazil
| | - Eduardo Borges de Melo
- Theorical Medicinal and Environmental Chemistry Laboratory (LQMAT), Department of Pharmacy, Western Paraná State University (UNIOESTE), 2069 Universitária St., Cascavel, PR, 85819-110, Brazil.
| |
Collapse
|
2
|
Borde C, Escargueil AE, Maréchal V. Shikonin, an inhibitor of inflammasomes, inhibits Epstein-Barr virus reactivation. Antiviral Res 2023; 217:105699. [PMID: 37549849 DOI: 10.1016/j.antiviral.2023.105699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 08/03/2023] [Accepted: 08/04/2023] [Indexed: 08/09/2023]
Abstract
Epstein-Barr virus (EBV) is a highly prevalent human herpesvirus that persists for life in more than 95% of the adult population. EBV usually establishes an asymptomatic life-long infection, but it is also associated with malignancies affecting B lymphocytes and epithelial cells mainly. The virus alternates between a latent phase and a lytic phase, both of which contribute to the initiation of the tumor process. So far, there is only a limited number of antiviral molecules against the lytic phase, most of them targeting viral replication. Recent studies provided evidence that EBV uses components of the NLRP3 inflammasome to enter the productive phase of its cycle following activation in response to various stimuli. In the present work, we demonstrate that shikonin, a natural molecule with low toxicity which is known to inhibit inflammasome, can efficiently repress EBV reactivation. Similar results were obtained with apigenin and OLT 1177, two other NLRP3 inflammasome inhibitors. It is shown herein that shikonin repressed the transcription of reactivation-induced NLRP3 thereby inhibiting inflammasome activation and EBV lytic phase induction.
Collapse
Affiliation(s)
- Chloé Borde
- Sorbonne Université, INSERM, Centre de Recherche Saint-Antoine, F-75012, Paris, France.
| | | | - Vincent Maréchal
- Sorbonne Université, INSERM, Centre de Recherche Saint-Antoine, F-75012, Paris, France.
| |
Collapse
|
3
|
Zhang Y, Menke J, He J, Nittinger E, Tyrchan C, Koch O, Zhao H. Similarity-based pairing improves efficiency of siamese neural networks for regression tasks and uncertainty quantification. J Cheminform 2023; 15:75. [PMID: 37649050 PMCID: PMC10469421 DOI: 10.1186/s13321-023-00744-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 08/10/2023] [Indexed: 09/01/2023] Open
Abstract
Siamese networks, representing a novel class of neural networks, consist of two identical subnetworks sharing weights but receiving different inputs. Here we present a similarity-based pairing method for generating compound pairs to train Siamese neural networks for regression tasks. In comparison with the conventional exhaustive pairing, it reduces the algorithm complexity from O(n2) to O(n). It also results in a better prediction performance consistently on the three physicochemical datasets, using a multilayer perceptron with the circular fingerprint as a proof of concept. We further include into a Siamese neural network the transformer-based Chemformer, which extracts task-specific features from the simplified molecular-input line-entry system representation of compounds. Additionally, we propose a means to measure the prediction uncertainty by utilizing the variance in predictions from a set of reference compounds. Our results demonstrate that the high prediction accuracy correlates with the high confidence. Finally, we investigate implications of the similarity property principle in machine learning.
Collapse
Affiliation(s)
- Yumeng Zhang
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Janosch Menke
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden.
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, 48149, Münster, Germany.
| | - Jiazhen He
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183, Gothenburg, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden
| | - Oliver Koch
- Institute of Pharmaceutical and Medicinal Chemistry, Westfälische Wilhelms-Universität Münster, 48149, Münster, Germany
| | - Hongtao Zhao
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183, Gothenburg, Sweden.
| |
Collapse
|
4
|
Liu W, Wang Z, Chen J, Tang W, Wang H. Machine Learning Model for Screening Thyroid Stimulating Hormone Receptor Agonists Based on Updated Datasets and Improved Applicability Domain Metrics. Chem Res Toxicol 2023. [PMID: 37209109 DOI: 10.1021/acs.chemrestox.3c00074] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Machine learning (ML) models for screening endocrine-disrupting chemicals (EDCs), such as thyroid stimulating hormone receptor (TSHR) agonists, are essential for sound management of chemicals. Previous models for screening TSHR agonists were built on imbalanced datasets and lacked applicability domain (AD) characterization essential for regulatory application. Herein, an updated TSHR agonist dataset was built, for which the ratio of active to inactive compounds greatly increased to 1:2.6, and chemical spaces of structure-activity landscapes (SALs) were enhanced. Resulting models based on 7 molecular representations and 4 ML algorithms were proven to outperform previous ones. Weighted similarity density (ρs) and weighted inconsistency of activities (IA) were proposed to characterize the SALs, and a state-of-the-art AD characterization methodology ADSAL{ρs, IA} was established. An optimal classifier developed with PubChem fingerprints and the random forest algorithm, coupled with ADSAL{ρs ≥ 0.15, IA ≤ 0.65}, exhibited good performance on the validation set with the area under the receiver operating characteristic curve being 0.984 and balanced accuracy being 0.941 and identified 90 TSHR agonist classes that could not be found previously. The classifier together with the ADSAL{ρs, IA} may serve as efficient tools for screening EDCs, and the AD characterization methodology may be applied to other ML models.
Collapse
Affiliation(s)
- Wenjia Liu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhongyu Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Weihao Tang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Haobo Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
5
|
Wang D, Wu Z, Shen C, Bao L, Luo H, Wang Z, Yao H, Kong DX, Luo C, Hou T. Learning with uncertainty to accelerate the discovery of histone lysine-specific demethylase 1A (KDM1A/LSD1) inhibitors. Brief Bioinform 2023; 24:6961473. [PMID: 36573494 DOI: 10.1093/bib/bbac592] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/01/2022] [Accepted: 12/02/2022] [Indexed: 12/28/2022] Open
Abstract
Machine learning including modern deep learning models has been extensively used in drug design and screening. However, reliable prediction of molecular properties is still challenging when exploring out-of-domain regimes, even for deep neural networks. Therefore, it is important to understand the uncertainty of model predictions, especially when the predictions are used to guide further experiments. In this study, we explored the utility and effectiveness of evidential uncertainty in compound screening. The evidential Graphormer model was proposed for uncertainty-guided discovery of KDM1A/LSD1 inhibitors. The benchmarking results illustrated that (i) Graphormer exhibited comparative predictive power to state-of-the-art models, and (ii) evidential regression enabled well-ranked uncertainty estimates and calibrated predictions. Subsequently, we leveraged time-splitting on the curated KDM1A/LSD1 dataset to simulate out-of-distribution predictions. The retrospective virtual screening showed that the evidential uncertainties helped reduce false positives among the top-acquired compounds and thus enabled higher experimental validation rates. The trained model was then used to virtually screen an independent in-house compound set. The top 50 compounds ranked by two different ranking strategies were experimentally validated, respectively. In general, our study highlighted the importance to understand the uncertainty in prediction, which can be recognized as an interpretable dimension to model predictions.
Collapse
Affiliation(s)
- Dong Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Lingjie Bao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| | - Hao Luo
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| | - Hucheng Yao
- State Key Laboratory of Agricultural Microbiology, Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - De-Xin Kong
- State Key Laboratory of Agricultural Microbiology, Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Cheng Luo
- The Center for Chemical Biology, Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203 China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, China
| |
Collapse
|
6
|
Yu J, Wang D, Zheng M. Uncertainty quantification: Can we trust artificial intelligence in drug discovery? iScience 2022; 25:104814. [PMID: 35996575 PMCID: PMC9391523 DOI: 10.1016/j.isci.2022.104814] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The problem of human trust is one of the most fundamental problems in applied artificial intelligence in drug discovery. In silico models have been widely used to accelerate the process of drug discovery in recent years. However, most of these models can only give reliable predictions within a limited chemical space that the training set covers (applicability domain). Predictions of samples falling outside the applicability domain are unreliable and sometimes dangerous for the drug-design decision-making process. Uncertainty quantification accordingly has drawn great attention to enable autonomous drug designing. By quantifying the confidence level of model predictions, the reliability of the predictions can be quantitatively represented to assist researchers in their molecular reasoning and experimental design. Here we summarize the state-of-the-art approaches to uncertainty quantification and underline how they can be used for drug design and discovery projects. Furthermore, we also outline four representative application scenarios of uncertainty quantification in drug discovery.
Collapse
|
7
|
Zhang K, Zhang H. Predicting Solute Descriptors for Organic Chemicals by a Deep Neural Network (DNN) Using Basic Chemical Structures and a Surrogate Metric. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:2054-2064. [PMID: 34995441 DOI: 10.1021/acs.est.1c05398] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Solute descriptors have been widely used to model chemical transfer processes through poly-parameter linear free energy relationships (pp-LFERs); however, there are still substantial difficulties in obtaining these descriptors accurately and quickly for new organic chemicals. In this research, models (PaDEL-DNN) that require only SMILES of chemicals were built to satisfactorily estimate pp-LFER descriptors using deep neural networks (DNN) and the PaDEL chemical representation. The PaDEL-DNN-estimated pp-LFER descriptors demonstrated good performance in modeling storage-lipid/water partitioning coefficient (log Kstorage-lipid/water), bioconcentration factor (BCF), aqueous solubility (ESOL), and hydration free energy (freesolve). Then, assuming that the accuracy in the estimated values of widely available properties, e.g., logP (octanol-water partition coefficient), can calibrate estimates for less available but related properties, we proposed logP as a surrogate metric for evaluating the overall accuracy of the estimated pp-LFER descriptors. When using the pp-LFER descriptors to model log Kstorage-lipid/water, BCF, ESOL, and freesolve, we achieved around 0.1 log unit lower errors for chemicals whose estimated pp-LFER descriptors were deemed "accurate" by the surrogate metric. The interpretation of the PaDEL-DNN models revealed that, for a given test chemical, having several (around 5) "similar" chemicals in the training data set was crucial for accurate estimation while the remaining less similar training chemicals provided reasonable baseline estimates. Lastly, pp-LFER descriptors for over 2800 persistent, bioaccumulative, and toxic chemicals were reasonably estimated by combining PaDEL-DNN with the surrogate metric. Overall, the PaDEL-DNN/surrogate metric and newly estimated descriptors will greatly benefit chemical transfer modeling.
Collapse
Affiliation(s)
- Kai Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| | - Huichun Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| |
Collapse
|
8
|
Hesping E, Chua MJ, Pflieger M, Qian Y, Dong L, Bachu P, Liu L, Kurz T, Fisher GM, Skinner-Adams TS, Reid RC, Fairlie DP, Andrews KT, Gorse ADJ. QSAR Classification Models for Prediction of Hydroxamate Histone Deacetylase Inhibitor Activity against Malaria Parasites. ACS Infect Dis 2022; 8:106-117. [PMID: 34985259 DOI: 10.1021/acsinfecdis.1c00355] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Malaria, caused by Plasmodium parasites, results in >400,000 deaths annually. There is no effective vaccine, and new drugs with novel modes of action are needed because of increasing parasite resistance to current antimalarials. Histone deacetylases (HDACs) are epigenetic regulatory enzymes that catalyze post-translational protein deacetylation and are promising malaria drug targets. Here, we describe quantitative structure-activity relationship models to predict the antiplasmodial activity of hydroxamate-based HDAC inhibitors. The models incorporate P. falciparum in vitro activity data for 385 compounds containing a hydroxamic acid and were subject to internal and external validation. When used to screen 22 new hydroxamate-based HDAC inhibitors for antiplasmodial activity, model A7 (external accuracy 91%) identified three hits that were subsequently verified as having potent in vitro activity against P. falciparum parasites (IC50 = 6, 71, and 84 nM), with 8 to 51-fold selectivity for P. falciparum versus human cells.
Collapse
Affiliation(s)
- Eva Hesping
- Griffith Institute for Drug Discovery, Griffith University, Nathan 4111, Australia
| | - Ming Jang Chua
- Griffith Institute for Drug Discovery, Griffith University, Nathan 4111, Australia
| | - Marc Pflieger
- Institut für pharmazeutische und medizinische Chemie, Heinrich-Heine Universität, Dusseldorf 40225, Germany
| | - Yunan Qian
- Griffith Institute for Drug Discovery, Griffith University, Nathan 4111, Australia
| | - Lilong Dong
- Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
| | - Prabhakar Bachu
- Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
| | - Ligong Liu
- Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
| | - Thomas Kurz
- Institut für pharmazeutische und medizinische Chemie, Heinrich-Heine Universität, Dusseldorf 40225, Germany
| | - Gillian M. Fisher
- Griffith Institute for Drug Discovery, Griffith University, Nathan 4111, Australia
| | | | - Robert C. Reid
- Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
| | - David P. Fairlie
- Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
| | - Katherine T. Andrews
- Griffith Institute for Drug Discovery, Griffith University, Nathan 4111, Australia
| | - Alain-Dominique J.P. Gorse
- QCIF Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Saint Lucia 4072, Australia
| |
Collapse
|
9
|
Chen D, Huang X, Fan Y. Thermodynamics-Based Model Construction for the Accurate Prediction of Molecular Properties From Partition Coefficients. Front Chem 2021; 9:737579. [PMID: 34589468 PMCID: PMC8473701 DOI: 10.3389/fchem.2021.737579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 08/20/2021] [Indexed: 11/17/2022] Open
Abstract
Developing models for predicting molecular properties of organic compounds is imperative for drug development and environmental safety; however, development of such models that have high predictive power and are independent of the compounds used is challenging. To overcome the challenges, we used a thermodynamics-based theoretical derivation to construct models for accurately predicting molecular properties. The free energy change that determines a property equals the sum of the free energy changes (ΔGFs) caused by the factors affecting the property. By developing or selecting molecular descriptors that are directly proportional to ΔGFs, we built a general linear free energy relationship (LFER) for predicting the property with the molecular descriptors as predictive variables. The LFER can be used to construct models for predicting various specific properties from partition coefficients. Validations show that the models constructed according to the LFER have high predictive power and their performance is independent of the compounds used, including the models for the properties having little correlation with partition coefficients. The findings in this study are highly useful for applications in drug development and environmental safety.
Collapse
Affiliation(s)
- Deliang Chen
- Jiangxi Key Laboratory of Organo-Pharmaceutical Chemistry, Chemistry and Chemical Engineering College, Gannan Normal University, Ganzhou, China
| | - Xiaoqing Huang
- Jiangxi Key Laboratory of Organo-Pharmaceutical Chemistry, Chemistry and Chemical Engineering College, Gannan Normal University, Ganzhou, China
| | - Yulan Fan
- Jiangxi Key Laboratory of Organo-Pharmaceutical Chemistry, Chemistry and Chemical Engineering College, Gannan Normal University, Ganzhou, China
| |
Collapse
|
10
|
Wang D, Yu J, Chen L, Li X, Jiang H, Chen K, Zheng M, Luo X. A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling. J Cheminform 2021; 13:69. [PMID: 34544485 PMCID: PMC8454160 DOI: 10.1186/s13321-021-00551-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/05/2021] [Indexed: 11/24/2022] Open
Abstract
Reliable uncertainty quantification for statistical models is crucial in various downstream applications, especially for drug design and discovery where mistakes may incur a large amount of cost. This topic has therefore absorbed much attention and a plethora of methods have been proposed over the past years. The approaches that have been reported so far can be mainly categorized into two classes: distance-based approaches and Bayesian approaches. Although these methods have been widely used in many scenarios and shown promising performance with their distinct superiorities, being overconfident on out-of-distribution examples still poses challenges for the deployment of these techniques in real-world applications. In this study we investigated a number of consensus strategies in order to combine both distance-based and Bayesian approaches together with post-hoc calibration for improved uncertainty quantification in QSAR (Quantitative Structure-Activity Relationship) regression modeling. We employed a set of criteria to quantitatively assess the ranking and calibration ability of these models. Experiments based on 24 bioactivity datasets were designed to make critical comparison between the model we proposed and other well-studied baseline models. Our findings indicate that the hybrid framework proposed by us can robustly enhance the model ability of ranking absolute errors. Together with post-hoc calibration on the validation set, we show that well-calibrated uncertainty quantification results can be obtained in domain shift settings. The complementarity between different methods is also conceptually analyzed.
Collapse
Affiliation(s)
- Dingyan Wang
- Shanghai Key Laboratory of Forensic Medicine, Academy of Forensic Science, Shanghai, 200063, China
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Jie Yu
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Lifan Chen
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xutong Li
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Hualiang Jiang
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Kaixian Chen
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Mingyue Zheng
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
| | - Xiaomin Luo
- Shanghai Key Laboratory of Forensic Medicine, Academy of Forensic Science, Shanghai, 200063, China.
- University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing, 100049, China.
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.
| |
Collapse
|
11
|
Duan C, Liu F, Nandy A, Kulik HJ. Putting Density Functional Theory to the Test in Machine-Learning-Accelerated Materials Discovery. J Phys Chem Lett 2021; 12:4628-4637. [PMID: 33973793 DOI: 10.1021/acs.jpclett.1c00631] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Accelerated discovery with machine learning (ML) has begun to provide the advances in efficiency needed to overcome the combinatorial challenge of computational materials design. Nevertheless, ML-accelerated discovery both inherits the biases of training data derived from density functional theory (DFT) and leads to many attempted calculations that are doomed to fail. Many compelling functional materials and catalytic processes involve strained chemical bonds, open-shell radicals and diradicals, or metal-organic bonds to open-shell transition-metal centers. Although promising targets, these materials present unique challenges for electronic structure methods and combinatorial challenges for their discovery. In this Perspective, we describe the advances needed in accuracy, efficiency, and approach beyond what is typical in conventional DFT-based ML workflows. These challenges have begun to be addressed through ML models trained to predict the results of multiple methods or the differences between them, enabling quantitative sensitivity analysis. For DFT to be trusted for a given data point in a high-throughput screen, it must pass a series of tests. ML models that predict the likelihood of calculation success and detect the presence of strong correlation will enable rapid diagnoses and adaptation strategies. These "decision engines" represent the first steps toward autonomous workflows that avoid the need for expert determination of the robustness of DFT-based materials discoveries.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
12
|
KC GB, Bocci G, Verma S, Hassan MM, Holmes J, Yang JJ, Sirimulla S, Oprea TI. A machine learning platform to estimate anti-SARS-CoV-2 activities. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00335-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
13
|
Vishwakarma G, Sonpal A, Hachmann J. Metrics for Benchmarking and Uncertainty Quantification: Quality, Applicability, and Best Practices for Machine Learning in Chemistry. TRENDS IN CHEMISTRY 2021. [DOI: 10.1016/j.trechm.2020.12.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
14
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil II: Ausblick. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909989] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
15
|
Jiménez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-00236-4] [Citation(s) in RCA: 152] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
16
|
Shamsara J. A Random Forest Model to Predict the Activity of a Large Set of Soluble Epoxide Hydrolase Inhibitors Solely Based on a Set of Simple Fragmental Descriptors. Comb Chem High Throughput Screen 2020; 22:555-569. [PMID: 31622216 DOI: 10.2174/1386207322666191016110232] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 08/02/2019] [Accepted: 09/19/2019] [Indexed: 01/10/2023]
Abstract
BACKGROUND The Soluble Epoxide Hydrolase (sEH) is a ubiquitously expressed enzyme in various tissues. The inhibition of the sEH has shown promising results to treat hypertension, alleviate pain and inflammation. OBJECTIVE In this study, the power of machine learning has been employed to develop a predictive QSAR model for a large set of sEH inhibitors. METHODS In this study, the random forest method was employed to make a valid model for the prediction of sEH inhibition. Besides, two new methods (Treeinterpreter python package and LIME, Local Interpretable Model-agnostic Explanations) have been exploited to explain and interpret the model. RESULTS The performance metrics of the model were as follows: R2=0.831, Q2=0.565, RMSE=0.552 and R2 pred=0.595. The model also demonstrated good predictability on the two extra external test sets at least in terms of ranking. The Spearman's rank correlation coefficients for external test set 1 and 2 were 0.872 and 0.673, respectively. The external test set 2 was a diverse one compared to the training set. Therefore, the model could be used for virtual screening to enrich potential sEH inhibitors among a diverse compound library. CONCLUSION As the model was solely developed based on a set of simple fragmental descriptors, the model was explained by two local interpretation algorithms, and this could guide medicinal chemists to design new sEH inhibitors. Moreover, the most important general descriptors (fragments) suggested by the model were consistent with the available crystallographic data. The model is available as an executable binary at http://www.pharm-sbg.com and https://github.com/shamsaraj.
Collapse
Affiliation(s)
- Jamal Shamsara
- Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
17
|
Hirschfeld L, Swanson K, Yang K, Barzilay R, Coley CW. Uncertainty Quantification Using Neural Networks for Molecular Property Prediction. J Chem Inf Model 2020; 60:3770-3780. [PMID: 32702986 DOI: 10.1021/acs.jcim.0c00502] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Uncertainty quantification (UQ) is an important component of molecular property prediction, particularly for drug discovery applications where model predictions direct experimental design and where unanticipated imprecision wastes valuable time and resources. The need for UQ is especially acute for neural models, which are becoming increasingly standard yet are challenging to interpret. While several approaches to UQ have been proposed in the literature, there is no clear consensus on the comparative performance of these models. In this paper, we study this question in the context of regression tasks. We systematically evaluate several methods on five regression data sets using multiple complementary performance metrics. Our experiments show that none of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple data sets. While we believe that these results show that existing UQ methods are not sufficient for all common use cases and further research is needed, we conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.
Collapse
Affiliation(s)
- Lior Hirschfeld
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States
| | - Kyle Swanson
- Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge CB3 0WB, U.K
| | - Kevin Yang
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, California 94720, United States
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States
| | - Connor W Coley
- Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
18
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part II: Outlook. Angew Chem Int Ed Engl 2020; 59:23414-23436. [PMID: 31553509 DOI: 10.1002/anie.201909989] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/19/2023]
Abstract
This two-part Review examines how automation has contributed to different aspects of discovery in the chemical sciences. In this second part, we reflect on a selection of exemplary studies. It is increasingly important to articulate what the role of automation and computation has been in the scientific process and how that has or has not accelerated discovery. One can argue that even the best automated systems have yet to "discover" despite being incredibly useful as laboratory assistants. We must carefully consider how they have been and can be applied to future problems of chemical discovery in order to effectively design and interact with future autonomous platforms. The majority of this Review defines a large set of open research directions, including improving our ability to work with complex data, build empirical models, automate both physical and computational experiments for validation, select experiments, and evaluate whether we are making progress towards the ultimate goal of autonomous discovery. Addressing these practical and methodological challenges will greatly advance the extent to which autonomous systems can make meaningful discoveries.
Collapse
Affiliation(s)
- Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Natalie S Eyke
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| |
Collapse
|
19
|
Griffen EJ, Dossetter AG, Leach AG. Chemists: AI Is Here; Unite To Get the Benefits. J Med Chem 2020; 63:8695-8704. [DOI: 10.1021/acs.jmedchem.0c00163] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Edward J. Griffen
- MedChemica Ltd., Alderley Park, Macclesfield, Cheshire SK10 4TG, U.K
| | | | - Andrew G. Leach
- MedChemica Ltd., Alderley Park, Macclesfield, Cheshire SK10 4TG, U.K
| |
Collapse
|
20
|
Janet JP, Duan C, Yang T, Nandy A, Kulik HJ. A quantitative uncertainty metric controls error in neural network-driven chemical discovery. Chem Sci 2019; 10:7913-7922. [PMID: 31588334 PMCID: PMC6764470 DOI: 10.1039/c9sc02298h] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Accepted: 07/11/2019] [Indexed: 12/14/2022] Open
Abstract
Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.
Collapse
Affiliation(s)
- Jon Paul Janet
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
| | - Chenru Duan
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
- Department of Chemistry , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA
| | - Tzuhsiung Yang
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
| | - Aditya Nandy
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
- Department of Chemistry , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA
| | - Heather J Kulik
- Department of Chemical Engineering , Massachusetts Institute of Technology , Cambridge , MA 02139 , USA . ; Tel: +1-617-253-4584
| |
Collapse
|
21
|
Cortés-Ciriano I, Bender A. Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout. J Chem Inf Model 2019; 59:3330-3339. [PMID: 31241929 DOI: 10.1021/acs.jcim.9b00297] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
While the use of deep learning in drug discovery is gaining increasing attention, the lack of methods to compute reliable errors in prediction for Neural Networks prevents their application to guide decision making in domains where identifying unreliable predictions is essential, e.g., precision medicine. Here, we present a framework to compute reliable errors in prediction for Neural Networks using Test-Time Dropout and Conformal Prediction. Specifically, the algorithm consists of training a single Neural Network using dropout, and then applying it N times to both the validation and test sets, also employing dropout in this step. Therefore, for each instance in the validation and test sets an ensemble of predictions are generated. The residuals and absolute errors in prediction for the validation set are then used to compute prediction errors for the test set instances using Conformal Prediction. We show using 24 bioactivity data sets from ChEMBL 23 that Dropout Conformal Predictors are valid (i.e., the fraction of instances whose true value lies within the predicted interval strongly correlates with the confidence level) and efficient, as the predicted confidence intervals span a narrower set of values than those computed with Conformal Predictors generated using Random Forest (RF) models. Lastly, we show in retrospective virtual screening experiments that dropout and RF-based Conformal Predictors lead to comparable retrieval rates of active compounds. Overall, we propose a computationally efficient framework (as only N extra forward passes are required in addition to training a single network) to harness Test-Time Dropout and the Conformal Prediction framework, which is generally applicable to generate reliable prediction errors for Deep Neural Networks in drug discovery and beyond.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , United Kingdom
| |
Collapse
|
22
|
Duan C, Janet JP, Liu F, Nandy A, Kulik HJ. Learning from Failure: Predicting Electronic Structure Calculation Outcomes with Machine Learning Models. J Chem Theory Comput 2019; 15:2331-2345. [DOI: 10.1021/acs.jctc.9b00057] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|