1
|
Liu Q, He D, Fan M, Wang J, Cui Z, Wang H, Mi Y, Li N, Meng Q, Hou Y. Prediction and Interpretation Microglia Cytotoxicity by Machine Learning. J Chem Inf Model 2024; 64:9306-9326. [PMID: 38949724 DOI: 10.1021/acs.jcim.4c00366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F1-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F1-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.
Collapse
Affiliation(s)
- Qing Liu
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Dakuo He
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Mengmeng Fan
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Jinpeng Wang
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Zeyu Cui
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Hao Wang
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Yan Mi
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| | - Ning Li
- School of Traditional Chinese Materia Medica, Key Laboratory for TCM Material Basis Study and Innovative Drug Development of Shenyang City, Shenyang Pharmaceutical University, Shenyang 110016, P. R. China
| | - Qingqi Meng
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| | - Yue Hou
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| |
Collapse
|
2
|
Ambreen S, Umar M, Noor A, Jain H, Ali R. Advanced AI and ML frameworks for transforming drug discovery and optimization: With innovative insights in polypharmacology, drug repurposing, combination therapy and nanomedicine. Eur J Med Chem 2024; 284:117164. [PMID: 39721292 DOI: 10.1016/j.ejmech.2024.117164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 12/28/2024]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are transforming drug discovery by overcoming traditional challenges like high costs, time-consuming, and frequent failures. AI-driven approaches streamline key phases, including target identification, lead optimization, de novo drug design, and drug repurposing. Frameworks such as deep neural networks (DNNs), convolutional neural networks (CNNs), and deep reinforcement learning (DRL) models have shown promise in identifying drug targets, optimizing delivery systems, and accelerating drug repurposing. Generative adversarial networks (GANs) and variational autoencoders (VAEs) aid de novo drug design by creating novel drug-like compounds with desired properties. Case studies, such as DDR1 kinase inhibitors designed using generative models and CDK20 inhibitors developed via structure-based methods, highlight AI's ability to produce highly specific therapeutics. Models like SNF-CVAE and DeepDR further advance drug repurposing by uncovering new therapeutic applications for existing drugs. Advanced ML algorithms enhance precision in predicting drug efficacy, toxicity, and ADME-Tox properties, reducing development costs and improving drug-target interactions. AI also supports polypharmacology by optimizing multi-target drug interactions and enhances combination therapy through predictions of drug synergies and antagonisms. In nanomedicine, AI models like CURATE.AI and the Hartung algorithm optimize personalized treatments by predicting toxicological risks and real-time dosing adjustments with high accuracy. Despite its potential, challenges like data quality, model interpretability, and ethical concerns must be addressed. High-quality datasets, transparent models, and unbiased algorithms are essential for reliable AI applications. As AI continues to evolve, it is poised to revolutionize drug discovery and personalized medicine, advancing therapeutic development and patient care.
Collapse
Affiliation(s)
- Subiya Ambreen
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Mohammad Umar
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Aaisha Noor
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Himangini Jain
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Ruhi Ali
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India.
| |
Collapse
|
3
|
Wang N, Li X, Xiao J, Liu S, Cao D. Data-driven toxicity prediction in drug discovery: Current status and future directions. Drug Discov Today 2024; 29:104195. [PMID: 39357621 DOI: 10.1016/j.drudis.2024.104195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 09/13/2024] [Accepted: 09/26/2024] [Indexed: 10/04/2024]
Abstract
Early toxicity assessment plays a vital role in the drug discovery process on account of its significant influence on the attrition rate of candidates. Recently, constant upgrading of information technology has greatly promoted the continuous development of toxicity prediction. To give an overview of the current state of data-driven toxicity prediction, we reviewed relevant studies and summarized them in three main respects: the features and difficulties of toxicity prediction, the evolution of modeling approaches, and the available tools for toxicity prediction. For each part, we expound the research status, existing challenges, and feasible solutions. Finally, several new directions and suggestions for toxicity prediction are also put forward.
Collapse
Affiliation(s)
- Ningning Wang
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China
| | - Xinliang Li
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China
| | - Jing Xiao
- Hunan Institute for Drug Control, Changsha 410001 Hunan, PR China
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; The Hunan Institute of Pharmacy Practice and Clinical Research, Changsha 410008 Hunan, PR China.
| | - Dongsheng Cao
- Department of Pharmacy, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008 Hunan, PR China; Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, PR China.
| |
Collapse
|
4
|
Yu X, Chen Y, Chen L, Li W, Wang Y, Tang Y, Liu G. GCLmf: A Novel Molecular Graph Contrastive Learning Framework Based on Hard Negatives and Application in Toxicity Prediction. Mol Inform 2024:e202400169. [PMID: 39421969 DOI: 10.1002/minf.202400169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 09/23/2024] [Accepted: 09/24/2024] [Indexed: 10/19/2024]
Abstract
In silico methods for prediction of chemical toxicity can decrease the cost and increase the efficiency in the early stage of drug discovery. However, due to low accessibility of sufficient and reliable toxicity data, constructing robust and accurate prediction models is challenging. Contrastive learning, a type of self-supervised learning, leverages large unlabeled data to obtain more expressive molecular representations, which can boost the prediction performance on downstream tasks. While molecular graph contrastive learning has gathered growing attentions, current models neglect the quality of negative data set. Here, we proposed a self-supervised pretraining deep learning framework named GCLmf. We first utilized molecular fragments that meet specific conditions as hard negative samples to boost the quality of the negative set and thus increase the difficulty of the proxy tasks during pre-training to learn informative representations. GCLmf has shown excellent predictive power on various molecular property benchmarks and demonstrates high performance in 33 toxicity tasks in comparison with multiple baselines. In addition, we further investigated the necessity of introducing hard negatives in model building and the impact of the proportion of hard negatives on the model.
Collapse
Affiliation(s)
- Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Yuanting Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Long Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Yuhao Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai, 200237, China
| |
Collapse
|
5
|
Zhao Y, Zhang Z, Kong X, Wang K, Wang Y, Jia J, Li H, Tian S. Prediction of Drug-Induced Liver Injury: From Molecular Physicochemical Properties and Scaffold Architectures to Machine Learning Approaches. Chem Biol Drug Des 2024; 104:e14607. [PMID: 39179521 DOI: 10.1111/cbdd.14607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 07/24/2024] [Accepted: 08/01/2024] [Indexed: 08/26/2024]
Abstract
The process of developing new drugs is widely acknowledged as being time-intensive and requiring substantial financial investment. Despite ongoing efforts to reduce time and expenses in drug development, ensuring medication safety remains an urgent problem. One of the major problems involved in drug development is hepatotoxicity, specifically known as drug-induced liver injury (DILI). The popularity of new drugs often poses a significant barrier during development and frequently leads to their recall after launch. In silico methods have many advantages compared with traditional in vivo and in vitro assays. To establish a more precise and reliable prediction model, it is necessary to utilize an extensive and high-quality database consisting of information on drug molecule properties and structural patterns. In addition, we should also carefully select appropriate molecular descriptors that can be used to accurately depict compound characteristics. The aim of this study was to conduct a comprehensive investigation into the prediction of DILI. First, we conducted a comparative analysis of the physicochemical properties of extensively well-prepared DILI-positive and DILI-negative compounds. Then, we used classic substructure dissection methods to identify structural pattern differences between these two different types of chemical molecules. These findings indicate that it is not feasible to establish property or substructure-based rules for distinguishing between DILI-positive and DILI-negative compounds. Finally, we developed quantitative classification models for predicting DILI using the naïve Bayes classifier (NBC) and recursive partitioning (RP) machine learning techniques. The optimal DILI prediction model was obtained using NBC, which combines 21 physicochemical properties, the VolSurf descriptors and the LCFP_10 fingerprint set. This model achieved a global accuracy (GA) of 0.855 and an area under the curve (AUC) of 0.704 for the training set, while the corresponding values were 0.619 and 0.674 for the test set, respectively. Moreover, indicative substructural fragments favorable or unfavorable for DILI were identified from the best naïve Bayesian classification model. These findings may help prioritize lead compounds in the early stage of drug development pipelines.
Collapse
Affiliation(s)
- Yulong Zhao
- College of Pharmaceutical Sciences, Soochow University, Suzhou, China
| | - Zhoudong Zhang
- College of Pharmaceutical Sciences, Soochow University, Suzhou, China
| | - Xiaotian Kong
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou, China
| | - Kai Wang
- College of Pharmaceutical Sciences, Soochow University, Suzhou, China
| | - Yaxuan Wang
- College of Pharmaceutical Sciences, Soochow University, Suzhou, China
| | - Jie Jia
- College of Pharmaceutical Sciences, Soochow University, Suzhou, China
| | - Huanqiu Li
- College of Pharmaceutical Sciences, Soochow University, Suzhou, China
| | - Sheng Tian
- College of Pharmaceutical Sciences, Soochow University, Suzhou, China
- College of Chemistry and Life Science, Beijing University of Technology, Beijing, China
| |
Collapse
|
6
|
Vinh T, Nguyen L, Trinh QH, Nguyen-Vo TH, Nguyen BP. Predicting Cardiotoxicity of Molecules Using Attention-Based Graph Neural Networks. J Chem Inf Model 2024; 64:1816-1827. [PMID: 38438914 DOI: 10.1021/acs.jcim.3c01286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
In drug discovery, the search for new and effective medications is often hindered by concerns about toxicity. Numerous promising molecules fail to pass the later phases of drug development due to strict toxicity assessments. This challenge significantly increases the cost, time, and human effort needed to discover new therapeutic molecules. Additionally, a considerable number of drugs already on the market have been withdrawn or re-evaluated because of their unwanted side effects. Among the various types of toxicity, drug-induced heart damage is a severe adverse effect commonly associated with several medications, especially those used in cancer treatments. Although a number of computational approaches have been proposed to identify the cardiotoxicity of molecules, the performance and interpretability of the existing approaches are limited. In our study, we proposed a more effective computational framework to predict the cardiotoxicity of molecules using an attention-based graph neural network. Experimental results indicated that the proposed framework outperformed the other methods. The stability of the model was also confirmed by our experiments. To assist researchers in evaluating the cardiotoxicity of molecules, we have developed an easy-to-use online web server that incorporates our model.
Collapse
Affiliation(s)
- Tuan Vinh
- Department of Chemistry, Emory University, 201 Dowman Drive, Atlanta, Georgia 30322-1007, United States
| | - Loc Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6012, New Zealand
| | - Quang H Trinh
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi 100000, Vietnam
| | - Thanh-Hoang Nguyen-Vo
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6012, New Zealand
- School of Innovation, Design and Technology, Wellington Institute of Technology, 21 Kensington Avenue, Lower Hutt 5012, New Zealand
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6012, New Zealand
| |
Collapse
|
7
|
Czeleń P, Jeliński T, Skotnicka A, Szefler B, Szupryczyński K. ADMET and Solubility Analysis of New 5-Nitroisatine-Based Inhibitors of CDK2 Enzymes. Biomedicines 2023; 11:3019. [PMID: 38002019 PMCID: PMC10669656 DOI: 10.3390/biomedicines11113019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/03/2023] [Accepted: 11/06/2023] [Indexed: 11/26/2023] Open
Abstract
The development of new substances with the ability to interact with a biological target is only the first stage in the process of the creation of new drugs. The 5-nitroisatin derivatives considered in this study are new inhibitors of cyclin-dependent kinase 2 (CDK2) intended for anticancer therapy. The research, carried out based on the ADMET (absorption, distribution, metabolism, excretion, toxicity) methods, allowed a basic assessment of the physicochemical parameters of the tested drugs to be made. The collected data clearly showed the good oral absorption, membrane permeability, and bioavailability of the tested substances. The analysis of the metabolite activity and toxicity of the tested drugs did not show any critical hazards in terms of the toxicity of the tested substances. The substances' low solubility in water meant that extended studies tested compounds were required, which helped to select solvents with a high dissolving capacity of the examined substances, such as DMSO or NMP. The use of aqueous binary mixtures based on these two solvents allowed a relatively high solubility with significantly reduced toxicity and environmental index compared to pure solvents to be maintained, which is important in the context of the search for green solvents for pharmaceutical use.
Collapse
Affiliation(s)
- Przemysław Czeleń
- Department of Physical Chemistry, Faculty of Pharmacy, Collegium Medicum, Nicolaus Copernicus University, Kurpinskiego 5, 85-096 Bydgoszcz, Poland
| | - Tomasz Jeliński
- Department of Physical Chemistry, Faculty of Pharmacy, Collegium Medicum, Nicolaus Copernicus University, Kurpinskiego 5, 85-096 Bydgoszcz, Poland
| | - Agnieszka Skotnicka
- Faculty of Chemical Technology and Engineering, Bydgoszcz University of Science and Technology, Seminaryjna 3, 85-326 Bydgoszcz, Poland
| | - Beata Szefler
- Department of Physical Chemistry, Faculty of Pharmacy, Collegium Medicum, Nicolaus Copernicus University, Kurpinskiego 5, 85-096 Bydgoszcz, Poland
| | - Kamil Szupryczyński
- Doctoral School of Medical and Health Sciences, Faculty of Pharmacy, Collegium Medicum, Nicolaus Copernicus University, Jagiellońska 13, 85-067 Bydgoszcz, Poland
| |
Collapse
|
8
|
Zhang Y, Xie L, Zhang D, Xu X, Xu L. Application of Machine Learning Methods to Predict the Air Half-Lives of Persistent Organic Pollutants. Molecules 2023; 28:7457. [PMID: 38005179 PMCID: PMC10673120 DOI: 10.3390/molecules28227457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/01/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023] Open
Abstract
Persistent organic pollutants (POPs) are ubiquitous and bioaccumulative, posing potential and long-term threats to human health and the ecological environment. Quantitative structure-activity relationship (QSAR) studies play a guiding role in analyzing the toxicity and environmental fate of different organic pollutants. In the current work, five molecular descriptors are utilized to construct QSAR models for predicting the mean and maximum air half-lives of POPs, including specifically the energy of the highest occupied molecular orbital (HOMO_Energy_DMol3), a component of the dipole moment along the z-axis (Dipole_Z), fragment contribution to SAscore (SAscore_Fragments), subgraph counts (SC_3_P), and structural information content (SIC). The QSAR models were achieved through the application of three machine learning methods: partial least squares (PLS), multiple linear regression (MLR), and genetic function approximation (GFA). The determination coefficients (R2) and relative errors (RE) for the mean air half-life of each model are 0.916 and 3.489% (PLS), 0.939 and 5.048% (MLR), 0.938 and 5.131% (GFA), respectively. Similarly, the determination coefficients (R2) and RE for the maximum air half-life of each model are 0.915 and 5.629% (PLS), 0.940 and 10.090% (MLR), 0.939 and 11.172% (GFA), respectively. Furthermore, the mechanisms that elucidate the significant factors impacting the air half-lives of POPs have been explored. The three regression models show good predictive and extrapolation abilities for POPs within the application domain.
Collapse
Affiliation(s)
| | | | | | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China; (Y.Z.); (D.Z.)
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China; (Y.Z.); (D.Z.)
| |
Collapse
|
9
|
Kurbanova M, Saravanan K, Ahmad S, Sadigova A, Askerov R, Magerramov A, Bakri YE. Computational Binding Analysis of Ethyl 3,3,5,5-Tetracyano-2-Hydroxy-2-Methyl-4,6-Diphenylcyclohexane-1-Carboxylate in Calf Thymus DNA. Appl Biochem Biotechnol 2023; 195:5338-5354. [PMID: 35195835 DOI: 10.1007/s12010-022-03849-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 02/11/2022] [Indexed: 11/02/2022]
Abstract
In the present paper, several computational binding analyses were performed on ethyl 3,3,5,5-tetracyano-2-hydroxy-2-methyl-4,6-diphenylcyclohexane-1-carboxylate which was newly synthesized by three-component condensation of benzaldehyde with ethyl acetoacetate and malononitrile in the presence of trichloroacetic acid, and the structure was finally proved by X-ray analysis. The visualization of molecular interaction was carried out through Hirshfeld surface analysis and ESP. The atomic charges, HOMO, LUMO, and electrostatic potential were also studied to explore the insight of the molecule deeper, and then, natural bonding orbitals (NBO) and non-linear optical properties (NLO) were calculated to reveal the interactions that happen to be between the filled and vacant orbitals. Afterwards, molecular docking studies predicted the compound binding mode fits in the minor groove of DNA and remained interacts via stable bonding as validated by molecular dynamics simulations. The binding energy estimation also affirmed domination van der Waals and electrostatic energies. Lastly, the compound was found as good drug-like molecule and had good pharmacokinetic profile with exception of toxic moieties.
Collapse
Affiliation(s)
- Malahat Kurbanova
- Organic Chemistry Department, Baku State University, Z. Khalilov 23, Baku, AZ, 1148, Azerbaijan.
| | | | - Sajjad Ahmad
- Department of Health and Biological Sciences, Abasyn University, Peshawar, 25000, Pakistan
| | - Arzu Sadigova
- Organic Chemistry Department, Baku State University, Z. Khalilov 23, Baku, AZ, 1148, Azerbaijan
| | - Rizvan Askerov
- Organic Chemistry Department, Baku State University, Z. Khalilov 23, Baku, AZ, 1148, Azerbaijan
| | - Abel Magerramov
- Organic Chemistry Department, Baku State University, Z. Khalilov 23, Baku, AZ, 1148, Azerbaijan
| | - Youness El Bakri
- Department of Theoretical and Applied Chemistry, South Ural State University, Lenin prospect 76, Chelyabinsk, 454080, Russian Federation.
| |
Collapse
|
10
|
Takada Y, Kaneko K. Automated machine learning approach for developing a quantitative structure-activity relationship model for cardiac steroid inhibition of Na +/K +-ATPase. Pharmacol Rep 2023:10.1007/s43440-023-00508-x. [PMID: 37354314 DOI: 10.1007/s43440-023-00508-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 06/09/2023] [Accepted: 06/16/2023] [Indexed: 06/26/2023]
Abstract
BACKGROUND Quantitative structure-activity relationship (QSAR) modeling is a method of characterizing the relationship between chemical structures and biological activity. Automated machine learning enables computers to learn from large datasets and can be used for chemoinformatics. Cardiac steroids (CSs) inhibit the activity of Na+/K+-ATPase (NKA) in several species, including humans, since the binding pocket in which NKA binds to CSs is highly conserved. CSs are used to treat heart disease and have been developed into anticancer drugs for use in clinical trials. Novel CSs are, therefore, frequently synthesized and their activities evaluated. The purpose of this study is to develop a QSAR model via automated machine learning to predict the potential inhibitory activity of compounds without performing experiments. METHODS The chemical structures and inhibitory activities of 215 CS derivatives were obtained from the scientific literature. Predictive QSAR models were constructed using molecular descriptors, fingerprints, and biological activities. RESULTS The best predictive QSAR models were selected based on the LogLoss value. Using these models, the Matthews correlation coefficient, F1 score, and area under the curve of the test dataset were 0.6729, 0.8813, and 0.8812, respectively. Next, we showed automated construction of the predictive models for CS derivatives, which may be useful for identifying novel CSs suitable for candidate drug development. CONCLUSION The automated machine learning-based QSAR method developed here should be applicable for the time-efficient construction of predictive models using only a small number of compounds.
Collapse
Affiliation(s)
- Yohei Takada
- Corporate Planning Department, Otsuka Holdings Co., Ltd, Shinagawa Grand Central Tower 2-16-4 Konan, Minato-ku, Tokyo, 108-8241, Japan.
| | - Kazuhiro Kaneko
- Headquarters of Clinical Development, Otsuka Pharmaceutical Co., Ltd, Shinagawa Grand Central Tower 2-16-4 Konan, Minato-ku, Tokyo, 108-8241, Japan
| |
Collapse
|
11
|
Fang C, Wang Y, Grater R, Kapadnis S, Black C, Trapa P, Sciabola S. Prospective Validation of Machine Learning Algorithms for Absorption, Distribution, Metabolism, and Excretion Prediction: An Industrial Perspective. J Chem Inf Model 2023. [PMID: 37216672 DOI: 10.1021/acs.jcim.3c00160] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Absorption, distribution, metabolism, and excretion (ADME), which collectively define the concentration profile of a drug at the site of action, are of critical importance to the success of a drug candidate. Recent advances in machine learning algorithms and the availability of larger proprietary as well as public ADME data sets have generated renewed interest within the academic and pharmaceutical science communities in predicting pharmacokinetic and physicochemical endpoints in early drug discovery. In this study, we collected 120 internal prospective data sets over 20 months across six ADME in vitro endpoints: human and rat liver microsomal stability, MDR1-MDCK efflux ratio, solubility, and human and rat plasma protein binding. A variety of machine learning algorithms in combination with different molecular representations were evaluated. Our results suggest that gradient boosting decision tree and deep learning models consistently outperformed random forest over time. We also observed better performance when models were retrained on a fixed schedule, and the more frequent retraining generally resulted in increased accuracy, while hyperparameters tuning only improved the prospective predictions marginally.
Collapse
Affiliation(s)
- Cheng Fang
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| | - Ye Wang
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| | - Richard Grater
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | | | - Cheryl Black
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | - Patrick Trapa
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | - Simone Sciabola
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| |
Collapse
|
12
|
Yang SQ, Zhang LX, Ge YJ, Zhang JW, Hu JX, Shen CY, Lu AP, Hou TJ, Cao DS. In-silico target prediction by ensemble chemogenomic model based on multi-scale information of chemical structures and protein sequences. J Cheminform 2023; 15:48. [PMID: 37088813 PMCID: PMC10123967 DOI: 10.1186/s13321-023-00720-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 04/08/2023] [Indexed: 04/25/2023] Open
Abstract
Identification and validation of bioactive small-molecule targets is a significant challenge in drug discovery. In recent years, various in-silico approaches have been proposed to expedite time- and resource-consuming experiments for target detection. Herein, we developed several chemogenomic models for target prediction based on multi-scale information of chemical structures and protein sequences. By combining the information of a compound with multiple protein targets together and putting these compound-target pairs into a well-established model, the scores to indicate whether there are interactions between compounds and targets can be derived, and thus a target prediction task can be completed by sorting the outputted scores. To improve the prediction performance, we constructed several chemogenomic models using multi-scale information of chemical structures and protein sequences, and the ensemble model with the best performance was used as our final model. The model was validated by various strategies and external datasets and the promising target prediction capability of the model, i.e., the fraction of known targets identified in the top-k (1 to 10) list of the potential target candidates suggested by the model, was confirmed. Compared with multiple state-of-art target prediction methods, our model showed equivalent or better predictive ability in terms of the top-k predictions. It is expected that our method can be utilized as a powerful computational tool to narrow down the potential targets for experimental testing.
Collapse
Affiliation(s)
- Su-Qing Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Liu-Xia Zhang
- The First Hospital of Hunan University of Chinese Medicine, Changsha, 410007, Hunan, People's Republic of China
| | - You-Jin Ge
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Jin-Wei Zhang
- Departments of Biomedical Engineering and Pathology, School of Basic Medical Science, Central South University, Changsha, 410013, Hunan, People's Republic of China
| | - Jian-Xin Hu
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Cheng-Ying Shen
- Department of Pharmacy, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, Jiangxi, People's Republic of China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China
| | - Ting-Jun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, People's Republic of China.
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, People's Republic of China.
| |
Collapse
|
13
|
Toropov AA, Barnes DA, Toropova AP, Roncaglioni A, Irvine AR, Masereeuw R, Benfenati E. CORAL Models for Drug-Induced Nephrotoxicity. TOXICS 2023; 11:293. [PMID: 37112520 PMCID: PMC10142465 DOI: 10.3390/toxics11040293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/20/2023] [Accepted: 03/21/2023] [Indexed: 06/19/2023]
Abstract
Drug-induced nephrotoxicity is a major cause of kidney dysfunction with potentially fatal consequences. The poor prediction of clinical responses based on preclinical research hampers the development of new pharmaceuticals. This emphasises the need for new methods for earlier and more accurate diagnosis to avoid drug-induced kidney injuries. Computational predictions of drug-induced nephrotoxicity are an attractive approach to facilitate such an assessment and such models could serve as robust and reliable replacements for animal testing. To provide the chemical information for computational prediction, we used the convenient and common SMILES format. We examined several versions of so-called optimal SMILES-based descriptors. We obtained the highest statistical values, considering the specificity, sensitivity and accuracy of the prediction, by applying recently suggested atoms pairs proportions vectors and the index of ideality of correlation, which is a special statistical measure of the predictive potential. Implementation of this tool in the drug development process might lead to safer drugs in the future.
Collapse
Affiliation(s)
- Andrey A. Toropov
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (A.P.T.); (A.R.); (E.B.)
| | - Devon A. Barnes
- Utrecht Institute for Pharmaceutical Sciences, div. Pharmacology, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands; (D.A.B.); (A.R.I.); (R.M.)
| | - Alla P. Toropova
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (A.P.T.); (A.R.); (E.B.)
| | - Alessandra Roncaglioni
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (A.P.T.); (A.R.); (E.B.)
| | - Alasdair R. Irvine
- Utrecht Institute for Pharmaceutical Sciences, div. Pharmacology, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands; (D.A.B.); (A.R.I.); (R.M.)
| | - Rosalinde Masereeuw
- Utrecht Institute for Pharmaceutical Sciences, div. Pharmacology, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands; (D.A.B.); (A.R.I.); (R.M.)
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (A.P.T.); (A.R.); (E.B.)
| |
Collapse
|
14
|
Zhou H, Shan M, Qin LP, Cheng G. Reliable prediction of cannabinoid receptor 2 ligand by machine learning based on combined fingerprints. Comput Biol Med 2023; 152:106379. [PMID: 36502694 DOI: 10.1016/j.compbiomed.2022.106379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 11/15/2022] [Accepted: 11/28/2022] [Indexed: 12/02/2022]
Abstract
Cannabinoid receptors, as part of the family of the G protein-coupled receptors (GPCRs), are involved in various physiological functions. Its subtype cannabinoid receptor subtype 2 (CB2), mainly distributed in the periphery, is a crucial therapeutic target for anti-epileptic, anti-inflammation, anti-fibrosis, and bone metabolism regulation, and it regulates these physiological functions without psychiatric side effects. Recently machine learning methods for predicting biophysics properties have attracted much attention. Successful application of machine learning usually highly depends on the appropriate representation of the compounds. In this study, we comprehensively evaluate the performance of the descriptor-based models (including XGBoost, Random Forest, and KNN) and two graph-based models (D-MPNN, MolMap) for the prediction of the CB2 regulators, and found that XGBoost offers outstanding performance for both regression tasks and classification tasks. 13 different molecular fingerprints and 12 descriptors, as well as their combination were further screened; AvalonFP + AtomPairFP + RDkitFP + MorganFP and AtomPairFP + MorganFP + AvalonFP were the optimum combinations for regression task (R2 increase to 0.667) and classification task (AUC-ROC increase to 0.933), respectively. Specifically, the best XGBoost regression model with optimum features achieves better performance than Mizera's QSAR model on the same dataset developed by Mizera (R2 0.664 versus 0.62). It also achieves optimal performance with an AUC-ROC of 0.917 on the external validation set. By comparison, MolMap and D-MPNN only provide 0.912 and 0.898. The Shapley additive explanation method was used to interpret the models, and features importance were shown for both regression and classification task. The XGBoost model equipped with essential molecular fingerprints combination in this paper may provide valuable clues to designing novel CB2 ligands and developing models for other properties prediction.
Collapse
Affiliation(s)
- Hao Zhou
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou, 310053, People's Republic of China
| | - Mengyi Shan
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou, 310053, People's Republic of China
| | - Lu-Ping Qin
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou, 310053, People's Republic of China.
| | - Gang Cheng
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University Hangzhou, 310053, People's Republic of China.
| |
Collapse
|
15
|
Sun L, Zhang M, Xie L, Gao Q, Xu X, Xu L. In silico prediction of boiling point, octanol-water partition coefficient, and retention time index of polycyclic aromatic hydrocarbons through machine learning. Chem Biol Drug Des 2023; 101:52-68. [PMID: 35852446 DOI: 10.1111/cbdd.14121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 07/14/2022] [Accepted: 07/17/2022] [Indexed: 12/15/2022]
Abstract
Polycyclic aromatic hydrocarbons (PAHs), a special class of persistent organic pollutants (POPs) with two or more aromatic rings, have received extensive attention owing to their carcinogenic, mutagenic, and teratogenic effects. Quantitative structure-property relationship (QSPR) is powerful chemometric method to correlate structural descriptors of PAHs with their physicochemical properties. In this manuscript, a QSPR study of PAHs was performed to predict their boiling point (bp), octanol-water partition coefficient (LogKow ), and retention time index (RI). In addition to traditional molecular descriptors, structural fingerprints play an important role in the correlation of the above properties. Three regression methods, partial least squares (PLS), multiple linear regression (MLR), and genetic function approximation (GFA), were used to establish QSPR models for each property of PAHs. The correlation coefficient (R2 test ) and root mean square error (RMSE) of best model were 0.980 and 24.39% (PLS), 0.979 and 35.80% (GFA), 0.926 and 22.90% (MLR) for bp, LogKow, and RI, respectively. The model proposed here can be used to estimate physicochemical properties and inform toxicity prediction of environmental chemicals.
Collapse
Affiliation(s)
- Linkang Sun
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Min Zhang
- School of Computer Engineering, Jiangsu University of Technology, Changzhou, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Qian Gao
- School of Computer Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|
16
|
An interpretable machine learning model for selectivity of small molecules against homologous protein family. Future Med Chem 2022; 14:1441-1453. [PMID: 36169035 DOI: 10.4155/fmc-2022-0075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Aim: In the early stages of drug discovery, various experimental and computational methods are used to measure the specificity of small molecules against a target protein. The selectivity of small molecules remains a challenge leading to off-target side effects. Methods: We have developed a multitask deep learning model for predicting the selectivity on closely related homologs of the target protein. The model has been tested on the Janus-activated kinase and dopamine receptor families of proteins. Results & conclusion: The feature-based representation (extended connectivity fingerprint 4) with Extreme Gradient Boosting performed better when compared with deep neural network models in most of the evaluation metrics. Both the Extreme Gradient Boosting and deep neural network models outperformed the graph-based models. Furthermore, to decipher the model decision on selectivity, the important fragments associated with each homologous protein were identified.
Collapse
|
17
|
Li L, Lu Z, Liu G, Tang Y, Li W. In Silico Prediction of Human and Rat Liver Microsomal Stability via Machine Learning Methods. Chem Res Toxicol 2022; 35:1614-1624. [PMID: 36053050 DOI: 10.1021/acs.chemrestox.2c00207] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Liver microsomal stability is an important property considered for the screening of drug candidates in the early stage of drug development. Determination of hepatic metabolic stability can be performed by an in vitro assay, but it requires quite a few resources and time. In recent years, machine learning methods have made much progress. Therefore, development of computational models to predict liver microsomal stability is highly desirable in the drug discovery process. In this study, the in silico classification models for the prediction of the metabolic stability of compounds in rat and human liver microsomes were constructed by the conventional machine learning and deep learning methods. The performance of the models was evaluated using the test and external sets. For the rat liver microsomes (RLM) stability, the best model yielded the AUC values of 0.84 and 0.71 on the test and external validation sets, respectively. For the human liver microsome (HLM) stability, the best model exhibited the AUC values of 0.86 and 0.77 on the test and external validation sets, respectively. In addition, several important substructure fragments were detected using information gain and frequency substructure analysis methods. The applicability domain of the models was defined using the Euclidean distance-based method. We anticipate that our results would be helpful for the prediction of liver microsomal stability of compounds in the early stage of drug discovery.
Collapse
Affiliation(s)
- Longqiang Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zhou Lu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
18
|
Prediction and Screening Model for Products Based on Fusion Regression and XGBoost Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4987639. [PMID: 35958779 PMCID: PMC9357736 DOI: 10.1155/2022/4987639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 06/14/2022] [Accepted: 06/27/2022] [Indexed: 11/18/2022]
Abstract
Performance prediction based on candidates and screening based on predicted performance value are the core of product development. For example, the performance prediction and screening of equipment components and parts are an important guarantee for the reliability of equipment products. The prediction and screening of drug bioactivity value and performance are the keys to pharmaceutical product development. The main reasons for the failure of pharmaceutical discovery are the low bioactivity of the candidate compounds and the deficiencies in their efficacy and safety, which are related to the absorption, distribution, metabolism, excretion, and toxicity (ADMET) of the compounds. Therefore, it is very necessary to quickly and effectively perform systematic bioactivity value prediction and ADMET property evaluation for candidate compounds in the early stage of drug discovery. In this paper, a data-driven pharmaceutical products screening prediction model is proposed to screen drug candidates with higher bioactivity value and better ADMET properties. First, a quantitative prediction method for bioactivity value is proposed using the fusion regression of LGBM and neural network based on backpropagation (BP-NN). Then, the ADMET properties prediction method is proposed using XGBoost. According to the predicted bioactivity value and ADMET properties, the BVAP method is defined to screen the drug candidates. And the screening model is validated on the dataset of antagonized Erα active compounds, in which the mean square error (MSE) of fusion regression is 1.1496, the XGBoost prediction accuracy of ADMET properties are 94.0% for Caco-2, 95.7% for CYP3A4, 89.4% for HERG, 88.6% for hob, and 96.2% for Mn. Compared with the commonly used methods for ADMET properties such as SVM, RF, KNN, LDA, and NB, the XGBoost in this paper has the highest prediction accuracy and AUC value, which has better guiding significance and can help screen pharmaceutical product candidates with good bioactivity, pharmacokinetic properties, and safety.
Collapse
|
19
|
Shi Y, Hua Y, Wang B, Zhang R, Li X. In Silico Prediction and Insights Into the Structural Basis of Drug Induced Nephrotoxicity. Front Pharmacol 2022; 12:793332. [PMID: 35082675 PMCID: PMC8785686 DOI: 10.3389/fphar.2021.793332] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 11/23/2021] [Indexed: 12/11/2022] Open
Abstract
Drug induced nephrotoxicity is a major clinical challenge, and it is always associated with higher costs for the pharmaceutical industry and due to detection during the late stages of drug development. It is desirable for improving the health outcomes for patients to distinguish nephrotoxic structures at an early stage of drug development. In this study, we focused on in silico prediction and insights into the structural basis of drug induced nephrotoxicity, based on reliable data on human nephrotoxicity. We collected 565 diverse chemical structures, including 287 nephrotoxic drugs on humans in the real world, and 278 non-nephrotoxic approved drugs. Several different machine learning and deep learning algorithms were employed for in silico model building. Then, a consensus model was developed based on three best individual models (RFR_QNPR, XGBOOST_QNPR, and CNF). The consensus model performed much better than individual models on internal validation and it achieved prediction accuracy of 86.24% external validation. The results of analysis of molecular properties differences between nephrotoxic and non-nephrotoxic structures indicated that several key molecular properties differ significantly, including molecular weight (MW), molecular polar surface area (MPSA), AlogP, number of hydrogen bond acceptors (nHBA), molecular solubility (LogS), the number of rotatable bonds (nRotB), and the number of aromatic rings (nAR). These molecular properties may be able to play an important part in the identification of nephrotoxic chemicals. Finally, 87 structural alerts for chemical nephrotoxicity were mined with f-score and positive rate analysis of substructures from Klekota-Roth fingerprint (KRFP). These structural alerts can well identify nephrotoxic drug structures in the data set. The in silico models and the structural alerts could be freely accessed via https://ochem.eu/article/140251 and http://www.sapredictor.cn, respectively. We hope the results should provide useful tools for early nephrotoxicity estimation in drug development.
Collapse
Affiliation(s)
- Yinping Shi
- Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Yuqing Hua
- Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China.,School of Pharmacy, Shandong First Medical University, Tai'an, China
| | - Baobao Wang
- Department of Nephrology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Ruiqiu Zhang
- Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China.,School of Pharmacy, Shandong First Medical University, Tai'an, China
| | - Xiao Li
- Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China.,Shandong Provincial Qianfoshan Hospital, Shandong University, Jinan, China
| |
Collapse
|
20
|
Lin CY, Chien TW, Chen YH, Lee YL, Su SB. An app to classify a 5-year survival in patients with breast cancer using the convolutional neural networks (CNN) in Microsoft Excel: Development and usability study. Medicine (Baltimore) 2022; 101:e28697. [PMID: 35089226 PMCID: PMC8797502 DOI: 10.1097/md.0000000000028697] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Accepted: 01/04/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Breast cancer (BC) is the most common malignant cancer in women. A predictive model is required to predict the 5-year survival in patients with BC (5YSPBC) and improve the treatment quality by increasing their survival rate. However, no reports in literature about apps developed and designed in medical practice to classify the 5YSPBC. This study aimed to build a model to develop an app for an automatically accurate classification of the 5YSPBC. METHODS A total of 1810 patients with BC were recruited in a hospital in Taiwan from the secondary data with codes on 53 characteristic variables that were endorsed by professional staff clerks as of December 31, 2019. Five models (i.e., revolution neural network [CNN], artificial neural network, Naïve Bayes, K-nearest Neighbors Algorithm, and Logistic regression) and 3 tasks (i.e., extraction of feature variables, model comparison in accuracy [ACC] and stability, and app development) were performed to achieve the goal of developing an app to predict the 5YSPBC. The sensitivity, specificity, and receiver operating characteristic curve (area under ROC curve) on models across 2 scenarios of training (70%) and testing (30%) sets were compared. An app predicting the 5YSPBC was developed involving the model estimated parameters for a website assessment. RESULTS We observed that the 15-variable CNN model yields higher ACC rates (0.87 and 0.86) with area under ROC curves of 0.80 and 0.78 (95% confidence interval 0.78-82 and 0.74-81) based on 1357 training and 540 testing cases an available app for patients predicting the 5YSPBC was successfully developed and demonstrated in this study. CONCLUSION The 15-variable CNN model with 38 parameters estimated using CNN for improving the ACC of the 5YSPBC has been particularly demonstrated in Microsoft Excel. An app developed for helping clinicians assess the 5YSPBC in clinical settings is required for application in the future.
Collapse
Affiliation(s)
- Cheng-Yao Lin
- Division of Hematology-Oncology, Department of Internal Medicine, Chi Mei Medical Center, Liouying, Tainan, Taiwan
- Department of Senior Welfare and Services, Southern Taiwan University of Science and Technology, Tainan, Taiwan
- Department of Environmental and Occupational Health, National Cheng Kung University, Tainan, Taiwan
| | - Tsair-Wei Chien
- Department of Medical Research, Chi-Mei Medical Center, Tainan, Taiwan
| | - Yen-Hsun Chen
- Division of Hematology-Oncology, Department of Internal Medicine, Chi Mei Center, Liouying, Tainan, Taiwan
| | - Yen-Ling Lee
- Department of Oncology, Tainan Hospital, Ministry of Healthy and Welfare, Tainan, Taiwan
- Department of Environmental and Occupational Health, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Shih-Bin Su
- Department of Occupational Medicine, Chi Mei Medical Center, Tainan, Taiwan
| |
Collapse
|
21
|
Xing Y, Wang Z, Li X, Hou C, Chai J, Li X, Su J, Gao J, Xu H. A new method for predicting the acute toxicity of carbamate pesticides based on the perspective of binding information with carrier protein. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 264:120188. [PMID: 34358782 DOI: 10.1016/j.saa.2021.120188] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/08/2021] [Accepted: 07/12/2021] [Indexed: 06/13/2023]
Abstract
Toxicity is one of the most important factors limiting the success of new drug development. In this paper, we built a fast and convenient new method (Carrier protein binding information-toxicity relationship, CPBITR) for predicting drug acute toxicity based on the perspective of binding information with carrier protein. First, we studied the binding information between carbamate pesticides and human serum albumin (HSA) through various spectroscopic methods and molecular docking. Then a total of 16 models were established to clarify the relationship between binding information with HSA and drug toxicity. The results showed that the binding information was related to toxicity. Finally we obtained the effective toxicity prediction model for carbamate pesticides. And the "Platform for Predicting Drug Toxicity Based on the Information of Binding with Carrier Protein" was established with the Back-propagation neural network model. We proposed and proved that it was feasible to predict drug toxicity from this new perspective: binding with carrier protein. According to this new perspective, toxicity prediction model of other drugs can also be established. This new method has the advantages of convenience and fast, and can be used to screen out low-toxic drugs quickly in the early stage. It is helpful for drug research and development.
Collapse
Affiliation(s)
- Yue Xing
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin 150080, China
| | - Zishi Wang
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin 150080, China
| | - Xiangshuai Li
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin 150080, China
| | - Chenxin Hou
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin 150080, China
| | - Jiashuang Chai
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin 150080, China
| | - Xiangfen Li
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin 150080, China
| | - Jing Su
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin 150080, China
| | - Jinsheng Gao
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin 150080, China.
| | - Hongliang Xu
- Engineering Research Center of Pesticide of Heilongjiang Province, College of Advanced Agriculture and Ecological Environment, Heilongjiang University, Harbin 150080, China.
| |
Collapse
|
22
|
Lan X, Wang X, Qi J, Chen H, Zeng X, Shi J, Liu D, Shen H, Zhang J. Application of machine learning with multiparametric dual-energy computed tomography of the breast to differentiate between benign and malignant lesions. Quant Imaging Med Surg 2022; 12:810-822. [PMID: 34993120 DOI: 10.21037/qims-21-39] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 07/30/2021] [Indexed: 11/06/2022]
Abstract
BACKGROUND Multiparametric dual-energy computed tomography (mpDECT) is widely used to differentiate various kinds of tumors; however, the data regarding its diagnostic performance with machine learning to diagnose breast tumors is limited. We evaluated univariate analysis and machine learning performance with mpDECT to distinguish between benign and malignant breast lesions. METHODS In total, 172 patients with 214 breast lesions (55 benign and 159 malignant) who underwent preoperative dual-phase contrast-enhanced DECT were included in this retrospective study. Twelve quantitative features were extracted for each lesion, including CT attenuation (precontrast, arterial, and venous phases), the arterial-venous phase difference in normalized effective atomic number (nZeff), normalized iodine concentration (NIC), and slope of the spectral Hounsfield unit (HU) curve (λHu). Predictive models were developed using univariate analysis and eight machine learning methods [logistic regression, extreme gradient boosting (XGBoost), stochastic gradient descent (SGD), linear discriminant analysis (LDA), adaptive boosting (AdaBoost), random forest (RF), decision tree, and linear support vector machine (SVM)]. Classification performances were assessed based on the area under the receiver operating characteristic curve (AUROC). The best performances of the conventional univariate analysis and machine learning methods were compared using the Delong test. RESULTS The univariate analysis showed that the venous phase λHu had the highest AUROC (0.88). Machine learning with mpDECT achieved an excellent and stable diagnostic performance, as shown by the mean classification performances in the training dataset (AUROC, 0.88-0.99) and testing (AUROC, 0.83-0.96) datasets. The performance of the AdaBoost model based on mpDECT was more stable than the other machine learning models and superior to the univariate analysis (AUROC, 0.96 vs. 0.88; P<0.001). CONCLUSIONS The performance of the AdaBoost classifier based on mpDECT data achieved the highest mean accuracy compared to the other machine learning models and univariate analysis in differentiating between benign and malignant breast lesions.
Collapse
Affiliation(s)
- Xiaosong Lan
- Department of Radiology, Chongqing University Cancer Hospital & Chongqing Cancer Institute & Chongqing Cancer Hospital, Chongqing, China
| | - Xiaoxia Wang
- Department of Radiology, Chongqing University Cancer Hospital & Chongqing Cancer Institute & Chongqing Cancer Hospital, Chongqing, China
| | - Jun Qi
- Department of Thoracic Surgery, Chongqing University Cancer Hospital, School of Medicine, Chongqing University, Chongqing, China
| | - Huifang Chen
- Department of Radiology, Chongqing University Cancer Hospital & Chongqing Cancer Institute & Chongqing Cancer Hospital, Chongqing, China
| | - Xiangfei Zeng
- Department of Radiology, Chongqing University Cancer Hospital & Chongqing Cancer Institute & Chongqing Cancer Hospital, Chongqing, China
| | - Jinfang Shi
- Department of Radiology, Chongqing University Cancer Hospital & Chongqing Cancer Institute & Chongqing Cancer Hospital, Chongqing, China
| | - Daihong Liu
- Department of Radiology, Chongqing University Cancer Hospital & Chongqing Cancer Institute & Chongqing Cancer Hospital, Chongqing, China
| | - Hesong Shen
- Department of Radiology, Chongqing University Cancer Hospital & Chongqing Cancer Institute & Chongqing Cancer Hospital, Chongqing, China
| | - Jiuquan Zhang
- Department of Radiology, Chongqing University Cancer Hospital & Chongqing Cancer Institute & Chongqing Cancer Hospital, Chongqing, China
| |
Collapse
|
23
|
Bassan A, Alves VM, Amberg A, Anger LT, Beilke L, Bender A, Bernal A, Cronin MT, Hsieh JH, Johnson C, Kemper R, Mumtaz M, Neilson L, Pavan M, Pointon A, Pletz J, Ruiz P, Russo DP, Sabnis Y, Sandhu R, Schaefer M, Stavitskaya L, Szabo DT, Valentin JP, Woolley D, Zwickl C, Myatt GJ. In silico approaches in organ toxicity hazard assessment: Current status and future needs for predicting heart, kidney and lung toxicities. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2021; 20:100188. [PMID: 35721273 PMCID: PMC9205464 DOI: 10.1016/j.comtox.2021.100188] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
The kidneys, heart and lungs are vital organ systems evaluated as part of acute or chronic toxicity assessments. New methodologies are being developed to predict these adverse effects based on in vitro and in silico approaches. This paper reviews the current state of the art in predicting these organ toxicities. It outlines the biological basis, processes and endpoints for kidney toxicity, pulmonary toxicity, respiratory irritation and sensitization as well as functional and structural cardiac toxicities. The review also covers current experimental approaches, including off-target panels from secondary pharmacology batteries. Current in silico approaches for prediction of these effects and mechanisms are described as well as obstacles to the use of in silico methods. Ultimately, a commonly accepted protocol for performing such assessment would be a valuable resource to expand the use of such approaches across different regulatory and industrial applications. However, a number of factors impede their widespread deployment including a lack of a comprehensive mechanistic understanding, limited in vitro testing approaches and limited in vivo databases suitable for modeling, a limited understanding of how to incorporate absorption, distribution, metabolism, and excretion (ADME) considerations into the overall process, a lack of in silico models designed to predict a safe dose and an accepted framework for organizing the key characteristics of these organ toxicants.
Collapse
Affiliation(s)
- Arianna Bassan
- Innovatune srl, Via Giulio Zanon 130/D, 35129 Padova, Italy
| | - Vinicius M. Alves
- The National Institute of Environmental Health Sciences, Division of the National Toxicology Program, Research Triangle Park, NC 27709, United States
| | - Alexander Amberg
- Sanofi, R&D Preclinical Safety Frankfurt, Industriepark Hoechst, D-65926 Frankfurt am Main, Germany
| | - Lennart T. Anger
- Genentech, Inc., 1 DNA Way, South San Francisco, CA 94080, United States
| | - Lisa Beilke
- Toxicology Solutions Inc., San Diego, CA, United States
| | - Andreas Bender
- AI and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United States
| | | | - Mark T.D. Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, L3 3AF, UK
| | - Jui-Hua Hsieh
- The National Institute of Environmental Health Sciences, Division of the National Toxicology Program, Research Triangle Park, NC 27709, United States
| | | | - Raymond Kemper
- Nuvalent, One Broadway, 14th floor, Cambridge, MA 02142, United States
| | - Moiz Mumtaz
- Agency for Toxic Substances and Disease Registry, US Department of Health and Human Services, Atlanta, GA, United States
| | - Louise Neilson
- Broughton Nicotine Services, Oak Tree House, West Craven Drive, Earby, Lancashire BB18 6JZ UK
| | - Manuela Pavan
- Innovatune srl, Via Giulio Zanon 130/D, 35129 Padova, Italy
| | - Amy Pointon
- Functional and Mechanistic Safety, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Julia Pletz
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, L3 3AF, UK
| | - Patricia Ruiz
- Agency for Toxic Substances and Disease Registry, US Department of Health and Human Services, Atlanta, GA, United States
| | - Daniel P. Russo
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, United States
- Department of Chemistry, Rutgers University, Camden, NJ 08102, United States
| | - Yogesh Sabnis
- UCB Biopharma SRL, Chemin du Foriest, B-1420 Braine-l’Alleud, Belgium
| | - Reena Sandhu
- SafeDose Ltd., 20 Dundas Street West, Suite 921, Toronto, Ontario M5G2H1, Canada
| | - Markus Schaefer
- Sanofi, R&D Preclinical Safety Frankfurt, Industriepark Hoechst, D-65926 Frankfurt am Main, Germany
| | - Lidiya Stavitskaya
- US Food and Drug Administration, Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA
| | | | | | - David Woolley
- ForthTox Limited, PO Box 13550, Linlithgow, EH49 7YU, UK
| | - Craig Zwickl
- Transendix LLC, 1407 Moores Manor, Indianapolis, IN 46229, United States
| | - Glenn J. Myatt
- Instem, 1393 Dublin Road, Columbus, OH 43215, United States
| |
Collapse
|
24
|
Kim T, You BH, Han S, Shin HC, Chung KC, Park H. Quantum Artificial Neural Network Approach to Derive a Highly Predictive 3D-QSAR Model for Blood-Brain Barrier Passage. Int J Mol Sci 2021; 22:ijms222010995. [PMID: 34681653 PMCID: PMC8537149 DOI: 10.3390/ijms222010995] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 10/07/2021] [Accepted: 10/10/2021] [Indexed: 01/07/2023] Open
Abstract
A successful passage of the blood–brain barrier (BBB) is an essential prerequisite for the drug molecules designed to act on the central nervous system. The logarithm of blood–brain partitioning (LogBB) has served as an effective index of molecular BBB permeability. Using the three-dimensional (3D) distribution of the molecular electrostatic potential (ESP) as the numerical descriptor, a quantitative structure-activity relationship (QSAR) model termed AlphaQ was derived to predict the molecular LogBB values. To obtain the optimal atomic coordinates of the molecules under investigation, the pairwise 3D structural alignments were conducted in such a way to maximize the quantum mechanical cross correlation between the template and a target molecule. This alignment method has the advantage over the conventional atom-by-atom matching protocol in that the structurally diverse molecules can be analyzed as rigorously as the chemical derivatives with the same scaffold. The inaccuracy problem in the 3D structural alignment was alleviated in a large part by categorizing the molecules into the eight subsets according to the molecular weight. By applying the artificial neural network algorithm to associate the fully quantum mechanical ESP descriptors with the extensive experimental LogBB data, a highly predictive 3D-QSAR model was derived for each molecular subset with a squared correlation coefficient larger than 0.8. Due to the simplicity in model building and the high predictability, AlphaQ is anticipated to serve as an effective computational screening tool for molecular BBB permeability.
Collapse
Affiliation(s)
- Taeho Kim
- Department of Bioscience and Biotechnology, Sejong University, Kwangjin-gu, Seoul 05006, Korea;
| | - Byoung Hoon You
- Whan In Pharmaceutical Co., Ltd., 11, Songpa-gu, Seoul 05855, Korea; (B.H.Y.); (S.H.); (H.C.S.)
| | - Songhee Han
- Whan In Pharmaceutical Co., Ltd., 11, Songpa-gu, Seoul 05855, Korea; (B.H.Y.); (S.H.); (H.C.S.)
| | - Ho Chul Shin
- Whan In Pharmaceutical Co., Ltd., 11, Songpa-gu, Seoul 05855, Korea; (B.H.Y.); (S.H.); (H.C.S.)
| | - Kee-Choo Chung
- Department of Bioscience and Biotechnology, Sejong University, Kwangjin-gu, Seoul 05006, Korea;
- Correspondence: (K.-C.C.); (H.P.); Tel.: +82-2-2963-1635 (K.-C.C.); +82-2-3408-3766 (H.P.); Fax: +82-2-3408-4334 (K.-C.C. & H.P.)
| | - Hwangseo Park
- Department of Bioscience and Biotechnology, Sejong University, Kwangjin-gu, Seoul 05006, Korea;
- Correspondence: (K.-C.C.); (H.P.); Tel.: +82-2-2963-1635 (K.-C.C.); +82-2-3408-3766 (H.P.); Fax: +82-2-3408-4334 (K.-C.C. & H.P.)
| |
Collapse
|
25
|
Venkatraman V. FP-ADMET: a compendium of fingerprint-based ADMET prediction models. J Cheminform 2021; 13:75. [PMID: 34583740 PMCID: PMC8479898 DOI: 10.1186/s13321-021-00557-5] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 09/20/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION The absorption, distribution, metabolism, excretion, and toxicity (ADMET) of drugs plays a key role in determining which among the potential candidates are to be prioritized. In silico approaches based on machine learning methods are becoming increasing popular, but are nonetheless limited by the availability of data. With a view to making both data and models available to the scientific community, we have developed FPADMET which is a repository of molecular fingerprint-based predictive models for ADMET properties. In this article, we have examined the efficacy of fingerprint-based machine learning models for a large number of ADMET-related properties. The predictive ability of a set of 20 different binary fingerprints (based on substructure keys, atom pairs, local path environments, as well as custom fingerprints such as all-shortest paths) for over 50 ADMET and ADMET-related endpoints have been evaluated as part of the study. We find that for a majority of the properties, fingerprint-based random forest models yield comparable or better performance compared with traditional 2D/3D molecular descriptors. AVAILABILITY The models are made available as part of open access software that can be downloaded from https://gitlab.com/vishsoft/fpadmet .
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Norwegian University of Science and Technology, Realfagbygget, Gløshaugen, Høgskoleringen, 7491, Trondheim, Norway.
| |
Collapse
|
26
|
Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 2021; 25:1315-1360. [PMID: 33844136 PMCID: PMC8040371 DOI: 10.1007/s11030-021-10217-3] [Citation(s) in RCA: 331] [Impact Index Per Article: 82.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modeling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learning in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent opportunity for rational drug design and discovery process, which will eventually impact mankind. The primary concern associated with drug design and development is time consumption and production cost. Further, inefficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning, and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug design and development. The artificial neural network, deep neural network, support vector machines, classification and regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and development process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative structure-activity relationship to drug repositioning, protein misfolding to protein-protein interactions, and molecular pathway identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity, and identification of mode of action.
Collapse
Affiliation(s)
- Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Devesh Srivastava
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Swati Tiwari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India.
| |
Collapse
|
27
|
Santana R, Onieva E, Zuluaga R, Duardo-Sánchez A, Gañán P. The Role of Machine Learning in Centralized Authorization Process of Nanomedicines in European Union. Curr Top Med Chem 2021; 21:828-838. [PMID: 33745436 DOI: 10.2174/1568026621666210319101847] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 11/12/2020] [Accepted: 12/31/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Machine Learning (ML) has experienced an increasing use, given the possibilities to expand the scientific knowledge of different disciplines, such as nanotechnology. This has allowed the creation of Cheminformatic models capable of predicting biological activity and physicochemical characteristics of new components with high success rates in training and test partitions. Given the current gaps of scientific knowledge and the need for efficient application of medicines products law, this paper analyzes the position of regulators for marketing medicinal nanoproducts in the European Union and the role of ML in the authorization process. METHODS In terms of methodology, a dogmatic study of the European regulation and the guidance of the European Medicine Agency on the use of predictive models for nanomaterials was carried out. The study has, as the framework of reference, the European Regulation 726/2004 and has focused on the analysis of how ML processes are contemplated in the regulations. RESULTS As a result, we present a discussion of the information that must be provided for every case for simulation methods. The results show a favorable and flexible position for the development of the use of predictive models to complement the applicant's information. CONCLUSION It is concluded that Machine Learning has the capacity to help improve the application of nanotechnology medicine products regulation. Future regulations should promote this kind of information given the advanced state of the art in terms of algorithms that are able to build accurate predictive models. This especially applies to methods, such as Perturbation Theory Machine Learning (PTML), given that it is aligned with principles promoted by the standards of Organization for Economic Co-operation and Development (OECD), European Union regulations, and European Authority Medicine. To our best knowledge, this is the first study focused on nanotechnology medicine products and machine learning used to support technical European public assessment reports (EPAR) for complementary information.
Collapse
Affiliation(s)
- Ricardo Santana
- DeustoTech-Fundacion Deusto, Avda. Universidades, 24,48007 Bilbao, Spain
| | - Enrique Onieva
- DeustoTech-Fundacion Deusto, Avda. Universidades, 24,48007 Bilbao, Spain
| | - Robin Zuluaga
- Facultad de Ingeniería Agroindustrial, Universidad Pontificia Bolivariana UPB050031, Medellin, Colombia
| | - Aliuska Duardo-Sánchez
- Department of Public Law, Law and the Human Genome Research Group, University of the Basque Country UPV/EHU 48940, Leioa, Biscay, Spain
| | - Piedad Gañán
- Facultad de Ingenieria Quimica, Universidad Pontificia Bolivariana UPB050031, Medellin, Colombia
| |
Collapse
|
28
|
Predicting the 14-Day Hospital Readmission of Patients with Pneumonia Using Artificial Neural Networks (ANN). INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18105110. [PMID: 34065894 PMCID: PMC8150657 DOI: 10.3390/ijerph18105110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Revised: 04/22/2021] [Accepted: 04/29/2021] [Indexed: 12/02/2022]
Abstract
Unplanned patient readmission (UPRA) is frequent and costly in healthcare settings. No indicators during hospitalization have been suggested to clinicians as useful for identifying patients at high risk of UPRA. This study aimed to create a prediction model for the early detection of 14-day UPRA of patients with pneumonia. We downloaded the data of patients with pneumonia as the primary disease (e.g., ICD-10:J12*-J18*) at three hospitals in Taiwan from 2016 to 2018. A total of 21,892 cases (1208 (6%) for UPRA) were collected. Two models, namely, artificial neural network (ANN) and convolutional neural network (CNN), were compared using the training (n = 15,324; ≅70%) and test (n = 6568; ≅30%) sets to verify the model accuracy. An app was developed for the prediction and classification of UPRA. We observed that (i) the 17 feature variables extracted in this study yielded a high area under the receiver operating characteristic curve of 0.75 using the ANN model and that (ii) the ANN exhibited better AUC (0.73) than the CNN (0.50), and (iii) a ready and available app for predicting UHA was developed. The app could help clinicians predict UPRA of patients with pneumonia at an early stage and enable them to formulate preparedness plans near or after patient discharge from hospitalization.
Collapse
|
29
|
Wu Z, Jiang D, Wang J, Hsieh CY, Cao D, Hou T. Mining Toxicity Information from Large Amounts of Toxicity Data. J Med Chem 2021; 64:6924-6936. [PMID: 33961429 DOI: 10.1021/acs.jmedchem.1c00421] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Safety is a main reason for drug failures, and therefore, the detection of compound toxicity and potential adverse effects in the early stage of drug development is highly desirable. However, accurate prediction of many toxicity endpoints is extremely challenging due to low accessibility of sufficient and reliable toxicity data, as well as complicated and diversified mechanisms related to toxicity. In this study, we proposed the novel multitask graph attention (MGA) framework to learn the regression and classification tasks simultaneously. MGA has shown excellent predictive power on 33 toxicity data sets and has the capability to extract general toxicity features and generate customized toxicity fingerprints. In addition, MGA provides a new way to detect structural alerts and discover the relationship between different toxicity tasks, which will be quite helpful to mine toxicity information from large amounts of toxicity data.
Collapse
Affiliation(s)
- Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058 Zhejiang, P. R. China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, P. R. China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058 Zhejiang, P. R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058 Zhejiang, P. R. China.,National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan 430072 Hubei, P. R. China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Tencent, Shenzhen 518057 Guangdong, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004 Hunan, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058 Zhejiang, P. R. China.,State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058 Zhejiang, P. R. China
| |
Collapse
|
30
|
Nayarisseri A, Khandelwal R, Tanwar P, Madhavi M, Sharma D, Thakur G, Speck-Planche A, Singh SK. Artificial Intelligence, Big Data and Machine Learning Approaches in Precision Medicine & Drug Discovery. Curr Drug Targets 2021; 22:631-655. [PMID: 33397265 DOI: 10.2174/1389450122999210104205732] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 08/21/2020] [Accepted: 09/14/2020] [Indexed: 11/22/2022]
Abstract
Artificial Intelligence revolutionizes the drug development process that can quickly identify potential biologically active compounds from millions of candidate within a short period. The present review is an overview based on some applications of Machine Learning based tools, such as GOLD, Deep PVP, LIB SVM, etc. and the algorithms involved such as support vector machine (SVM), random forest (RF), decision tree and Artificial Neural Network (ANN), etc. at various stages of drug designing and development. These techniques can be employed in SNP discoveries, drug repurposing, ligand-based drug design (LBDD), Ligand-based Virtual Screening (LBVS) and Structure- based Virtual Screening (SBVS), Lead identification, quantitative structure-activity relationship (QSAR) modeling, and ADMET analysis. It is demonstrated that SVM exhibited better performance in indicating that the classification model will have great applications on human intestinal absorption (HIA) predictions. Successful cases have been reported which demonstrate the efficiency of SVM and RF models in identifying JFD00950 as a novel compound targeting against a colon cancer cell line, DLD-1, by inhibition of FEN1 cytotoxic and cleavage activity. Furthermore, a QSAR model was also used to predict flavonoid inhibitory effects on AR activity as a potent treatment for diabetes mellitus (DM), using ANN. Hence, in the era of big data, ML approaches have been evolved as a powerful and efficient way to deal with the huge amounts of generated data from modern drug discovery to model small-molecule drugs, gene biomarkers and identifying the novel drug targets for various diseases.
Collapse
Affiliation(s)
- Anuraj Nayarisseri
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Ravina Khandelwal
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Poonam Tanwar
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Maddala Madhavi
- Department of Zoology, Nizam College, Osmania University, Hyderabad - 500001, Telangana State, India
| | - Diksha Sharma
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Garima Thakur
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Alejandro Speck-Planche
- Programa Institucional de Fomento a la Investigacion, Desarrollo e Innovacion, Universidad Tecnologica Metropolitana, Ignacio Valdivieso 2409, P.O. 8940577, San Joaquin, Santiago, Chile
| | - Sanjeev Kumar Singh
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi-630003, Tamil Nadu, India
| |
Collapse
|
31
|
Chou PH, Chien TW, Yang TY, Yeh YT, Chou W, Yeh CH. Predicting Active NBA Players Most Likely to Be Inducted into the Basketball Hall of Famers Using Artificial Neural Networks in Microsoft Excel: Development and Usability Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18084256. [PMID: 33923846 PMCID: PMC8072800 DOI: 10.3390/ijerph18084256] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 03/18/2021] [Accepted: 03/25/2021] [Indexed: 12/11/2022]
Abstract
The prediction of whether active NBA players can be inducted into the Hall of Fame (HOF) is interesting and important. However, no such research have been published in the literature, particularly using the artificial neural network (ANN) technique. The aim of this study is to build an ANN model with an app for automatic prediction and classification of HOF for NBA players. We downloaded 4728 NBA players’ data of career stats and accolades from the website at basketball-reference.com. The training sample was collected from 85 HOF members and 113 retired Non-HOF players based on completed data and a longer career length (≥15 years). Featured variables were taken from the higher correlation coefficients (<0.1) with HOF and significant deviations apart from the two HOF/Non-HOF groups using logistical regression. Two models (i.e., ANN and convolutional neural network, CNN) were compared in model accuracy (e.g., sensitivity, specificity, area under the receiver operating characteristic curve, AUC). An app predicting HOF was then developed involving the model’s parameters. We observed that (1) 20 feature variables in the ANN model yielded a higher AUC of 0.93 (95% CI 0.93–0.97) based on the 198-case training sample, (2) the ANN performed better than CNN on the accuracy of AUC (= 0.91, 95% CI 0.87–0.95), and (3) an ready and available app for predicting HOF was successfully developed. The 20-variable ANN model with the 53 parameters estimated by the ANN for improving the accuracy of HOF has been developed. The app can help NBA fans to predict their players likely to be inducted into the HOF and is not just limited to the active NBA players.
Collapse
Affiliation(s)
- Po-Hsin Chou
- Department of Orthopedics and Traumatology, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- School of Medicine, National Yang Ming Chiao Tung University, Taipei 112, Taiwan
| | - Tsair-Wei Chien
- Department of Medical Research, Chi-Mei Medical Center, Tainan 700, Taiwan;
| | - Ting-Ya Yang
- Medical Education Center, Chi-Mei Medical Center, Tainan 700, Taiwan;
- School of Medicine, College of Medicine, China Medical University, Taichung 400, Taiwan
| | - Yu-Tsen Yeh
- Medical School, St. George’s University of London, London SW17 0RE, UK;
| | - Willy Chou
- Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, Tainan 700, Taiwan
- Correspondence: (W.C.); (C.-H.Y.); Tel.: +886-6291-2811 (C.-H.Y.)
| | - Chao-Hung Yeh
- Department of Neurosurgery, Chi Mei Medical Center, Tainan 700, Taiwan
- Correspondence: (W.C.); (C.-H.Y.); Tel.: +886-6291-2811 (C.-H.Y.)
| |
Collapse
|
32
|
Ye Q, Chai X, Jiang D, Yang L, Shen C, Zhang X, Li D, Cao D, Hou T. Identification of active molecules against Mycobacterium tuberculosis through machine learning. Brief Bioinform 2021; 22:6209685. [PMID: 33822874 DOI: 10.1093/bib/bbab068] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 01/23/2021] [Accepted: 02/09/2021] [Indexed: 11/14/2022] Open
Abstract
Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (Mtb) and it has been one of the top 10 causes of death globally. Drug-resistant tuberculosis (XDR-TB), extensively resistant to the commonly used first-line drugs, has emerged as a major challenge to TB treatment. Hence, it is quite necessary to discover novel drug candidates for TB treatment. In this study, based on different types of molecular representations, four machine learning (ML) algorithms, including support vector machine, random forest (RF), extreme gradient boosting (XGBoost) and deep neural networks (DNN), were used to develop classification models to distinguish Mtb inhibitors from noninhibitors. The results demonstrate that the XGBoost model exhibits the best prediction performance. Then, two consensus strategies were employed to integrate the predictions from multiple models. The evaluation results illustrate that the consensus model by stacking the RF, XGBoost and DNN predictions offers the best predictions with area under the receiver operating characteristic curve of 0.842 and 0.942 for the 10-fold cross-validated training set and external test set, respectively. Besides, the association between the important descriptors and the bioactivities of molecules was interpreted by using the Shapley additive explanations method. Finally, an online webserver called ChemTB (http://cadd.zju.edu.cn/chemtb/) was developed, and it offers a freely available computational tool to detect potential Mtb inhibitors.
Collapse
Affiliation(s)
- Qing Ye
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Xin Chai
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Dejun Jiang
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Liu Yang
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Chao Shen
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Xujun Zhang
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences at Central South University, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences at Zhejiang University, China
| |
Collapse
|
33
|
Yang ZY, Yang ZJ, Zhao Y, Yin MZ, Lu AP, Chen X, Liu S, Hou TJ, Cao DS. PySmash: Python package and individual executable program for representative substructure generation and application. Brief Bioinform 2021; 22:6168498. [PMID: 33709154 DOI: 10.1093/bib/bbab017] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 01/06/2021] [Accepted: 01/12/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Substructure screening is widely applied to evaluate the molecular potency and ADMET properties of compounds in drug discovery pipelines, and it can also be used to interpret QSAR models for the design of new compounds with desirable physicochemical and biological properties. With the continuous accumulation of more experimental data, data-driven computational systems which can derive representative substructures from large chemical libraries attract more attention. Therefore, the development of an integrated and convenient tool to generate and implement representative substructures is urgently needed. RESULTS In this study, PySmash, a user-friendly and powerful tool to generate different types of representative substructures, was developed. The current version of PySmash provides both a Python package and an individual executable program, which achieves ease of operation and pipeline integration. Three types of substructure generation algorithms, including circular, path-based and functional group-based algorithms, are provided. Users can conveniently customize their own requirements for substructure size, accuracy and coverage, statistical significance and parallel computation during execution. Besides, PySmash provides the function for external data screening. CONCLUSION PySmash, a user-friendly and integrated tool for the automatic generation and implementation of representative substructures, is presented. Three screening examples, including toxicophore derivation, privileged motif detection and the integration of substructures with machine learning (ML) models, are provided to illustrate the utility of PySmash in safety profile evaluation, therapeutic activity exploration and molecular optimization, respectively. Its executable program and Python package are available at https://github.com/kotori-y/pySmash.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Department of Pharmacy, Xiangya Hospital, Central South University and the Xiangya School of Pharmaceutical Sciences, Central South University, Sichuan, China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Hunan, China
| | - Yue Zhao
- Xiangya School of Pharmaceutical Sciences, Central South University (Changsha), Sichuan, China
| | - Ming-Zhu Yin
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Hunan
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital, Central South University, Hunan
| | - Shao Liu
- Department of Pharmacy, Xiangya Hospital, Central South University, Hunan
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| |
Collapse
|
34
|
Jiang D, Wu Z, Hsieh CY, Chen G, Liao B, Wang Z, Shen C, Cao D, Wu J, Hou T. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminform 2021; 13:12. [PMID: 33597034 PMCID: PMC7888189 DOI: 10.1186/s13321-020-00479-8] [Citation(s) in RCA: 203] [Impact Index Per Article: 50.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/26/2020] [Indexed: 12/31/2022] Open
Abstract
Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.![]()
Collapse
Affiliation(s)
- Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.,State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058, Zhejiang, China.,College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory Tencent, Shenzhen, 518057, Guangdong, China
| | - Guangyong Chen
- Shenzhen Institutes of Advanced Technology, Shenzhen, 518055, Guangdong, China
| | - Ben Liao
- Tencent Quantum Laboratory Tencent, Shenzhen, 518057, Guangdong, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004, Hunan, China.
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China. .,State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
35
|
Xie L, Xu L, Kong R, Chang S, Xu X. Improvement of Prediction Performance With Conjoint Molecular Fingerprint in Deep Learning. Front Pharmacol 2021; 11:606668. [PMID: 33488387 PMCID: PMC7819282 DOI: 10.3389/fphar.2020.606668] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/23/2020] [Indexed: 12/27/2022] Open
Abstract
The accurate predicting of physical properties and bioactivity of drug molecules in deep learning depends on how molecules are represented. Many types of molecular descriptors have been developed for quantitative structure-activity/property relationships quantitative structure-activity relationships (QSPR). However, each molecular descriptor is optimized for a specific application with encoding preference. Considering that standalone featurization methods may only cover parts of information of the chemical molecules, we proposed to build the conjoint fingerprint by combining two supplementary fingerprints. The impact of conjoint fingerprint and each standalone fingerprint on predicting performance was systematically evaluated in predicting the logarithm of the partition coefficient (logP) and binding affinity of protein-ligand by using machine learning/deep learning (ML/DL) methods, including random forest (RF), support vector regression (SVR), extreme gradient boosting (XGBoost), long short-term memory network (LSTM), and deep neural network (DNN). The results demonstrated that the conjoint fingerprint yielded improved predictive performance, even outperforming the consensus model using two standalone fingerprints among four out of five examined methods. Given that the conjoint fingerprint scheme shows easy extensibility and high applicability, we expect that the proposed conjoint scheme would create new opportunities for continuously improving predictive performance of deep learning by harnessing the complementarity of various types of fingerprints.
Collapse
Affiliation(s)
- Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China.,Jiangsu Sino-Israel Industrial Technology Research Institute, Changzhou, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|
36
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|
37
|
Idakwo G, Thangapandian S, Luttrell J, Li Y, Wang N, Zhou Z, Hong H, Yang B, Zhang C, Gong P. Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets. J Cheminform 2020; 12:66. [PMID: 33372637 PMCID: PMC7592558 DOI: 10.1186/s13321-020-00468-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 10/13/2020] [Indexed: 12/14/2022] Open
Abstract
The specificity of toxicant-target biomolecule interactions lends to the very imbalanced nature of many toxicity datasets, causing poor performance in Structure–Activity Relationship (SAR)-based chemical classification. Undersampling and oversampling are representative techniques for handling such an imbalance challenge. However, removing inactive chemical compound instances from the majority class using an undersampling technique can result in information loss, whereas increasing active toxicant instances in the minority class by interpolation tends to introduce artificial minority instances that often cross into the majority class space, giving rise to class overlapping and a higher false prediction rate. In this study, in order to improve the prediction accuracy of imbalanced learning, we employed SMOTEENN, a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms, to oversample the minority class by creating synthetic samples, followed by cleaning the mislabeled instances. We chose the highly imbalanced Tox21 dataset, which consisted of 12 in vitro bioassays for > 10,000 chemicals that were distributed unevenly between binary classes. With Random Forest (RF) as the base classifier and bagging as the ensemble strategy, we applied four hybrid learning methods, i.e., RF without imbalance handling (RF), RF with Random Undersampling (RUS), RF with SMOTE (SMO), and RF with SMOTEENN (SMN). The performance of the four learning methods was compared using nine evaluation metrics, among which F1 score, Matthews correlation coefficient and Brier score provided a more consistent assessment of the overall performance across the 12 datasets. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that SMN significantly outperformed the other three methods. We also found that a strong negative correlation existed between the prediction accuracy and the imbalance ratio (IR), which is defined as the number of inactive compounds divided by the number of active compounds. SMN became less effective when IR exceeded a certain threshold (e.g., > 28). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. This work demonstrates that the performance of SAR-based, imbalanced chemical toxicity classification can be significantly improved through the use of data rebalancing.
Collapse
Affiliation(s)
- Gabriel Idakwo
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Sundar Thangapandian
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA
| | - Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Yan Li
- Bennett Aerospace Inc, Cary, NC, 27518, USA
| | - Nan Wang
- Department of Computer Science, New Jersey City University, Jersey City, NJ, 07305, USA
| | - Zhaoxian Zhou
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Centre for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Bei Yang
- School of Information & Engineering, Zhengzhou University, Zhengzhou, 450000, China
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA.
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA.
| |
Collapse
|
38
|
Marchwiany ME, Birowska M, Popielski M, Majewski JA, Jastrzębska AM. Surface-Related Features Responsible for Cytotoxic Behavior of MXenes Layered Materials Predicted with Machine Learning Approach. MATERIALS (BASEL, SWITZERLAND) 2020; 13:E3083. [PMID: 32664304 PMCID: PMC7412046 DOI: 10.3390/ma13143083] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 07/06/2020] [Accepted: 07/08/2020] [Indexed: 12/16/2022]
Abstract
To speed up the implementation of the two-dimensional materials in the development of potential biomedical applications, the toxicological aspects toward human health need to be addressed. Due to time-consuming and expensive analysis, only part of the continuously expanding family of 2D materials can be tested in vitro. The machine learning methods can be used-by extracting new insights from available biological data sets, and provide further guidance for experimental studies. This study identifies the most relevant highly surface-specific features that might be responsible for cytotoxic behavior of 2D materials, especially MXenes. In particular, two factors, namely, the presence of transition metal oxides and lithium atoms on the surface, are identified as cytotoxicity-generating features. The developed machine learning model succeeds in predicting toxicity for other 2D MXenes, previously not tested in vitro, and hence, is able to complement the existing knowledge coming from in vitro studies. Thus, we claim that it might be one of the solutions for reducing the number of toxicological studies needed, and allows for minimizing failures in future biological applications.
Collapse
Affiliation(s)
- Maciej E. Marchwiany
- Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw, Pawińskiego 5a, 02-106 Warsaw, Poland;
| | - Magdalena Birowska
- Faculty of Physics, University of Warsaw, Pasteura 5, 00-092 Warsaw, Poland; (M.P.); (J.A.M.)
| | - Mariusz Popielski
- Faculty of Physics, University of Warsaw, Pasteura 5, 00-092 Warsaw, Poland; (M.P.); (J.A.M.)
| | - Jacek A. Majewski
- Faculty of Physics, University of Warsaw, Pasteura 5, 00-092 Warsaw, Poland; (M.P.); (J.A.M.)
| | - Agnieszka M. Jastrzębska
- Faculty of Materials Science and Engineering, Warsaw University of Technology, Wołoska 141, 02-507 Warsaw, Poland;
| |
Collapse
|
39
|
Shi C, Dong F, Zhao G, Zhu N, Lao X, Zheng H. Applications of machine-learning methods for the discovery of NDM-1 inhibitors. Chem Biol Drug Des 2020; 96:1232-1243. [PMID: 32418370 DOI: 10.1111/cbdd.13708] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 04/25/2020] [Accepted: 05/06/2020] [Indexed: 12/11/2022]
Abstract
The emergence of New Delhi metal beta-lactamase (NDM-1)-producing bacteria and their worldwide spread pose great challenges for the treatment of drug-resistant bacterial infections. These bacteria can hydrolyze most β-lactam antibacterials. Unfortunately, there are no clinically useful NDM-1 inhibitors. In the current work, we manually collected NDM-1 inhibitors reported in the past decade and established the first NDM-1 inhibitor database. Four machine-learning models were constructed using the structural and property characteristics of the collected compounds as input training set to discover potential NDM-1 inhibitors. In order to distinguish between high active inhibitors and putative positive drugs, a three-classification strategy was introduced in our study. In detail, the commonly used positive and negative divisions are converted into strongly active, weakly active, and inactive. The accuracy of the best prediction model designed based on this strategy reached 90.5%, compared with 69.14% achieved by the traditional docking-based virtual screening method. Consequently, the best model was used to virtually screen a natural product library. The safety of the selected compounds was analyzed by the ADMET prediction model based on machine learning. Seven novel NDM-1 inhibitors were identified, which will provide valuable clues for the discovery of NDM-1 inhibitors.
Collapse
Affiliation(s)
- Cheng Shi
- School of Life Science and Technology, China Pharmaceutical University, Nanjing, China
| | - Fanyi Dong
- School of Life Science and Technology, China Pharmaceutical University, Nanjing, China
| | - Guiling Zhao
- School of Life Science and Technology, China Pharmaceutical University, Nanjing, China
| | - Ning Zhu
- School of Life Science and Technology, China Pharmaceutical University, Nanjing, China
| | - Xingzhen Lao
- School of Life Science and Technology, China Pharmaceutical University, Nanjing, China
| | - Heng Zheng
- School of Life Science and Technology, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
40
|
Ye WL, Shen C, Xiong GL, Ding JJ, Lu AP, Hou TJ, Cao DS. Improving Docking-Based Virtual Screening Ability by Integrating Multiple Energy Auxiliary Terms from Molecular Docking Scoring. J Chem Inf Model 2020; 60:4216-4230. [PMID: 32352294 DOI: 10.1021/acs.jcim.9b00977] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Virtual Screening (VS) based on molecular docking is an efficient method used for retrieving novel hit compounds in drug discovery. However, the accuracy of the current docking scoring function (SF) is usually insufficient. In this study, in order to improve the screening power of SF, a novel approach named EAT-Score was proposed by directly utilizing the energy auxiliary terms (EAT) provided by molecular docking scoring through eXtreme Gradient Boosting (XGBoost). Here, EAT specifically refers to the output of the Molecular Operating Environment (MOE) scoring, including the energy scores of five different classical SFs and the Protein-Ligand Interaction Fingerprint (PLIF) terms. The performance of EAT-Score to discriminate actives from decoys was strictly validated on the DUD-E diverse subset by using different performance metrics. The results showed that EAT-Score performed much better than classical SFs in VS, with its AUC values exhibiting an improvement of around 0.3. Meanwhile, EAT-Score could achieve comparable even better prediction performance compared with other state-of-the-art VS methods, such as some machine learning (ML)-based SFs and classical SFs implemented in docking programs, in terms of AUC, LogAUC, or BEDROC. Furthermore, the EAT-Score model can capture important binding pattern information from protein-ligand complexes by Shapley additive explanations (SHAP) analysis, which may be very helpful in interpreting the ligand binding mechanism for a certain target and thereby guiding drug design.
Collapse
Affiliation(s)
- Wen-Ling Ye
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Guo-Li Xiong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China.,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| |
Collapse
|
41
|
Yang ZY, Dong J, Yang ZJ, Lu AP, Hou TJ, Cao DS. Structural Analysis and Identification of False Positive Hits in Luciferase-Based Assays. J Chem Inf Model 2020; 60:2031-2043. [PMID: 32202787 DOI: 10.1021/acs.jcim.9b01188] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Luciferase-based bioluminescence detection techniques are highly favored in high-throughput screening (HTS), in which the firefly luciferase (FLuc) is the most commonly used variant. However, FLuc inhibitors can interfere with the activity of luciferase, which may result in false positive signals in HTS assays. In order to reduce the unnecessary cost of time and money, an in silico prediction model for FLuc inhibitors is highly desirable. In this study, we built an extensive data set consisting of 20 888 FLuc inhibitors and 198 608 noninhibitors, and then developed a group of classification models based on the combination of three machine learning (ML) algorithms and four types of molecular representations. The best prediction model based on XGBoost and ECFP4 and MOE2d descriptors yielded a balanced accuracy (BA) of 0.878 and an area under the receiver operating characteristic curve (AUC) value of 0.958 for the validation set, and a BA of 0.886 and an AUC of 0.947 for the test set. Three external validation sets, including set 1 (3231 FLuc inhibitors and 69 783 noninhibitors), set 2 (695 FLuc inhibitors and 75 913 noninhibitors), and set 3 (1138 FLuc inhibitors and 8155 noninhibitors), were used to verify the predictive ability of our models. The BA values for the three external validation sets given by the best model are 0.864, 0.845, and 0.791, respectively. In addition, the important features or structural fragments related to FLuc inhibitors were recognized by the Shapley additive explanations (SHAP) method along with their influences on predictions, which may provide valuable clues to detecting undesirable luciferase inhibitors. Based on the important and explanatory features, 16 rules were proposed for detecting FLuc inhibitors, which can achieve a correction rate of 70% for FLuc inhibitors. Furthermore, a comparison with existing prediction rules and models for FLuc inhibitors used in virtual screening verified the high reliability of the models and rules proposed in this study. We also used the model to screen three curated chemical databases, and almost 10% of the molecules in the evaluated databases were predicted as inhibitors, highlighting the potential risk of false positives in luciferase-based assays. Finally, a public web server called ChemFLuc was developed (http://admet.scbdd.com/chemfluc/index/), and it offers a free available service to predict potential FLuc inhibitors.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China
| | - Jie Dong
- Central South University of Forestry and Technology, Changsha, 410004, P.R. China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P.R. China
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P.R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China.,Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P.R. China
| |
Collapse
|
42
|
Jiang D, Lei T, Wang Z, Shen C, Cao D, Hou T. ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein inhibition through machine learning. J Cheminform 2020; 12:16. [PMID: 33430990 PMCID: PMC7059329 DOI: 10.1186/s13321-020-00421-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 02/20/2020] [Indexed: 12/14/2022] Open
Abstract
Breast cancer resistance protein (BCRP/ABCG2), an ATP-binding cassette (ABC) efflux transporter, plays a critical role in multi-drug resistance (MDR) to anti-cancer drugs and drug–drug interactions. The prediction of BCRP inhibition can facilitate evaluating potential drug resistance and drug–drug interactions in early stage of drug discovery. Here we reported a structurally diverse dataset consisting of 1098 BCRP inhibitors and 1701 non-inhibitors. Analysis of various physicochemical properties illustrates that BCRP inhibitors are more hydrophobic and aromatic than non-inhibitors. We then developed a series of quantitative structure–activity relationship (QSAR) models to discriminate between BCRP inhibitors and non-inhibitors. The optimal feature subset was determined by a wrapper feature selection method named rfSA (simulated annealing algorithm coupled with random forest), and the classification models were established by using seven machine learning approaches based on the optimal feature subset, including a deep learning method, two ensemble learning methods, and four classical machine learning methods. The statistical results demonstrated that three methods, including support vector machine (SVM), deep neural networks (DNN) and extreme gradient boosting (XGBoost), outperformed the others, and the SVM classifier yielded the best predictions (MCC = 0.812 and AUC = 0.958 for the test set). Then, a perturbation-based model-agnostic method was used to interpret our models and analyze the representative features for different models. The application domain analysis demonstrated the prediction reliability of our models. Moreover, the important structural fragments related to BCRP inhibition were identified by the information gain (IG) method along with the frequency analysis. In conclusion, we believe that the classification models developed in this study can be regarded as simple and accurate tools to distinguish BCRP inhibitors from non-inhibitors in drug design and discovery pipelines.![]()
Collapse
Affiliation(s)
- Dejun Jiang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China
| | - Tailong Lei
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004, Hunan, People's Republic of China.
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, People's Republic of China.
| |
Collapse
|
43
|
Fu L, Liu L, Yang ZJ, Li P, Ding JJ, Yun YH, Lu AP, Hou TJ, Cao DS. Systematic Modeling of log D7.4 Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis. J Chem Inf Model 2019; 60:63-76. [DOI: 10.1021/acs.jcim.9b00718] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Lu Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Pan Li
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, P. R. China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, P. R. China
| | - Yong-Huan Yun
- College of Food Science and Engineering, Hainan University, Haikou 570228, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China
| |
Collapse
|
44
|
Ancuceanu R, Tamba B, Stoicescu CS, Dinu M. Use of QSAR Global Models and Molecular Docking for Developing New Inhibitors of c-src Tyrosine Kinase. Int J Mol Sci 2019; 21:ijms21010019. [PMID: 31861445 PMCID: PMC6981969 DOI: 10.3390/ijms21010019] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 12/15/2019] [Accepted: 12/16/2019] [Indexed: 12/11/2022] Open
Abstract
A prototype of a family of at least nine members, cellular Src tyrosine kinase is a therapeutically interesting target because its inhibition might be of interest not only in a number of malignancies, but also in a diverse array of conditions, from neurodegenerative pathologies to certain viral infections. Computational methods in drug discovery are considerably cheaper than conventional methods and offer opportunities of screening very large numbers of compounds in conditions that would be simply impossible within the wet lab experimental settings. We explored the use of global quantitative structure-activity relationship (QSAR) models and molecular ligand docking in the discovery of new c-src tyrosine kinase inhibitors. Using a dataset of 1038 compounds from ChEMBL database, we developed over 350 QSAR classification models. A total of 49 models with reasonably good performance were selected and the models were assembled by stacking with a simple majority vote and used for the virtual screening of over 100,000 compounds. A total of 744 compounds were predicted by at least 50% of the QSAR models as active, 147 compounds were within the applicability domain and predicted by at least 75% of the models to be active. The latter 147 compounds were submitted to molecular ligand docking using AutoDock Vina and LeDock, and 89 were predicted to be active based on the energy of binding.
Collapse
Affiliation(s)
- Robert Ancuceanu
- Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020956 Bucharest, Romania; (R.A.); (M.D.)
| | - Bogdan Tamba
- Advanced Research and Development Center for Experimental Medicine (CEMEX), Grigore T. Popa, University of Medicine and Pharmacy of Iasi, 700115 Iasi, Romania
- Correspondence:
| | - Cristina Silvia Stoicescu
- Department of Chemical Thermodynamics, Institute of Physical Chemistry “Ilie Murgulescu”, 060021 Bucharest, Romania;
| | - Mihaela Dinu
- Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020956 Bucharest, Romania; (R.A.); (M.D.)
| |
Collapse
|
45
|
Wu Z, Lei T, Shen C, Wang Z, Cao D, Hou T. ADMET Evaluation in Drug Discovery. 19. Reliable Prediction of Human Cytochrome P450 Inhibition Using Artificial Intelligence Approaches. J Chem Inf Model 2019; 59:4587-4601. [DOI: 10.1021/acs.jcim.9b00801] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
| | | | | | | | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, P. R. China
| | | |
Collapse
|
46
|
Yang ZY, Yang ZJ, Dong J, Wang LL, Zhang LX, Ding JJ, Ding XQ, Lu AP, Hou TJ, Cao DS. Structural Analysis and Identification of Colloidal Aggregators in Drug Discovery. J Chem Inf Model 2019; 59:3714-3726. [DOI: 10.1021/acs.jcim.9b00541] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Zi-Yi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, People’s Republic of China
| | - Zhi-Jiang Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, People’s Republic of China
| | - Jie Dong
- Central South University of Forestry and Technology, Changsha 410004, People’s Republic of China
| | - Liang-Liang Wang
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, People’s Republic of China
| | - Liu-Xia Zhang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, People’s Republic of China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, People’s Republic of China
| | - Xiao-Qin Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, People’s Republic of China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region, People’s Republic of China
| | - Ting-Jun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, People’s Republic of China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, People’s Republic of China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region, People’s Republic of China
| |
Collapse
|
47
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 369] [Impact Index Per Article: 61.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
48
|
Yang S, Shen Y, Lu W, Yang Y, Wang H, Li L, Wu C, Du G. Evaluation and Identification of the Neuroprotective Compounds of Xiaoxuming Decoction by Machine Learning: A Novel Mode to Explore the Combination Rules in Traditional Chinese Medicine Prescription. BIOMED RESEARCH INTERNATIONAL 2019; 2019:6847685. [PMID: 31360720 PMCID: PMC6652039 DOI: 10.1155/2019/6847685] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Revised: 05/13/2019] [Accepted: 05/26/2019] [Indexed: 12/18/2022]
Abstract
Xiaoxuming decoction (XXMD), a classic traditional Chinese medicine (TCM) prescription, has been used as a therapeutic in the treatment of stroke in clinical practice for over 1200 years. However, the pharmacological mechanisms of XXMD have not yet been elucidated. The purpose of this study was to develop neuroprotective models for identifying neuroprotective compounds in XXMD against hypoxia-induced and H2O2-induced brain cell damage. In this study, a phenotype-based classification method was designed by machine learning to identify neuroprotective compounds and to clarify the compatibility of XXMD components. Four different single classifiers (AB, kNN, CT, and RF) and molecular fingerprint descriptors were used to construct stacked naïve Bayesian models. Among them, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross-validation and test set, respectively. The probability values calculated by four models were then integrated into a stacked Bayesian model. In total, two optimal models, s-NB-1-LPFP6 and s-NB-2-LPFP6, were obtained. The two validated optimal models revealed Matthews correlation coefficients (MCC) of 0.968 and 0.993 for 5-fold cross-validation and of 0.874 and 0.959 for the test set, respectively. Furthermore, the two models were used for virtual screening experiments to identify neuroprotective compounds in XXMD. Ten representative compounds with potential therapeutic effects against the two phenotypes were selected for further cell-based assays. Among the selected compounds, two compounds significantly inhibited H2O2-induced and Na2S2O4-induced neurotoxicity simultaneously. Together, our findings suggested that machine learning algorithms such as combination Bayesian models were feasible to predict neuroprotective compounds and to preliminarily demonstrate the pharmacological mechanisms of TCM.
Collapse
Affiliation(s)
- Shilun Yang
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, No. 103, Wen hua Road, Shenyang 110016, China
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Yanjia Shen
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Wendan Lu
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Yinglin Yang
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Haigang Wang
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Li Li
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| | - Chunfu Wu
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, No. 103, Wen hua Road, Shenyang 110016, China
| | - Guanhua Du
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, No. 103, Wen hua Road, Shenyang 110016, China
- Beijing Key Laboratory of Drug Targets Identification and Drug Screening, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 2, Nan wei Road, Beijing 100050, China
| |
Collapse
|
49
|
Sokolov A, Ashenden S, Sahin N, Lewis R, Erdem N, Ozaltan E, Bender A, Roth FP, Cokol M. Characterizing ABC-Transporter Substrate-Likeness Using a Clean-Slate Genetic Background. Front Pharmacol 2019; 10:448. [PMID: 31105571 PMCID: PMC6494965 DOI: 10.3389/fphar.2019.00448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 04/08/2019] [Indexed: 12/02/2022] Open
Abstract
Mutations in ATP Binding Cassette (ABC)-transporter genes can have major effects on the bioavailability and toxicity of the drugs that are ABC-transporter substrates. Consequently, methods to predict if a drug is an ABC-transporter substrate are useful for drug development. Such methods traditionally relied on literature curated collections of ABC-transporter dependent membrane transfer assays. Here, we used a single large-scale dataset of 376 drugs with relative efficacy on an engineered yeast strain with all ABC-transporter genes deleted (ABC-16), to explore the relationship between a drug’s chemical structure and ABC-transporter substrate-likeness. We represented a drug’s chemical structure by an array of substructure keys and explored several machine learning methods to predict the drug’s efficacy in an ABC-16 yeast strain. Gradient-Boosted Random Forest models outperformed all other methods with an AUC of 0.723. We prospectively validated the model using new experimental data and found significant agreement with predictions. Our analysis expands the previously reported chemical substructures associated with ABC-transporter substrates and provides an alternative means to investigate ABC-transporter substrate-likeness.
Collapse
Affiliation(s)
- Artem Sokolov
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, United States
| | - Stephanie Ashenden
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom.,Discovery Sciences, IMed Biotech Unit, AstraZeneca R&D, Cambridge, United Kingdom
| | - Nil Sahin
- Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul, Turkey.,Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | - Richard Lewis
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Nurdan Erdem
- Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul, Turkey
| | - Elif Ozaltan
- Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul, Turkey
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Department of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON, Canada.,Lunenfeld-Tanenbaum Research Institute, Mt. Sinai Hospital, Canadian Institute for Advanced Research, Toronto, ON, Canada
| | - Murat Cokol
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, United States.,Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul, Turkey.,Donnelly Centre, University of Toronto, Toronto, ON, Canada.,Axcella Health, Cambridge, MA, United States
| |
Collapse
|
50
|
Zhang Y, Wang Y, Zhou W, Fan Y, Zhao J, Zhu L, Lu S, Lu T, Chen Y, Liu H. A combined drug discovery strategy based on machine learning and molecular docking. Chem Biol Drug Des 2019; 93:685-699. [PMID: 30688405 DOI: 10.1111/cbdd.13494] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 01/04/2019] [Accepted: 01/19/2019] [Indexed: 12/14/2022]
Abstract
Data mining methods based on machine learning play an increasingly important role in drug design and discovery. In the current work, eight machine learning methods including decision trees, k-Nearest neighbor, support vector machines, random forests, extremely randomized trees, AdaBoost, gradient boosting trees, and XGBoost were evaluated comprehensively through a case study of ACC inhibitor data sets. Internal and external data sets were employed for cross-validation of the eight machine learning methods. Results showed that the extremely randomized trees model performed best and was adopted as the first step of virtual screening. Together with structure-based virtual screening in the second step, this combined strategy obtained desirable results. This work indicates that the combination of machine learning methods with traditional structure-based virtual screening can effectively strengthen the ability in finding potential hits from large compound database for a given target.
Collapse
Affiliation(s)
- Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yuchen Wang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Weineng Zhou
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yuanrong Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Junnan Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Lu Zhu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Shuai Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China.,State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| |
Collapse
|