1
|
Zhang Q, Mao D, Tu Y, Wu YY. A New Fingerprint and Graph Hybrid Neural Network for Predicting Molecular Properties. J Chem Inf Model 2024; 64:5853-5866. [PMID: 39052623 DOI: 10.1021/acs.jcim.4c00586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Machine learning plays a role in accelerating drug discovery, and the design of effective machine learning models is crucial for accurately predicting molecular properties. Characterizing molecules typically involves the use of molecular fingerprints and molecular graphs. These are input into a multilayer perceptron (MLP) and variants of graph neural networks, such as graph attention networks (GATs). Due to the diverse types and large dimension of fingerprints, models may contain many features that are relatively irrelevant or redundant; meanwhile, although the GAT excels in handling heterogeneous graph tasks, it lacks the ability to extract collaborative information from neighboring nodes, which is crucial in scenarios where it cannot capture the joint influence of adjacent groups on atoms. To overcome these challenges, we introduce a hybrid model, combining improved GAT and MLP. In GAT, the recurrent neural network is employed to capture collaborative information. To address the dimensionality issue, we propose a feature selection algorithm, which is based on the principle of maximizing relevance while minimizing redundancy. Through experiments on 13 public data sets and 14 breast cell lines, our model demonstrates superior performance compared to state-of-the-art deep learning and traditional machine learning algorithms. Additionally, a series of ablation experiments were conducted to demonstrate the advantages of our improved version, as well as its antinoise capability and interpretability. These results indicate that our model holds promising prospects for practical applications.
Collapse
Affiliation(s)
- Qingtian Zhang
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Dangxin Mao
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yusong Tu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yuan-Yan Wu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| |
Collapse
|
2
|
Pérez KL, Jung V, Chen L, Huddleston K, Miranda-Quintana RA. Efficient clustering of large molecular libraries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.10.607459. [PMID: 39149242 PMCID: PMC11326248 DOI: 10.1101/2024.08.10.607459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
The widespread use of Machine Learning (ML) techniques in chemical applications has come with the pressing need to analyze extremely large molecular libraries. In particular, clustering remains one of the most common tools to dissect the chemical space. Unfortunately, most current approaches present unfavorable time and memory scaling, which makes them unsuitable to handle million- and billion-sized sets. Here, we propose to bypass these problems with a time- and memory-efficient clustering algorithm, BitBIRCH. This method uses a tree structure similar to the one found in the Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm to ensure O N time scaling. BitBIRCH leverages the instant similarity (iSIM) formalism to process binary fingerprints, allowing the use of Tanimoto similarity, and reducing memory requirements. Our tests show that BitBIRCH is already > 1,000 times faster than standard implementations of the Taylor-Butina clustering for libraries with 1,500,000 molecules. BitBIRCH increases efficiency without compromising the quality of the resulting clusters. We explore strategies to handle large sets, which we applied in the clustering of one billion molecules under 5 hours using a parallel/iterative BitBIRCH approximation.
Collapse
Affiliation(s)
| | | | - Lexin Chen
- Department of Chemistry & Quantum Theory Project, University of Florida, Gainesville, Florida 32611
| | - Kate Huddleston
- Department of Chemistry & Quantum Theory Project, University of Florida, Gainesville, Florida 32611
| | | |
Collapse
|
3
|
Schossler RT, Ojo S, Jiang Z, Hu J, Yu X. A novel interpretable machine learning model approach for the prediction of TiO 2 photocatalytic degradation of air contaminants. Sci Rep 2024; 14:13070. [PMID: 38844551 PMCID: PMC11156991 DOI: 10.1038/s41598-024-62450-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 05/16/2024] [Indexed: 06/09/2024] Open
Abstract
Air contaminants lead to various environmental and health issues. Titanium dioxide (TiO2) features the benefits of autogenous photocatalytic degradation of air contaminants. To evaluate its performance, laboratory experiments are commonly used to determine the kinetics of the photocatalytic-degradation rate, which is labor intensive, time-consuming, and costly. In this study, Machine Learning (ML) models were developed to predict the photo-degradation rate constants of air-borne organic contaminants with TiO2 nanoparticles and ultraviolet irradiation. The hyperparameters of the ML models were optimized, which included Artificial Neural Network (ANN) with Bayesian optimization, gradient booster regressor (GBR) with Bayesian optimization, Extreme Gradient Boosting (XGBoost) with optimization using Hyperopt, and Catboost combined with Adaboost. The organic contaminant was encoded through Molecular fingerprints (MF). Imputation method was applied to deal with the missing data. A generative ML model Vanilla Gan was utilized to create synthetic data to further augment the size of available dataset and the SHapley Additive exPlanations (SHAP) was employed for ML model interpretability. The results indicated that data imputation allowed for the full utilization of the limited dataset, leading to good machine learning prediction performance and preventing common overfitting problems with small-sized data. Additionally, augmenting experimental data with synthetic data significantly improved prediction accuracy and considerably reduced overfitting issues. The results ranked the feature importance and assessed the impacts of different experimental variables on the rate of photo-degradation, which were consistent with physico-chemical laws.
Collapse
Affiliation(s)
- Rodrigo Teixeira Schossler
- Department of Civil and Environmental Engineering, Case Western Reserve University, Bingham Building-Room 237, Cleveland, OH, 44106, USA
| | - Samuel Ojo
- Department of Civil and Environmental Engineering, Case Western Reserve University, Bingham Building-Room 237, Cleveland, OH, 44106, USA
| | - Zhuoying Jiang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Bingham Building-Room 237, Cleveland, OH, 44106, USA
| | - Jiajie Hu
- Department of Civil and Environmental Engineering, Case Western Reserve University, Bingham Building-Room 237, Cleveland, OH, 44106, USA
| | - Xiong Yu
- Department of Civil and Environmental Engineering, Case Western Reserve University, Bingham Building-Room 237, Cleveland, OH, 44106, USA.
- Department of Electrical Engineering and Computer Science (courtesy appointment), Case Western Reserve University, Bingham Building-Room 237, Cleveland, OH, 44106, USA.
- Department of Mechanical and Aerospace Engineering (Courtesy Appointment), Case Western Reserve University, Bingham Building-Room 237, Cleveland, OH, 44106, USA.
| |
Collapse
|
4
|
Wang Z, Huang S, Yin L, Wan J, Liu C, Liu T, Huang C. Chemodivergence in Fluorine Source-Controlled Cascade Reaction of Aryne Precursors to Synthesize Pyrrolo[3,4- b]indoles and 3-Arylated Maleimides. J Org Chem 2024; 89:5498-5510. [PMID: 38577943 DOI: 10.1021/acs.joc.3c02961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
Reactions allowing chemodivergence prove to be attractive strategies in synthetic organic chemistry. We herein described a highly practical, transition-metal-free, highly regioselective and chemodivergent cascade reaction controlled by fluorine sources, which involved a [3 + 2] cycloaddition or C-arylation process between aryne precursors and 3-aminomaleimides. These two pathways led to a wide scope of structurally diverse pyrrolo[3,4-b]indoles (19 examples) and 3-arylated maleimides (25 examples) in good-to-excellent yields. Furthermore, the reaction could be scaled up, and several synthetic transformations were accomplished for the preparation of functionalized molecules and might provide new opportunities for the discovery of N-heterocyclic drugs.
Collapse
Affiliation(s)
- Zhuoyu Wang
- National and Local Joint Engineering Research Center for Green Preparation Technology of Biobased Materials, School of Chemistry and Environment, Yunnan Minzu University, Kunming 650500, P. R. China
| | - Shuntao Huang
- National and Local Joint Engineering Research Center for Green Preparation Technology of Biobased Materials, School of Chemistry and Environment, Yunnan Minzu University, Kunming 650500, P. R. China
| | - Lu Yin
- National and Local Joint Engineering Research Center for Green Preparation Technology of Biobased Materials, School of Chemistry and Environment, Yunnan Minzu University, Kunming 650500, P. R. China
| | - Juan Wan
- National and Local Joint Engineering Research Center for Green Preparation Technology of Biobased Materials, School of Chemistry and Environment, Yunnan Minzu University, Kunming 650500, P. R. China
| | - Cheng Liu
- School of Chemistry and Chemical Engineering, Jinggangshan University, Ji'An, Jiangxi 343009, P. R. China
| | - Teng Liu
- School of Chemistry and Chemical Engineering, Jinggangshan University, Ji'An, Jiangxi 343009, P. R. China
| | - Chao Huang
- National and Local Joint Engineering Research Center for Green Preparation Technology of Biobased Materials, School of Chemistry and Environment, Yunnan Minzu University, Kunming 650500, P. R. China
| |
Collapse
|
5
|
Bhattacharjee A, Kar S, Ojha PK. First report on chemometrics-driven multilayered lead prioritization in addressing oxysterol-mediated overexpression of G protein-coupled receptor 183. Mol Divers 2024:10.1007/s11030-024-10811-1. [PMID: 38460065 DOI: 10.1007/s11030-024-10811-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 01/12/2024] [Indexed: 03/11/2024]
Abstract
Contemporary research has convincingly demonstrated that upregulation of G protein-coupled receptor 183 (GPR183), orchestrated by its endogenous agonist, 7α,25-dihydroxyxcholesterol (7α,25-OHC), leads to the development of cancer, diabetes, multiple sclerosis, infectious, and inflammatory diseases. A recent study unveiled the cryo-EM structure of 7α,25-OHC bound GPR183 complex, presenting an untapped opportunity for computational exploration of potential GPR183 inhibitors, which served as our inspiration for the current work. A predictive and validated two-dimensional QSAR model using genetic algorithm (GA) and multiple linear regression (MLR) on experimental GPR183 inhibition data was developed. QSAR study highlighted that structural features like dissimilar electronegative atoms, quaternary carbon atoms, and CH2RX fragment (X: heteroatoms) influence positively, while the existence of oxygen atoms with a topological separation of 3, negatively affects GPR183 inhibitory activity. Post assessment of true external set prediction capability, the MLR model was deployed to screen 12,449 DrugBank compounds, followed by a screening pipeline involving molecular docking, druglikeness, ADMET, protein-ligand stability assessment using deep learning algorithm, molecular dynamics, and molecular mechanics. The current findings strongly evidenced DB05790 as a potential lead for prospective interference of oxysterol-mediated GPR183 overexpression, warranting further in vitro and in vivo validation.
Collapse
Affiliation(s)
- Arnab Bhattacharjee
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India
| | - Supratik Kar
- Chemometrics and Molecular Modeling Laboratory, Department of Chemistry and Physics, Kean University, 1000 Morris Avenue, Union, NJ, 07083, USA
| | - Probir Kumar Ojha
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India.
| |
Collapse
|
6
|
Lei L, Zhang L, Han Z, Chen Q, Liao P, Wu D, Tai J, Xie B, Su Y. Advancing chronic toxicity risk assessment in freshwater ecology by molecular characterization-based machine learning. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 342:123093. [PMID: 38072027 DOI: 10.1016/j.envpol.2023.123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/26/2024]
Abstract
The continuously increased production of various chemicals and their release into environments have raised potential negative effects on ecological health. However, traditional labor-intensive assessment methods cannot effectively and rapidly evaluate these hazards, especially for chronic risk. In this study, machine learning (ML) was employed to construct quantitative structure-activity relationship (QSAR) models, enabling the prediction of chronic toxicity to aquatic organisms by leveraging the molecular characteristics of pollutants, namely, the molecular descriptors, fingerprints, and graphs. The limited dataset size hindered the notable advantages of the graph attention network (GAT) model for the molecular graphs. Considering computational efficiency and performance (R2 = 0.78; RMSE = 0.77), XGBoost (XGB) was used for reliable QSAR-ML models predicting chronic toxicity using small- or medium-sized tabular data and the molecular descriptors. Further kernel density estimation analysis confirmed the high accuracy of the model for pollutant concentrations ranging from 10-3 to 102 mg/L, effectively aligning with most environmental scenarios. Model interpretation showed SlogP and exposure duration as the primary influential factors. SlogP, representing the distribution coefficient of a molecule between lipophilic and hydrophilic environments, had a negative effect on the toxicity outcomes. Additionally, the exposure duration played a crucial role in determining the chronic toxicity. Finally, the chronic toxicity data of bisphenol A validated the robustness and reliability of the model established in this research. Our study provided a robust and feasible methodology for chronic ecological risk evaluation of various types of pollutants and could facilitate and increase the use of ML applications in environmental fields.
Collapse
Affiliation(s)
- Lang Lei
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Liangmao Zhang
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Zhibang Han
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Qirui Chen
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Pengcheng Liao
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Dong Wu
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
| | - Jun Tai
- Shanghai Environmental Sanitation Engineering Design Institute Co., Ltd., Shanghai, 200232, China
| | - Bing Xie
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
| | - Yinglong Su
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China.
| |
Collapse
|
7
|
Wu Y, Li K, Li M, Pu X, Guo Y. Attention Mechanism-Based Graph Neural Network Model for Effective Activity Prediction of SARS-CoV-2 Main Protease Inhibitors: Application to Drug Repurposing as Potential COVID-19 Therapy. J Chem Inf Model 2023; 63:7011-7031. [PMID: 37960886 DOI: 10.1021/acs.jcim.3c01280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Compared to de novo drug discovery, drug repurposing provides a time-efficient way to treat coronavirus disease 19 (COVID-19) that is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). SARS-CoV-2 main protease (Mpro) has been proved to be an attractive drug target due to its pivotal involvement in viral replication and transcription. Here, we present a graph neural network-based deep-learning (DL) strategy to prioritize the existing drugs for their potential therapeutic effects against SARS-CoV-2 Mpro. Mpro inhibitors were represented as molecular graphs ready for graph attention network (GAT) and graph isomorphism network (GIN) modeling for predicting the inhibitory activities. The result shows that the GAT model outperforms the GIN and other competitive models and yields satisfactory predictions for unseen Mpro inhibitors, confirming its robustness and generalization. The attention mechanism of GAT enables to capture the dominant substructures and thus to realize the interpretability of the model. Finally, we applied the optimal GAT model in conjunction with molecular docking simulations to screen the Drug Repurposing Hub (DRH) database. As a result, 18 drug hits with best consensus prediction scores and binding affinity values were identified as the potential therapeutics against COVID-19. Both the extensive literature searching and evaluations on adsorption, distribution, metabolism, excretion, and toxicity (ADMET) illustrate the premium drug-likeness and pharmacokinetic properties of the drug candidates. Overall, our work not only provides an effective GAT-based DL prediction tool for inhibitory activity of SARS-CoV-2 Mpro inhibitors but also provides theoretical guidelines for drug discovery in the COVID-19 treatment.
Collapse
Affiliation(s)
- Yanling Wu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Kun Li
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
8
|
Zhong S, Guan X. Count-Based Morgan Fingerprint: A More Efficient and Interpretable Molecular Representation in Developing Machine Learning-Based Predictive Regression Models for Water Contaminants' Activities and Properties. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18193-18202. [PMID: 37406199 DOI: 10.1021/acs.est.3c02198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
In this study, we introduce the count-based Morgan fingerprint (C-MF) to represent chemical structures of contaminants and develop machine learning (ML)-based predictive models for their activities and properties. Compared with the binary Morgan fingerprint (B-MF), C-MF not only qualifies the presence or absence of an atom group but also quantifies its counts in a molecule. We employ six different ML algorithms (ridge regression, SVM, KNN, RF, XGBoost, and CatBoost) to develop models on 10 contaminant-related data sets based on C-MF and B-MF to compare them in terms of the model's predictive performance, interpretation, and applicability domain (AD). Our results show that C-MF outperforms B-MF in nine of 10 data sets in terms of model predictive performance. The advantage of C-MF over B-MF is dependent on the ML algorithm, and the performance enhancements are proportional to the difference in the chemical diversity of data sets calculated by B-MF and C-MF. Model interpretation results show that the C-MF-based model can elucidate the effect of atom group counts on the target and have a wider range of SHAP values. AD analysis shows that C-MF-based models have an AD similar to that of B-MF-based ones. Finally, we developed a "ContaminaNET" platform to deploy these C-MF-based models for free use.
Collapse
Affiliation(s)
- Shifa Zhong
- Department of Environmental Science, School of Ecological and Environmental Sciences, East China Normal University, Shanghai 200241, P. R. China
| | - Xiaohong Guan
- Department of Environmental Science, School of Ecological and Environmental Sciences, East China Normal University, Shanghai 200241, P. R. China
| |
Collapse
|
9
|
Lo S, Seifrid M, Gaudin T, Aspuru-Guzik A. Augmenting Polymer Datasets by Iterative Rearrangement. J Chem Inf Model 2023. [PMID: 37390494 DOI: 10.1021/acs.jcim.3c00144] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2023]
Abstract
One of the biggest obstacles to successful polymer property prediction is an effective representation that accurately captures the sequence of repeat units in a polymer. Motivated by the success of data augmentation in computer vision and natural language processing, we explore augmenting polymer data by iteratively rearranging the molecular representation while preserving the correct connectivity, revealing additional substructural information that is not present in a single representation. We evaluate the effects of this technique on the performance of machine learning models trained on three polymer datasets and compare them to common molecular representations. Data augmentation does not yield significant improvements in machine learning property prediction performance compared to equivalent (non-augmented) representations. In datasets where the target property is primarily influenced by the polymer sequence rather than experimental parameters, this data augmentation technique provides molecular embedding with more information to improve property prediction accuracy.
Collapse
Affiliation(s)
- Stanley Lo
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Martin Seifrid
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4, Canada
- IBM Research Zürich, Rüschlikon, Zürich 8803, Switzerland
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, 200 College St., Toronto, Ontario M5S 3E5, Canada
- Department of Materials Science and Engineering, University of Toronto, 184 College St., Toronto, Ontario M5S 3E4, Canada
- CIFAR Artificial Intelligence Research Chair, Vector Institute, Toronto, Ontario M5S 1M1, Canada
- Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario M5S 1M1, Canada
| |
Collapse
|
10
|
Wu Y, Li M, Shen J, Pu X, Guo Y. A consensual machine-learning-assisted QSAR model for effective bioactivity prediction of xanthine oxidase inhibitors using molecular fingerprints. Mol Divers 2023:10.1007/s11030-023-10649-z. [PMID: 37043162 DOI: 10.1007/s11030-023-10649-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 04/06/2023] [Indexed: 04/13/2023]
Abstract
Xanthine oxidase inhibitors (XOIs) have been widely studied due to the promising potential as safe and effective therapeutics in hyperuricemia and gout. Currently, available XOI molecules have been developed from different experiments but they are with the wide structure diversity and significant varying bioactivities. So it is of great practical significance to present a consensual QSAR model for effective bioactivity prediction of XOIs based on a systematic compiling of these XOIs across different experiments. In this work, 249 XOIs belonging to 16 scaffolds were collected and were integrated into a consensual dataset by introducing the concept of IC50 values relative to allopurinol (RIC50). Here, extended connectivity fingerprints (ECFPs) were employed to represent XOI molecules. By performing effective feature selection by machine-learning method, 54 crucial fingerprints were indicated to be valuable for predicting the inhibitory potency (IP) of XOIs. The optimal predictor yields the promising performance by different cross-validation tests. Besides, an external validation of 43 XOIs and a case study on febuxostat also provide satisfactory results, indicating the powerful generalization of our predictor. Here, the predictor was interpreted by shapely additive explanation (SHAP) method which revealed several important substructures by mapping the featured fingerprints to molecular structures. Then, 15 new molecules were designed and predicted by our predictor to show superior IP than febuxostat. Finally, molecular docking simulation was performed to gain a deep insight into molecular binding mode with xanthine oxidase (XO) enzyme, showing that molecules with selenazole moiety, cyano group and isopropyl group tended to yield higher IP. The absorption, distribution, metabolism, excretion and toxicity (ADMET) prediction results further enhanced the potential of these novel XOIs as drug candidates. Overall, this work presents a QSAR model for accurate prediction of IP of XOIs, and is expected to provide new insights for further structure-guided design of novel XOIs.
Collapse
Affiliation(s)
- Yanling Wu
- College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Jinru Shen
- College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| |
Collapse
|
11
|
Minh Quang N, Tran Thai H, Le Thi H, Duc Cuong N, Hien NQ, Hoang D, Ngoc VTB, Ky Minh V, Van Tat P. Novel Thiosemicarbazone Quantum Dots in the Treatment of Alzheimer's Disease Combining In Silico Models Using Fingerprints and Physicochemical Descriptors. ACS OMEGA 2023; 8:11076-11099. [PMID: 37008140 PMCID: PMC10061515 DOI: 10.1021/acsomega.2c07934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 03/07/2023] [Indexed: 06/19/2023]
Abstract
Searching for thiosemicarbazone derivatives with the potential to inhibit acetylcholinesterase for the treatment of Alzheimer's disease (AD) is an important current goal. The QSARKPLS, QSARANN, and QSARSVR models were constructed using binary fingerprints and physicochemical (PC) descriptors of 129 thiosemicarbazone compounds screened from a database of 3791 derivatives. The R 2 and Q 2 values for the QSARKPLS, QSARANN, and QSARSVR models are greater than 0.925 and 0.713 using dendritic fingerprint (DF) and PC descriptors, respectively. The in vitro pIC50 activities of four new design-oriented compounds N1, N2, N3, and N4, from the QSARKPLS model using DFs, are consistent with the experimental results and those from the QSARANN and QSARSVR models. The designed compounds N1, N2, N3, and N4 do not violate Lipinski-5 and Veber rules using the ADME and BoiLED-Egg methods. The binding energy, kcal mol-1, of the novel compounds to the 1ACJ-PDB protein receptor of the AChE enzyme was also obtained by molecular docking and dynamics simulations consistent with those predicted from the QSARANN and QSARSVR models. New compounds N1, N2, N3, and N4 were synthesized, and the experimental in vitro pIC50 activity was determined in agreement with those obtained from in silico models. The newly synthesized thiosemicarbazones N1, N2, N3, and N4 can inhibit 1ACJ-PDB, which is predicted to be able to cross the barrier. The DFT B3LYP/def-SV(P)-ECP quantization calculation method was used to calculate E HOMO and E LUMO to account for the activities of compounds N1, N2, N3, and N4. The quantum calculation results explained are consistent with those obtained in in silico models. The successful results here may contribute to the search for new drugs for the treatment of AD.
Collapse
Affiliation(s)
- Nguyen Minh Quang
- Faculty
of Chemical Engineering, Industrial University
of Ho Chi Minh City, 12 Nguyen Van Bao, Dist. Go Vap, Ho Chi Minh 700000, Viet Nam
| | - Hoa Tran Thai
- Faculty
of Chemistry, Hue University of Sciences, Hue University, 77 Nguyen Hue, Hue City 530000, Viet Nam
| | - Hoa Le Thi
- Faculty
of Chemistry, Hue University of Sciences, Hue University, 77 Nguyen Hue, Hue City 530000, Viet Nam
| | - Nguyen Duc Cuong
- Faculty
of Chemistry, Hue University of Sciences, Hue University, 77 Nguyen Hue, Hue City 530000, Viet Nam
- School
of Hospitality and Tourism, Hue University, 22 Lam Hoang, Hue City 530000, Viet
Nam
| | - Nguyen Quoc Hien
- Vietnam
Atomic Energy Institute, 59 Ly Thuong Kiet, Dist. Hoan Kiem, Hanoi
City 100000, Viet Nam
| | - DongQuy Hoang
- Faculty
of
Materials Science and Technology, University of Science, Vietnam National University, Ho Chi Minh 700000, Viet Nam
- Vietnam
National University, Ho Chi Minh
City 700000, Viet Nam
| | - Vu Thi Bao Ngoc
- Faculty
of Chemistry and Environment, University
of Dalat, 01 Phu Dong Thien Vuong, Dalat City 660000, Viet Nam
| | - Vo Ky Minh
- Franklin
High School, 6400 Whitelock Pkwy, Elk Grove, California 95757, United States
| | - Pham Van Tat
- Department
of Sciences and Journal Management, Hoa
Sen University, 08 Nguyen Van Trang, Dist. 01, Ho Chi Minh 700000, Viet Nam
| |
Collapse
|
12
|
Li M, Zeng M, Zhang H, Chen H, Guan L. Biological Activity Predictions of Ligands Based on Hybrid Molecular Fingerprinting and Ensemble Learning. ACS OMEGA 2023; 8:5561-5570. [PMID: 36816680 PMCID: PMC9933080 DOI: 10.1021/acsomega.2c06944] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]
Abstract
The biological activity predictions of ligands are an important research direction, which can improve the efficiency and success probability of drug screening. However, the traditional prediction method has the disadvantages of complex modeling and low screening efficiency. Machine learning is considered an important research direction to solve these traditional method problems in the near future. This paper proposes a machine learning model with high predictive accuracy and stable prediction ability, namely, the back propagation neural network cross-support vector regression model (BPCSVR). By comparing multiple molecular descriptors, MACCS fingerprint and ECFP6 fingerprint were selected as inputs, and the stable prediction ability of the model was improved by integrating multiple models and correcting similar samples. We used leave-one-out cross-validation on 3038 samples from six data sets. The coefficient of determination, root mean square error, and absolute error were used as the evaluation parameters. After comparing the multiclass models, the results show that the BPCSVR model has stable prediction ability in different data sets, and the prediction accuracy is higher than other comparison models.
Collapse
|
13
|
Bort W, Mazitov D, Horvath D, Bonachera F, Lin A, Marcou G, Baskin I, Madzhidov T, Varnek A. Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder. J Chem Inf Model 2022; 62:5471-5484. [PMID: 36332178 DOI: 10.1021/acs.jcim.2c01086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Daniyar Mazitov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Fanny Bonachera
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Arkadii Lin
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| | - Igor Baskin
- Department of Material Science and Engineering, Technion─Israel Institute of Technology, 3200003 Haifa, Israel
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008 Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 University of Strasbourg/CNRS, 4 rue Blaise Pascal, 67000 Strasbourg, France
| |
Collapse
|
14
|
Pal R, Patra SG, Chattaraj PK. Quantitative Structure-Toxicity Relationship in Bioactive Molecules from a Conceptual DFT Perspective. Pharmaceuticals (Basel) 2022; 15:1383. [PMID: 36355555 PMCID: PMC9695291 DOI: 10.3390/ph15111383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 11/01/2022] [Accepted: 11/07/2022] [Indexed: 10/29/2023] Open
Abstract
The preclinical drug discovery stage often requires a large amount of costly and time-consuming experiments using huge sets of chemical compounds. In the last few decades, this process has undergone significant improvements by the introduction of quantitative structure-activity relationship (QSAR) modelling that uses a certain percentage of experimental data to predict the biological activity/property of compounds with similar structural skeleton and/or containing a particular functional group(s). The use of machine learning tools along with it has made life even easier for pharmaceutical researchers. Here, we discuss the toxicity of certain sets of bioactive compounds towards Pimephales promelas and Tetrahymena pyriformis in terms of the global conceptual density functional theory (CDFT)-based descriptor, electrophilicity index (ω). We have compared the results with those obtained by using the commonly used hydrophobicity parameter, logP (where P is the n-octanol/water partition coefficient), considering the greater ease of computing the ω descriptor. The Human African trypanosomiasis (HAT) curing activity of 32 pyridyl benzamide derivatives is also studied against Tryphanosoma brucei. In this review article, we summarize these multiple linear regression (MLR)-based QSAR studies in terms of electrophilicity (ω, ω2) and hydrophobicity (logP, (logP)2) parameters.
Collapse
Affiliation(s)
- Ranita Pal
- Advanced Technology Development Centre, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Shanti Gopal Patra
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Pratim Kumar Chattaraj
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| |
Collapse
|
15
|
Muegge I, Hu Y. How do we further enhance 2D fingerprint similarity searching for novel drug discovery? Expert Opin Drug Discov 2022; 17:1173-1176. [PMID: 36150044 DOI: 10.1080/17460441.2022.2128332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
| | - Yuan Hu
- Alkermes, Inc, Waltham, Massachusetts, USA
| |
Collapse
|
16
|
Yang J, Cai Y, Zhao K, Xie H, Chen X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today 2022; 27:103356. [PMID: 36113834 DOI: 10.1016/j.drudis.2022.103356] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 07/28/2022] [Accepted: 09/08/2022] [Indexed: 11/22/2022]
Abstract
Molecular fingerprints are used to represent chemical (structural, physicochemical, etc.) properties of large-scale chemical sets in a low computational cost way. They have a prominent role in transforming chemical data sets into consistent input formats (bit strings or numeric values) suitable for in silico approaches. In this review, we summarize and classify common and state-of-the-art fingerprints into eight different types (dictionary based, circular, topological, pharmacophore, protein-ligand interaction, shape based, reinforced, and multi). We also highlight applications of fingerprints in early drug research and development (R&D). Thus, this review provides a guide for the selection of appropriate fingerprints of compounds (or ligand-protein complexes) for use in drug R&D.
Collapse
Affiliation(s)
- Jingbo Yang
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Yiyang Cai
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Kairui Zhao
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Hongbo Xie
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| | - Xiujie Chen
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| |
Collapse
|
17
|
Devillers J, Sartor V, Devillers H. Predicting mosquito repellents for clothing application from molecular fingerprint-based artificial neural network SAR models. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:729-751. [PMID: 36106833 DOI: 10.1080/1062936x.2022.2124014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 09/06/2022] [Indexed: 06/15/2023]
Abstract
Spraying repellents on clothing limits toxicity and allergy problems that can occur when the repellents are directly applied to skin. This also allows the use of higher doses to ensure longer lasting effects. As the number of repellents available on the market is limited, it is necessary to propose new ones, especially by using in silico methods that reduce costs and time. In this context SAR models were built from a dataset of 2027 chemicals for which repellent activity on clothing was measured against Aedes aegypti. The interest of using either the ECFP or MACCS fingerprints as input neurons of a three-layer perceptron was evaluated. Transformation of MACCS bit strings into disjunctive tables led to interesting results. Models obtained with both types of fingerprints were compared to a model including physicochemical and topological descriptors.
Collapse
Affiliation(s)
| | - V Sartor
- Laboratoire des IMRCP, Université de Toulouse, CNRS UMR 5623, Université Toulouse III - Paul Sabatier, Toulouse, France
| | - H Devillers
- SPO, Univ Montpellier, INRAE, Institut Agro, Montpellier, France
| |
Collapse
|
18
|
Sreenivasan AP, Harrison PJ, Schaal W, Matuszewski DJ, Kultima K, Spjuth O. Predicting protein network topology clusters from chemical structure using deep learning. J Cheminform 2022; 14:47. [PMID: 35841114 PMCID: PMC9284831 DOI: 10.1186/s13321-022-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 06/06/2022] [Indexed: 11/10/2022] Open
Abstract
Comparing chemical structures to infer protein targets and functions is a common approach, but basing comparisons on chemical similarity alone can be misleading. Here we present a methodology for predicting target protein clusters using deep neural networks. The model is trained on clusters of compounds based on similarities calculated from combined compound-protein and protein-protein interaction data using a network topology approach. We compare several deep learning architectures including both convolutional and recurrent neural networks. The best performing method, the recurrent neural network architecture MolPMoFiT, achieved an F1 score approaching 0.9 on a held-out test set of 8907 compounds. In addition, in-depth analysis on a set of eleven well-studied chemical compounds with known functions showed that predictions were justifiable for all but one of the chemicals. Four of the compounds, similar in their molecular structure but with dissimilarities in their function, revealed advantages of our method compared to using chemical similarity.
Collapse
Affiliation(s)
- Akshai P Sreenivasan
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden.,Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Philip J Harrison
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| | - Wesley Schaal
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| | - Damian J Matuszewski
- Centre for Image Analysis, Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Kim Kultima
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden.
| |
Collapse
|
19
|
Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system. Mol Divers 2022; 27:959-985. [PMID: 35819579 DOI: 10.1007/s11030-022-10489-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/21/2022] [Indexed: 12/11/2022]
Abstract
CNS disorders are indications with a very high unmet medical needs, relatively smaller number of available drugs, and a subpar satisfaction level among patients and caregiver. Discovery of CNS drugs is extremely expensive affair with its own unique challenges leading to extremely high attrition rates and low efficiency. With explosion of data in information age, there is hardly any aspect of life that has not been touched by data driven technologies such as artificial intelligence (AI) and machine learning (ML). Drug discovery is no exception, emergence of big data via genomic, proteomic, biological, and chemical technologies has driven pharmaceutical giants to collaborate with AI oriented companies to revolutionise drug discovery, with the goal of increasing the efficiency of the process. In recent years many examples of innovative applications of AI and ML techniques in CNS drug discovery has been reported. Research on therapeutics for diseases such as schizophrenia, Alzheimer's and Parkinsonism has been provided with a new direction and thrust from these developments. AI and ML has been applied to both ligand-based and structure-based drug discovery and design of CNS therapeutics. In this review, we have summarised the general aspects of AI and ML from the perspective of drug discovery followed by a comprehensive coverage of the recent developments in the applications of AI/ML techniques in CNS drug discovery.
Collapse
|
20
|
Karim MB, Kanaya S, Altaf‐Ul‐Amin M. Antibacterial Activity Prediction of Plant Secondary Metabolites Based on a Combined Approach of Graph Clustering and Deep Neural Network. Mol Inform 2022; 41:e2100247. [PMID: 35014190 PMCID: PMC9400908 DOI: 10.1002/minf.202100247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 01/09/2022] [Indexed: 11/20/2022]
Abstract
The plants produce numerous types of secondary metabolites which have pharmacological importance in drug development for different diseases. Computational methods widely use the fingerprints of the metabolites to understand different properties and similarities among metabolites and for the prediction of chemical reactions etc. In this work, we developed three different deep neural network models (DNN) to predict the antibacterial property of plant metabolites. We developed the first DNN model using the fingerprint set of metabolites as features. In the second DNN model, we searched the similarities among fingerprints using correlation and used one representative feature from each group of highly correlated fingerprints. In the third model, the fingerprints of metabolites were used to find structurally similar chemical compound clusters. Form each cluster a representative metabolite is selected and made part of the training dataset. The second model reduced the number of features where the third model achieved better classification results for test data. In both cases, we applied the simple graph clustering method to cluster the corresponding network. The correlation-based DNN model reduced some features while retaining an almost similar performance compared to the first DNN model. The third model improves classification results for test data by capturing wider variance within training data using graph clustering method. This third model is somewhat novel approach and can be applied to build DNN models for other purposes.
Collapse
|
21
|
Devillers J, Sartor V, Doucet JP, Doucet-Panaye A, Devillers H. In silico prediction of mosquito repellents for clothing application. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:239-257. [PMID: 35532305 DOI: 10.1080/1062936x.2022.2062871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 03/30/2022] [Indexed: 06/14/2023]
Abstract
Use of protective clothing is a simple and efficient way to reduce the contacts with mosquitoes and consequently the probability of transmission of diseases spread by them. This mechanical barrier can be enhanced by the application of repellents. Unfortunately the number of available repellents is limited. As a result, there is a crucial need to find new active and safer molecules repelling mosquitoes. In this context, a structure-activity relationship (SAR) model was proposed for the design of repellents active on clothing. It was computed from a dataset of 2027 chemicals for which repellent activity on clothing was measured against Aedes aegypti. Molecules were described by means of 20 molecular descriptors encoding physicochemical properties, topological information and structural features. A three-layer perceptron was used as statistical tool. An accuracy of 87% was obtained for both the training and test sets. Most of the wrong predictions can be explained. Avenues for increasing the performances of the model have been proposed.
Collapse
Affiliation(s)
| | - V Sartor
- Laboratoire des IMRCP, Université de Toulouse, Toulouse, France
| | - J P Doucet
- Université de Paris, ITODYS, CNRS, Paris, France
| | | | - H Devillers
- SPO, Univ Montpellier, INRAE, Institut Agro, Montpellier, France
| |
Collapse
|
22
|
Prediction of second-order rate constants between carbonate radical and organics by deep neural network combined with molecular fingerprints. CHINESE CHEM LETT 2022. [DOI: 10.1016/j.cclet.2021.06.061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
23
|
Hermansyah O, Bustamam A, Yanuar A. Virtual screening of dipeptidyl peptidase-4 inhibitors using quantitative structure-activity relationship-based artificial intelligence and molecular docking of hit compounds. Comput Biol Chem 2021; 95:107597. [PMID: 34800858 DOI: 10.1016/j.compbiolchem.2021.107597] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Revised: 10/25/2021] [Accepted: 10/26/2021] [Indexed: 12/31/2022]
Abstract
Dipeptidyl peptidase-4 (DPP-4) inhibitors are becoming an essential drug in the treatment of type 2 diabetes mellitus; however, some classes of these drugs exert side effects, including joint pain and pancreatitis. Studies suggest that these side effects might be related to secondary inhibition of DPP-8 and DPP-9. In this study, we identified DPP-4-inhibitor hit compounds selective against DPP-8 and DPP-9. We built a virtual screening workflow using a quantitative structure-activity relationship (QSAR) strategy based on artificial intelligence to allow faster screening of millions of molecules for the DPP-4 target relative to other screening methods. Five regression machine learning algorithms and four classification machine learning algorithms were applied to build virtual screening workflows, with the QSAR model applied using support vector regression (R2pred 0.78) and the classification QSAR model using the random forest algorithm with 92.2% accuracy. Virtual screening results of > 10 million molecules obtained 2 716 hits compounds with a pIC50 value of > 7.5. Additionally, molecular docking results of several potential hit compounds for DPP-4, DPP-8, and DPP-9 identified CH0002 as showing high inhibitory potential against DPP-4 and low inhibitory potential for DPP-8 and DPP-9 enzymes. These results demonstrated the effectiveness of this technique for identifying DPP-4-inhibitor hit compounds selective for DPP-4 and against DPP-8 and DPP-9 and suggest its potential efficacy for applications to discover hit compounds of other targets.
Collapse
Affiliation(s)
- Oky Hermansyah
- Laboratory of Biomedical Computation and Drug Design, Faculty of Pharmacy, Universitas Indonesia, Depok 16424, Indonesia
| | - Alhadi Bustamam
- Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Indonesia, Depok 16424, Indonesia
| | - Arry Yanuar
- Laboratory of Biomedical Computation and Drug Design, Faculty of Pharmacy, Universitas Indonesia, Depok 16424, Indonesia.
| |
Collapse
|
24
|
Easy preparation of novel 3,3-dimethyl-3,4-dihydro-2H-1,2,4-benzothiadiazine 1,1-dioxide: Molecular structure, Hirshfeld surface, NCI analyses and molecular docking on AMPA receptors. J Mol Struct 2021. [DOI: 10.1016/j.molstruc.2021.130435] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
25
|
Gao P, Zhang J, Qiu H, Zhao S. A general QSPR protocol for the prediction of atomic/inter-atomic properties: a fragment based graph convolutional neural network (F-GCN). Phys Chem Chem Phys 2021; 23:13242-13249. [PMID: 34086015 DOI: 10.1039/d1cp00677k] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In this study, a general quantitative structure-property relationship (QSPR) protocol, fragment based graph convolutional neural network (F-GCN), was developed for the prediction of atomic/inter-atomic properties. We applied this novel artificial intelligence (AI) tool in predictions of NMR chemical shifts and bond dissociation energies (BDEs). The obtained results were comparable to experimental measurements, while the computational cost was substantially reduced, with respect to pure density functional theory (DFT) calculations. The two important features of F-GCN can be summarised as: first, it could utilise different levels of molecular fragments for atomic/inter-atomic information extraction; second, the designed architecture is also open to include additional descriptors for a more accurate solution of the local environment at atomic level, making itself more efficient for structural solutions. And during our test, the averaged prediction error of 1H NMR chemical shifts is as small as 0.32 ppm, and the error of C-H BDE estimation is 2.7 kcal mol-1. Moreover, we further demonstrated the applicability of this developed F-GCN model via several challenging structural assignments. The success of the F-GCN in atomic and inter-atomic predictions also indicates an essential improvement of computational chemistry with the assistance of AI tools.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, NSW 2500, Australia
| | - Jie Zhang
- Centre of Chemistry and Chemical Biology, Bioland Laboratory (Guangzhou Regenerative Medicine and Health-Guangdong Laboratory), Guangzhou 53000, China. and School of Chemical Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Hongbo Qiu
- Department of Chemical Engineering, Monash University, Clayton, VIC 3800, Australia
| | - Shuaifei Zhao
- Institute for Frontier Materials (IFM), Deakin University, Perth, WA, Australia
| |
Collapse
|
26
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
27
|
Ding Y, Chen M, Guo C, Zhang P, Wang J. Molecular fingerprint-based machine learning assisted QSAR model development for prediction of ionic liquid properties. J Mol Liq 2021. [DOI: 10.1016/j.molliq.2020.115212] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
28
|
Drug design of new 5-HT 6R antagonists aided by artificial neural networks. J Mol Graph Model 2021; 104:107844. [PMID: 33529936 DOI: 10.1016/j.jmgm.2021.107844] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 01/06/2021] [Accepted: 01/08/2021] [Indexed: 11/23/2022]
Abstract
Alzheimer's Disease (AD) is the most frequent illness and cause of death amongst the age related-neurodegenerative disorders. The Alzheimer's Disease International (ADI) reported in 2019 that over 50 million people were living with dementia in the world and this number could potentially be around 152 million by 2050.5-hydroxtryptamine subtype 6 receptor (5-HT6R) has been identified as a potential anti-amnesic drug target and therefore, the administration of 5-HT6R antagonists can likely mitigate the memory loss and intellectual deterioration associated with AD. Herein, computational tools were applied to design new 5-HT6 antagonists and their biological activity values were predicted by our QSAR model obtained from Artificial Neural Networks (ANN). The proposed compounds here from the QSAR-ANN model presented significant biological activity values and some of them have achieved pKi above 9.00. Furthermore, our results suggest that the presence of halogen atoms (especially bromine) linked to the aromatic ring at para-position (HYD) contribute considerably to the increase of the biological activity values while bulky groups in the PI position do not culminate with the increase antagonist activity of compounds here analyzed. Finally, the ADME/Tox profile as well as the synthetic accessibility of new proposed compounds qualify them to go on further with experimental procedures and thenceforward their antagonist effects can be confirmed.
Collapse
|
29
|
Molecular fingerprints based on Jacobi expansions of electron densities. Theor Chem Acc 2021. [DOI: 10.1007/s00214-020-02708-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
30
|
|
31
|
Xie L, Xu L, Kong R, Chang S, Xu X. Improvement of Prediction Performance With Conjoint Molecular Fingerprint in Deep Learning. Front Pharmacol 2021; 11:606668. [PMID: 33488387 PMCID: PMC7819282 DOI: 10.3389/fphar.2020.606668] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/23/2020] [Indexed: 12/27/2022] Open
Abstract
The accurate predicting of physical properties and bioactivity of drug molecules in deep learning depends on how molecules are represented. Many types of molecular descriptors have been developed for quantitative structure-activity/property relationships quantitative structure-activity relationships (QSPR). However, each molecular descriptor is optimized for a specific application with encoding preference. Considering that standalone featurization methods may only cover parts of information of the chemical molecules, we proposed to build the conjoint fingerprint by combining two supplementary fingerprints. The impact of conjoint fingerprint and each standalone fingerprint on predicting performance was systematically evaluated in predicting the logarithm of the partition coefficient (logP) and binding affinity of protein-ligand by using machine learning/deep learning (ML/DL) methods, including random forest (RF), support vector regression (SVR), extreme gradient boosting (XGBoost), long short-term memory network (LSTM), and deep neural network (DNN). The results demonstrated that the conjoint fingerprint yielded improved predictive performance, even outperforming the consensus model using two standalone fingerprints among four out of five examined methods. Given that the conjoint fingerprint scheme shows easy extensibility and high applicability, we expect that the proposed conjoint scheme would create new opportunities for continuously improving predictive performance of deep learning by harnessing the complementarity of various types of fingerprints.
Collapse
Affiliation(s)
- Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China.,Jiangsu Sino-Israel Industrial Technology Research Institute, Changzhou, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| |
Collapse
|
32
|
Amangeldiuly N, Karlov D, Fedorov MV. Baseline Model for Predicting Protein–Ligand Unbinding Kinetics through Machine Learning. J Chem Inf Model 2020; 60:5946-5956. [DOI: 10.1021/acs.jcim.0c00450] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Nurlybek Amangeldiuly
- Center for Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| | - Dmitry Karlov
- Center for Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
| | - Maxim V. Fedorov
- Center for Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Moscow 121205, Russia
- Department of Physics, Scottish Universities Physics Alliance (SUPA), University of Strathclyde, Glasgow G4 0NG, U.K
| |
Collapse
|
33
|
Li G, Bi S. Substituent-controlled C-N coupling involved in Rh(III)-catalyzed oxidative [3+2] annulation of 2-acetyl-1-arylhydrazines with maleimides: A DFT study. J Organomet Chem 2020. [DOI: 10.1016/j.jorganchem.2020.121539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
34
|
Jiang Z, Hu J, Zhang X, Zhao Y, Fan X, Zhong S, Zhang H, Yu X. A generalized predictive model for TiO 2-Catalyzed photo-degradation rate constants of water contaminants through artificial neural network. ENVIRONMENTAL RESEARCH 2020; 187:109697. [PMID: 32474313 DOI: 10.1016/j.envres.2020.109697] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 05/11/2020] [Accepted: 05/16/2020] [Indexed: 06/11/2023]
Abstract
Titanium dioxide (TiO2) is a well-known photocatalyst in the applications of water contaminant treatment. Traditionally, the kinetics of photo-degradation rates are obtained from experiments, which consumes enormous labor and experimental investments. Here, a generalized predictive model was developed for prediction of the photo-degradation rate constants of organic contaminants in the presence of TiO2 nanoparticles and ultraviolet irradiation in aqueous solution. This model combines an artificial neural network (ANN) with a variety of factors that affect the photo-degradation performance, i.e., ultraviolet intensity, TiO2 dosage, organic contaminant type and initial concentration in water, and initial pH of the solution. The molecular fingerprints (MF) were used to interpret the organic contaminants as binary vectors, a format that is machine-readable in computational linguistics. A dataset of 446 data points for training and testing was collected from the literature. This predictive model shows a good accuracy with a root mean square error (RMSE) of 0.173.
Collapse
Affiliation(s)
- Zhuoying Jiang
- Department of Civil and Environmental Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH, 44106, USA
| | - Jiajie Hu
- Departments of Computer and Data Sciences, and Electrical, Computer, and Systems Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH, 44106, USA
| | - Xijin Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH, 44106, USA
| | - Yihang Zhao
- Department of Civil and Environmental Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH, 44106, USA
| | - Xudong Fan
- Department of Civil and Environmental Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH, 44106, USA
| | - Shifa Zhong
- Department of Civil and Environmental Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH, 44106, USA
| | - Huichun Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH, 44106, USA
| | - Xiong Yu
- Department of Civil and Environmental Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH, 44106, USA; Departments of Computer and Data Sciences, and Electrical, Computer, and Systems Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH, 44106, USA.
| |
Collapse
|
35
|
Irwin BWJ, Levell JR, Whitehead TM, Segall MD, Conduit GJ. Practical Applications of Deep Learning To Impute Heterogeneous Drug Discovery Data. J Chem Inf Model 2020; 60:2848-2857. [PMID: 32478517 DOI: 10.1021/acs.jcim.0c00443] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Contemporary deep learning approaches still struggle to bring a useful improvement in the field of drug discovery because of the challenges of sparse, noisy, and heterogeneous data that are typically encountered in this context. We use a state-of-the-art deep learning method, Alchemite, to impute data from drug discovery projects, including multitarget biochemical activities, phenotypic activities in cell-based assays, and a variety of absorption, distribution, metabolism, and excretion (ADME) endpoints. The resulting model gives excellent predictions for activity and ADME endpoints, offering an average increase in R2 of 0.22 versus quantitative structure-activity relationship methods. The model accuracy is robust to combining data across uncorrelated endpoints and projects with different chemical spaces, enabling a single model to be trained for all compounds and endpoints. We demonstrate improvements in accuracy on the latest chemistry and data when updating models with new data as an ongoing medicinal chemistry project progresses.
Collapse
Affiliation(s)
- Benedict W J Irwin
- Optibrium Limited, Cambridge Innovation Park, Denny End Rd, Cambridge CB25 9PB, U.K.,Cavendish Laboratory, University of Cambridge, 19 JJ Thomson Avenue, Cambridge CB3 0HE, U.K
| | - Julian R Levell
- Constellation Pharmaceuticals Inc., 215 First St Suite 200, Cambridge, Massachusetts 02142, United States
| | - Thomas M Whitehead
- Intellegens Limited, Eagle Labs, 28 Chesterton Road, Cambridge CB4 3AZ, U.K
| | - Matthew D Segall
- Optibrium Limited, Cambridge Innovation Park, Denny End Rd, Cambridge CB25 9PB, U.K
| | - Gareth J Conduit
- Intellegens Limited, Eagle Labs, 28 Chesterton Road, Cambridge CB4 3AZ, U.K.,Cavendish Laboratory, University of Cambridge, 19 JJ Thomson Avenue, Cambridge CB3 0HE, U.K
| |
Collapse
|
36
|
Chen JH, Tseng YJ. Different molecular enumeration influences in deep learning: an example using aqueous solubility. Brief Bioinform 2020; 22:5851267. [PMID: 32501508 DOI: 10.1093/bib/bbaa092] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 04/27/2020] [Accepted: 04/27/2020] [Indexed: 12/24/2022] Open
Abstract
Aqueous solubility is the key property driving many chemical and biological phenomena and impacts experimental and computational attempts to assess those phenomena. Accurate prediction of solubility is essential and challenging, even with modern computational algorithms. Fingerprint-based, feature-based and molecular graph-based representations have all been used with different deep learning methods for aqueous solubility prediction. It has been clearly demonstrated that different molecular representations impact the model prediction and explainability. In this work, we reviewed different representations and also focused on using graph and line notations for modeling. In general, one canonical chemical structure is used to represent one molecule when computing its properties. We carefully examined the commonly used simplified molecular-input line-entry specification (SMILES) notation representing a single molecule and proposed to use the full enumerations in SMILES to achieve better accuracy. A convolutional neural network (CNN) was used. The full enumeration of SMILES can improve the presentation of a molecule and describe the molecule with all possible angles. This CNN model can be very robust when dealing with large datasets since no additional explicit chemistry knowledge is necessary to predict the solubility. Also, traditionally it is hard to use a neural network to explain the contribution of chemical substructures to a single property. We demonstrated the use of attention in the decoding network to detect the part of a molecule that is relevant to solubility, which can be used to explain the contribution from the CNN.
Collapse
|
37
|
Choi KE, Balupuri A, Kang NS. The Study on the hERG Blocker Prediction Using Chemical Fingerprint Analysis. Molecules 2020; 25:E2615. [PMID: 32512802 PMCID: PMC7321128 DOI: 10.3390/molecules25112615] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 01/31/2023] Open
Abstract
Human ether-a-go-go-related gene (hERG) potassium channel blockage by small molecules may cause severe cardiac side effects. Thus, it is crucial to screen compounds for activity on the hERG channels early in the drug discovery process. In this study, we collected 5299 hERG inhibitors with diverse chemical structures from a number of sources. Based on this dataset, we evaluated different machine learning (ML) and deep learning (DL) algorithms using various integer and binary type fingerprints. A training set of 3991 compounds was used to develop quantitative structure-activity relationship (QSAR) models. The performance of the developed models was evaluated using a test set of 998 compounds. Models were further validated using external set 1 (263 compounds) and external set 2 (47 compounds). Overall, models with integer type fingerprints showed better performance than models with no fingerprints, converted binary type fingerprints or original binary type fingerprints. Comparison of ML and DL algorithms revealed that integer type fingerprints are suitable for ML, whereas binary type fingerprints are suitable for DL. The outcomes of this study indicate that the rational selection of fingerprints is important for hERG blocker prediction.
Collapse
Affiliation(s)
| | | | - Nam Sook Kang
- Graduate School of New Drug Discovery and Development, Chungnam National University, 99 Daehak-ro, Yuseong-gu, Daejeon 34134, Korea; (K.-E.C.); (A.B.)
| |
Collapse
|
38
|
Sandfort F, Strieth-Kalthoff F, Kühnemund M, Beecks C, Glorius F. A Structure-Based Platform for Predicting Chemical Reactivity. Chem 2020. [DOI: 10.1016/j.chempr.2020.02.017] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
39
|
Terayama K, Sumita M, Tamura R, Payne DT, Chahal MK, Ishihara S, Tsuda K. Pushing property limits in materials discovery via boundless objective-free exploration. Chem Sci 2020; 11:5959-5968. [PMID: 32832058 PMCID: PMC7409358 DOI: 10.1039/d0sc00982b] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 05/04/2020] [Indexed: 01/08/2023] Open
Abstract
Our developed algorithm, BLOX (BoundLess Objective-free eXploration), successfully found “out-of-trend” molecules potentially useful for photofunctional materials from a drug database.
Materials chemists develop chemical compounds to meet often conflicting demands of industrial applications. This process may not be properly modeled by black-box optimization because the target property is not well defined in some cases. Herein, we propose a new algorithm for automated materials discovery called BoundLess Objective-free eXploration (BLOX) that uses a novel criterion based on kernel-based Stein discrepancy in the property space. Unlike other objective-free exploration methods, a boundary for the materials properties is not needed; hence, BLOX is suitable for open-ended scientific endeavors. We demonstrate the effectiveness of BLOX by finding light-absorbing molecules from a drug database. Our goal is to minimize the number of density functional theory calculations required to discover out-of-trend compounds in the intensity–wavelength property space. Using absorption spectroscopy, we experimentally verified that eight compounds identified as outstanding exhibit the expected optical properties. Our results show that BLOX is useful for chemical repurposing, and we expect this search method to have numerous applications in various scientific disciplines.
Collapse
Affiliation(s)
- Kei Terayama
- RIKEN Center for Advanced Intelligence Project , 1-4-1 Nihonbashi, Chuo-ku , Tokyo 103-0027 , Japan . ; .,Medical Sciences Innovation Hub Program , RIKEN Cluster for Science, Technology and Innovation Hub , Tsurumi-ku , Kanagawa 230-0045 , Japan.,Graduate School of Medicine , Kyoto University , Shogoin-Kawaharacho, Sakyo-ku , Kyoto 606-8507 , Japan.,Graduate School of Medical Life Science , Yokohama City University , 1-7-29, Suehiro-cho, Tsurumi-ku , Yokohama 230-0045 , Japan
| | - Masato Sumita
- RIKEN Center for Advanced Intelligence Project , 1-4-1 Nihonbashi, Chuo-ku , Tokyo 103-0027 , Japan . ; .,International Center for Materials Nanoarchitectonics (WPI-MANA) , National Institute for Materials Science , 1-1 Namiki , Tsukuba , Ibaraki 305-0044 , Japan
| | - Ryo Tamura
- International Center for Materials Nanoarchitectonics (WPI-MANA) , National Institute for Materials Science , 1-1 Namiki , Tsukuba , Ibaraki 305-0044 , Japan.,Research and Services Division of Materials Data and Integrated System , National Institute for Materials Science , 1-1 Namiki , Tsukuba , Ibaraki 305-0044 , Japan.,Graduate School of Frontier Sciences , The University of Tokyo , 5-1-5 Kashiwa-no-ha , Kashiwa , Chiba 277-8561 , Japan
| | - Daniel T Payne
- International Center for Young Scientists (ICYS) , National Institute for Materials Science , 1-1 Namiki , Tsukuba , Ibaraki 305-0044 , Japan
| | - Mandeep K Chahal
- International Center for Materials Nanoarchitectonics (WPI-MANA) , National Institute for Materials Science , 1-1 Namiki , Tsukuba , Ibaraki 305-0044 , Japan
| | - Shinsuke Ishihara
- International Center for Materials Nanoarchitectonics (WPI-MANA) , National Institute for Materials Science , 1-1 Namiki , Tsukuba , Ibaraki 305-0044 , Japan
| | - Koji Tsuda
- RIKEN Center for Advanced Intelligence Project , 1-4-1 Nihonbashi, Chuo-ku , Tokyo 103-0027 , Japan . ; .,Research and Services Division of Materials Data and Integrated System , National Institute for Materials Science , 1-1 Namiki , Tsukuba , Ibaraki 305-0044 , Japan.,Graduate School of Frontier Sciences , The University of Tokyo , 5-1-5 Kashiwa-no-ha , Kashiwa , Chiba 277-8561 , Japan
| |
Collapse
|
40
|
Hao M, Bryant SH, Wang Y. Open-source chemogenomic data-driven algorithms for predicting drug-target interactions. Brief Bioinform 2020; 20:1465-1474. [PMID: 29420684 DOI: 10.1093/bib/bby010] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 01/18/2018] [Indexed: 12/25/2022] Open
Abstract
While novel technologies such as high-throughput screening have advanced together with significant investment by pharmaceutical companies during the past decades, the success rate for drug development has not yet been improved prompting researchers looking for new strategies of drug discovery. Drug repositioning is a potential approach to solve this dilemma. However, experimental identification and validation of potential drug targets encoded by the human genome is both costly and time-consuming. Therefore, effective computational approaches have been proposed to facilitate drug repositioning, which have proved to be successful in drug discovery. Doubtlessly, the availability of open-accessible data from basic chemical biology research and the success of human genome sequencing are crucial to develop effective in silico drug repositioning methods allowing the identification of potential targets for existing drugs. In this work, we review several chemogenomic data-driven computational algorithms with source codes publicly accessible for predicting drug-target interactions (DTIs). We organize these algorithms by model properties and model evolutionary relationships. We re-implemented five representative algorithms in R programming language, and compared these algorithms by means of mean percentile ranking, a new recall-based evaluation metric in the DTI prediction research field. We anticipate that this review will be objective and helpful to researchers who would like to further improve existing algorithms or need to choose appropriate algorithms to infer potential DTIs in the projects. The source codes for DTI predictions are available at: https://github.com/minghao2016/chemogenomicAlg4DTIpred.
Collapse
|
41
|
Zhong S, Hu J, Fan X, Yu X, Zhang H. A deep neural network combined with molecular fingerprints (DNN-MF) to develop predictive models for hydroxyl radical rate constants of water contaminants. JOURNAL OF HAZARDOUS MATERIALS 2020; 383:121141. [PMID: 31610411 DOI: 10.1016/j.jhazmat.2019.121141] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 08/29/2019] [Accepted: 09/02/2019] [Indexed: 05/24/2023]
Abstract
This work combined a Deep Neural Network (DNN) with molecular fingerprints (MF) to develop models to predict the OH radical rate constants of 593 organic contaminants. Molecular descriptors, most often used in establishing quantitative structural-activity relationships (QSARs), were not used here because of their complicated generation processes that rely on advanced physicochemical and computational knowledge. Instead, we only fed the most basic information of the contaminant structures, i.e., MF encoding the types of atoms and how they are connected, to DNN and DNN then developed predictive models automatically. Here, a dataset containing 457 contaminants and their OH rate constants was first used to develop predictive models by DNN-MF. The hence developed models showed comparable accuracy to the traditional QSARs. The root mean square error (RMSE) values of the test sets were 0.358-0.384. The length of 2048 bits for the MF and 3 hidden layers (each with 1024 neurons) were found to be the optimal parameters for DNN. The model containing additional 89 micorpollutants in the training set was then successfully applied to predict the OH rate constants of 17 organophosphorus flame retardants and 29 additional micropollutants, with comparable accuracy to the reported molecular descriptors-based QSARs.
Collapse
Affiliation(s)
- Shifa Zhong
- Department of Civil Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH 44106-7201, USA
| | - Jiajie Hu
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH 44106-7201, USA
| | - Xudong Fan
- Department of Civil Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH 44106-7201, USA
| | - Xiong Yu
- Department of Civil Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH 44106-7201, USA; Department of Electrical Engineering and Computer Science, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH 44106-7201, USA
| | - Huichun Zhang
- Department of Civil Engineering, Case Western Reserve University, 2104 Adelbert Road, Cleveland, OH 44106-7201, USA.
| |
Collapse
|
42
|
Jiang J, Wang R, Wang M, Gao K, Nguyen DD, Wei GW. Boosting Tree-Assisted Multitask Deep Learning for Small Scientific Datasets. J Chem Inf Model 2020; 60:1235-1244. [PMID: 31977216 DOI: 10.1021/acs.jcim.9b01184] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Machine learning approaches have had tremendous success in various disciplines. However, such success highly depends on the size and quality of datasets. Scientific datasets are often small and difficult to collect. Currently, improving machine learning performance for small scientific datasets remains a major challenge in many academic fields, such as bioinformatics or medical science. Gradient boosting decision tree (GBDT) is typically optimal for small datasets, while deep learning often performs better for large datasets. This work reports a boosting tree-assisted multitask deep learning (BTAMDL) architecture that integrates GBDT and multitask deep learning (MDL) to achieve near-optimal predictions for small datasets when there exists a large dataset that is well correlated to the small datasets. Two BTAMDL models are constructed, one utilizing purely MDL output as GBDT input while the other admitting additional features in GBDT input. The proposed BTAMDL models are validated on four categories of datasets, including toxicity, partition coefficient, solubility, and solvation. It is found that the proposed BTAMDL models outperform the current state-of-the-art methods in various applications involving small datasets.
Collapse
Affiliation(s)
- Jian Jiang
- Research Center of Nonlinear Science, College of Mathematics and Computer Science, Engineering Research Center of Hubei Province for Clothing Information, Wuhan Textile University, Wuhan, 430200, P R. China.,Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Menglun Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kaifu Gao
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Duc Duy Nguyen
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States.,Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States.,Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
43
|
Jing Y, Hu Z, Fan P, Xue Y, Wang L, Tarter RE, Kirisci L, Wang J, Tarter MV, Xie XQ. Analysis of substance use and its outcomes by machine learning I. Childhood evaluation of liability to substance use disorder. Drug Alcohol Depend 2020; 206:107605. [PMID: 31839402 PMCID: PMC6980708 DOI: 10.1016/j.drugalcdep.2019.107605] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 07/13/2019] [Accepted: 08/23/2019] [Indexed: 12/25/2022]
Abstract
BACKGROUND Substance use disorder (SUD) exacts enormous societal costs in the United States, and it is important to detect high-risk youths for prevention. Machine learning (ML) is the method to find patterns and make prediction from data. We hypothesized that ML identifies the health, psychological, psychiatric, and contextual features to predict SUD, and the identified features predict high-risk individuals to develop SUD. METHOD Male (N = 494) and female (N = 206) participants and their informant parents were administered a battery of questionnaires across five waves of assessment conducted at 10-12, 12-14, 16, 19, and 22 years of age. Characteristics most strongly associated with SUD were identified using the random forest (RF)algorithm from approximately 1000 variables measured at each assessment. Next, the complement of features was validated, and the best models were selected for predicting SUD using seven ML algorithms. Lastly, area under the receiver operating characteristic curve (AUROC) evaluated accuracy of detecting individuals who develop SUD+/- up to thirty years of age. RESULTS Approximately thirty variables strongly predict SUD. The predictors shift from psychological dysregulation and poor health behavior in late childhood to non-normative socialization in mid to late adolescence. In 10-12-year-old youths, the features predict SUD+/- with 74% accuracy, increasing to 86% at 22 years of age. The RF algorithm optimally detects individuals between 10-22 years of age who develop SUD compared to other ML algorithms. CONCLUSION These findings inform the items required for inclusion in instruments to accurately identify high risk youths and young adults requiring SUD prevention.
Collapse
Affiliation(s)
- Yankang Jing
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy; NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, USA, 15213
| | - Ziheng Hu
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy; NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, USA, 15213
| | - Peihao Fan
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy; NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, USA, 15213
| | - Ying Xue
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy; NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, USA, 15213
| | - Lirong Wang
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy; NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, USA, 15213
| | - Ralph E Tarter
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, USA, 15213
| | - Levent Kirisci
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, USA, 15213
| | - Junmei Wang
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA; Department of Pharmaceutical Sciences, School of Pharmacy, NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA.
| | - Michael Vanyukov Tarter
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, USA, 15213.,Corresponding Author: Xiang-Qun Xie; , Junmei Wang; , Michael Vanyukov;
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA; Department of Pharmaceutical Sciences, School of Pharmacy, NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
44
|
Xing S, Guo J, Wang Y, Wang C, Wang K, Zhu B. General and efficient synthesis of 1,2-dihydropyrrolo[3,4- b]indol-3-ones via a formal [3 + 2] cycloaddition initiated by C–H activation. Org Chem Front 2020. [DOI: 10.1039/d0qo00922a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
A [Cp*RhCl2]2-catalyzed formal [3 + 2] cycloaddition involving a sequential coupling reaction initiated by C–H activation and aza-Michael addition has been developed for the general and efficient synthesis of 1,2-dihydropyrrolo[3,4-b]indol-3-ones.
Collapse
Affiliation(s)
- Siyang Xing
- Tianjin Key Laboratory of Structure and Performance for Functional Molecules
- College of Chemistry
- Tianjin Normal University
- Tianjin 300387
- People's Republic of China
| | - Junsuo Guo
- Tianjin Key Laboratory of Structure and Performance for Functional Molecules
- College of Chemistry
- Tianjin Normal University
- Tianjin 300387
- People's Republic of China
| | - Yuhan Wang
- Tianjin Key Laboratory of Structure and Performance for Functional Molecules
- College of Chemistry
- Tianjin Normal University
- Tianjin 300387
- People's Republic of China
| | - Chenyu Wang
- Tianjin Key Laboratory of Structure and Performance for Functional Molecules
- College of Chemistry
- Tianjin Normal University
- Tianjin 300387
- People's Republic of China
| | - Kui Wang
- Tianjin Key Laboratory of Structure and Performance for Functional Molecules
- College of Chemistry
- Tianjin Normal University
- Tianjin 300387
- People's Republic of China
| | - Bolin Zhu
- Tianjin Key Laboratory of Structure and Performance for Functional Molecules
- College of Chemistry
- Tianjin Normal University
- Tianjin 300387
- People's Republic of China
| |
Collapse
|
45
|
Tosstorff A, Menzen T, Winter G. Exploring Chemical Space for New Substances to Stabilize a Therapeutic Monoclonal Antibody. J Pharm Sci 2020; 109:301-307. [DOI: 10.1016/j.xphs.2019.10.057] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 10/22/2019] [Accepted: 10/28/2019] [Indexed: 01/10/2023]
|
46
|
Li H, Zhang S, Feng X, Yu X, Yamamoto Y, Bao M. Rhodium(III)-Catalyzed Oxidative [3 + 2] Annulation of 2-Acetyl-1-arylhydrazines with Maleimides: Synthesis of Pyrrolo[3,4- b]indole-1,3-diones. Org Lett 2019; 21:8563-8567. [PMID: 31617727 DOI: 10.1021/acs.orglett.9b03107] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A rhodium-catalyzed oxidative [3 + 2] annulation of 2-acetyl-1-phenylhydrazines with maleimides was accomplished using AgNTf2 and Ag2CO3 as additive and oxidant, respectively. A variety of 2-acetyl-1-phenylhydrazines with maleimides were converted into pyrrolo[3,4-b]indole-1,3-diones in satisfactory to excellent yields. Synthetically useful functional groups, such as halogen atoms (F, Cl, Br, and I), ester, cyano, and nitro, remained intact during tandem C-H activation and annulation reactions.
Collapse
Affiliation(s)
- He Li
- State Key Laboratory of Fine Chemicals , Dalian University of Technology , Dalian 116023 , China
| | - Sheng Zhang
- State Key Laboratory of Fine Chemicals , Dalian University of Technology , Dalian 116023 , China
| | - Xiujuan Feng
- State Key Laboratory of Fine Chemicals , Dalian University of Technology , Dalian 116023 , China
| | - Xiaoqiang Yu
- State Key Laboratory of Fine Chemicals , Dalian University of Technology , Dalian 116023 , China
| | - Yoshinori Yamamoto
- State Key Laboratory of Fine Chemicals , Dalian University of Technology , Dalian 116023 , China.,Department of Chemistry, Graduate School of Science , Tohoku University , Sendai 980-8578 , Japan.,Research Organization of Science and Technology , Ritsumeikan University , Kusatsu, Shiga 525-8577 , Japan
| | - Ming Bao
- State Key Laboratory of Fine Chemicals , Dalian University of Technology , Dalian 116023 , China
| |
Collapse
|
47
|
Erdas-Cicek O, Atac AO, Gurkan-Alp AS, Buyukbingol E, Alpaslan FN. Three-Dimensional Analysis of Binding Sites for Predicting Binding Affinities in Drug Design. J Chem Inf Model 2019; 59:4654-4662. [PMID: 31596082 DOI: 10.1021/acs.jcim.9b00206] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Understanding the interaction between drug molecules and proteins is one of the main challenges in drug design. Several tools have been developed recently to decrease the complexity of the process. Artificial intelligence and machine learning methods offer promising results in predicting the binding affinities. It becomes possible to do accurate predictions by using the known protein-ligand interactions. In this study, the electrostatic potential values extracted from 3-dimensional grid cubes of the drug-protein binding sites are used for predicting binding affinities of related complexes. A new algorithm with a dynamic feature selection method was implemented, which is derived from Compressed Images For Affinity Prediction (CIFAP) study, to predict binding affinities of Checkpoint Kinase 1 and Caspase 3 inhibitors.
Collapse
Affiliation(s)
- Ozlem Erdas-Cicek
- Department of Computer Engineering, Faculty of Engineering , Alanya Alaaddin Keykubat University , Alanya , 07425 Antalya , Turkey
| | - Ali Osman Atac
- Department of Computer Engineering , Middle East Technical University , Cankaya, 06800 Ankara , Turkey
| | - A Selen Gurkan-Alp
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy , Ankara University , Yenimahalle, 06560 Ankara , Turkey
| | - Erdem Buyukbingol
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy , Ankara University , Yenimahalle, 06560 Ankara , Turkey.,Pharmaceutical Chemistry, Faculty of Pharmacy , Afyonkarahisar Health Sciences University , 03200 Afyonkarahisar , Turkey
| | - Ferda Nur Alpaslan
- Department of Computer Engineering , Middle East Technical University , Cankaya, 06800 Ankara , Turkey
| |
Collapse
|
48
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 346] [Impact Index Per Article: 69.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
49
|
Bian Y, Jing Y, Wang L, Ma S, Jun JJ, Xie XQ. Prediction of Orthosteric and Allosteric Regulations on Cannabinoid Receptors Using Supervised Machine Learning Classifiers. Mol Pharm 2019; 16:2605-2615. [PMID: 31013097 DOI: 10.1021/acs.molpharmaceut.9b00182] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Designing highly selective compounds to protein subtypes and developing allosteric modulators targeting them are critical considerations to both drug discovery and mechanism studies for cannabinoid receptors. It is challenging but in demand to have classifiers to identify active ligands from inactive or random compounds and distinguish allosteric modulators from orthosteric ligands. In this study, supervised machine learning classifiers were built for two subtypes of cannabinoid receptors, CB1 and CB2. Three types of features, including molecular descriptors, MACCS fingerprints, and ECFP6 fingerprints, were calculated to evaluate the compound sets from diverse aspects. Deep neural networks, as well as conventional machine learning algorithms including support vector machine, naïve Bayes, logistic regression, and ensemble learning, were applied. Their performances on the classification with different types of features were compared and discussed. According to the receiver operating characteristic curves and the calculated metrics, the advantages and drawbacks of each algorithm were investigated. The feature ranking was followed to help extract useful knowledge about critical molecular properties, substructural keys, and circular fingerprints. The extracted features will then facilitate the research on cannabinoid receptors by providing guidance on preferred properties for compound modification and novel scaffold design. Besides using conventional molecular docking studies for compound virtual screening, machine-learning-based decision-making models provide alternative options. This study can be of value to the application of machine learning in the area of drug discovery and compound development.
Collapse
|
50
|
Ye Q, Li Q, Gao A, Ying H, Cheng G, Chen J, Che J, Li J, Dong X, Zhou Y. Discovery of novel indoleaminopyrimidine NIK inhibitors based on molecular docking-based support vector regression (SVR) model. Chem Phys Lett 2019. [DOI: 10.1016/j.cplett.2019.01.031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|