1
|
Thakur A, Sharma B, Parashar A, Sharma V, Kumar A, Mehta V. 2D-QSAR, molecular docking and MD simulation based virtual screening of the herbal molecules against Alzheimer's disorder: an approach to predict CNS activity. J Biomol Struct Dyn 2024; 42:148-162. [PMID: 36970779 DOI: 10.1080/07391102.2023.2192805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 03/10/2023] [Indexed: 03/29/2023]
Abstract
Acetylcholinesterase (AChE) is one of the key enzyme targets that have been used clinically for the management of Alzheimer's Disorder (AD). Numerous reports in the literature predict and demonstrate in-vitro, and in-silico anticholinergic activity of herbal molecules, however, majority of them failed to find clinical application. To address these issues, we developed a 2D-QSAR model that could efficiently predict the AChE inhibitory activity of herbal molecules along with predicting their potential to cross the blood-brain-barrier (BBB) to exert their beneficial effects during AD. Virtual screening of the herbal molecules was performed and amentoflavone, asiaticoside, astaxanthin, bahouside, biapigenin, glycyrrhizin, hyperforin, hypericin, and tocopherol were predicted as the most promising herbal molecules for inhibiting AChE. Results were validated through molecular docking, atomistic molecular dynamics simulations and Molecular mechanics-Poisson Boltzmann surface area (MM-PBSA) studies against human AChE (PDB ID: 4EY7). To determine whether or not these molecules can cross BBB to inhibit AChE within the central nervous system (CNS) for being beneficial for the management of AD, we determined a CNS Multi-parameter Optimization (MPO) score, which was found in the range of 1 to 3.76. Overall, the best results were observed for amentoflavone and our results demonstrated a PIC50 value of 7.377 nM, molecular docking score of -11.5 kcal/mol, and CNS MPO score of 3.76. In conclusion, we successfully developed a reliable and efficient 2D-QSAR model and predicted amentoflavone to be the most promising molecule that could inhibit human AChE enzyme within the CNS and could prove beneficial for the management of AD.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Aman Thakur
- DCO, Govt. of Rajasthan, Bharatpur, Rajasthan, India
| | - Bhanu Sharma
- Structural Bioinformatics Lab, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh, India
- Biotechnology Division, CSIR-IHBT, Palampur, Himachal Pradesh, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Arun Parashar
- School of Pharmaceutical Sciences, Shoolini University of Biotechnology and Management Sciences, Solan, Himachal Pradesh, India
| | - Vivek Sharma
- Department of Pharmacology, Govt. College of Pharmacy, Shimla, Himachal Pradesh, India
| | - Ajay Kumar
- Institute of Pharmaceutical Sciences, Kurukshetra University, Kurukshetra, Haryana, India
| | - Vineet Mehta
- Department of Pharmacology, Govt. College of Pharmacy, Shimla, Himachal Pradesh, India
| |
Collapse
|
2
|
Guo W, Liu J, Dong F, Song M, Li Z, Khan MKH, Patterson TA, Hong H. Review of machine learning and deep learning models for toxicity prediction. Exp Biol Med (Maywood) 2023; 248:1952-1973. [PMID: 38057999 PMCID: PMC10798180 DOI: 10.1177/15353702231209421] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023] Open
Abstract
The ever-increasing number of chemicals has raised public concerns due to their adverse effects on human health and the environment. To protect public health and the environment, it is critical to assess the toxicity of these chemicals. Traditional in vitro and in vivo toxicity assays are complicated, costly, and time-consuming and may face ethical issues. These constraints raise the need for alternative methods for assessing the toxicity of chemicals. Recently, due to the advancement of machine learning algorithms and the increase in computational power, many toxicity prediction models have been developed using various machine learning and deep learning algorithms such as support vector machine, random forest, k-nearest neighbors, ensemble learning, and deep neural network. This review summarizes the machine learning- and deep learning-based toxicity prediction models developed in recent years. Support vector machine and random forest are the most popular machine learning algorithms, and hepatotoxicity, cardiotoxicity, and carcinogenicity are the frequently modeled toxicity endpoints in predictive toxicology. It is known that datasets impact model performance. The quality of datasets used in the development of toxicity prediction models using machine learning and deep learning is vital to the performance of the developed models. The different toxicity assignments for the same chemicals among different datasets of the same type of toxicity have been observed, indicating benchmarking datasets is needed for developing reliable toxicity prediction models using machine learning and deep learning algorithms. This review provides insights into current machine learning models in predictive toxicology, which are expected to promote the development and application of toxicity prediction models in the future.
Collapse
Affiliation(s)
- Wenjing Guo
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Jie Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Fan Dong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Meng Song
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Zoe Li
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Md Kamrul Hasan Khan
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Tucker A Patterson
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA
| |
Collapse
|
3
|
Riedl M, Mukherjee S, Gauthier M. Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma. Mol Pharm 2023; 20:4984-4993. [PMID: 37656906 DOI: 10.1021/acs.molpharmaceut.3c00129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
Chemical-specific parameters are either measured in vitro or estimated using quantitative structure-activity relationship (QSAR) models. The existing body of QSAR work relies on extracting a set of descriptors or fingerprints, subset selection, and training a machine learning model. In this work, we used a state-of-the-art natural language processing model, Bidirectional Encoder Representations from Transformers, which allowed us to circumvent the need for calculation of these chemical descriptors. In this approach, simplified molecular-input line-entry system (SMILES) strings were embedded in a high-dimensional space using a two-stage training approach. The model was first pre-trained on a masked SMILES token task and then fine-tuned on a QSAR prediction task. The pre-training task learned meaningful high-dimensional embeddings based upon the relationships between the chemical tokens in the SMILES strings derived from the "in-stock" portion of the ZINC 15 dataset─a large dataset of commercially available chemicals. The fine-tuning task then perturbed the pre-trained embeddings to facilitate prediction of a specific QSAR endpoint of interest. The power of this model stems from the ability to reuse the pre-trained model for multiple different fine-tuning tasks, reducing the computational burden of developing multiple models for different endpoints. We used our framework to develop a predictive model for fraction unbound in human plasma (fu,p). This approach is flexible, requires minimum domain expertise, and can be generalized for other parameters of interest for rapid and accurate estimation of absorption, distribution, metabolism, excretion, and toxicity.
Collapse
|
4
|
Ningthoujam SS, Nath R, Kityania S, Mazumder PB, Dutta Choudhury M, Talukdar AD, Nahar L, Sarker SD. R software for QSAR analysis in phytopharmacological studies. PHYTOCHEMICAL ANALYSIS : PCA 2023; 34:709-728. [PMID: 37392081 DOI: 10.1002/pca.3239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 05/11/2023] [Accepted: 05/11/2023] [Indexed: 07/02/2023]
Abstract
INTRODUCTION In recent decades, quantitative structure-activity relationship (QSAR) analysis has become an important method for drug design and natural product research. With the availability of bioinformatic and cheminformatic tools, a vast number of descriptors have been generated, making it challenging to select potential independent variables that are accurately related to the dependent response variable. OBJECTIVE The objective of this study is to demonstrate various descriptor selection procedures, such as the Boruta approach, all subsets regression, the ANOVA approach, the AIC method, stepwise regression, and genetic algorithm, that can be used in QSAR studies. Additionally, we performed regression diagnostics using R software to test parameters such as normality, linearity, residual histograms, PP plots, multicollinearity, and homoscedasticity. RESULTS The workflow designed in this study highlights the different descriptor selection procedures and regression diagnostics that can be used in QSAR studies. The results showed that the Boruta approach and genetic algorithm performed better than other methods in selecting potential independent variables. The regression diagnostics parameters tested using R software, such as normality, linearity, residual histograms, PP plots, multicollinearity, and homoscedasticity, helped in identifying and diagnosing model errors, ensuring the reliability of the QSAR model. CONCLUSION QSAR analysis is vital in drug design and natural product research. To develop a reliable QSAR model, it is essential to choose suitable descriptors and perform regression diagnostics. This study offers an accessible, customizable approach for researchers to select appropriate descriptors and diagnose errors in QSAR studies.
Collapse
Affiliation(s)
| | - Rajat Nath
- Department of Life Science and Bioinformatics, Assam University, Silchar, Assam, India
| | - Sibashish Kityania
- Department of Life Science and Bioinformatics, Assam University, Silchar, Assam, India
| | | | | | - Anupam Das Talukdar
- Department of Life Science and Bioinformatics, Assam University, Silchar, Assam, India
| | - Lutfun Nahar
- Laboratory of Growth Regulators, Institute of Experimental Botany, The Czech Academy of Sciences and Palacký University, Olomouc, Czech Republic
| | - Satyajit D Sarker
- Centre for Natural Products Discovery (CNPD), School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| |
Collapse
|
5
|
Ghafoor N, Yildiz A. Targeting MDM2-p53 Axis through Drug Repurposing for Cancer Therapy: A Multidisciplinary Approach. ACS OMEGA 2023; 8:34583-34596. [PMID: 37779953 PMCID: PMC10536845 DOI: 10.1021/acsomega.3c03471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 09/01/2023] [Indexed: 10/03/2023]
Abstract
Cancer remains a major cause of morbidity and mortality worldwide, and while current therapies, such as chemotherapy, immunotherapy, and cell therapy, have been effective in many patients, the development of novel therapeutic options remains an urgent priority. Mouse double minute 2 (MDM2) is a key regulator of the tumor suppressor protein p53, which plays a critical role in regulating cellular growth, apoptosis, and DNA repair. Consequently, MDM2 has been the subject of extensive research aimed at developing novel cancer therapies. In this study, we employed a machine learning-based approach to establish a quantitative structure-activity relationship model capable of predicting the potential in vitro efficacy of small molecules as MDM2 inhibitors. Our model was used to screen 5883 FDA-approved drugs, resulting in the identification of promising hits that were subsequently evaluated using molecular docking and molecular dynamics simulations. Two antihistamine drugs, cetirizine (CZ) and rupatadine (RP), exhibited particularly favorable results in the initial in silico analyses. To further assess their potential use as the activators of the p53 pathway, we investigated the antiproliferative capability of the abovementioned drugs on human glioblastoma and neuroblastoma cell lines. Both the compounds exhibited significant antiproliferative effects on the abovementioned cell lines in a dose-dependent manner. The half-maximal inhibitory concentration (IC50) of CZ was found to be 697.87 and 941.37 μM on U87 and SH-SY5Y cell lines, respectively, while the IC50 of RP was found to be 524.28 and 617.07 μM on the same cell lines, respectively. Further investigation by quantitative reverse transcriptase polymerase chain reaction analysis revealed that the CZ-treated cell lines upregulate the expression of the p53-regulated genes involved in cell cycle arrest, apoptosis, and DNA damage response compared to their respective vehicle controls. These findings suggest that CZ activates the p53 pathway by inhibiting MDM2. Our results provide compelling preclinical evidence supporting the potential use of CZ as a modulator of the MDM2-p53 axis and its plausible repurposing for cancer treatment.
Collapse
Affiliation(s)
- Naeem
Abdul Ghafoor
- Department
of Molecular Biology and Genetics, Graduate School of Natural and
Applied Sciences, Mugla Sitki Kocman University, 48000 Mugla, Turkey
| | - Aysegul Yildiz
- Department
of Molecular Biology and Genetics, Graduate School of Natural and
Applied Sciences, Mugla Sitki Kocman University, 48000 Mugla, Turkey
- Department
of Molecular Biology and Genetics, Faculty of Science, Mugla Sitki Kocman University, 48000 Mugla, Turkey
| |
Collapse
|
6
|
Pusparini RT, Krisnadhi AA, Firdayani. MATH: A Deep Learning Approach in QSAR for Estrogen Receptor Alpha Inhibitors. Molecules 2023; 28:5843. [PMID: 37570812 PMCID: PMC10421274 DOI: 10.3390/molecules28155843] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 07/24/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
Breast cancer ranks as the second leading cause of death among women, but early screening and self-awareness can help prevent it. Hormone therapy drugs that target estrogen levels offer potential treatments. However, conventional drug discovery entails extensive, costly processes. This study presents a framework for analyzing the quantitative structure-activity relationship (QSAR) of estrogen receptor alpha inhibitors. Our approach utilizes supervised learning, integrating self-attention Transformer and molecular graph information, to predict estrogen receptor alpha inhibitors. We established five classification models for predicting these inhibitors in breast cancer. Among these models, our proposed MATH model achieved remarkable precision, recall, F1 score, and specificity, with values of 0.952, 0.972, 0.960, and 0.922, respectively, alongside an ROC AUC of 0.977. MATH exhibited robust performance, suggesting its potential to assist pharmaceutical and health researchers in identifying candidate compounds for estrogen alpha inhibitors and guiding drug discovery pathways.
Collapse
Affiliation(s)
- Rizki Triyani Pusparini
- Tokopedia-UI AI Center of Excellence, Faculty of Computer Science, Universitas Indonesia, Depok 16424, Indonesia
- Research Center for Vaccine and Drugs, Research Organization for Health, National Research and Innovation Agency (BRIN), Jakarta 10340, Indonesia;
| | - Adila Alfa Krisnadhi
- Tokopedia-UI AI Center of Excellence, Faculty of Computer Science, Universitas Indonesia, Depok 16424, Indonesia
| | - Firdayani
- Research Center for Vaccine and Drugs, Research Organization for Health, National Research and Innovation Agency (BRIN), Jakarta 10340, Indonesia;
| |
Collapse
|
7
|
Niazi SK, Mariam Z. Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review. Int J Mol Sci 2023; 24:11488. [PMID: 37511247 PMCID: PMC10380192 DOI: 10.3390/ijms241411488] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/30/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
In modern drug discovery, the combination of chemoinformatics and quantitative structure-activity relationship (QSAR) modeling has emerged as a formidable alliance, enabling researchers to harness the vast potential of machine learning (ML) techniques for predictive molecular design and analysis. This review delves into the fundamental aspects of chemoinformatics, elucidating the intricate nature of chemical data and the crucial role of molecular descriptors in unveiling the underlying molecular properties. Molecular descriptors, including 2D fingerprints and topological indices, in conjunction with the structure-activity relationships (SARs), are pivotal in unlocking the pathway to small-molecule drug discovery. Technical intricacies of developing robust ML-QSAR models, including feature selection, model validation, and performance evaluation, are discussed herewith. Various ML algorithms, such as regression analysis and support vector machines, are showcased in the text for their ability to predict and comprehend the relationships between molecular structures and biological activities. This review serves as a comprehensive guide for researchers, providing an understanding of the synergy between chemoinformatics, QSAR, and ML. Due to embracing these cutting-edge technologies, predictive molecular analysis holds promise for expediting the discovery of novel therapeutic agents in the pharmaceutical sciences.
Collapse
Affiliation(s)
- Sarfaraz K Niazi
- College of Pharmacy, University of Illinois, Chicago, IL 61820, USA
| | - Zamara Mariam
- Zamara Mariam, School of Interdisciplinary Engineering & Sciences (SINES), National University of Sciences & Technology (NUST), Islamabad 24090, Pakistan
| |
Collapse
|
8
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
9
|
Chakravarti S. Augmenting Expert Knowledge-Based Toxicity Alerts by Statistically Mined Molecular Fragments. Chem Res Toxicol 2023. [PMID: 37207298 DOI: 10.1021/acs.chemrestox.2c00368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Structural alerts are molecular substructures assumed to be associated with molecular initiating events in various toxic effects and an integral part of in silico toxicology. However, alerts derived using the knowledge of human experts often suffer from a lack of predictivity, specificity, and satisfactory coverage. In this work, we present a method to build hybrid QSAR models by combining expert knowledge-based alerts and statistically mined molecular fragments. Our objective was to find out if the combination is better than the individual systems. Lasso regularization-based variable selection was applied on combined sets of knowledge-based alerts and molecular fragments, but the variable elimination was only allowed to happen on the molecular fragments. We tested the concept on three toxicity end points, i.e., skin sensitization, acute Daphnia toxicity, and Ames mutagenicity, which covered both classification and regression problems. Results showed the predictive performance of such hybrid models is, indeed, better than the models based solely on expert alerts or statistically mined fragments alone. The method also enables the discovery of activating and mitigating/deactivating features for toxicity alerts and the identification of new alerts, thereby reducing false positive and false negative outcomes commonly associated with generic alerts and alerts with poor coverage, respectively.
Collapse
Affiliation(s)
- Suman Chakravarti
- MultiCASE Inc., 23811 Chagrin Blvd, Suite 305, Beachwood, Ohio 44122, United States
| |
Collapse
|
10
|
Sadybekov AV, Katritch V. Computational approaches streamlining drug discovery. Nature 2023; 616:673-685. [PMID: 37100941 DOI: 10.1038/s41586-023-05905-z] [Citation(s) in RCA: 151] [Impact Index Per Article: 151.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 03/01/2023] [Indexed: 04/28/2023]
Abstract
Computer-aided drug discovery has been around for decades, although the past few years have seen a tectonic shift towards embracing computational technologies in both academia and pharma. This shift is largely defined by the flood of data on ligand properties and binding to therapeutic targets and their 3D structures, abundant computing capacities and the advent of on-demand virtual libraries of drug-like small molecules in their billions. Taking full advantage of these resources requires fast computational methods for effective ligand screening. This includes structure-based virtual screening of gigascale chemical spaces, further facilitated by fast iterative screening approaches. Highly synergistic are developments in deep learning predictions of ligand properties and target activities in lieu of receptor structure. Here we review recent advances in ligand discovery technologies, their potential for reshaping the whole process of drug discovery and development, as well as the challenges they encounter. We also discuss how the rapid identification of highly diverse, potent, target-selective and drug-like ligands to protein targets can democratize the drug discovery process, presenting new opportunities for the cost-effective development of safer and more effective small-molecule treatments.
Collapse
Affiliation(s)
- Anastasiia V Sadybekov
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA
| | - Vsevolod Katritch
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
11
|
Joel IY, Sulaimon LA, Idris MO, Adigun TO, Adisa RA, Ademoye TA, Ogunleye MO, Olaniyi TO. Descriptor-free QSAR: effectiveness in screening for putative inhibitors of FGFR1. J Biomol Struct Dyn 2023; 41:2016-2032. [PMID: 35073829 DOI: 10.1080/07391102.2022.2026248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The long short-term memory (LSTM) algorithm has provided solutions to the limitations of the descriptors-utilizing QSAR models in drug design. However, the direct application of LSTM remains scarce. The effectiveness of a descriptor-free QSAR (LSTM-SM) in modeling the FGFR1 inhibitors dataset while comparing with two conventional QSAR using descriptors (126 bits Morgan fingerprint and 2 D descriptors respectively) as a baseline model was investigated in this study. The validated descriptor-free QSAR model was thereafter used to screen for active FGFR1 inhibitors in the ChemDiv database and subjected to molecular docking, induced-fit docking, QM-MM optimization, and molecular dynamics simulations to filter for compounds with high binding affinity and suggest the putative mechanism of inhibition and specificity. The LSTM-SM model performed better than conventional QSAR; having accuracy, specificity, and sensitivity of 0.92, model loss of 0.025, and AUC of 0.95. Fifteen thousand compounds were predicted as actives from the ChemDiv database and four compounds were finally selected. Of the four, two showed putatively effective binding interactions with key active site residues. Molecular dynamics simulations on these compounds in complex with the receptor further give insight into the conformational dynamics of each compound bounded to the receptor. The complexes formed are stable and exhibit a similar degree of compactness. Our findings predicted the advent of self-feature extracting machine learning algorithms of compounds, and have provided the possibility of better predictive model quality that is not necessarily limited by compound descriptors. The putative FGFR1 inhibitors, with their mechanism of inhibition and specificity, were elucidated using this approachCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- I Y Joel
- University of Ilorin Molecular Diagnostic and Research Laboratory, Ilorin, Kwara State, Nigeria
| | - L A Sulaimon
- Department of Biochemistry, Faculty of Basic Medical Sciences, College of Medicine University of Lagos, Idi-araba, Lagos, Nigeria
| | - M O Idris
- School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - T O Adigun
- University of Ilorin Molecular Diagnostic and Research Laboratory, Ilorin, Kwara State, Nigeria
| | - R A Adisa
- Department of Biochemistry, Faculty of Basic Medical Sciences, College of Medicine University of Lagos, Idi-araba, Lagos, Nigeria
| | - T A Ademoye
- Department of Biochemistry, Faculty of Basic Medical Sciences, College of Medicine University of Lagos, Idi-araba, Lagos, Nigeria
| | - M O Ogunleye
- Department of Biochemistry, Faculty of Basic Medical Sciences, College of Medicine University of Lagos, Idi-araba, Lagos, Nigeria
| | - T O Olaniyi
- Department of Science Laboratory Technology, Faculty of Science, Oyo State College of Agriculture and Technology, Igbo-ora, Oyo, Nigeria
| |
Collapse
|
12
|
Zaslavsky J, Bannigan P, Allen C. Re-envisioning the design of nanomedicines: harnessing automation and artificial intelligence. Expert Opin Drug Deliv 2023; 20:241-257. [PMID: 36644850 DOI: 10.1080/17425247.2023.2167978] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
INTRODUCTION Interest in nanomedicines has surged in recent years due to the critical role they have played in the COVID-19 pandemic. Nanoformulations can turn promising therapeutic cargo into viable products through improvements in drug safety and efficacy profiles. However, the developmental pathway for such formulations is non-trivial and largely reliant on trial-and-error. Beyond the costly demands on time and resources, this traditional approach may stunt innovation. The emergence of automation, artificial intelligence (AI) and machine learning (ML) tools, which are currently underutilized in pharmaceutical formulation development, offers a promising direction for an improved path in the design of nanomedicines. AREAS COVERED the potential of harnessing experimental automation and AI/ML to drive innovation in nanomedicine development. The discussion centers on the current challenges in drug formulation research and development, and the major advantages afforded through the application of data-driven methods. EXPERT OPINION The development of integrated workflows based on automated experimentation and AI/ML may accelerate nanomedicine development. A crucial step in achieving this is the generation of high-quality, accessible datasets. Future efforts to make full use of these tools can ultimately contribute to the development of more innovative nanomedicines and improved clinical translation of formulations that rely on advanced drug delivery systems.
Collapse
Affiliation(s)
- Jonathan Zaslavsky
- Leslie Dan Faculty of Pharmacy, University of Toronto, M5S 3M2, Toronto, ON, Canada
| | - Pauric Bannigan
- Leslie Dan Faculty of Pharmacy, University of Toronto, M5S 3M2, Toronto, ON, Canada
| | - Christine Allen
- Leslie Dan Faculty of Pharmacy, University of Toronto, M5S 3M2, Toronto, ON, Canada
| |
Collapse
|
13
|
Pal R, Patra SG, Chattaraj PK. Quantitative Structure-Toxicity Relationship in Bioactive Molecules from a Conceptual DFT Perspective. Pharmaceuticals (Basel) 2022; 15:1383. [PMID: 36355555 PMCID: PMC9695291 DOI: 10.3390/ph15111383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 11/01/2022] [Accepted: 11/07/2022] [Indexed: 10/29/2023] Open
Abstract
The preclinical drug discovery stage often requires a large amount of costly and time-consuming experiments using huge sets of chemical compounds. In the last few decades, this process has undergone significant improvements by the introduction of quantitative structure-activity relationship (QSAR) modelling that uses a certain percentage of experimental data to predict the biological activity/property of compounds with similar structural skeleton and/or containing a particular functional group(s). The use of machine learning tools along with it has made life even easier for pharmaceutical researchers. Here, we discuss the toxicity of certain sets of bioactive compounds towards Pimephales promelas and Tetrahymena pyriformis in terms of the global conceptual density functional theory (CDFT)-based descriptor, electrophilicity index (ω). We have compared the results with those obtained by using the commonly used hydrophobicity parameter, logP (where P is the n-octanol/water partition coefficient), considering the greater ease of computing the ω descriptor. The Human African trypanosomiasis (HAT) curing activity of 32 pyridyl benzamide derivatives is also studied against Tryphanosoma brucei. In this review article, we summarize these multiple linear regression (MLR)-based QSAR studies in terms of electrophilicity (ω, ω2) and hydrophobicity (logP, (logP)2) parameters.
Collapse
Affiliation(s)
- Ranita Pal
- Advanced Technology Development Centre, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Shanti Gopal Patra
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Pratim Kumar Chattaraj
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| |
Collapse
|
14
|
Kumar S, Kumar GS, Maitra SS, Malý P, Bharadwaj S, Sharma P, Dwivedi VD. Viral informatics: bioinformatics-based solution for managing viral infections. Brief Bioinform 2022; 23:6659740. [PMID: 35947964 DOI: 10.1093/bib/bbac326] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 06/26/2022] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open
Abstract
Several new viral infections have emerged in the human population and establishing as global pandemics. With advancements in translation research, the scientific community has developed potential therapeutics to eradicate or control certain viral infections, such as smallpox and polio, responsible for billions of disabilities and deaths in the past. Unfortunately, some viral infections, such as dengue virus (DENV) and human immunodeficiency virus-1 (HIV-1), are still prevailing due to a lack of specific therapeutics, while new pathogenic viral strains or variants are emerging because of high genetic recombination or cross-species transmission. Consequently, to combat the emerging viral infections, bioinformatics-based potential strategies have been developed for viral characterization and developing new effective therapeutics for their eradication or management. This review attempts to provide a single platform for the available wide range of bioinformatics-based approaches, including bioinformatics methods for the identification and management of emerging or evolved viral strains, genome analysis concerning the pathogenicity and epidemiological analysis, computational methods for designing the viral therapeutics, and consolidated information in the form of databases against the known pathogenic viruses. This enriched review of the generally applicable viral informatics approaches aims to provide an overview of available resources capable of carrying out the desired task and may be utilized to expand additional strategies to improve the quality of translation viral informatics research.
Collapse
Affiliation(s)
- Sanjay Kumar
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | - Geethu S Kumar
- Department of Life Science, School of Basic Science and Research, Sharda University, Greater Noida, Uttar Pradesh, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | | | - Petr Malý
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Shiv Bharadwaj
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Pradeep Sharma
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Vivek Dhar Dwivedi
- Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India.,Institute of Advanced Materials, IAAM, 59053 Ulrika, Sweden
| |
Collapse
|
15
|
Kumar V, Lee G, Yoo J, Ro HS, Lee KW. An attention mechanism-based LSTM network for cancer kinase activity prediction. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:631-647. [PMID: 36062308 DOI: 10.1080/1062936x.2022.2109062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 07/30/2022] [Indexed: 06/15/2023]
Abstract
Despite the endeavours and achievements made in treating cancers during the past decades, resistance to available kinase drugs continues to be a major problem in cancer therapies. Thus, it is highly desirable to develop computational models that can predict the bioactivity of a compound against cancer kinases. Here, we present a Long Short-Term Memory (LSTM) framework for predicting the activities of lead molecules against seven different kinases. A total of 14,907 compounds from the ChEMBL database were selected for model building. Two different molecular representations, namely, 2D descriptors and MACCS fingerprints were subjected to the LSTM method for the training process. We also successfully integrated an attention mechanism into our model, which helped us to interpret the contribution of chemical features on kinase activity. The attention mechanism extracted the significant chemical moieties more effectively by taking them into consideration during the activity prediction. The recorded accuracies in the test sets for both 2D descriptors and MACCS fingerprints-based models were 0.81 and 0.78, respectively. The receiver operating characteristic curve (ROC)-area under the curve (AUC) score for both models was in the range of 0.8-0.99. The proposed framework can be a good starting point for the development of new cancer kinase drugs.
Collapse
Affiliation(s)
- V Kumar
- Department of Bio & Medical Big Data (BK21 Four Program), Division of Life Sciences, Research Institute of Life Sciences, Gyeongsang National University, Jinju, Korea
| | - G Lee
- Division of Applied Life Science (BK21 Program), ABC-RLRC, PMBBRC, Gyeongsang National University, Jinju, Korea
| | - J Yoo
- Division of Applied Life Science (BK21 Program), Research Institute of Life Sciences, Gyeongsang National University, Jinju, Korea
| | - H S Ro
- Department of Bio & Medical Big Data (BK21 Four Program), Division of Life Sciences, Research Institute of Life Sciences, Gyeongsang National University, Jinju, Korea
| | - K W Lee
- Department of Bio & Medical Big Data (BK21 Four Program), Division of Life Sciences, Research Institute of Life Sciences, Gyeongsang National University, Jinju, Korea
- ANGEL i-Drug Design (AiDD), Jinju, Korea
| |
Collapse
|
16
|
Abbasi M, Santos BP, Pereira TC, Sofia R, Monteiro NRC, Simões CJV, Brito R, Ribeiro B, Oliveira JL, Arrais JP. Designing optimized drug candidates with Generative Adversarial Network. J Cheminform 2022; 14:40. [PMID: 35754029 PMCID: PMC9233801 DOI: 10.1186/s13321-022-00623-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 06/13/2022] [Indexed: 12/03/2022] Open
Abstract
Drug design is an important area of study for pharmaceutical businesses. However, low efficacy, off-target delivery, time consumption, and high cost are challenges and can create barriers that impact this process. Deep Learning models are emerging as a promising solution to perform de novo drug design, i.e., to generate drug-like molecules tailored to specific needs. However, stereochemistry was not explicitly considered in the generated molecules, which is inevitable in targeted-oriented molecules. This paper proposes a framework based on Feedback Generative Adversarial Network (GAN) that includes optimization strategy by incorporating Encoder-Decoder, GAN, and Predictor deep models interconnected with a feedback loop. The Encoder-Decoder converts the string notations of molecules into latent space vectors, effectively creating a new type of molecular representation. At the same time, the GAN can learn and replicate the training data distribution and, therefore, generate new compounds. The feedback loop is designed to incorporate and evaluate the generated molecules according to the multiobjective desired property at every epoch of training to ensure a steady shift of the generated distribution towards the space of the targeted properties. Moreover, to develop a more precise set of molecules, we also incorporate a multiobjective optimization selection technique based on a non-dominated sorting genetic algorithm. The results demonstrate that the proposed framework can generate realistic, novel molecules that span the chemical space. The proposed Encoder-Decoder model correctly reconstructs 99% of the datasets, including stereochemical information. The model's ability to find uncharted regions of the chemical space was successfully shown by optimizing the unbiased GAN to generate molecules with a high binding affinity to the Kappa Opioid and Adenosine [Formula: see text] receptor. Furthermore, the generated compounds exhibit high internal and external diversity levels 0.88 and 0.94, respectively, and uniqueness.
Collapse
Affiliation(s)
- Maryam Abbasi
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Beatriz P. Santos
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Tiago C. Pereira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Raul Sofia
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - Nelson R. C. Monteiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | | | - Rui Brito
- BSIM Therapeutics, Instituto Pedro Nunes, Coimbra, Portugal
| | - Bernardete Ribeiro
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| | - José L. Oliveira
- IEETA, Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Joel P. Arrais
- Univ Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, Coimbra, Portugal
| |
Collapse
|
17
|
Zięba A, Stępnicki P, Matosiuk D, Kaczor AA. What are the challenges with multi-targeted drug design for complex diseases? Expert Opin Drug Discov 2022; 17:673-683. [PMID: 35549603 DOI: 10.1080/17460441.2022.2072827] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION Current findings on multifactorial diseases with a complex pathomechanism confirm that multi-target drugs are more efficient ways in treating them as opposed to single-target drugs. However, to design multi-target ligands, a number of factors and challenges must be taken into account. AREAS COVERED In this perspective, we summarize the concept of application of multi-target drugs for the treatment of complex diseases such as neurodegenerative diseases, schizophrenia, diabetes, and cancer. We discuss the aspects of target selection for multifunctional ligands and the application of in silico methods in their design and optimization. Furthermore, we highlight other challenges such as balancing affinities to different targets and drug-likeness of obtained compounds. Finally, we present success stories in the design of multi-target ligands for the treatment of common complex diseases. EXPERT OPINION Despite numerous challenges resulting from the design of multi-target ligands, these efforts are worth making. Appropriate target selection, activity balancing, and ligand drug-likeness belong to key aspects in the design of ligands acting on multiple targets. It should be emphasized that in silico methods, in particular inverse docking, pharmacophore modeling, machine learning methods and approaches derived from network pharmacology are valuable tools for the design of multi-target drugs.
Collapse
Affiliation(s)
- Agata Zięba
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modeling Laboratory, Faculty of Pharmacy, Medical University of Lublin, Lublin, Poland
| | - Piotr Stępnicki
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modeling Laboratory, Faculty of Pharmacy, Medical University of Lublin, Lublin, Poland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modeling Laboratory, Faculty of Pharmacy, Medical University of Lublin, Lublin, Poland
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modeling Laboratory, Faculty of Pharmacy, Medical University of Lublin, Lublin, Poland.,School of Pharmacy, University of Eastern Finland, Kuopio, Finland
| |
Collapse
|
18
|
Mengucci C, Ferranti P, Romano A, Masi P, Picone G, Capozzi F. Food structure, function and artificial intelligence. Trends Food Sci Technol 2022. [DOI: 10.1016/j.tifs.2022.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
19
|
Dey V, Machiraju R, Ning X. Improving Compound Activity Classification via Deep Transfer and Representation Learning. ACS OMEGA 2022; 7:9465-9483. [PMID: 35350358 PMCID: PMC8945064 DOI: 10.1021/acsomega.1c06805] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 02/23/2022] [Indexed: 06/14/2023]
Abstract
Recent advances in molecular machine learning, especially deep neural networks such as graph neural networks (GNNs), for predicting structure-activity relationships (SAR) have shown tremendous potential in computer-aided drug discovery. However, the applicability of such deep neural networks is limited by the requirement of large amounts of training data. In order to cope with limited training data for a target task, transfer learning for SAR modeling has been recently adopted to leverage information from data of related tasks. In this work, in contrast to the popular parameter-based transfer learning such as pretraining, we develop novel deep transfer learning methods TAc and TAc-fc to leverage source domain data and transfer useful information to the target domain. TAc learns to generate effective molecular features that can generalize well from one domain to another and increase the classification performance in the target domain. Additionally, TAc-fc extends TAc by incorporating novel components to selectively learn feature-wise and compound-wise transferability. We used the bioassay screening data from PubChem and identified 120 pairs of bioassays such that the active compounds in each pair are more similar to each other compared to their inactive compounds. Overall, TAc achieves the best performance with an average ROC-AUC of 0.801; it significantly improves the ROC-AUC of 83% of target tasks with an average task-wise performance improvement of 7.102%, compared to the best baseline dmpna. Our experiments clearly demonstrate that TAc achieves significant improvement over all baselines across a large number of target tasks. Furthermore, although TAc-fc achieves slightly worse ROC-AUC on average compared to TAc (0.798 vs 0.801), TAc-fc still achieves the best performance on more tasks in terms of PR-AUC and F1 compared to other methods. In summary, TAc-fc is also found to be a strong model with competitive or even better performance than TAc on a notable number of target tasks.
Collapse
Affiliation(s)
- Vishal Dey
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
| | - Raghu Machiraju
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United
States
| | - Xia Ning
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United
States
| |
Collapse
|
20
|
Jiménez-Luna J, Skalic M, Weskamp N. Benchmarking Molecular Feature Attribution Methods with Activity Cliffs. J Chem Inf Model 2022; 62:274-283. [PMID: 35019265 DOI: 10.1021/acs.jcim.1c01163] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Feature attribution techniques are popular choices within the explainable artificial intelligence toolbox, as they can help elucidate which parts of the provided inputs used by an underlying supervised-learning method are considered relevant for a specific prediction. In the context of molecular design, these approaches typically involve the coloring of molecular graphs, whose presentation to medicinal chemists can be useful for making a decision of which compounds to synthesize or prioritize. The consistency of the highlighted moieties alongside expert background knowledge is expected to contribute to the understanding of machine-learning models in drug design. Quantitative evaluation of such coloring approaches, however, has so far been limited to substructure identification tasks. We here present an approach that is based on maximum common substructure algorithms applied to experimentally-determined activity cliffs. Using the proposed benchmark, we found that molecule coloring approaches in conjunction with classical machine-learning models tend to outperform more modern, graph-neural-network alternatives. The provided benchmark data are fully open sourced, which we hope will facilitate the testing of newly developed molecular feature attribution techniques.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093 Zurich, Switzerland.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Miha Skalic
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Nils Weskamp
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| |
Collapse
|
21
|
Abstract
Quantitative structure-activity relationship (QSAR) models are routinely applied computational tools in the drug discovery process. QSAR models are regression or classification models that predict the biological activities of molecules based on the features derived from their molecular structures. These models are usually used to prioritize a list of candidate molecules for future laboratory experiments and to help chemists gain better insights into how structural changes affect a molecule's biological activities. Developing accurate and interpretable QSAR models is therefore of the utmost importance in the drug discovery process. Deep neural networks, which are powerful supervised learning algorithms, have shown great promise for addressing regression and classification problems in various research fields, including the pharmaceutical industry. In this chapter, we briefly review the applications of deep neural networks in QSAR modeling and describe commonly used techniques to improve model performance.
Collapse
|
22
|
Gini G. QSAR Methods. Methods Mol Biol 2022; 2425:1-26. [PMID: 35188626 DOI: 10.1007/978-1-0716-1960-5_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This chapter introduces the basis of computational chemistry and discusses how computational methods have been extended from physical to biological properties, and toxicology in particular, modeling. Since about three decades, chemical experimentation is more and more replaced by modeling and virtual experimentation, using a large core of mathematics, chemistry, physics, and algorithms. Animal and wet experiments, aimed at providing a standardized result about a biological property, can be mimicked by modeling methods, globally called in silico methods, all characterized by deducing properties starting from the chemical structures. Two main streams of such models are available: models that consider the whole molecular structure to predict a value, namely QSAR (quantitative structure-activity relationships), and models that check relevant substructures to predict a class, namely SAR. The term in silico discovery is applied to chemical design, to computational toxicology, and to drug discovery. Virtual experiments confirm hypotheses, provide data for regulation, and help in designing new chemicals.
Collapse
|
23
|
Abstract
Predictive and computational toxicology, a highly scientific and research-based field, is rapidly progressing with wider acceptance by regulatory agencies around the world. Almost every aspect of the field has seen fundamental changes during the last decade due to the availability of more data, usage, and acceptance of a variety of predictive tools and an increase in the overall awareness. Also, the influence from the recent explosive developments in the field of artificial intelligence has been significant. However, the need for sophisticated, easy to use and well-maintained software platforms for in silico toxicological assessments remains very high. The MultiCASE suite of software is one such platform that consists of an integrated collection of software programs, tools, and databases. While providing easy-to-use and highly useful tools that are relevant at present, it has always remained at the forefront of research and development by inventing new technologies and discovering new insights in the area of QSAR, artificial intelligence, and machine learning. This chapter gives the background, an overview of the software and databases involved, and a brief description of the usage methodology with the aid of examples.
Collapse
|
24
|
Gallego V, Naveiro R, Roca C, Ríos Insua D, Campillo NE. AI in drug development: a multidisciplinary perspective. Mol Divers 2021; 25:1461-1479. [PMID: 34251580 PMCID: PMC8342381 DOI: 10.1007/s11030-021-10266-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/29/2021] [Indexed: 01/09/2023]
Abstract
The introduction of a new drug to the commercial market follows a complex and long process that typically spans over several years and entails large monetary costs due to a high attrition rate. Because of this, there is an urgent need to improve this process using innovative technologies such as artificial intelligence (AI). Different AI tools are being applied to support all four steps of the drug development process (basic research for drug discovery; pre-clinical phase; clinical phase; and postmarketing). Some of the main tasks where AI has proven useful include identifying molecular targets, searching for hit and lead compounds, synthesising drug-like compounds and predicting ADME-Tox. This review, on the one hand, brings in a mathematical vision of some of the key AI methods used in drug development closer to medicinal chemists and, on the other hand, brings the drug development process and the use of different models closer to mathematicians. Emphasis is placed on two aspects not mentioned in similar surveys, namely, Bayesian approaches and their applications to molecular modelling and the eventual final use of the methods to actually support decisions. Promoting a perfect synergy.
Collapse
Affiliation(s)
- Víctor Gallego
- Institute of Mathematical Sciences (ICMAT-CSIC), Nicolás Cabrera 13-15, 28049, Madrid, Spain
| | - Roi Naveiro
- Institute of Mathematical Sciences (ICMAT-CSIC), Nicolás Cabrera 13-15, 28049, Madrid, Spain
| | - Carlos Roca
- AItenea Biotech S.L. Parque Científico de Madrid, Faraday, 7, 28049, Madrid, Spain
| | - David Ríos Insua
- ICMAT-CSIC and Dept. of Statistics and OR, U. Compl. Madrid, Madrid, Spain
| | - Nuria E Campillo
- CIB-Margarita Salas (CSIC), Ramiro de Maeztu, 9, 28040, Madrid, Spain.
| |
Collapse
|
25
|
Pereira T, Abbasi M, Oliveira JL, Ribeiro B, Arrais J. Optimizing blood-brain barrier permeation through deep reinforcement learning for de novo drug design. Bioinformatics 2021; 37:i84-i92. [PMID: 34252946 PMCID: PMC8336597 DOI: 10.1093/bioinformatics/btab301] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION The process of placing new drugs into the market is time-consuming, expensive and complex. The application of computational methods for designing molecules with bespoke properties can contribute to saving resources throughout this process. However, the fundamental properties to be optimized are often not considered or conflicting with each other. In this work, we propose a novel approach to consider both the biological property and the bioavailability of compounds through a deep reinforcement learning framework for the targeted generation of compounds. We aim to obtain a promising set of selective compounds for the adenosine A2A receptor and, simultaneously, that have the necessary properties in terms of solubility and permeability across the blood-brain barrier to reach the site of action. The cornerstone of the framework is based on a recurrent neural network architecture, the Generator. It seeks to learn the building rules of valid molecules to sample new compounds further. Also, two Predictors are trained to estimate the properties of interest of the new molecules. Finally, the fine-tuning of the Generator was performed with reinforcement learning, integrated with multi-objective optimization and exploratory techniques to ensure that the Generator is adequately biased. RESULTS The biased Generator can generate an interesting set of molecules, with approximately 85% having the two fundamental properties biased as desired. Thus, this approach has transformed a general molecule generator into a model focused on optimizing specific objectives. Furthermore, the molecules' synthesizability and drug-likeness demonstrate the potential applicability of the de novo drug design in medicinal chemistry. AVAILABILITY AND IMPLEMENTATION All code is publicly available in the https://github.com/larngroup/De-Novo-Drug-Design. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tiago Pereira
- CSUC/DEI, University of Coimbra, Coimbra 3030-290, Portugal.,IEETA/DETI, University of Aveiro, Aveiro 3810-193, Portugal
| | - Maryam Abbasi
- CSUC/DEI, University of Coimbra, Coimbra 3030-290, Portugal
| | | | | | - Joel Arrais
- CSUC/DEI, University of Coimbra, Coimbra 3030-290, Portugal
| |
Collapse
|
26
|
Hung C, Gini G. QSAR modeling without descriptors using graph convolutional neural networks: the case of mutagenicity prediction. Mol Divers 2021; 25:1283-1299. [PMID: 34146224 DOI: 10.1007/s11030-021-10250-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Accepted: 06/08/2021] [Indexed: 11/30/2022]
Abstract
Deep neural networks are effective in learning directly from low-level encoded data without the need of feature extraction. This paper shows how QSAR models can be constructed from 2D molecular graphs without computing chemical descriptors. Two graph convolutional neural network-based models are presented with and without a Bayesian estimation of the prediction uncertainty. The property under investigation is mutagenicity: Models developed here predict the output of the Ames test. These models take the SMILES representation of the molecules as input to produce molecular graphs in terms of adjacency matrices and subsequently use attention mechanisms to weight the role of their subgraphs in producing the output. The results positively compare with current state-of-the-art models. Furthermore, our proposed model interpretation can be enhanced by the automatic extraction of the substructures most important in driving the prediction, as well as by uncertainty estimations.
Collapse
|
27
|
Artificial intelligence in drug design: algorithms, applications, challenges and ethics. FUTURE DRUG DISCOVERY 2021. [DOI: 10.4155/fdd-2020-0028] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The discovery paradigm of drugs is rapidly growing due to advances in machine learning (ML) and artificial intelligence (AI). This review covers myriad faces of AI and ML in drug design. There is a plethora of AI algorithms, the most common of which are summarized in this review. In addition, AI is fraught with challenges that are highlighted along with plausible solutions to them. Examples are provided to illustrate the use of AI and ML in drug discovery and in predicting drug properties such as binding affinities and interactions, solubility, toxicology, blood–brain barrier permeability and chemical properties. The review also includes examples depicting the implementation of AI and ML in tackling intractable diseases such as COVID-19, cancer and Alzheimer’s disease. Ethical considerations and future perspectives of AI are also covered in this review.
Collapse
|
28
|
Abbasi K, Razzaghi P, Poso A, Ghanbari-Ara S, Masoudi-Nejad A. Deep Learning in Drug Target Interaction Prediction: Current and Future Perspectives. Curr Med Chem 2021; 28:2100-2113. [PMID: 32895036 DOI: 10.2174/0929867327666200907141016] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 07/30/2020] [Accepted: 07/30/2020] [Indexed: 11/22/2022]
Abstract
Drug-target Interactions (DTIs) prediction plays a central role in drug discovery. Computational methods in DTIs prediction have gained more attention because carrying out in vitro and in vivo experiments on a large scale is costly and time-consuming. Machine learning methods, especially deep learning, are widely applied to DTIs prediction. In this study, the main goal is to provide a comprehensive overview of deep learning-based DTIs prediction approaches. Here, we investigate the existing approaches from multiple perspectives. We explore these approaches to find out which deep network architectures are utilized to extract features from drug compound and protein sequences. Also, the advantages and limitations of each architecture are analyzed and compared. Moreover, we explore the process of how to combine descriptors for drug and protein features. Likewise, a list of datasets that are commonly used in DTIs prediction is investigated. Finally, current challenges are discussed and a short future outlook of deep learning in DTI prediction is given.
Collapse
Affiliation(s)
- Karim Abbasi
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| | - Antti Poso
- School of Pharmacy, Faculty of Health Sciences, University of Eastern Finland, Kuopio 80100, Finland
| | - Saber Ghanbari-Ara
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran
| |
Collapse
|
29
|
Nayarisseri A, Khandelwal R, Tanwar P, Madhavi M, Sharma D, Thakur G, Speck-Planche A, Singh SK. Artificial Intelligence, Big Data and Machine Learning Approaches in Precision Medicine & Drug Discovery. Curr Drug Targets 2021; 22:631-655. [PMID: 33397265 DOI: 10.2174/1389450122999210104205732] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 08/21/2020] [Accepted: 09/14/2020] [Indexed: 11/22/2022]
Abstract
Artificial Intelligence revolutionizes the drug development process that can quickly identify potential biologically active compounds from millions of candidate within a short period. The present review is an overview based on some applications of Machine Learning based tools, such as GOLD, Deep PVP, LIB SVM, etc. and the algorithms involved such as support vector machine (SVM), random forest (RF), decision tree and Artificial Neural Network (ANN), etc. at various stages of drug designing and development. These techniques can be employed in SNP discoveries, drug repurposing, ligand-based drug design (LBDD), Ligand-based Virtual Screening (LBVS) and Structure- based Virtual Screening (SBVS), Lead identification, quantitative structure-activity relationship (QSAR) modeling, and ADMET analysis. It is demonstrated that SVM exhibited better performance in indicating that the classification model will have great applications on human intestinal absorption (HIA) predictions. Successful cases have been reported which demonstrate the efficiency of SVM and RF models in identifying JFD00950 as a novel compound targeting against a colon cancer cell line, DLD-1, by inhibition of FEN1 cytotoxic and cleavage activity. Furthermore, a QSAR model was also used to predict flavonoid inhibitory effects on AR activity as a potent treatment for diabetes mellitus (DM), using ANN. Hence, in the era of big data, ML approaches have been evolved as a powerful and efficient way to deal with the huge amounts of generated data from modern drug discovery to model small-molecule drugs, gene biomarkers and identifying the novel drug targets for various diseases.
Collapse
Affiliation(s)
- Anuraj Nayarisseri
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Ravina Khandelwal
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Poonam Tanwar
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Maddala Madhavi
- Department of Zoology, Nizam College, Osmania University, Hyderabad - 500001, Telangana State, India
| | - Diksha Sharma
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Garima Thakur
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| | - Alejandro Speck-Planche
- Programa Institucional de Fomento a la Investigacion, Desarrollo e Innovacion, Universidad Tecnologica Metropolitana, Ignacio Valdivieso 2409, P.O. 8940577, San Joaquin, Santiago, Chile
| | - Sanjeev Kumar Singh
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi-630003, Tamil Nadu, India
| |
Collapse
|
30
|
Huang DZ, Baber JC, Bahmanyar SS. The challenges of generalizability in artificial intelligence for ADME/Tox endpoint and activity prediction. Expert Opin Drug Discov 2021; 16:1045-1056. [PMID: 33739897 DOI: 10.1080/17460441.2021.1901685] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
INTRODUCTION Artificial intelligence (AI) has seen a massive resurgence in recent years with wide successes in computer vision, natural language processing, and games. The similar creation of robust and accurate AI models for ADME/Tox endpoint and activity prediction would be revolutionary to drug discovery pipelines. There have been numerous demonstrations of successful applications, but a key challenge remains: how generalizable are these predictive models? AREAS COVERED The authors present a summary of current promising components of AI models in the context of early drug discovery where ADME/Tox endpoint and activity prediction is the main driver of the iterative drug design process. Following that is a review of applicability domains and dataset construction considerations which determine generalizability bottlenecks for AI deployment. Further reviewed is the role of promising learning frameworks - multitask, transfer, and meta learning - which leverage auxiliary data to overcome issues of generalizability. EXPERT OPINION The authors conclude that the most promising direction toward integrating reliable and informative AI models into the drug discovery pipeline is a conjunction of learned feature representations, deep learning, and novel learning frameworks. Such a solution would address the sparse and incomplete datasets that are available for key endpoints related to drug discovery.
Collapse
Affiliation(s)
| | - J Christian Baber
- Scientific Informatics, Global Head of Scientific Informatics, Scientific Informatics, Takeda Pharmaceuticals, Cambridge, MA, USA
| | - Sogole Sami Bahmanyar
- Computational Chemistry, Director of Computational Sciences, Computational Chemistry, Takeda Pharmaceuticals, San Diego, USA
| |
Collapse
|
31
|
Pereira T, Abbasi M, Ribeiro B, Arrais JP. Diversity oriented Deep Reinforcement Learning for targeted molecule generation. J Cheminform 2021; 13:21. [PMID: 33750461 PMCID: PMC7944916 DOI: 10.1186/s13321-021-00498-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/22/2021] [Indexed: 11/10/2022] Open
Abstract
In this work, we explore the potential of deep learning to streamline the process of identifying new potential drugs through the computational generation of molecules with interesting biological properties. Two deep neural networks compose our targeted generation framework: the Generator, which is trained to learn the building rules of valid molecules employing SMILES strings notation, and the Predictor which evaluates the newly generated compounds by predicting their affinity for the desired target. Then, the Generator is optimized through Reinforcement Learning to produce molecules with bespoken properties. The innovation of this approach is the exploratory strategy applied during the reinforcement training process that seeks to add novelty to the generated compounds. This training strategy employs two Generators interchangeably to sample new SMILES: the initially trained model that will remain fixed and a copy of the previous one that will be updated during the training to uncover the most promising molecules. The evolution of the reward assigned by the Predictor determines how often each one is employed to select the next token of the molecule. This strategy establishes a compromise between the need to acquire more information about the chemical space and the need to sample new molecules, with the experience gained so far. To demonstrate the effectiveness of the method, the Generator is trained to design molecules with an optimized coefficient of partition and also high inhibitory power against the Adenosine [Formula: see text] and [Formula: see text] opioid receptors. The results reveal that the model can effectively adjust the newly generated molecules towards the wanted direction. More importantly, it was possible to find promising sets of unique and diverse molecules, which was the main purpose of the newly implemented strategy.
Collapse
Affiliation(s)
- Tiago Pereira
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| | - Maryam Abbasi
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| | - Bernardete Ribeiro
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| | - Joel P. Arrais
- Department of Informatics Engineering, Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, Pinhal de Marrocos, Coimbra, Portugal
| |
Collapse
|
32
|
Wu L, Huang R, Tetko IV, Xia Z, Xu J, Tong W. Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets. Chem Res Toxicol 2021; 34:541-549. [PMID: 33513003 DOI: 10.1021/acs.chemrestox.0c00373] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds. Seven molecular representations as features and 12 modeling approaches varying in complexity and explainability were employed to systematically investigate the impact of various factors on model performance and explainability. We demonstrated that end points dictated a model's performance, regardless of the chosen modeling approach including deep learning and chemical features. Overall, more complex models such as (LS-)SVM and Random Forest performed marginally better than simpler models such as linear regression and KNN in the presented Tox21 data analysis. Since a simpler model with acceptable performance often also is easy to interpret for the Tox21 data set, it clearly was the preferred choice due to its better explainability. Given that each data set had its own error structure both for dependent and independent variables, we strongly recommend that it is important to conduct a systematic study with a broad range of model complexity and feature explainability to identify model balancing its predictivity and explainability.
Collapse
Affiliation(s)
- Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Ruili Huang
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.,BIGCHEM GmbH, Valerystraße 49, DE-85716 Unterschleißheim, Germany
| | - Zhonghua Xia
- Institute of Structural Biology, Helmholtz Zentrum München-Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| |
Collapse
|
33
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|
34
|
Siramshetty VB, Nguyen DT, Martinez NJ, Southall NT, Simeonov A, Zakharov AV. Critical Assessment of Artificial Intelligence Methods for Prediction of hERG Channel Inhibition in the "Big Data" Era. J Chem Inf Model 2020; 60:6007-6019. [PMID: 33259212 DOI: 10.1021/acs.jcim.0c00884] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The rise of novel artificial intelligence (AI) methods necessitates their benchmarking against classical machine learning for a typical drug-discovery project. Inhibition of the potassium ion channel, whose alpha subunit is encoded by the human ether-à-go-go-related gene (hERG), leads to a prolonged QT interval of the cardiac action potential and is a significant safety pharmacology target for the development of new medicines. Several computational approaches have been employed to develop prediction models for the assessment of hERG liabilities of small molecules including recent work using deep learning methods. Here, we perform a comprehensive comparison of hERG effect prediction models based on classical approaches (random forests and gradient boosting) and modern AI methods [deep neural networks (DNNs) and recurrent neural networks (RNNs)]. The training set (∼9000 compounds) was compiled by integrating the hERG bioactivity data from the ChEMBL database with experimental data generated from an in-house, high-throughput thallium flux assay. We utilized different molecular descriptors including the latent descriptors, which are real-value continuous vectors derived from chemical autoencoders trained on a large chemical space (>1.5 million compounds). The models were prospectively validated on ∼840 in-house compounds screened in the same thallium flux assay. The best results were obtained with the XGBoost method and RDKit descriptors. The comparison of models based only on latent descriptors revealed that the DNNs performed significantly better than the classical methods. The RNNs that operate on SMILES provided the highest model sensitivity. The best models were merged into a consensus model that offered superior performance compared to reference models from academic and commercial domains. Furthermore, we shed light on the potential of AI methods to exploit the big data in chemistry and generate novel chemical representations useful in predictive modeling and tailoring a new chemical space.
Collapse
Affiliation(s)
- Vishal B Siramshetty
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Natalia J Martinez
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Noel T Southall
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
35
|
Siramshetty VB, Shah P, Kerns E, Nguyen K, Yu KR, Kabir M, Williams J, Neyra J, Southall N, Nguyễn ÐT, Xu X. Retrospective assessment of rat liver microsomal stability at NCATS: data and QSAR models. Sci Rep 2020; 10:20713. [PMID: 33244000 PMCID: PMC7693334 DOI: 10.1038/s41598-020-77327-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 11/04/2020] [Indexed: 11/09/2022] Open
Abstract
Hepatic metabolic stability is a key pharmacokinetic parameter in drug discovery. Metabolic stability is usually assessed in microsomal fractions and only the best compounds progress in the drug discovery process. A high-throughput single time point substrate depletion assay in rat liver microsomes (RLM) is employed at the National Center for Advancing Translational Sciences. Between 2012 and 2020, RLM stability data was generated for ~ 24,000 compounds from more than 250 projects that cover a wide range of pharmacological targets and cellular pathways. Although a crucial endpoint, little or no data exists in the public domain. In this study, computational models were developed for predicting RLM stability using different machine learning methods. In addition, a retrospective time-split validation was performed, and local models were built for projects that performed poorly with global models. Further analysis revealed inherent medicinal chemistry knowledge potentially useful to chemists in the pursuit of synthesizing metabolically stable compounds. In addition, we deposited experimental data for ~ 2500 compounds in the PubChem bioassay database (AID: 1508591). The global prediction models are made publicly accessible ( https://opendata.ncats.nih.gov/adme ). This is to the best of our knowledge, the first publicly available RLM prediction model built using high-quality data generated at a single laboratory.
Collapse
Affiliation(s)
- Vishal B Siramshetty
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Pranav Shah
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Edward Kerns
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Kimloan Nguyen
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA.,NY State Public Health, DOHMH 42-09 28th St, Long Island City, NY, 11101, USA
| | - Kyeong Ri Yu
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA.,School of Medicine, Virginia Commonwealth University, 1201 E Marshall St, Richmond, VA, 23298, USA
| | - Md Kabir
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA.,The Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, 10029, USA
| | - Jordan Williams
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Jorge Neyra
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Noel Southall
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Ðắc-Trung Nguyễn
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Xin Xu
- National Center for Advancing Translational Sciences (NCATS), 9800 Medical Center Drive, Rockville, MD, 20850, USA.
| |
Collapse
|
36
|
Chakravarti SK. Reason Vectors: Abstract Representation of Chemistry–Biology Interaction Outcomes, for Reasoning and Prediction. J Chem Inf Model 2020; 60:4614-4628. [DOI: 10.1021/acs.jcim.0c00601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Suman K. Chakravarti
- MultiCASE Inc., 23811 Chagrin Blvd., Suite 305, Beachwood, Ohio 44122, United States
| |
Collapse
|
37
|
Alsenan S, Al-Turaiki I, Hafez A. A Recurrent Neural Network model to predict blood-brain barrier permeability. Comput Biol Chem 2020; 89:107377. [PMID: 33010784 DOI: 10.1016/j.compbiolchem.2020.107377] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 09/09/2020] [Accepted: 09/12/2020] [Indexed: 12/14/2022]
Abstract
The rapid development of computational methods and the increasing volume of chemical and biological data have contributed to an immense growth in chemical research. This field of study is known as "chemoinformatics," which is a discipline that uses machine-learning techniques to extract, process, and extrapolate data from chemical structures. One of the significant lines of research in chemoinformatics is the study of blood-brain barrier (BBB) permeability, which aims to identify drug penetration into the central nervous system (CNS). In this research, we attempt to solve the problem of BBB permeability by predicting compounds penetration to the CNS. To accomplish this goal: (i) First, an overview is provided to the field of chemoinformatics, its definition, applications, and challenges, (ii) Second, a broad view is taken to investigate previous machine-learning and deep-learning computational models to solve BBB permeability. Based on the analysis of previous models, three main challenges that collectively affect the classifier performance are identified, which we define as "the triple constraints"; subsequently, we map each constraint to a proposed solution, (iii) Finally, we conclude this endeavor by proposing a deep learning based Recurrent Neural Network model, to predict BBB permeability (RNN-BBB model). Our model outperformed other studies from the literature by scoring an overall accuracy of 96.53%, and a specificity score of 98.08%. The obtained results confirm that addressing the triple constraints substantially improves the classification model capability specifically when predicting compounds with low penetration.
Collapse
Affiliation(s)
- Shrooq Alsenan
- Research Center, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia; Research Chair in Healthcare Innovation, Information Systems Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.
| | - Isra Al-Turaiki
- College of Computer and Information Sciences, Information Technology Department, King Saud University, Riyadh, Saudi Arabia.
| | - Alaaeldin Hafez
- College of Computer and Information Sciences, Information Systems Department, King Saud University, Riyadh, Saudi Arabia.
| |
Collapse
|
38
|
Keshavarzi Arshadi A, Webb J, Salem M, Cruz E, Calad-Thomson S, Ghadirian N, Collins J, Diez-Cecilia E, Kelly B, Goodarzi H, Yuan JS. Artificial Intelligence for COVID-19 Drug Discovery and Vaccine Development. Front Artif Intell 2020; 3:65. [PMID: 33733182 PMCID: PMC7861281 DOI: 10.3389/frai.2020.00065] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 07/17/2020] [Indexed: 12/31/2022] Open
Abstract
SARS-COV-2 has roused the scientific community with a call to action to combat the growing pandemic. At the time of this writing, there are as yet no novel antiviral agents or approved vaccines available for deployment as a frontline defense. Understanding the pathobiology of COVID-19 could aid scientists in their discovery of potent antivirals by elucidating unexplored viral pathways. One method for accomplishing this is the leveraging of computational methods to discover new candidate drugs and vaccines in silico. In the last decade, machine learning-based models, trained on specific biomolecules, have offered inexpensive and rapid implementation methods for the discovery of effective viral therapies. Given a target biomolecule, these models are capable of predicting inhibitor candidates in a structural-based manner. If enough data are presented to a model, it can aid the search for a drug or vaccine candidate by identifying patterns within the data. In this review, we focus on the recent advances of COVID-19 drug and vaccine development using artificial intelligence and the potential of intelligent training for the discovery of COVID-19 therapeutics. To facilitate applications of deep learning for SARS-COV-2, we highlight multiple molecular targets of COVID-19, inhibition of which may increase patient survival. Moreover, we present CoronaDB-AI, a dataset of compounds, peptides, and epitopes discovered either in silico or in vitro that can be potentially used for training models in order to extract COVID-19 treatment. The information and datasets provided in this review can be used to train deep learning-based models and accelerate the discovery of effective viral therapies.
Collapse
Affiliation(s)
- Arash Keshavarzi Arshadi
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL, United States
| | - Julia Webb
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL, United States
| | - Milad Salem
- Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, United States
| | | | | | - Niloofar Ghadirian
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ, United States
| | - Jennifer Collins
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL, United States
| | | | | | - Hani Goodarzi
- Department of Biochemistry and Biophysics, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, United States
| | - Jiann Shiun Yuan
- Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, United States
| |
Collapse
|
39
|
Chan K, Leung HCM, Tsoi JKH. Predictive QSAR model confirms flavonoids in Chinese medicine can activate voltage-gated calcium (CaV) channel in osteogenesis. Chin Med 2020; 15:31. [PMID: 32256687 PMCID: PMC7106815 DOI: 10.1186/s13020-020-00313-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2020] [Accepted: 03/19/2020] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Flavonoids in Chinese Medicine have been proven in animal studies that could aid in osteogenesis and bone formation. However, there is no consented mechanism for how these phytochemicals action on the bone-forming osteoblasts, and henceforth the prediction model of chemical screening for this specific biochemical function has not been established. The purpose of this study was to develop a novel selection and effective approach of flavonoids on the prediction of bone-forming ability via osteoblastic voltage-gated calcium (CaV) activation and inhibition using molecular modelling technique. METHOD Quantitative structure-activity relationship (QSAR) in supervised maching-learning approach is applied in this study to predict the behavioral manifestations of flavonoids in the CaV channels, and developing statistical correlation between the biochemical features and the behavioral manifestations of 24 compounds (Training set: Kaempferol, Taxifolin, Daidzein, Morin, Scutellarein, Quercetin, Apigenin, Myricetin, Tamarixetin, Rutin, Genistein, 5,7,2'-Trihydroxyflavone, Baicalein, Luteolin, Galangin, Chrysin, Isorhamnetin, Naringin, 3-Methyl galangin, Resokaempferol; test set: 5-Hydroxyflavone, 3,6,4'-Trihydroxyflavone, 3,4'-Dihydroxyflavone and Naringenin). Based on statistical algorithm, QSAR provides a reasonable basis for establishing a predictive correlation model by a variety of molecular descriptors that are able to identify as well as analyse the biochemical features of flavonoids that engaged in activating or inhibiting the CaV channels for osteoblasts. RESULTS The model has shown these flavonoids have high activating effects on CaV channel for osteogenesis. In addition, scutellarein was ranked the highest among the screened flavonoids, and other lower ranked compounds, such as daidzein, quercetin, genistein and naringin, have shown the same descending order as previous animal studies. CONCLUSION This predictive modelling study has confirmed and validated the biochemical activity of the flavonoids in the osteoblastic CaV activation.
Collapse
Affiliation(s)
- Ki Chan
- Dental Materials Science, Division of Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Pokfulam, Hong Kong SAR PRC
| | - Henry Chi Ming Leung
- Department of Computer Science, Faculty of Engineering, University of Hong Kong, Pokfulam, Hong Kong SAR PRC
| | - James Kit-Hon Tsoi
- Dental Materials Science, Division of Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Pokfulam, Hong Kong SAR PRC
| |
Collapse
|
40
|
Hu S, Chen P, Gu P, Wang B. A Deep Learning-Based Chemical System for QSAR Prediction. IEEE J Biomed Health Inform 2020; 24:3020-3028. [PMID: 32142459 DOI: 10.1109/jbhi.2020.2977009] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Research on quantitative structure-activity relationships (QSAR) provides an effective approach to determine new hits and promising lead compounds during drug discovery. In the past decades, various works have gained good performance for QSAR with the development of machine learning. The rise of deep learning, along with massive accessible chemical databases, made improvement on the QSAR performance. This article proposes a novel deep-learning-based method to implement QSAR prediction by the concatenation of end-to-end encoder-decoder model and convolutional neural network (CNN) architecture. The encoder-decoder model is mainly used to generate fixed-size latent features to represent chemical molecules; while these features are then input into CNN framework to train a robust and stable model and finally to predict active chemicals. Two models with different schemes are investigated to evaluate the validity of our proposed model on the same data sets. Experimental results showed that our proposed method outperforms other state-of-the-art methods in successful identification of chemical molecule whether it is active.
Collapse
|