1
|
Zhu W, Wang Y, Niu Y, Zhang L, Liu Z. Current Trends and Challenges in Drug-Likeness Prediction: Are They Generalizable and Interpretable? HEALTH DATA SCIENCE 2023; 3:0098. [PMID: 38487200 PMCID: PMC10880170 DOI: 10.34133/hds.0098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 10/20/2023] [Indexed: 03/17/2024]
Abstract
Importance: Drug-likeness of a compound is an overall assessment of its potential to succeed in clinical trials, and is essential for economizing research expenditures by filtering compounds with unfavorable properties and poor development potential. To this end, a robust drug-likeness prediction method is indispensable. Various approaches, including discriminative rules, statistical models, and machine learning models, have been developed to predict drug-likeness based on physiochemical properties and structural features. Notably, recent advancements in novel deep learning techniques have significantly advanced drug-likeness prediction, especially in classification performance. Highlights: In this review, we addressed the evolving landscape of drug-likeness prediction, with emphasis on methods employing novel deep learning techniques, and highlighted the current challenges in drug-likeness prediction, specifically regarding the aspects of generalization and interpretability. Moreover, we explored potential remedies and outlined promising avenues for future research. Conclusion: Despite the hurdles of generalization and interpretability, novel deep learning techniques have great potential in drug-likeness prediction and are worthy of further research efforts.
Collapse
Affiliation(s)
- Wenyu Zhu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Yanxing Wang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Yan Niu
- Department of Medicinal Chemistry,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| |
Collapse
|
2
|
Dulsat J, López-Nieto B, Estrada-Tejedor R, Borrell JI. Evaluation of Free Online ADMET Tools for Academic or Small Biotech Environments. MOLECULES (BASEL, SWITZERLAND) 2023; 28:molecules28020776. [PMID: 36677832 PMCID: PMC9864198 DOI: 10.3390/molecules28020776] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 12/27/2022] [Accepted: 01/10/2023] [Indexed: 01/15/2023]
Abstract
For a new molecular entity (NME) to become a drug, it is not only essential to have the right biological activity also be safe and efficient, but it is also required to have a favorable pharmacokinetic profile including toxicity (ADMET). Consequently, there is a need to predict, during the early stages of development, the ADMET properties to increase the success rate of compounds reaching the lead optimization process. Since Lipinski's rule of five, the prediction of pharmacokinetic parameters has evolved towards the current in silico tools based on empirical approaches or molecular modeling. The commercial specialized software for performing such predictions, which is usually costly, is, in many cases, not among the possibilities for research laboratories in academia or at small biotech companies. Nevertheless, in recent years, many free online tools have become available, allowing, more or less accurately, for the prediction of the most relevant pharmacokinetic parameters. This paper studies 18 free web servers capable of predicting ADMET properties and analyzed their advantages and disadvantages, their model-based calculations, and their degree of accuracy by considering the experimental data reported for a set of 24 FDA-approved tyrosine kinase inhibitors (TKIs) as a model of a research project.
Collapse
|
3
|
T. Billones* L, B. Morales N, B. Billones J. Logistic regression and random forest unveil key molecular descriptors of druglikeness. CHEM-BIO INFORMATICS JOURNAL 2021. [DOI: 10.1273/cbij.21.39] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Liza T. Billones*
- Department of Physical Sciences and Mathematics, College of Arts and Sciences University of the Philippines Manila
| | - Nadia B. Morales
- Department of Physical Sciences and Mathematics, College of Arts and Sciences University of the Philippines Manila
| | - Junie B. Billones
- Department of Physical Sciences and Mathematics, College of Arts and Sciences University of the Philippines Manila
| |
Collapse
|
4
|
Ye Z, Yang W, Yang Y, Ouyang D. Interpretable machine learning methods for in vitro pharmaceutical formulation development. FOOD FRONTIERS 2021. [DOI: 10.1002/fft2.78] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Zhuyifan Ye
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| | - Wenmian Yang
- State Key Laboratory of Internet of Things for Smart City University of Macau Macau China
| | - Yilong Yang
- School of Software Beihang University Beijing China
| | - Defang Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine Institute of Chinese Medical Sciences (ICMS) University of Macau Macau China
| |
Collapse
|
5
|
Wang A, Wang M. Drug-Target Interaction Prediction via Dual Laplacian Graph Regularized Logistic Matrix Factorization. BIOMED RESEARCH INTERNATIONAL 2021; 2021:5599263. [PMID: 33855072 PMCID: PMC8019634 DOI: 10.1155/2021/5599263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 03/06/2021] [Accepted: 03/13/2021] [Indexed: 11/18/2022]
Abstract
Drug-target interactions provide useful information for biomedical drug discovery as well as drug development. However, it is costly and time consuming to find drug-target interactions by experimental methods. As a result, developing computational approaches for this task is necessary and has practical significance. In this study, we establish a novel dual Laplacian graph regularized logistic matrix factorization model for drug-target interaction prediction, referred to as DLGrLMF briefly. Specifically, DLGrLMF regards the task of drug-target interaction prediction as a weighted logistic matrix factorization problem, in which the experimentally validated interactions are allocated with larger weights. Meanwhile, by considering that drugs with similar chemical structure should have interactions with similar targets and targets with similar genomic sequence similarity should in turn have interactions with similar drugs, the drug pairwise chemical structure similarities as well as the target pairwise genomic sequence similarities are fully exploited to serve the matrix factorization problem by using a dual Laplacian graph regularization term. In addition, we design a gradient descent algorithm to solve the resultant optimization problem. Finally, the efficacy of DLGrLMF is validated on various benchmark datasets and the experimental results demonstrate that DLGrLMF performs better than other state-of-the-art methods. Case studies are also conducted to validate that DLGrLMF can successfully predict most of the experimental validated drug-target interactions.
Collapse
Affiliation(s)
- Aizhen Wang
- Department of Pharmacy, The Affiliated Huai'an Hospital of Xuzhou Medical University and The Second People's Hospital of Huai'an, Huai'an 223002, China
| | - Minhui Wang
- Department of Pharmacy, Lianshui People's Hospital Affiliated to Kangda College, Nanjing Medical University, Huai'an 223300, China
| |
Collapse
|
6
|
Machine learning, artificial intelligence, and data science breaking into drug design and neglected diseases. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1513] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
7
|
Abstract
Aim: The explosion of data based technology has accelerated pattern mining. However, it is clear that quality and bias of data impacts all machine learning and modeling. Results & methodology: A technique is presented for using the distribution of first significant digits of medicinal chemistry features: logP, logS, and pKa. experimental and predicted, to assess their following of Benford's law as seen in many natural phenomena. Conclusion: Quality of data depends on the dataset sizes, diversity, and magnitudes. Profiling based on drugs may be too small or narrow; using larger sets of experimentally determined or predicted values recovers the distribution seen in other natural phenomena. This technique may be used to improve profiling, machine learning, large dataset assessment and other data based methods for better (automated) data generation and designing compounds.
Collapse
|
8
|
Distinguishing drug/non-drug-like small molecules in drug discovery using deep belief network. Mol Divers 2020; 25:827-838. [PMID: 32193758 DOI: 10.1007/s11030-020-10065-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2020] [Accepted: 02/26/2020] [Indexed: 10/24/2022]
Abstract
The advent of computational methods for efficient prediction of the druglikeness of small molecules and their ever-burgeoning applications in the fields of medicinal chemistry and drug industries have been a profound scientific development, since only a few amounts of the small molecule libraries were identified as approvable drugs. In this study, a deep belief network was utilized to construct a druglikeness classification model. For this purpose, small molecules and approved drugs from the ZINC database were selected for the unsupervised pre-training step and supervised training step. Various binary fingerprints such as Macc 166 bit, PubChem 881 bit, and Morgan 2048 bit as data features were investigated. The report revealed that using an unsupervised pre-training phase can lead to a good performance model and generalizability capability. Accuracy, precision, and recall of the model for Macc features were 97%, 96%, and 99%, respectively. For more consideration about the generalizability of the model, the external data by expression and investigational drugs in drug banks as drug data and randomly selected data from the ZINC database as non-drug were created. The results confirmed the good performance and generalizability capability of the model. Also, the outcomes depicted that a large proportion of misclassified non-drug small molecules ascertain the bioavailability conditions and could be investigated as a drug in the future. Furthermore, our model attempted to tap potential opportunities as a drug filter in drug discovery.
Collapse
|
9
|
Saddala MS, Lennikov A, Huang H. Discovery of Small-Molecule Activators for Glucose-6-Phosphate Dehydrogenase (G6PD) Using Machine Learning Approaches. Int J Mol Sci 2020; 21:ijms21041523. [PMID: 32102234 PMCID: PMC7073180 DOI: 10.3390/ijms21041523] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 02/18/2020] [Accepted: 02/21/2020] [Indexed: 02/06/2023] Open
Abstract
Glucose-6-Phosphate Dehydrogenase (G6PD) is a ubiquitous cytoplasmic enzyme converting glucose-6-phosphate into 6-phosphogluconate in the pentose phosphate pathway (PPP). The G6PD deficiency renders the inability to regenerate glutathione due to lack of Nicotine Adenosine Dinucleotide Phosphate (NADPH) and produces stress conditions that can cause oxidative injury to photoreceptors, retinal cells, and blood barrier function. In this study, we constructed pharmacophore-based models based on the complex of G6PD with compound AG1 (G6PD activator) followed by virtual screening. Fifty-three hit molecules were mapped with core pharmacophore features. We performed molecular descriptor calculation, clustering, and principal component analysis (PCA) to pharmacophore hit molecules and further applied statistical machine learning methods. Optimal performance of pharmacophore modeling and machine learning approaches classified the 53 hits as drug-like (18) and nondrug-like (35) compounds. The drug-like compounds further evaluated our established cheminformatics pipeline (molecular docking and in silico ADMET (absorption, distribution, metabolism, excretion and toxicity) analysis). Finally, five lead molecules with different scaffolds were selected by binding energies and in silico ADMET properties. This study proposes that the combination of machine learning methods with traditional structure-based virtual screening can effectively strengthen the ability to find potential G6PD activators used for G6PD deficiency diseases. Moreover, these compounds can be considered as safe agents for further validation studies at the cell level, animal model, and even clinic setting.
Collapse
|
10
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 340] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
11
|
Wang M, Tang C, Chen J. Drug-Target Interaction Prediction via Dual Laplacian Graph Regularized Matrix Completion. BIOMED RESEARCH INTERNATIONAL 2018; 2018:1425608. [PMID: 30627536 PMCID: PMC6304580 DOI: 10.1155/2018/1425608] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 09/03/2018] [Accepted: 10/24/2018] [Indexed: 01/16/2023]
Abstract
Drug-target interactions play an important role for biomedical drug discovery and development. However, it is expensive and time-consuming to accomplish this task by experimental determination. Therefore, developing computational techniques for drug-target interaction prediction is urgent and has practical significance. In this work, we propose an effective computational model of dual Laplacian graph regularized matrix completion, referred to as DLGRMC briefly, to infer the unknown drug-target interactions. Specifically, DLGRMC transforms the task of drug-target interaction prediction into a matrix completion problem, in which the potential interactions between drugs and targets can be obtained based on the prediction scores after the matrix completion procedure. In DLGRMC, the drug pairwise chemical structure similarities and the target pairwise genomic sequence similarities are fully exploited to serve the matrix completion by using a dual Laplacian graph regularization term; i.e., drugs with similar chemical structure are more likely to have interactions with similar targets and targets with similar genomic sequence similarity are more likely to have interactions with similar drugs. In addition, during the matrix completion process, an indicator matrix with binary values which indicates the indices of the observed drug-target interactions is deployed to preserve the experimental confirmed interactions. Furthermore, we develop an alternative iterative strategy to solve the constrained matrix completion problem based on Augmented Lagrange Multiplier algorithm. We evaluate DLGRMC on five benchmark datasets and the results show that DLGRMC outperforms several state-of-the-art approaches in terms of 10-fold cross validation based AUPR values and PR curves. In addition, case studies also demonstrate that DLGRMC can successfully predict most of the experimental validated drug-target interactions.
Collapse
Affiliation(s)
- Minhui Wang
- Department of Pharmacy, People's Hospital of Lian'shui County, Huai'an 223300, China
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Jiajia Chen
- Department of Pharmacy, The Affiliated Huai'an Hospital of Xuzhou Medical University, Huai'an 223002, China
| |
Collapse
|
12
|
Eurtivong C, Reynisson J. The Development of a Weighted Index to Optimise Compound Libraries for High Throughput Screening. Mol Inform 2018; 38:e1800068. [PMID: 30345657 DOI: 10.1002/minf.201800068] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 09/27/2018] [Indexed: 12/15/2022]
Abstract
1880 known drugs were collected and analysed for their mainstream molecular descriptors: MW, log P, HA, HD, RB and PSA. The statistical distributions were fitted to Gaussian functions for each of the descriptors. This gave a mathematical tool to calculate a weighted score, or an Index, for each descriptor. Known Drug Indexes (KDIs) were derived either by summation or multiplication of the Indexes, giving one number for each molecule calculated. The KDI summation and multiplication methods give a theoretical maxima of 6 and 1 respectively. According to both methods, methysergide (5.89/0.90), amsacrine (5.89/0.89) and fluorometholone (5.88/0.88) have the scores of the most well-balanced pharmaceuticals. The KDIs are advantageous tools in identifying the most well-balanced screening compounds based on the properties of known drugs; the screening collection can be optimised to only include quality compounds, which in turn produce tractable hit and lead compounds from the screening campaign.
Collapse
Affiliation(s)
- Chatchakorn Eurtivong
- Chemical Biology, Chulabhorn Graduate Institute, 54 Kamphaeng Phet 6 Road, Talat Bang Khen sub-district, Lak Si district, Bangkok, 10210, Thailand
| | - Jóhannes Reynisson
- School of Chemical Sciences, University of Auckland Private Bag 92019, Auckland, 1142, New Zealand
| |
Collapse
|
13
|
Yosipof A, Guedes RC, García-Sosa AT. Data Mining and Machine Learning Models for Predicting Drug Likeness and Their Disease or Organ Category. Front Chem 2018; 6:162. [PMID: 29868564 PMCID: PMC5954128 DOI: 10.3389/fchem.2018.00162] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 04/20/2018] [Indexed: 12/11/2022] Open
Abstract
Data mining approaches can uncover underlying patterns in chemical and pharmacological property space decisive for drug discovery and development. Two of the most common approaches are visualization and machine learning methods. Visualization methods use dimensionality reduction techniques in order to reduce multi-dimension data into 2D or 3D representations with a minimal loss of information. Machine learning attempts to find correlations between specific activities or classifications for a set of compounds and their features by means of recurring mathematical models. Both models take advantage of the different and deep relationships that can exist between features of compounds, and helpfully provide classification of compounds based on such features or in case of visualization methods uncover underlying patterns in the feature space. Drug-likeness has been studied from several viewpoints, but here we provide the first implementation in chemoinformatics of the t-Distributed Stochastic Neighbor Embedding (t-SNE) method for the visualization and the representation of chemical space, and the use of different machine learning methods separately and together to form a new ensemble learning method called AL Boost. The models obtained from AL Boost synergistically combine decision tree, random forests (RF), support vector machine (SVM), artificial neural network (ANN), k nearest neighbors (kNN), and logistic regression models. In this work, we show that together they form a predictive model that not only improves the predictive force but also decreases bias. This resulted in a corrected classification rate of over 0.81, as well as higher sensitivity and specificity rates for the models. In addition, separation and good models were also achieved for disease categories such as antineoplastic compounds and nervous system diseases, among others. Such models can be used to guide decision on the feature landscape of compounds and their likeness to either drugs or other characteristics, such as specific or multiple disease-category(ies) or organ(s) of action of a molecule.
Collapse
Affiliation(s)
- Abraham Yosipof
- Department of Information Systems and Department of Business Administration, College of Law & Business, Ramat-Gan, Israel
| | - Rita C Guedes
- Department of Medicinal Chemistry, Faculty of Pharmacy, Research Institute for Medicines (iMed.ULisboa), Universidade de Lisboa, Lisbon, Portugal
| | - Alfonso T García-Sosa
- Department of Molecular Technology, Institute of Chemistry, University of Tartu, Tartu, Estonia
| |
Collapse
|
14
|
Zhang W, Chen Y, Li D. Drug-Target Interaction Prediction through Label Propagation with Linear Neighborhood Information. Molecules 2017; 22:molecules22122056. [PMID: 29186828 PMCID: PMC6149680 DOI: 10.3390/molecules22122056] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 11/19/2017] [Accepted: 11/20/2017] [Indexed: 11/16/2022] Open
Abstract
Interactions between drugs and target proteins provide important information for the drug discovery. Currently, experiments identified only a small number of drug-target interactions. Therefore, the development of computational methods for drug-target interaction prediction is an urgent task of theoretical interest and practical significance. In this paper, we propose a label propagation method with linear neighborhood information (LPLNI) for predicting unobserved drug-target interactions. Firstly, we calculate drug-drug linear neighborhood similarity in the feature spaces, by considering how to reconstruct data points from neighbors. Then, we take similarities as the manifold of drugs, and assume the manifold unchanged in the interaction space. At last, we predict unobserved interactions between known drugs and targets by using drug-drug linear neighborhood similarity and known drug-target interactions. The experiments show that LPLNI can utilize only known drug-target interactions to make high-accuracy predictions on four benchmark datasets. Furthermore, we consider incorporating chemical structures into LPLNI models. Experimental results demonstrate that the model with integrated information (LPLNI-II) can produce improved performances, better than other state-of-the-art methods. The known drug-target interactions are an important information source for computational predictions. The usefulness of the proposed method is demonstrated by cross validation and the case study.
Collapse
Affiliation(s)
- Wen Zhang
- School of Computer, Wuhan University, Wuhan 430072, China.
| | - Yanlin Chen
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China.
| | - Dingfang Li
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China.
| |
Collapse
|
15
|
Zeidan M, Rayan M, Zeidan N, Falah M, Rayan A. Indexing Natural Products for Their Potential Anti-Diabetic Activity: Filtering and Mapping Discriminative Physicochemical Properties. Molecules 2017; 22:molecules22091563. [PMID: 28926980 PMCID: PMC6151781 DOI: 10.3390/molecules22091563] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2017] [Revised: 09/14/2017] [Accepted: 09/14/2017] [Indexed: 12/12/2022] Open
Abstract
Diabetes mellitus (DM) poses a major health problem, for which there is an unmet need to develop novel drugs. The application of in silico techniques and optimization algorithms is instrumental to achieving this goal. A set of 97 approved anti-diabetic drugs, representing the active domain, and a set of 2892 natural products, representing the inactive domain, were used to construct predictive models and to index anti-diabetic bioactivity. Our recently-developed approach of ‘iterative stochastic elimination’ was utilized. This article describes a highly discriminative and robust model, with an area under the curve above 0.96. Using the indexing model and a mix ratio of 1:1000 (active/inactive), 65% of the anti-diabetic drugs in the sample were captured in the top 1% of the screened compounds, compared to 1% in the random model. Some of the natural products that scored highly as potential anti-diabetic drug candidates are disclosed. One of those natural products is caffeine, which is noted in the scientific literature as having the capability to decrease blood glucose levels. The other nine phytochemicals await evaluation in a wet lab for their anti-diabetic activity. The indexing model proposed herein is useful for the virtual screening of large chemical databases and for the construction of anti-diabetes focused libraries.
Collapse
Affiliation(s)
- Mouhammad Zeidan
- Molecular Genetics and Virology Laboratory, QRC-Qasemi Research Center, Al-Qasemi Academic College, P.O. Box 124, Baka EL-Garbiah 30100, Israel.
| | - Mahmoud Rayan
- Institute of Applied Research-Galilee Society, P.O. Box 437, Shefa-Amr 20200, Israel.
| | - Nuha Zeidan
- Clalit Health Service, Diet and Nutrition Unit, P.O. Box 789, Arara 30026, Israel.
| | - Mizied Falah
- Eliachar Research Laboratory, Galilee Medical Center, P.O. Box 21, Nahariya 22100, Israel.
- Faculty of Medicine in the Galilee, Bar-Ilan University, Ramat Gan 52900, Israel.
| | - Anwar Rayan
- Institute of Applied Research-Galilee Society, P.O. Box 437, Shefa-Amr 20200, Israel.
- Drug Discovery Informatics Laboratory, QRC-Qasemi Research Center, Al-Qasemi Academic College, P.O. Box 124, Baka EL-Garbiah 30100, Israel.
| |
Collapse
|
16
|
Bordoloi M, Saikia S, Bordoloi PK, Kolita B, Dutta PP, Bhuyan PD, Dutta SC, Rao PG. Isolation, characterization and antifungal activity of very long chain alkane derivatives from Cinnamomum obtusifolium, Elaeocarpus lanceifolius and Baccaurea sapida. J Mol Struct 2017. [DOI: 10.1016/j.molstruc.2017.04.027] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
17
|
Improving virtual screening predictive accuracy of Human kallikrein 5 inhibitors using machine learning models. Comput Biol Chem 2017; 69:110-119. [DOI: 10.1016/j.compbiolchem.2017.05.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 12/18/2016] [Accepted: 05/26/2017] [Indexed: 12/23/2022]
|
18
|
Abstract
Rapid determination of whether a candidate compound will bind to a particular target receptor remains a stumbling block in drug discovery. We use an approach inspired by random matrix theory to decompose the known ligand set of a target in terms of orthogonal "signals" of salient chemical features, and distinguish these from the much larger set of ligand chemical features that are not relevant for binding to that particular target receptor. After removing the noise caused by finite sampling, we show that the similarity of an unknown ligand to the remaining, cleaned chemical features is a robust predictor of ligand-target affinity, performing as well or better than any algorithm in the published literature. We interpret our algorithm as deriving a model for the binding energy between a target receptor and the set of known ligands, where the underlying binding energy model is related to the classic Ising model in statistical physics.
Collapse
|
19
|
Viira B, Selyutina A, García-Sosa AT, Karonen M, Sinkkonen J, Merits A, Maran U. Design, discovery, modelling, synthesis, and biological evaluation of novel and small, low toxicity s-triazine derivatives as HIV-1 non-nucleoside reverse transcriptase inhibitors. Bioorg Med Chem 2016; 24:2519-2529. [PMID: 27108399 DOI: 10.1016/j.bmc.2016.04.018] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2015] [Revised: 03/10/2016] [Accepted: 04/08/2016] [Indexed: 11/15/2022]
Abstract
A set of top-ranked compounds from a multi-objective in silico screen was experimentally tested for toxicity and the ability to inhibit the activity of HIV-1 reverse transcriptase (RT) in cell-free assay and in cell-based assay using HIV-1 based virus-like particles. Detailed analysis of a commercial sample that indicated specific inhibition of HIV-1 reverse transcription revealed that a minor component that was structurally similar to that of the main compound was responsible for the strongest inhibition. As a result, novel s-triazine derivatives were proposed, modelled, discovered, and synthesised, and their antiviral activity and cellular toxicity were tested. Compounds 18a and 18b were found to be efficient HIV-1 RT inhibitors, with an IC50 of 5.6±1.1μM and 0.16±0.05μM in a cell-based assay using infectious HIV-1, respectively. Compound 18b also had no detectable toxicity for different human cell lines. Their binding mode and interactions with the RT suggest that there was strong and adaptable binding in a tight (NNRTI) hydrophobic pocket. In summary, this iterative study produced structural clues and led to a group of non-toxic, novel compounds to inhibit HIV-RT with up to nanomolar potency.
Collapse
Affiliation(s)
- Birgit Viira
- Institute of Chemistry, University of Tartu, Tartu 50411, Estonia
| | | | | | - Maarit Karonen
- Department of Chemistry, University of Turku, FI-20014 Turku, Finland
| | - Jari Sinkkonen
- Department of Chemistry, University of Turku, FI-20014 Turku, Finland
| | - Andres Merits
- Institute of Technology, University of Tartu, Tartu 50411, Estonia.
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu 50411, Estonia.
| |
Collapse
|
20
|
Lee H, Kang S, Kim W. Drug Repositioning for Cancer Therapy Based on Large-Scale Drug-Induced Transcriptional Signatures. PLoS One 2016; 11:e0150460. [PMID: 26954019 PMCID: PMC4783079 DOI: 10.1371/journal.pone.0150460] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 02/15/2016] [Indexed: 11/18/2022] Open
Abstract
An in silico chemical genomics approach is developed to predict drug repositioning (DR) candidates for three types of cancer: glioblastoma, lung cancer, and breast cancer. It is based on a recent large-scale dataset of ~20,000 drug-induced expression profiles in multiple cancer cell lines, which provides i) a global impact of transcriptional perturbation of both known targets and unknown off-targets, and ii) rich information on drug's mode-of-action. First, the drug-induced expression profile is shown more effective than other information, such as the drug structure or known target, using multiple HTS datasets as unbiased benchmarks. Particularly, the utility of our method was robustly demonstrated in identifying novel DR candidates. Second, we predicted 14 high-scoring DR candidates solely based on expression signatures. Eight of the fourteen drugs showed significant anti-proliferative activity against glioblastoma; i.e., ivermectin, trifluridine, astemizole, amlodipine, maprotiline, apomorphine, mometasone, and nortriptyline. Our DR score strongly correlated with that of cell-based experimental results; the top seven DR candidates were positive, corresponding to an approximately 20-fold enrichment compared with conventional HTS. Despite diverse original indications and known targets, the perturbed pathways of active DR candidates show five distinct patterns that form tight clusters together with one or more known cancer drugs, suggesting common transcriptome-level mechanisms of anti-proliferative activity.
Collapse
Affiliation(s)
- Haeseung Lee
- Ewha Research Center for Systems Biology, Division of Molecular & Life Sciences, Ewha Womans University, Seoul, Korea
| | - Seungmin Kang
- Ewha Research Center for Systems Biology, Division of Molecular & Life Sciences, Ewha Womans University, Seoul, Korea
| | - Wankyu Kim
- Ewha Research Center for Systems Biology, Division of Molecular & Life Sciences, Ewha Womans University, Seoul, Korea
- * E-mail:
| |
Collapse
|
21
|
Korkmaz S, Zararsiz G, Goksuluk D. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development. PLoS One 2015; 10:e0124600. [PMID: 25928885 PMCID: PMC4415797 DOI: 10.1371/journal.pone.0124600] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 03/03/2015] [Indexed: 12/18/2022] Open
Abstract
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
- * E-mail:
| | - Gokmen Zararsiz
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
| |
Collapse
|
22
|
Korkmaz S, Zararsiz G, Goksuluk D. Drug/nondrug classification using Support Vector Machines with various feature selection strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 117:51-60. [PMID: 25224081 DOI: 10.1016/j.cmpb.2014.08.009] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 08/15/2014] [Accepted: 08/27/2014] [Indexed: 06/03/2023]
Abstract
In conjunction with the advance in computer technology, virtual screening of small molecules has been started to use in drug discovery. Since there are thousands of compounds in early-phase of drug discovery, a fast classification method, which can distinguish between active and inactive molecules, can be used for screening large compound collections. In this study, we used Support Vector Machines (SVM) for this type of classification task. SVM is a powerful classification tool that is becoming increasingly popular in various machine-learning applications. The data sets consist of 631 compounds for training set and 216 compounds for a separate test set. In data pre-processing step, the Pearson's correlation coefficient used as a filter to eliminate redundant features. After application of the correlation filter, a single SVM has been applied to this reduced data set. Moreover, we have investigated the performance of SVM with different feature selection strategies, including SVM-Recursive Feature Elimination, Wrapper Method and Subset Selection. All feature selection methods generally represent better performance than a single SVM while Subset Selection outperforms other feature selection methods. We have tested SVM as a classification tool in a real-life drug discovery problem and our results revealed that it could be a useful method for classification task in early-phase of drug discovery.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey.
| | - Gokmen Zararsiz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| |
Collapse
|
23
|
García-Sosa AT, Maran U. Improving the use of ranking in virtual screening against HIV-1 integrase with triangular numbers and including ligand profiling with antitargets. J Chem Inf Model 2014; 54:3172-85. [PMID: 25303089 DOI: 10.1021/ci500300u] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
A delicate balance exists between a drug molecule's toxicity and its activity. Indeed, efficacy, toxicity, and side effect problems are a common cause for the termination of drug candidate compounds and development projects. To address this, an antitarget interaction profile is built and combined with virtual screening and cross docking for new inhibitors of HIV-1 integrase, in order to consider possible off-target interactions as early as possible in a drug or hit discovery program. New ranking techniques using triangular numbers improve ranking information on the compounds and recovery of known inhibitors into the top compounds using different docking programs. This improved ranking arises from using consensus of ranks between docking programs and ligand efficiencies to derive a new rank, instead of using absolute score values, or average of ranks. The triangular number rerank also allowed the objective combination of results from several protein targets or screen conditions and several programs. Triangular number reranking conserves more information than other reranking methods such as average of scores or averages of ranks. In addition, the use of triangular numbers for reranking makes possible the use of thresholds with a justified leeway based on the number of available known inhibitors, so that the majority of the compounds above the threshold in ranks compare to the compounds that have known experimentally determined biological activity. The battery of anti- or off-targets can be tailored to specific molecular or drug design challenges. In silico filters can thus be deployed in successive stages, for prefiltering, activity profiling, and for further analysis and triaging of libraries of compounds.
Collapse
|
24
|
Liu Z, Li Y, Han L, Li J, Liu J, Zhao Z, Nie W, Liu Y, Wang R. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 2014; 31:405-12. [DOI: 10.1093/bioinformatics/btu626] [Citation(s) in RCA: 264] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
|
25
|
García-Sosa AT, Tulp I, Langel K, Langel Ü. Peptide-ligand binding modeling of siRNA with cell-penetrating peptides. BIOMED RESEARCH INTERNATIONAL 2014; 2014:257040. [PMID: 25147791 PMCID: PMC4131515 DOI: 10.1155/2014/257040] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Accepted: 05/15/2014] [Indexed: 12/04/2022]
Abstract
The binding affinity of a series of cell-penetrating peptides (CPP) was modeled through docking and making use of the number of intermolecular hydrogen bonds, lipophilic contacts, and the number of sp3 molecular orbital hybridization carbons. The new ranking of the peptides is consistent with the experimentally determined efficiency in the downregulation of luciferase activity, which includes the peptides' ability to bind and deliver the siRNA into the cell. The predicted structures of the complexes of peptides to siRNA were stable throughout 10 ns long, explicit water molecular dynamics simulations. The stability and binding affinity of peptide-siRNA complexes was related to the sidechains and modifications of the CPPs, with the stearyl and quinoline groups improving affinity and stability. The reranking of the peptides docked to siRNA, together with explicit water molecular dynamics simulations, appears to be well suited to describe and predict the interaction of CPPs with siRNA.
Collapse
Affiliation(s)
| | - Indrek Tulp
- Institute of Chemistry, University of Tartu, Ravila 14a, 50411 Tartu, Estonia
| | - Kent Langel
- Institute of Technology, University of Tartu, Nooruse 1, 50411 Tartu, Estonia
| | - Ülo Langel
- Institute of Technology, University of Tartu, Nooruse 1, 50411 Tartu, Estonia
- Department of Neurochemistry, Stockholm University, 106 91 Stockholm, Sweden
| |
Collapse
|
26
|
García-Sosa AT. Hydration Properties of Ligands and Drugs in Protein Binding Sites: Tightly-Bound, Bridging Water Molecules and Their Effects and Consequences on Molecular Design Strategies. J Chem Inf Model 2013; 53:1388-405. [DOI: 10.1021/ci3005786] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
27
|
García-Sosa AT, Maran U. Drugs, non-drugs, and disease category specificity: organ effects by ligand pharmacology. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2013; 24:319-331. [PMID: 23534612 DOI: 10.1080/1062936x.2013.773373] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Important understanding can be gained from using molecular biology-based and chemistry-based techniques together. Bayesian classifiers have thus been developed in the present work using several statistically significant molecular properties of compiled datasets of drugs and non-drugs, including their disease category or organ. The results show they provide a useful classification and simplicity of several different ligand efficiencies and molecular properties. Early recall of drugs among non-drugs using the classifiers as a ranking tool is also provided. As the chemical space of compounds is addressed together with their anatomical characterization, chemical libraries can be improved to select for specific organ or disease. Eventually, by including even finer detail, the method may help in designing libraries with specific pharmacological or toxicological target chemical space. Alternatively, a lack of statistically significant differences in property density distributions may help in further describing compounds with possibility of activity on several organs or disease groups, and given their very similar or considerably overlapping chemical space, therefore wanted or unwanted side-effects. The overlaps between densities for several properties of organs or disease categories were calculated by integrating the area under the curves where they intersect. The naïve Bayesian classifiers are readily built, fast to score, and easily interpretable.
Collapse
Affiliation(s)
- A T García-Sosa
- Institute of Chemistry, University of Tartu, Tartu, Estonia.
| | | |
Collapse
|