1
|
Lorenc A, Badura A, Karolak M, Pałkowski Ł, Kubik Ł, Buciński A. Antimicrobial Activity Classification of Imidazolium Derivatives Predicted by Artificial Neural Networks. Pharm Res 2024; 41:891-898. [PMID: 38632156 PMCID: PMC11116175 DOI: 10.1007/s11095-024-03699-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 04/09/2024] [Indexed: 04/19/2024]
Abstract
PURPOSE This study assesses the Multilayer Perceptron (MLP) neural network, complemented by other Machine Learning techniques (CART, PCA), in predicting the antimicrobial activity of 140 newly designed imidazolium chlorides against Klebsiella pneumoniae before synthesis. Emphasis is on leveraging molecular properties for predictive analysis. METHODS Classification and regression decision trees (CART) identified the top 200 predictive molecular descriptors. Principal Component Analysis (PCA) reduced these descriptors to 5 components, retaining 99.57% of raw data information. Antimicrobial activity, categorized as high or low, was based on experimentally proven minimal inhibitory concentration (MIC), with a cut-point at MIC = 0.856 mol/L. A 12-fold cross-validation trained the MLP (architecture 5-12-2 with 5 Principal Components). RESULTS The MLP exhibited commendable performance, achieving almost 90% correct classifications across learning, validation, and test sets, outperforming models without PCA dimension reduction. Key metrics, including accuracy (0.907), sensitivity (0.905), specificity (0.909), and precision (0.891), were notably high. These results highlight the MLP model's efficacy with PCA as a high-quality classifier for determining antimicrobial activity. CONCLUSIONS The study concludes that the MLP neural network, along with CART and PCA, is a robust tool for predicting the antimicrobial activity class of imidazolium chlorides against Klebsiella pneumoniae. CART and PCA, used in this study, allowed input variable reduction without significant information loss. High classification accuracy and associated metrics affirm the method's potential utility in pre-synthesis assessments, offering valuable insights for antimicrobial compound design.
Collapse
Affiliation(s)
- Andżelika Lorenc
- Department of Biopharmacy, Faculty of Pharmacy, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, dr A. Jurasza 2, 85-089, Bydgoszcz, Poland.
| | - Anna Badura
- Department of Biopharmacy, Faculty of Pharmacy, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, dr A. Jurasza 2, 85-089, Bydgoszcz, Poland
| | - Maciej Karolak
- Department of Pharmaceutical Technology, Faculty of Pharmacy, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, dr A. Jurasza 2, 85-089, Bydgoszcz, Poland
| | - Łukasz Pałkowski
- Department of Pharmaceutical Technology, Faculty of Pharmacy, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, dr A. Jurasza 2, 85-089, Bydgoszcz, Poland
| | - Łukasz Kubik
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Gen. J. Hallera 107, 80-416, Gdańsk, Poland
| | - Adam Buciński
- Department of Biopharmacy, Faculty of Pharmacy, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, dr A. Jurasza 2, 85-089, Bydgoszcz, Poland
| |
Collapse
|
2
|
Parwez S, Chaurasia A, Mahapatra PP, Ahmed S, Siddiqi MI. Integrated machine learning-based virtual screening and biological evaluation for identification of potential inhibitors against cathepsin K. Mol Divers 2024:10.1007/s11030-024-10845-5. [PMID: 38662177 DOI: 10.1007/s11030-024-10845-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 03/11/2024] [Indexed: 04/26/2024]
Abstract
Cathepsin K is a type of cysteine proteinase that is primarily expressed in osteoclasts and has a key role in the breakdown of bone matrix protein during bone resorption. Many studies suggest that the deficiency of cathepsin K is concomitant with a suppression of osteoclast functioning, therefore rendering the resorptive properties of cathepsin K the most prominent target for osteoporosis. This innovative work has identified a novel anti-osteoporotic agent against Cathepsin K by using a comparison of machine learning and deep learning-based virtual screening followed by their biological evaluation. Out of ten shortlisted compounds, five of the compounds (JFD02945, JFD02944, RJC01981, KM08968 and SB01934) exhibit more than 50% inhibition of the Cathepsin K activity at 0.1 μM concentration and are considered to have a promising inhibitory effect against Cathepsin K. The comprehensive docking, MD simulation, and MM/PBSA investigations affirm the stable and effective interaction of these compounds with Cathepsin K to inhibit its function. Furthermore, the compounds RJC01981, KM08968 and SB01934 are represented to have promising anti-osteoporotic properties for the management of osteoporosis owing to their significantly well predicted ADMET properties.
Collapse
Affiliation(s)
- Shahid Parwez
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Animesh Chaurasia
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Pinaki Parsad Mahapatra
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Shakil Ahmed
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Mohammad Imran Siddiqi
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| |
Collapse
|
3
|
Li H, Zhang R, Min Y, Ma D, Zhao D, Zeng J. A knowledge-guided pre-training framework for improving molecular representation learning. Nat Commun 2023; 14:7568. [PMID: 37989998 PMCID: PMC10663446 DOI: 10.1038/s41467-023-43214-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 11/03/2023] [Indexed: 11/23/2023] Open
Abstract
Learning effective molecular feature representation to facilitate molecular property prediction is of great significance for drug discovery. Recently, there has been a surge of interest in pre-training graph neural networks (GNNs) via self-supervised learning techniques to overcome the challenge of data scarcity in molecular property prediction. However, current self-supervised learning-based methods suffer from two main obstacles: the lack of a well-defined self-supervised learning strategy and the limited capacity of GNNs. Here, we propose Knowledge-guided Pre-training of Graph Transformer (KPGT), a self-supervised learning framework to alleviate the aforementioned issues and provide generalizable and robust molecular representations. The KPGT framework integrates a graph transformer specifically designed for molecular graphs and a knowledge-guided pre-training strategy, to fully capture both structural and semantic knowledge of molecules. Through extensive computational tests on 63 datasets, KPGT exhibits superior performance in predicting molecular properties across various domains. Moreover, the practical applicability of KPGT in drug discovery has been validated by identifying potential inhibitors of two antitumor targets: hematopoietic progenitor kinase 1 (HPK1) and fibroblast growth factor receptor 1 (FGFR1). Overall, KPGT can provide a powerful and useful tool for advancing the artificial intelligence (AI)-aided drug discovery process.
Collapse
Affiliation(s)
- Han Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Ruotian Zhang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Yaosen Min
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China
| | - Dacheng Ma
- Research Center for Biological Computation, Zhejiang Province, Zhejiang Laboratory, 311100, Hangzhou, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, 100084, Beijing, China.
- School of Engineering, Westlake University, Zhejiang Province, 310030, Hangzhou, China.
| |
Collapse
|
4
|
Ghaemi Z, Asadollahi-Baboli M. Developing reliable classification of dual IDO1/TDO inhibitors using data fusion and majority voting. J Biomol Struct Dyn 2023:1-9. [PMID: 37921776 DOI: 10.1080/07391102.2023.2278079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 10/25/2023] [Indexed: 11/04/2023]
Abstract
Indoleamine 2,3-dioxygenase 1 (IDO1) and tryptophan 2,3-dioxygenase (TDO) are promising dual-targeting inhibitors in cancer and neurodegenerative diseases treatment. Data fusion of receptor-based and ligand-based information of dual IDO1/TDO inhibitors were employed for active/inactive classification performance. A reliable decision making procedure was used here to identify active/inactive dual IDO1/TDO inhibitors using majority voting method and pools of individual classifications instead of individual models. All classification models were validated using prediction set, cross-validation and y-scrambling tests. The classification outcomes indicate that the sensitivity, specificity, precision, accuracy, G-mean and F1 score values increases up to ∼90% using data fusion and majority voting method. Compare to individual classification models with a single prediction point, the majority voting method has more reliable results due to the integration of the pool of individual classification models. This classification strategy may lead to more reliable identification of active/inactive dual-targeting inhibitors in cancer immunotherapy.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Zahra Ghaemi
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Mazandaran, Iran
| | - M Asadollahi-Baboli
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Mazandaran, Iran
| |
Collapse
|
5
|
Wankhade N, Dayasagar U, Sharma A, Kamble P, Varma T, Garg P. DeepADRA2A: predicting adrenergic α2a inhibitors using deep learning. J Biomol Struct Dyn 2023:1-12. [PMID: 37837428 DOI: 10.1080/07391102.2023.2270056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 10/07/2023] [Indexed: 10/16/2023]
Abstract
Adrenergic α2a (ADRA2A) receptors play a crucial role in modulating various physiological actions, thereby influencing the proper functioning of different systems in the body. ADRA2A regulation is associated with a wide range of effects, including alterations in blood pressure, hypertension, heightened heart rate, etc. Inhibition of these receptors results in the release of noradrenaline, leading to heightened physiological activity, improved alertness, reduced blood pressure, and alleviation of hypertension. Conventional approaches for identifying ADRA2A inhibitors are burdened with high costs, labor-intensive procedures, and time-consuming processes. In light of these challenges, leveraging the power of artificial intelligence offers a promising solution for drug discovery and development. This study endeavors to harness the potential of artificial intelligence to develop robust models capable of accurately predicting ADRA2A inhibitors and non-inhibitors. By doing so, we aim to streamline and expedite the identification of potential drug candidates in this domain. In this study, we employed four different machine learning (ML) and deep learning (DL) algorithms to develop prediction models based on various molecular descriptors (1D, 2D, and molecular fingerprints). Among these models, the DL-based prediction model demonstrated superior performance, achieving accuracies of 98.25% and 97.23% on the training and test datasets, respectively. These results underscore the efficacy of DL-based model, as a highly effective tool for predicting ADRA2A inhibitors. The model is made available at https://github.com/PGlab-NIPER/DeepADRA2A.git.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Nitin Wankhade
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Sahibzada Ajit Singh Nagar, Punjab, India
| | - Ummireddy Dayasagar
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Sahibzada Ajit Singh Nagar, Punjab, India
| | - Anju Sharma
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Sahibzada Ajit Singh Nagar, Punjab, India
| | - Pradnya Kamble
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Sahibzada Ajit Singh Nagar, Punjab, India
| | - Tanmaykumar Varma
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Sahibzada Ajit Singh Nagar, Punjab, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Sahibzada Ajit Singh Nagar, Punjab, India
| |
Collapse
|
6
|
Shaker B, Lee J, Lee Y, Yu MS, Lee HM, Lee E, Kang HC, Oh KS, Kim HW, Na D. A machine learning-based quantitative model (LogBB_Pred) to predict the blood-brain barrier permeability (logBB value) of drug compounds. Bioinformatics 2023; 39:btad577. [PMID: 37713469 PMCID: PMC10560102 DOI: 10.1093/bioinformatics/btad577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 08/30/2023] [Accepted: 09/14/2023] [Indexed: 09/17/2023] Open
Abstract
MOTIVATION Efficient assessment of the blood-brain barrier (BBB) penetration ability of a drug compound is one of the major hurdles in central nervous system drug discovery since experimental methods are costly and time-consuming. To advance and elevate the success rate of neurotherapeutic drug discovery, it is essential to develop an accurate computational quantitative model to determine the absolute logBB value (a logarithmic ratio of the concentration of a drug in the brain to its concentration in the blood) of a drug candidate. RESULTS Here, we developed a quantitative model (LogBB_Pred) capable of predicting a logBB value of a query compound. The model achieved an R2 of 0.61 on an independent test dataset and outperformed other publicly available quantitative models. When compared with the available qualitative (classification) models that only classified whether a compound is BBB-permeable or not, our model achieved the same accuracy (0.85) with the best qualitative model and far-outperformed other qualitative models (accuracies between 0.64 and 0.70). For further evaluation, our model, quantitative models, and the qualitative models were evaluated on a real-world central nervous system drug screening library. Our model showed an accuracy of 0.97 while the other models showed an accuracy in the range of 0.29-0.83. Consequently, our model can accurately classify BBB-permeable compounds as well as predict the absolute logBB values of drug candidates. AVAILABILITY AND IMPLEMENTATION Web server is freely available on the web at http://ssbio.cau.ac.kr/software/logbb_pred/. The data used in this study are available to download at http://ssbio.cau.ac.kr/software/logbb_pred/dataset.zip.
Collapse
Affiliation(s)
- Bilal Shaker
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Jingyu Lee
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Yunhyeok Lee
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Myeong-Sang Yu
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Hyang-Mi Lee
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Eunee Lee
- Division of Pediatric Neurology, Department of Pediatrics, Severance Children’s Hospital, Yonsei University College of Medicine, Epilepsy Research Institute, Seoul 03722, Republic of Korea
| | - Hoon-Chul Kang
- Department of Anatomy College of Medicine, Yonsei University, Seoul 03722, Republic of Korea
| | - Kwang-Seok Oh
- Convergence Drug Research Center, Korea Research Institute of Chemical Technology, Daejeon 34114, Republic of Korea
| | - Hyung Wook Kim
- Department of Bio-integrated Science and Technology, College of Life Sciences, Sejong University, Seoul 05006, Republic of Korea
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
7
|
de Oliveira LHD, Cruz JN, Dos Santos CBR, de Melo EB. Multivariate QSAR, similarity search and ADMET studies based in a set of methylamine derivatives described as dopamine transporter inhibitors. Mol Divers 2023:10.1007/s11030-023-10724-5. [PMID: 37670118 DOI: 10.1007/s11030-023-10724-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 08/27/2023] [Indexed: 09/07/2023]
Abstract
The dopamine transporter (DAT), responsible for the regulation of dopaminergic neurotransmission, is implicated in the etiology of several neuropsychiatric disorders which, in turn, have contributed to high rates of disability and numerous deaths in recent years, significantly impacting the global health system. Although the research for new drugs for the treatment of neuropsychiatric disorders has evolved in recent years, the availability of DAT-selective drugs that do not generate the same psychostimulant effects observed in drugs of abuse remains scarce. Therefore, we performed a QSAR study based on a dataset of 36 methylamine derivatives described as DAT inhibitors. The model was obtained based only in descriptors derived from 2D structures, and it was validated and generated satisfactory results considering the metrics used for internal and external validation. Subsequently, a virtual screening step also based on 2D similarity was performed, where it was possible to identify a total of 1157 compounds. After a series of reductions of the set using toxicity filters, applicability domain evaluation, and pharmacokinetic properties in silico assessment, seven hit compounds were selected as the most promising to be used, in future studies, as new scaffolds for the development of new DAT inhibitors.
Collapse
Affiliation(s)
- Luiz Henrique Dias de Oliveira
- Theorical Medicinal and Environmental Chemistry Laboratory (LQMAT), Department of Pharmacy, Western Paraná State University (UNIOESTE), 2069 Universitária St., Cascavel, PR, 85819-110, Brazil
| | - Jorddy Neves Cruz
- Laboratory of Modeling and Computational Chemistry, Department of Biological and Health Sciences, Federal University of Amapá, Macapá, AP, 68902-280, Brazil
| | - Cleydson Breno Rodrigues Dos Santos
- Laboratory of Modeling and Computational Chemistry, Department of Biological and Health Sciences, Federal University of Amapá, Macapá, AP, 68902-280, Brazil
| | - Eduardo Borges de Melo
- Theorical Medicinal and Environmental Chemistry Laboratory (LQMAT), Department of Pharmacy, Western Paraná State University (UNIOESTE), 2069 Universitária St., Cascavel, PR, 85819-110, Brazil.
| |
Collapse
|
8
|
Varikoti RA, Schultz KJ, Kombala CJ, Kruel A, Brandvold KR, Zhou M, Kumar N. Integrated data-driven and experimental approaches to accelerate lead optimization targeting SARS-CoV-2 main protease. J Comput Aided Mol Des 2023:10.1007/s10822-023-00509-1. [PMID: 37314632 DOI: 10.1007/s10822-023-00509-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 05/23/2023] [Indexed: 06/15/2023]
Abstract
Identification of potential therapeutic candidates can be expedited by integrating computational modeling with domain aware machine learning (ML) models followed by experimental validation in an iterative manner. Generative deep learning models can generate thousands of new candidates, however, their physiochemical and biochemical properties are typically not fully optimized. Using our recently developed deep learning models and a scaffold as a starting point, we generated tens of thousands of compounds for SARS-CoV-2 Mpro that preserve the core scaffold. We utilized and implemented several computational tools such as structural alert and toxicity analysis, high throughput virtual screening, ML-based 3D quantitative structure-activity relationships, multi-parameter optimization, and graph neural networks on generated candidates to predict biological activity and binding affinity in advance. As a result of these combined computational endeavors, eight promising candidates were singled out and put through experimental testing using Native Mass Spectrometry and FRET-based functional assays. Two of the tested compounds with quinazoline-2-thiol and acetylpiperidine core moieties showed IC[Formula: see text] values in the low micromolar range: [Formula: see text] [Formula: see text]M and 3.41±0.0015 [Formula: see text]M, respectively. Molecular dynamics simulations further highlight that binding of these compounds results in allosteric modulations within the chain B and the interface domains of the Mpro. Our integrated approach provides a platform for data driven lead optimization with rapid characterization and experimental validation in a closed loop that could be applied to other potential protein targets.
Collapse
Affiliation(s)
- Rohith Anand Varikoti
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Katherine J Schultz
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Chathuri J Kombala
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Agustin Kruel
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Kristoffer R Brandvold
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Mowei Zhou
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA
| | - Neeraj Kumar
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99352, USA.
| |
Collapse
|
9
|
Dutschmann TM, Kinzel L, Ter Laak A, Baumann K. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. J Cheminform 2023; 15:49. [PMID: 37118768 PMCID: PMC10142532 DOI: 10.1186/s13321-023-00709-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 03/10/2023] [Indexed: 04/30/2023] Open
Abstract
It is insightful to report an estimator that describes how certain a model is in a prediction, additionally to the prediction alone. For regression tasks, most approaches implement a variation of the ensemble method, apart from few exceptions. Instead of a single estimator, a group of estimators yields several predictions for an input. The uncertainty can then be quantified by measuring the disagreement between the predictions, for example by the standard deviation. In theory, ensembles should not only provide uncertainties, they also boost the predictive performance by reducing errors arising from variance. Despite the development of novel methods, they are still considered the "golden-standard" to quantify the uncertainty of regression models. Subsampling-based methods to obtain ensembles can be applied to all models, regardless whether they are related to deep learning or traditional machine learning. However, little attention has been given to the question whether the ensemble method is applicable to virtually all scenarios occurring in the field of cheminformatics. In a widespread and diversified attempt, ensembles are evaluated for 32 datasets of different sizes and modeling difficulty, ranging from physicochemical properties to biological activities. For increasing ensemble sizes with up to 200 members, the predictive performance as well as the applicability as uncertainty estimator are shown for all combinations of five modeling techniques and four molecular featurizations. Useful recommendations were derived for practitioners regarding the success and minimum size of ensembles, depending on whether predictive performance or uncertainty quantification is of more importance for the task at hand.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Lennart Kinzel
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Antonius Ter Laak
- Bayer AG, Research & Development, Pharmaceuticals, Muellerstrasse 178, 13353, Berlin, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany.
| |
Collapse
|
10
|
DE-INTERACT: A machine-learning-based predictive tool for the drug-excipient interaction study during product development-Validation through Paracetamol and Vanillin as a case study. Int J Pharm 2023; 637:122839. [PMID: 36931538 DOI: 10.1016/j.ijpharm.2023.122839] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 03/04/2023] [Accepted: 03/10/2023] [Indexed: 03/17/2023]
Abstract
The compatibility of drugs with excipients plays a crucial role in the prospective stability of pharmaceutical formulations. Apart from real-time stability studies, conventional analytical tools like DSC, FTIR, NMR, and chromatography help identify the possibilities of drug-excipient interactions. Machine learning can assist in developing a predictive tool for drug-excipient incompatibility. In the present work, PubChem Fingerprint is employed as the descriptor of compounds that thoroughly represents the drug's and excipient's chemistry. The 881-bit binary fingerprints of each drug and excipient make 1762 inputs, and one categorical output makes an instance in the dataset. A dataset of more than 3500 instances of drugs and excipients is carefully selected from peer-reviewed research papers. Rigorous training of the Artificial Neural Network (ANN) model was performed with maximum validation accuracy, minimum validation loss, and maximum validation precision as the checkpoints. The machine learning model (DE-Interact) was trained, achieving training and validation accuracies of 0.9930 and 0.9161, respectively. The performance of the DE-Interact model was evaluated by confirming three incompatible predictions by conventional analytical tools. Paracetamol with vanillin, paracetamol with methylparaben, and brinzolamide with polyethyleneglycol are these instances which are predicted as incompatible by the DE-Interact. DSC, FTIR, HPTLC, and HPLC analysis confirm the prediction. The present work offers a reliable DE-Interact tool for quick referencing while selecting excipients in formulation design.
Collapse
|
11
|
Mohd Yusof N, Muda AK, Pratama SF, Abraham A. A novel nonlinear time-varying sigmoid transfer function in binary whale optimization algorithm for descriptors selection in drug classification. Mol Divers 2023; 27:71-80. [PMID: 35254585 DOI: 10.1007/s11030-022-10410-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 02/15/2022] [Indexed: 02/08/2023]
Abstract
In computational chemistry, the high-dimensional molecular descriptors contribute to the curse of dimensionality issue. Binary whale optimization algorithm (BWOA) is a recently proposed metaheuristic optimization algorithm that has been efficiently applied in feature selection. The main contribution of this paper is a new version of the nonlinear time-varying Sigmoid transfer function to improve the exploitation and exploration activities in the standard whale optimization algorithm (WOA). A new BWOA algorithm, namely BWOA-3, is introduced to solve the descriptors selection problem, which becomes the second contribution. To validate BWOA-3 performance, a high-dimensional drug dataset is employed. The proficiency of the proposed BWOA-3 and the comparative optimization algorithms are measured based on convergence speed, the length of the selected feature subset, and classification performance (accuracy, specificity, sensitivity, and f-measure). In addition, statistical significance tests are also conducted using the Friedman test and Wilcoxon signed-rank test. The comparative optimization algorithms include two BWOA variants, binary bat algorithm (BBA), binary gray wolf algorithm (BGWOA), and binary manta-ray foraging algorithm (BMRFO). As the final contribution, from all experiments, this study has successfully revealed the superiority of BWOA-3 in solving the descriptors selection problem and improving the Amphetamine-type Stimulants (ATS) drug classification performance.
Collapse
Affiliation(s)
- Norfadzlia Mohd Yusof
- Fakulti Teknologi Kejuruteraan Elektrik dan Elektronik, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100, Durian Tunggal, Melaka, Malaysia.
| | - Azah Kamilah Muda
- Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100, Durian Tunggal, Melaka, Malaysia
| | - Satrya Fajri Pratama
- Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100, Durian Tunggal, Melaka, Malaysia
| | - Ajith Abraham
- Machine Intelligence Research Labs (MIR Labs) Scientific Network for Innovation and Research Excellence, Auburn, WA, USA
| |
Collapse
|
12
|
Luo Y, Wang P, Mou M, Zheng H, Hong J, Tao L, Zhu F. A novel strategy for designing the magic shotguns for distantly related target pairs. Brief Bioinform 2023; 24:6984790. [PMID: 36631399 DOI: 10.1093/bib/bbac621] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 11/09/2022] [Accepted: 12/17/2022] [Indexed: 01/13/2023] Open
Abstract
Due to its promising capacity in improving drug efficacy, polypharmacology has emerged to be a new theme in the drug discovery of complex disease. In the process of novel multi-target drugs (MTDs) discovery, in silico strategies come to be quite essential for the advantage of high throughput and low cost. However, current researchers mostly aim at typical closely related target pairs. Because of the intricate pathogenesis networks of complex diseases, many distantly related targets are found to play crucial role in synergistic treatment. Therefore, an innovational method to develop drugs which could simultaneously target distantly related target pairs is of utmost importance. At the same time, reducing the false discovery rate in the design of MTDs remains to be the daunting technological difficulty. In this research, effective small molecule clustering in the positive dataset, together with a putative negative dataset generation strategy, was adopted in the process of model constructions. Through comprehensive assessment on 10 target pairs with hierarchical similarity-levels, the proposed strategy turned out to reduce the false discovery rate successfully. Constructed model types with much smaller numbers of inhibitor molecules gained considerable yields and showed better false-hit controllability than before. To further evaluate the generalization ability, an in-depth assessment of high-throughput virtual screening on ChEMBL database was conducted. As a result, this novel strategy could hierarchically improve the enrichment factors for each target pair (especially for those distantly related/unrelated target pairs), corresponding to target pair similarity-levels.
Collapse
Affiliation(s)
- Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Panpan Wang
- College of Chemistry and Pharmaceutical Engineering, Huanghuai University, Zhumadian 463000, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Hanqi Zheng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jiajun Hong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
13
|
Yang J, Cai Y, Zhao K, Xie H, Chen X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today 2022; 27:103356. [PMID: 36113834 DOI: 10.1016/j.drudis.2022.103356] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 07/28/2022] [Accepted: 09/08/2022] [Indexed: 11/22/2022]
Abstract
Molecular fingerprints are used to represent chemical (structural, physicochemical, etc.) properties of large-scale chemical sets in a low computational cost way. They have a prominent role in transforming chemical data sets into consistent input formats (bit strings or numeric values) suitable for in silico approaches. In this review, we summarize and classify common and state-of-the-art fingerprints into eight different types (dictionary based, circular, topological, pharmacophore, protein-ligand interaction, shape based, reinforced, and multi). We also highlight applications of fingerprints in early drug research and development (R&D). Thus, this review provides a guide for the selection of appropriate fingerprints of compounds (or ligand-protein complexes) for use in drug R&D.
Collapse
Affiliation(s)
- Jingbo Yang
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Yiyang Cai
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Kairui Zhao
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Hongbo Xie
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| | - Xiujie Chen
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| |
Collapse
|
14
|
Rezaie H, Asadollahi-Baboli M, Hassaninejad-Darzi SK. Hybrid consensus and k-nearest neighbours (kNN) strategies to classify dual BRD4/PLK1 inhibitors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:779-792. [PMID: 36330747 DOI: 10.1080/1062936x.2022.2139292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 10/17/2022] [Indexed: 06/16/2023]
Abstract
A novel decision-making procedure is proposed here for the first time to identify active/inactive and selective/non-selective dual inhibitors using consensus approaches and pools of k-nearest neighbours (kNN) classifications instead of individual models. Dual BRD4/PLK1 inhibition with adequate selectivity is a potential therapeutic strategy for targeting tumour cells in high-risk patients. We report the unique way to identify both active and selective dual BRD4/PLK1 inhibitors using consensus and kNN strategies together with two sources of receptor-based and ligand-based information which are the ranked binding energies of residues and important molecular features, respectively. The results of consensus approaches were compared with the results of individual kNN models. The chemical space similarity was measured using three different distance functions to increase the reliability. All activity and selectivity classification models were validated using cross-validation and y-randomization tests. The outcomes show that consensus approaches can increase the reliability and accuracy of active/inactive or selective/non-selective detections up to 90%. Consensus approaches also reached more balanced values of sensitivity and specificity compared to the individual kNN models because of the compensation in the integration of diverse sources of information.
Collapse
Affiliation(s)
- H Rezaie
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Iran
| | - M Asadollahi-Baboli
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Iran
| | - S K Hassaninejad-Darzi
- Department of Chemistry, Faculty of Science, Babol Noshirvani University of Technology, Babol, Iran
| |
Collapse
|
15
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
16
|
Saldívar-González FI, Aldas-Bulos VD, Medina-Franco JL, Plisson F. Natural product drug discovery in the artificial intelligence era. Chem Sci 2022; 13:1526-1546. [PMID: 35282622 PMCID: PMC8827052 DOI: 10.1039/d1sc04471k] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/10/2021] [Indexed: 12/19/2022] Open
Abstract
Natural products (NPs) are primarily recognized as privileged structures to interact with protein drug targets. Their unique characteristics and structural diversity continue to marvel scientists for developing NP-inspired medicines, even though the pharmaceutical industry has largely given up. High-performance computer hardware, extensive storage, accessible software and affordable online education have democratized the use of artificial intelligence (AI) in many sectors and research areas. The last decades have introduced natural language processing and machine learning algorithms, two subfields of AI, to tackle NP drug discovery challenges and open up opportunities. In this article, we review and discuss the rational applications of AI approaches developed to assist in discovering bioactive NPs and capturing the molecular "patterns" of these privileged structures for combinatorial design or target selectivity.
Collapse
Affiliation(s)
- F I Saldívar-González
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México Avenida Universidad 3000 04510 Mexico Mexico
| | - V D Aldas-Bulos
- Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN Irapuato Guanajuato Mexico
| | - J L Medina-Franco
- DIFACQUIM Research Group, School of Chemistry, Department of Pharmacy, Universidad Nacional Autónoma de México Avenida Universidad 3000 04510 Mexico Mexico
| | - F Plisson
- CONACYT - Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), Centro de Investigación y de Estudios Avanzados del IPN Irapuato Guanajuato Mexico
| |
Collapse
|
17
|
Schmidt S, Schindler M, Eriksson L. Block-wise exploration of molecular descriptors with Multi-block Orthogonal Component Analysis (MOCA). Mol Inform 2021; 41:e2100165. [PMID: 34878230 PMCID: PMC9285065 DOI: 10.1002/minf.202100165] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/24/2021] [Indexed: 11/13/2022]
Abstract
Data tables for machine learning and structure‐activity relationship modelling (QSAR) are often naturally organized in blocks of data, where multiple molecular representations or sets of descriptors form the blocks. Multi‐block Orthogonal Component Analysis (MOCA), a new analytical tool, can be used to explore such data structures in a single model, identifying principal components that are unique to a single block or joint over multiple blocks. We applied MOCA to two sets of 550 and 300 molecules and up to 9213 molecular descriptors organized in 11 blocks. The MOCA models reveal relationships between the blocks and overarching trends across the whole dataset. Based on the MOCA joint components, we propose a quantitative metric for the redundancy of blocks, useful for a priori block‐wise feature selection or evaluation of new molecular representations. The second data set includes 7 ecotoxicological study endpoints for crop protection chemicals, for which we (re‐)discovered some general trends and linked them to molecular properties. Using a single MOCA model we estimated the predictive potential of each block and the model‐ability of the target block.
Collapse
Affiliation(s)
- Sebastian Schmidt
- Bayer AG, Crop Science Division, Environmental Safety, Alfred-Nobel-Str. 50, 40789, Monheim, Germany
| | - Michael Schindler
- Bayer AG, Crop Science Division, Environmental Safety, Alfred-Nobel-Str. 50, 40789, Monheim, Germany
| | - Lennart Eriksson
- Sartorius Stedim Data Analytics AB, Östra Strandgatan 24, SE-903 33, Umeå, Sweden
| |
Collapse
|
18
|
Zuorro A. Water Activity Prediction in Sugar and Polyol Systems Using Theoretical Molecular Descriptors. Int J Mol Sci 2021; 22:11044. [PMID: 34681700 PMCID: PMC8540113 DOI: 10.3390/ijms222011044] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Revised: 10/08/2021] [Accepted: 10/09/2021] [Indexed: 12/01/2022] Open
Abstract
Water activity is a key factor in the development of pharmaceutical, cosmetic, and food products. In aqueous solutions of nonelectrolytes, the Norrish model provides a simple and effective way to evaluate this quantity. However, it contains a parameter, known as the Norrish constant, that must be estimated from experimental data. In this study, a new strategy is proposed for the prediction of water activity in the absence of experimental information, based on the use of theoretical molecular descriptors for characterizing the effects of a solute. This approach was applied to the evaluation of water activity in the presence of sugars (glucose, fructose, xylose, sucrose) and polyols (sorbitol, xylitol, glycerol, erythritol). The use of two descriptors related to the constitutional and connectivity properties of the solutes was first investigated. Subsequently, a new theoretical descriptor, named the global information index (G), was developed. By using this index, the water activity curves in the binary systems were reconstructed. The positive results obtained support the proposed strategy, as well as the possibility of including, in a single information index, the main molecular features of a solute that determine its effects on water activity.
Collapse
Affiliation(s)
- Antonio Zuorro
- Department of Chemical Engineering, Materials and Environment, Sapienza University, 00185 Rome, Italy
| |
Collapse
|
19
|
Abstract
Molecular descriptors encode a variety of molecular representations for computer-assisted drug discovery. Here, we focus on the Weighted Holistic Atom Localization and Entity Shape (WHALES) descriptors, which were originally designed for scaffold hopping from natural products to synthetic molecules. WHALES descriptors capture molecular shape and partial charges simultaneously. We introduce the key aspects of the WHALES concept and provide a step-by-step guide on how to use these descriptors for virtual compound screening and scaffold hopping. The results presented can be reproduced by using the code freely available from URL: github.com/ETHmodlab/scaffold_hopping_whales .
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Zurich, Switzerland.
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
20
|
Chuang KV, Gunsalus LM, Keiser MJ. Learning Molecular Representations for Medicinal Chemistry. J Med Chem 2020; 63:8705-8722. [PMID: 32366098 DOI: 10.1021/acs.jmedchem.0c00385] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The accurate modeling and prediction of small molecule properties and bioactivities depend on the critical choice of molecular representation. Decades of informatics-driven research have relied on expert-designed molecular descriptors to establish quantitative structure-activity and structure-property relationships for drug discovery. Now, advances in deep learning make it possible to efficiently and compactly learn molecular representations directly from data. In this review, we discuss how active research in molecular deep learning can address limitations of current descriptors and fingerprints while creating new opportunities in cheminformatics and virtual screening. We provide a concise overview of the role of representations in cheminformatics, key concepts in deep learning, and argue that learning representations provides a way forward to improve the predictive modeling of small molecule bioactivities and properties.
Collapse
Affiliation(s)
- Kangway V Chuang
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94143, United States
| | - Laura M Gunsalus
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94143, United States
| | - Michael J Keiser
- Department of Pharmaceutical Chemistry, Department of Bioengineering & Therapeutic Sciences, Institute for Neurodegenerative Diseases, Kavli Institute for Fundamental Neuroscience, Bakar Computational Health Sciences Institute, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94143, United States
| |
Collapse
|
21
|
García-Jacas CR, Marrero-Ponce Y, Vivas-Reyes R, Suárez-Lezcano J, Martinez-Rios F, Terán JE, Aguilera-Mendoza L. Distributed and multicore QuBiLS-MIDAS software v2.0: Computing chiral, fuzzy, weighted and truncated geometrical molecular descriptors based on tensor algebra. J Comput Chem 2020; 41:1209-1227. [PMID: 32058625 DOI: 10.1002/jcc.26167] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 01/22/2020] [Accepted: 01/26/2020] [Indexed: 12/12/2022]
Abstract
Advances to the distributed, multi-core and fully cross-platform QuBiLS-MIDAS software v2.0 (http://tomocomd.com/qubils-midas) are reported in this article since the v1.0 release. The QuBiLS-MIDAS software is the only one that computes atom-pair and alignment-free geometrical MDs (3D-MDs) from several distance metrics other than the Euclidean distance, as well as alignment-free 3D-MDs that codify structural information regarding the relations among three and four atoms of a molecule. The most recent features added to the QuBiLS-MIDAS software v2.0 are related (a) to the calculation of atomic weightings from indices based on the vertex-degree invariant (e.g., Alikhanidi index); (b) to consider central chirality during the molecular encoding; (c) to use measures based on clustering methods and statistical functions to codify structural information among more than two atoms; (d) to the use of a novel method based on fuzzy membership functions to spherically truncate inter-atomic relations; and (e) to the use of weighted and fuzzy aggregation operators to compute global 3D-MDs according to the importance and/or interrelation of the atoms of a molecule during the molecular encoding. Moreover, a novel module to compute QuBiLS-MIDAS 3D-MDs from their headings was also developed. This module can be used either by the graphical user interface or by means of the software library. By using the library, both the predictive models built with the QuBiLS-MIDAS 3D-MDs and the QuBiLS-MIDAS 3D-MDs calculation can be embedded in other tools. A set of predefined QuBiLS-MIDAS 3D-MDs with high information content and low redundancy on a set comprised of 20,469 compounds is also provided to be employed in further cheminformatics tasks. This set of predefined 3D-MDs evidenced better performance than all the universe of Dragon (v5.5) and PaDEL 0D-to-3D MDs in variability studies, whereas a linear independence study proved that these QuBiLS-MIDAS 3D-MDs codify chemical information orthogonal to the Dragon 0D-to-3D MDs. This set of predefined 3D-MDs would be periodically updated as long as new results be achieved. In general, this report highlights our continued efforts to provide a better tool for a most suitable characterization of compounds, and in this way, to contribute to obtaining better outcomes in future applications.
Collapse
Affiliation(s)
- César R García-Jacas
- Cátedras Conacyt - Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja, California, Mexico
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador.,Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Quito, Pichincha, Ecuador.,Grupo GINUMED, Corporacion Universitaria Rafael Nuñez, Facultad de Salud, Programa de Medicina, Cartagena, Colombia.,Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Spain
| | - Ricardo Vivas-Reyes
- Grupo de Química Cuántica y Teórica de la Universidad de Cartagena - Facultad de Ciencias Exactas y Naturales. Programa de Química. Campus de San Pablo, Cartagena, Colombia.,Grupo CipTec, Facultad de Ingenierias. Fundacion Universitaria Tecnologico Comfenalco - Cartagena, Cartagena, Bolívar, Colombia
| | - José Suárez-Lezcano
- Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador
| | | | - Julio E Terán
- Department of Textile Engineering, Chemistry and Science, College of Textiles, NorthCarolina State University, Raleigh, NC, USA
| | - Longendri Aguilera-Mendoza
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| |
Collapse
|
22
|
Nantasenamat C. Best Practices for Constructing Reproducible QSAR Models. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2020. [DOI: 10.1007/978-1-0716-0150-1_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
23
|
Wei Y, Li W, Du T, Hong Z, Lin J. Targeting HIV/HCV Coinfection Using a Machine Learning-Based Multiple Quantitative Structure-Activity Relationships (Multiple QSAR) Method. Int J Mol Sci 2019; 20:ijms20143572. [PMID: 31336592 PMCID: PMC6678913 DOI: 10.3390/ijms20143572] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 07/13/2019] [Accepted: 07/21/2019] [Indexed: 12/11/2022] Open
Abstract
Human immunodeficiency virus type-1 and hepatitis C virus (HIV/HCV) coinfection occurs when a patient is simultaneously infected with both human immunodeficiency virus type-1 (HIV-1) and hepatitis C virus (HCV), which is common today in certain populations. However, the treatment of coinfection is a challenge because of the special considerations needed to ensure hepatic safety and avoid drug–drug interactions. Multitarget inhibitors with less toxicity may provide a promising therapeutic strategy for HIV/HCV coinfection. However, the identification of one molecule that acts on multiple targets simultaneously by experimental evaluation is costly and time-consuming. In silico target prediction tools provide more opportunities for the development of multitarget inhibitors. In this study, by combining Naïve Bayes (NB) and support vector machine (SVM) algorithms with two types of molecular fingerprints, MACCS and extended connectivity fingerprints 6 (ECFP6), 60 classification models were constructed to predict compounds that were active against 11 HIV-1 targets and four HCV targets based on a multiple quantitative structure–activity relationships (multiple QSAR) method. Five-fold cross-validation and test set validation were performed to measure the performance of the 60 classification models. Our results show that the 60 multiple QSAR models appeared to have high classification accuracy in terms of the area under the ROC curve (AUC) values, which ranged from 0.83 to 1 with a mean value of 0.97 for the HIV-1 models and from 0.84 to 1 with a mean value of 0.96 for the HCV models. Furthermore, the 60 models were used to comprehensively predict the potential targets of an additional 46 compounds, including 27 approved HIV-1 drugs, 10 approved HCV drugs and nine selected compounds known to be active against one or more targets of HIV-1 or HCV. Finally, 20 hits, including seven approved HIV-1 drugs, four approved HCV drugs, and nine other compounds, were predicted to be HIV/HCV coinfection multitarget inhibitors. The reported bioactivity data confirmed that seven out of nine compounds actually interacted with HIV-1 and HCV targets simultaneously with diverse binding affinities. The remaining predicted hits and chemical-protein interaction pairs with the potential ability to suppress HIV/HCV coinfection are worthy of further experimental investigation. This investigation shows that the multiple QSAR method is useful in predicting chemical-protein interactions for the discovery of multitarget inhibitors and provides a unique strategy for the treatment of HIV/HCV coinfection.
Collapse
Affiliation(s)
- Yu Wei
- State Key Laboratory of Medicinal Chemical Biology, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Haihe Education Park, 38 Tongyan Road, Tianjin 300353, China
| | - Wei Li
- State Key Laboratory of Medicinal Chemical Biology, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Haihe Education Park, 38 Tongyan Road, Tianjin 300353, China
- Platform of Pharmaceutical Intelligence, Tianjin International Joint Academy of Biomedicine, Tianjin 300000, China
| | - Tengfei Du
- State Key Laboratory of Medicinal Chemical Biology, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Haihe Education Park, 38 Tongyan Road, Tianjin 300353, China
| | - Zhangyong Hong
- State Key Laboratory of Medicinal Chemical Biology, College of Life Sciences, Nankai University, 94 Weijin Road, Tianjin 300071, China.
| | - Jianping Lin
- State Key Laboratory of Medicinal Chemical Biology, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Haihe Education Park, 38 Tongyan Road, Tianjin 300353, China.
- Platform of Pharmaceutical Intelligence, Tianjin International Joint Academy of Biomedicine, Tianjin 300000, China.
- Biodesign Center, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
| |
Collapse
|