1
|
Daghighi A, Casanola-Martin GM, Iduoku K, Kusic H, González-Díaz H, Rasulev B. Multi-Endpoint Acute Toxicity Assessment of Organic Compounds Using Large-Scale Machine Learning Modeling. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:10116-10127. [PMID: 38797941 DOI: 10.1021/acs.est.4c01017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
In recent years, alternative animal testing methods such as computational and machine learning approaches have become increasingly crucial for toxicity testing. However, the complexity and scarcity of available biomedical data challenge the development of predictive models. Combining nonlinear machine learning together with multicondition descriptors offers a solution for using data from various assays to create a robust model. This work applies multicondition descriptors (MCDs) to develop a QSTR (Quantitative Structure-Toxicity Relationship) model based on a large toxicity data set comprising more than 80,000 compounds and 59 different end points (122,572 data points). The prediction capabilities of developed single-task multi-end point machine learning models as well as a novel data analysis approach with the use of Convolutional Neural Networks (CNN) are discussed. The results show that using MCDs significantly improves the model and using them with CNN-1D yields the best result (R2train = 0.93, R2ext = 0.70). Several structural features showed a high level of contribution to the toxicity, including van der Waals surface area (VSA), number of nitrogen-containing fragments (nN+), presence of S-P fragments, ionization potential, and presence of C-N fragments. The developed models can be very useful tools to predict the toxicity of various compounds under different conditions, enabling quick toxicity assessment of new compounds.
Collapse
Affiliation(s)
- Amirreza Daghighi
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Gerardo M Casanola-Martin
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Kweeni Iduoku
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Hrvoje Kusic
- Faculty of Chemical Engineering and Technology, University of Zagreb, Marulicev Trg 19, Zagreb 10000, Croatia
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa 48940, Spain
- BIOFISIKA, Basque Center for Biophysics CSIC-UPVEH, Leioa 48940, Spain
- IKERBASQUE, Basque Foundation for Science,Bilbao, Biscay 48011, Spain
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| |
Collapse
|
2
|
Bhat-Ambure J, Ambure P, Serrano-Candelas E, Galiana-Roselló C, Gil-Martínez A, Guerrero M, Martin M, González-García J, García-España E, Gozalbes R. G4-QuadScreen: A Computational Tool for Identifying Multi-Target-Directed Anticancer Leads against G-Quadruplex DNA. Cancers (Basel) 2023; 15:3817. [PMID: 37568632 PMCID: PMC10416877 DOI: 10.3390/cancers15153817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 07/16/2023] [Accepted: 07/21/2023] [Indexed: 08/13/2023] Open
Abstract
The study presents 'G4-QuadScreen', a user-friendly computational tool for identifying MTDLs against G4s. Also, it offers a few hit MTDLs based on in silico and in vitro approaches. Multi-tasking QSAR models were developed using linear discriminant analysis and random forest machine learning techniques for predicting the responses of interest (G4 interaction, G4 stabilization, G4 selectivity, and cytotoxicity) considering the variations in the experimental conditions (e.g., G4 sequences, endpoints, cell lines, buffers, and assays). A virtual screening with G4-QuadScreen and molecular docking using YASARA (AutoDock-Vina) was performed. G4 activities were confirmed via FRET melting, FID, and cell viability assays. Validation metrics demonstrated the high discriminatory power and robustness of the models (the accuracy of all models is ~>90% for the training sets and ~>80% for the external sets). The experimental evaluations showed that ten screened MTDLs have the capacity to selectively stabilize multiple G4s. Three screened MTDLs induced a strong inhibitory effect on various human cancer cell lines. This pioneering computational study serves a tool to accelerate the search for new leads against G4s, reducing false positive outcomes in the early stages of drug discovery. The G4-QuadScreen tool is accessible on the ChemoPredictionSuite website.
Collapse
Affiliation(s)
| | - Pravin Ambure
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, 46980 Valencia, Spain; (P.A.); (E.S.-C.)
| | - Eva Serrano-Candelas
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, 46980 Valencia, Spain; (P.A.); (E.S.-C.)
| | - Cristina Galiana-Roselló
- Department of Inorganic Chemistry, Institute of Molecular Science, University of Valencia, 46980 Valencia, Spain; (C.G.-R.); (A.G.-M.); (J.G.-G.); (E.G.-E.)
| | - Ariadna Gil-Martínez
- Department of Inorganic Chemistry, Institute of Molecular Science, University of Valencia, 46980 Valencia, Spain; (C.G.-R.); (A.G.-M.); (J.G.-G.); (E.G.-E.)
| | - Mario Guerrero
- Biochemistry and Molecular Biology Unit, Biomedicine Department, Faculty of Medicine and Health Sciences, University of Barcelona, 08036 Barcelona, Spain; (M.G.); (M.M.)
| | - Margarita Martin
- Biochemistry and Molecular Biology Unit, Biomedicine Department, Faculty of Medicine and Health Sciences, University of Barcelona, 08036 Barcelona, Spain; (M.G.); (M.M.)
- Clinical and Experimental Respiratory Immunoallergy (IRCE), Institut d’Investigacions Biomediques August Pi i Sunyer (IDIBAPS), 08036 Barcelona, Spain
| | - Jorge González-García
- Department of Inorganic Chemistry, Institute of Molecular Science, University of Valencia, 46980 Valencia, Spain; (C.G.-R.); (A.G.-M.); (J.G.-G.); (E.G.-E.)
| | - Enrique García-España
- Department of Inorganic Chemistry, Institute of Molecular Science, University of Valencia, 46980 Valencia, Spain; (C.G.-R.); (A.G.-M.); (J.G.-G.); (E.G.-E.)
| | - Rafael Gozalbes
- MolDrug AI Systems SL, c/Olimpia Arozena Torres, 46018 Valencia, Spain;
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, 46980 Valencia, Spain; (P.A.); (E.S.-C.)
| |
Collapse
|
3
|
Kleandrova VV, Cordeiro MNDS, Speck-Planche A. Optimizing drug discovery using multitasking models for quantitative structure-biological effect relationships: an update of the literature. Expert Opin Drug Discov 2023; 18:1231-1243. [PMID: 37639708 DOI: 10.1080/17460441.2023.2251385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/21/2023] [Indexed: 08/31/2023]
Abstract
INTRODUCTION Drug discovery has provided modern societies with the means to fight against many diseases. In this sense, computational methods have been at the forefront, playing an important role in rationalizing the search for novel drugs. Yet, tackling phenomena such as the multi-genic nature of diseases and drug resistance are limitations of the current computational methods. Multi-tasking models for quantitative structure-biological effect relationships (mtk-QSBER) have emerged to overcome such limitations. AREAS COVERED The present review describes an update on the fundamentals and applications of the mtk-QSBER models as tools to accelerate multiple stages/substages of the drug discovery process. EXPERT OPINION Computational approaches are extremely important for the rationalization of the search for novel and efficacious therapeutic agents. However, they need to focus more on the multi-target drug discovery paradigm. In this sense, mtk-QSBER models are particularly suited for multi-target drug discovery, offering encouraging opportunities across multiple therapeutic areas and scientific disciplines associated with drug discovery.
Collapse
Affiliation(s)
- Valeria V Kleandrova
- Laboratory of Fundamental and Applied Research of Quality and Technology of Food Production, Russian Biotechnological University, Moscow, Russian Federation
| | - M Natália D S Cordeiro
- LAQV@REQUIMTE/Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, Porto, Portugal
| | - Alejandro Speck-Planche
- LAQV@REQUIMTE/Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, Porto, Portugal
| |
Collapse
|
4
|
Martínez-López Y, Castillo-Garit JA, Casanola-Martin GM, Rasulev B, Rodríguez-Gonzalez AY, Martínez-Santiago O, Barigye SJ. Exploring proteasome inhibition using atomic weighted vector indices and machine learning approaches. Mol Divers 2023:10.1007/s11030-023-10638-2. [PMID: 37017875 DOI: 10.1007/s11030-023-10638-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 03/17/2023] [Indexed: 04/06/2023]
Abstract
Ubiquitin-proteasome system (UPS) is a highly regulated mechanism of intracellular protein degradation and turnover. The UPS is involved in different biological activities, such as the regulation of gene transcription and cell cycle. Several researchers have applied cheminformatics and artificial intelligence methods to study the inhibition of proteasomes, including the prediction of UPP inhibitors. Following this idea, we applied a new tool for obtaining molecular descriptors (MDs) for modeling proteasome Inhibition in terms of EC50 (µmol/L), in which a set of new MDs called atomic weighted vectors (AWV) and several prediction algorithms were used in cheminformatics studies. In the manuscript, a set of descriptors based on AWV are presented as datasets for training different machine learning techniques, such as linear regression, multiple linear regression (MLR), random forest (RF), K-nearest neighbors (IBK), multi-layer perceptron, best-first search, and genetic algorithm. The results suggest that these atomic descriptors allow adequate modeling of proteasome inhibitors despite artificial intelligence techniques, as a variant to build efficient models for the prediction of inhibitory activity.
Collapse
Affiliation(s)
- Yoan Martínez-López
- Department of Computer Sciences, Faculty of Informatics, Camagüey University, 74650, Camagüey City, Cuba.
| | | | - Gerardo M Casanola-Martin
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND, 58102, USA
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND, 58102, USA
| | - Ansel Y Rodríguez-Gonzalez
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE-UT3), Unidad de Transferencia Tecnológica de Tepic, Tepic, México
| | - Oscar Martínez-Santiago
- Alfa Vitamins Laboratories, Miami, FL, 33166, USA
- Laboratorio de Bioinformática y Química Computacional, Universidad Católica del Maule, Talca, Chile
| | - Stephen J Barigye
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid (UAM), 28049, Madrid, Spain
| |
Collapse
|
5
|
MOZART, a QSAR Multi-Target Web-Based Tool to Predict Multiple Drug-Enzyme Interactions. Molecules 2023; 28:molecules28031182. [PMID: 36770857 PMCID: PMC9921108 DOI: 10.3390/molecules28031182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 01/09/2023] [Accepted: 01/16/2023] [Indexed: 01/27/2023] Open
Abstract
Developing models able to predict interactions between drugs and enzymes is a primary goal in computational biology since these models may be used for predicting both new active drugs and the interactions between known drugs on untested targets. With the compilation of a large dataset of drug-enzyme pairs (62,524), we recognized a unique opportunity to attempt to build a novel multi-target machine learning (MTML) quantitative structure-activity relationship (QSAR) model for probing interactions among different drugs and enzyme targets. To this end, this paper presents an MTML-QSAR model based on using the features of topological drugs together with the artificial neural network (ANN) multi-layer perceptron (MLP). Validation of the final best model found was carried out by internal cross-validation statistics and other relevant diagnostic statistical parameters. The overall accuracy of the derived model was found to be higher than 96%. Finally, to maximize the diffusion of this model, a public and accessible tool has been developed to allow users to perform their own predictions. The developed web-based tool is public accessible and can be downloaded as free open-source software.
Collapse
|
6
|
Daghighi A, Casanola-Martin GM, Timmerman T, Milenković D, Lučić B, Rasulev B. In Silico Prediction of the Toxicity of Nitroaromatic Compounds: Application of Ensemble Learning QSAR Approach. TOXICS 2022; 10:toxics10120746. [PMID: 36548579 PMCID: PMC9786026 DOI: 10.3390/toxics10120746] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/10/2022] [Accepted: 11/28/2022] [Indexed: 06/02/2023]
Abstract
In this work, a dataset of more than 200 nitroaromatic compounds is used to develop Quantitative Structure-Activity Relationship (QSAR) models for the estimation of in vivo toxicity based on 50% lethal dose to rats (LD50). An initial set of 4885 molecular descriptors was generated and applied to build Support Vector Regression (SVR) models. The best two SVR models, SVR_A and SVR_B, were selected to build an Ensemble Model by means of Multiple Linear Regression (MLR). The obtained Ensemble Model showed improved performance over the base SVR models in the training set (R2 = 0.88), validation set (R2 = 0.95), and true external test set (R2 = 0.92). The models were also internally validated by 5-fold cross-validation and Y-scrambling experiments, showing that the models have high levels of goodness-of-fit, robustness and predictivity. The contribution of descriptors to the toxicity in the models was assessed using the Accumulated Local Effect (ALE) technique. The proposed approach provides an important tool to assess toxicity of nitroaromatic compounds, based on the ensemble QSAR model and the structural relationship to toxicity by analyzed contribution of the involved descriptors.
Collapse
Affiliation(s)
- Amirreza Daghighi
- Biomedical Engineering Program, North Dakota State University, Fargo, ND 58105, USA
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND 58102, USA
| | | | - Troy Timmerman
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND 58102, USA
- Department of Computer Science, North Dakota State University, Fargo, ND 58105, USA
| | - Dejan Milenković
- Department of Science, Institute for Information Technologies, University of Kragujevac, 34000 Kragujevac, Serbia
| | - Bono Lučić
- NMR Centre, Ruđer Bošković Institute, 10000 Zagreb, Croatia
| | - Bakhtiyor Rasulev
- Biomedical Engineering Program, North Dakota State University, Fargo, ND 58105, USA
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, ND 58102, USA
| |
Collapse
|
7
|
Li J, Wang C, Yue L, Chen F, Cao X, Wang Z. Nano-QSAR modeling for predicting the cytotoxicity of metallic and metal oxide nanoparticles: A review. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2022; 243:113955. [PMID: 35961199 DOI: 10.1016/j.ecoenv.2022.113955] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 07/11/2022] [Accepted: 08/03/2022] [Indexed: 06/15/2023]
Abstract
Given the rapid development of nanotechnology, it is crucial to understand the effects of nanoparticles on living organisms. However, it is laborious to perform toxicological tests on a case-by-case basis. Quantitative structure-activity relationship (QSAR) is an effective computational technique because it saves time, costs, and animal sacrifice. Therefore, this review presents general procedures for the construction and application of nano-QSAR models of metal-based and metal-oxide nanoparticles (MBNPs and MONPs). We also provide an overview of available databases and common algorithms. The molecular descriptors and their roles in the toxicological interpretation of MBNPs and MONPs are systematically reviewed and the future of nano-QSAR is discussed. Finally, we address the growing demand for novel nano-specific descriptors, new computational strategies to address the data shortage, in situ data for regulatory concerns, a better understanding of the physicochemical properties of NPs with bioactivity, and, most importantly, the design of nano-QSAR for real-life environmental predictions rather than laboratory simulations.
Collapse
Affiliation(s)
- Jing Li
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Chuanxi Wang
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Le Yue
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Feiran Chen
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Xuesong Cao
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Zhenyu Wang
- Institute of Environmental Processes and Pollution Control, and School of Environment and Civil Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Engineering Laboratory for Biomass Energy and Carbon Reduction Technology, Jiangnan University, Wuxi, Jiangsu 214122, China; Jiangsu Key Laboratory of Anaerobic Biotechnology, Jiangnan University, Wuxi, Jiangsu 214122, China.
| |
Collapse
|
8
|
Halder AK, Moura AS, Cordeiro MNDS. Moving Average-Based Multitasking In Silico Classification Modeling: Where Do We Stand and What Is Next? Int J Mol Sci 2022; 23:ijms23094937. [PMID: 35563327 PMCID: PMC9099502 DOI: 10.3390/ijms23094937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/24/2022] [Accepted: 04/28/2022] [Indexed: 01/27/2023] Open
Abstract
Conventional in silico modeling is often viewed as 'one-target' or 'single-task' computer-aided modeling since it mainly relies on forecasting an endpoint of interest from similar input data. Multitasking or multitarget in silico modeling, in contrast, embraces a set of computational techniques that efficiently integrate multiple types of input data for setting up unique in silico models able to predict the outcome(s) relating to various experimental and/or theoretical conditions. The latter, specifically, based upon the Box-Jenkins moving average approach, has been applied in the last decade to several research fields including drug and materials design, environmental sciences, and nanotechnology. The present review discusses the current status of multitasking computer-aided modeling efforts, meanwhile describing both the existing challenges and future opportunities of its underlying techniques. Some important applications are also discussed to exemplify the ability of multitasking modeling in deriving holistic and reliable in silico classification-based models as well as in designing new chemical entities, either through fragment-based design or virtual screening. Focus will also be given to some software recently developed to automate and accelerate such types of modeling. Overall, this review may serve as a guideline for researchers to grasp the scope of multitasking computer-aided modeling as a promising in silico tool.
Collapse
Affiliation(s)
- Amit Kumar Halder
- LAQV@REQUIMTE, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal; (A.K.H.); (A.S.M.)
- Dr. B. C. Roy College of Pharmacy and Allied Health Sciences, Dr. Meghnad Saha Sarani, Bidhannagar, Durgapur 713212, West Bengal, India
| | - Ana S. Moura
- LAQV@REQUIMTE, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal; (A.K.H.); (A.S.M.)
| | - Maria Natália D. S. Cordeiro
- LAQV@REQUIMTE, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal; (A.K.H.); (A.S.M.)
- Correspondence: ; Tel.: +35-12-2040-2502
| |
Collapse
|
9
|
PTML Modeling for Pancreatic Cancer Research: In Silico Design of Simultaneous Multi-Protein and Multi-Cell Inhibitors. Biomedicines 2022; 10:biomedicines10020491. [PMID: 35203699 PMCID: PMC8962338 DOI: 10.3390/biomedicines10020491] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 02/10/2022] [Accepted: 02/15/2022] [Indexed: 02/07/2023] Open
Abstract
Pancreatic cancer (PANC) is a dangerous type of cancer that is a major cause of mortality worldwide and exhibits a remarkably poor prognosis. To date, discovering anti-PANC agents remains a very complex and expensive process. Computational approaches can accelerate the search for anti-PANC agents. We report for the first time two models that combined perturbation theory with machine learning via a multilayer perceptron network (PTML-MLP) to perform the virtual design and prediction of molecules that can simultaneously inhibit multiple PANC cell lines and PANC-related proteins, such as caspase-1, tumor necrosis factor-alpha (TNF-alpha), and the insulin-like growth factor 1 receptor (IGF1R). Both PTML-MLP models exhibited accuracies higher than 78%. Using the interpretation from one of the PTML-MLP models as a guideline, we extracted different molecular fragments desirable for the inhibition of the PANC cell lines and the aforementioned PANC-related proteins and then assembled some of those fragments to form three new molecules. The two PTML-MLP models predicted the designed molecules as potentially versatile anti-PANC agents through inhibition of the three PANC-related proteins and multiple PANC cell lines. Conclusions: This work opens new horizons for the application of the PTML modeling methodology to anticancer research.
Collapse
|
10
|
Prediction of Anti-Glioblastoma Drug-Decorated Nanoparticle Delivery Systems Using Molecular Descriptors and Machine Learning. Int J Mol Sci 2021; 22:ijms222111519. [PMID: 34768951 PMCID: PMC8584266 DOI: 10.3390/ijms222111519] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/08/2021] [Accepted: 10/22/2021] [Indexed: 12/22/2022] Open
Abstract
The theoretical prediction of drug-decorated nanoparticles (DDNPs) has become a very important task in medical applications. For the current paper, Perturbation Theory Machine Learning (PTML) models were built to predict the probability of different pairs of drugs and nanoparticles creating DDNP complexes with anti-glioblastoma activity. PTML models use the perturbations of molecular descriptors of drugs and nanoparticles as inputs in experimental conditions. The raw dataset was obtained by mixing the nanoparticle experimental data with drug assays from the ChEMBL database. Ten types of machine learning methods have been tested. Only 41 features have been selected for 855,129 drug-nanoparticle complexes. The best model was obtained with the Bagging classifier, an ensemble meta-estimator based on 20 decision trees, with an area under the receiver operating characteristic curve (AUROC) of 0.96, and an accuracy of 87% (test subset). This model could be useful for the virtual screening of nanoparticle-drug complexes in glioblastoma. All the calculations can be reproduced with the datasets and python scripts, which are freely available as a GitHub repository from authors.
Collapse
|
11
|
Kleandrova VV, Speck-Planche A. The QSAR Paradigm in Fragment-Based Drug Discovery: From the Virtual Generation of Target Inhibitors to Multi-Scale Modeling. Mini Rev Med Chem 2021; 20:1357-1374. [PMID: 32013845 DOI: 10.2174/1389557520666200204123156] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 10/21/2019] [Accepted: 10/28/2019] [Indexed: 12/24/2022]
Abstract
Fragment-Based Drug Design (FBDD) has established itself as a promising approach in modern drug discovery, accelerating and improving lead optimization, while playing a crucial role in diminishing the high attrition rates at all stages in the drug development process. On the other hand, FBDD has benefited from the application of computational methodologies, where the models derived from the Quantitative Structure-Activity Relationships (QSAR) have become consolidated tools. This mini-review focuses on the evolution and main applications of the QSAR paradigm in the context of FBDD in the last five years. This report places particular emphasis on the QSAR models derived from fragment-based topological approaches to extract physicochemical and/or structural information, allowing to design potentially novel mono- or multi-target inhibitors from relatively large and heterogeneous databases. Here, we also discuss the need to apply multi-scale modeling, to exemplify how different datasets based on target inhibition can be simultaneously integrated and predicted together with other relevant endpoints such as the biological activity against non-biomolecular targets, as well as in vitro and in vivo toxicity and pharmacokinetic properties. In this context, seminal papers are briefly analyzed. As huge amounts of data continue to accumulate in the domains of the chemical, biological and biomedical sciences, it has become clear that drug discovery must be viewed as a multi-scale optimization process. An ideal multi-scale approach should integrate diverse chemical and biological data and also serve as a knowledge generator, enabling the design of potentially optimal chemicals that may become therapeutic agents.
Collapse
Affiliation(s)
- Valeria V Kleandrova
- Laboratory of Fundamental and Applied Research of Quality and Technology of Food Production, Moscow State University of Food Production, Volokolamskoe Shosse 11, 125080, Moscow, Russian Federation
| | - Alejandro Speck-Planche
- Department of Chemistry, Institute of Pharmacy, I.M. Sechenov First Moscow State Medical University, Trubetskaya Str., 8, b. 2, 119992, Moscow, Russian Federation
| |
Collapse
|
12
|
Urista DV, Carrué DB, Otero I, Arrasate S, Quevedo-Tumailli VF, Gestal M, González-Díaz H, Munteanu CR. Prediction of Antimalarial Drug-Decorated Nanoparticle Delivery Systems with Random Forest Models. BIOLOGY 2020; 9:biology9080198. [PMID: 32751710 PMCID: PMC7465777 DOI: 10.3390/biology9080198] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 07/22/2020] [Accepted: 07/27/2020] [Indexed: 12/13/2022]
Abstract
Drug-decorated nanoparticles (DDNPs) have important medical applications. The current work combined Perturbation Theory with Machine Learning and Information Fusion (PTMLIF). Thus, PTMLIF models were proposed to predict the probability of nanoparticle–compound/drug complexes having antimalarial activity (against Plasmodium). The aim is to save experimental resources and time by using a virtual screening for DDNPs. The raw data was obtained by the fusion of experimental data for nanoparticles with compound chemical assays from the ChEMBL database. The inputs for the eight Machine Learning classifiers were transformed features of drugs/compounds and nanoparticles as perturbations of molecular descriptors in specific experimental conditions (experiment-centered features). The resulting dataset contains 107 input features and 249,992 examples. The best classification model was provided by Random Forest, with 27 selected features of drugs/compounds and nanoparticles in all experimental conditions considered. The high performance of the model was demonstrated by the mean Area Under the Receiver Operating Characteristics (AUC) in a test subset with a value of 0.9921 ± 0.000244 (10-fold cross-validation). The results demonstrated the power of information fusion of the experimental-centered features of drugs/compounds and nanoparticles for the prediction of nanoparticle–compound antimalarial activity. The scripts and dataset for this project are available in the open GitHub repository.
Collapse
Affiliation(s)
- Diana V. Urista
- Department of Organic Chemistry II, University of Basque Country (UPV/EHU), Sarriena w/n, 48940 Leioa, Spain; (D.V.U.); (S.A.); (H.G.-D.)
| | - Diego B. Carrué
- RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain; (D.B.C.); (I.O.); (V.F.Q.-T.); (M.G.)
| | - Iago Otero
- RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain; (D.B.C.); (I.O.); (V.F.Q.-T.); (M.G.)
| | - Sonia Arrasate
- Department of Organic Chemistry II, University of Basque Country (UPV/EHU), Sarriena w/n, 48940 Leioa, Spain; (D.V.U.); (S.A.); (H.G.-D.)
| | - Viviana F. Quevedo-Tumailli
- RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain; (D.B.C.); (I.O.); (V.F.Q.-T.); (M.G.)
- Universidad Estatal Amazónica UEA, Km. 2 1/2 vía Puyo a Tena (paso lateral), Puyo 160150, Pastaza, Ecuador
| | - Marcos Gestal
- RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain; (D.B.C.); (I.O.); (V.F.Q.-T.); (M.G.)
- Biomedical Research Institute of A Coruña (INIBIC), Hospital Teresa Herrera, Xubias de Arriba 84, 15006 A Coruña, Spain
| | - Humbert González-Díaz
- Department of Organic Chemistry II, University of Basque Country (UPV/EHU), Sarriena w/n, 48940 Leioa, Spain; (D.V.U.); (S.A.); (H.G.-D.)
- IKERBASQUE, Basque Foundation for Science, Alameda Urquijo 36, 48011 Bilbao, Spain
- Basque Centre for Biophysics CSIC-UPVEHU, University of Basque Country UPV/EHU, Barrio Sarriena, 48940 Leioa, Spain
| | - Cristian R. Munteanu
- RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain; (D.B.C.); (I.O.); (V.F.Q.-T.); (M.G.)
- Biomedical Research Institute of A Coruña (INIBIC), Hospital Teresa Herrera, Xubias de Arriba 84, 15006 A Coruña, Spain
- Correspondence:
| |
Collapse
|
13
|
Diez-Alarcia R, Yáñez-Pérez V, Muneta-Arrate I, Arrasate S, Lete E, Meana JJ, González-Díaz H. Big Data Challenges Targeting Proteins in GPCR Signaling Pathways; Combining PTML-ChEMBL Models and [ 35S]GTPγS Binding Assays. ACS Chem Neurosci 2019; 10:4476-4491. [PMID: 31618004 DOI: 10.1021/acschemneuro.9b00302] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
G-protein-coupled receptors (GPCRs), also known as 7-transmembrane receptors, are the single largest class of drug targets. Consequently, a large amount of preclinical assays having GPCRs as molecular targets has been released to public sources like the Chemical European Molecular Biology Laboratory (ChEMBL) database. These data are also very complex covering changes in drug chemical structure and assay conditions like c0 = activity parameter (Ki, IC50, etc.), c1 = target protein, c2 = cell line, c3 = assay organism, etc., making difficult the analysis of these databases that are placed in the borders of a Big Data challenge. One of the aims of this work is to develop a computational model able to predict new GPCRs targeting drugs taking into consideration multiple conditions of assay. Another objective is to perform new predictive and experimental studies of selective 5-HTA2 receptor agonist, antagonist, or inverse agonist in human comparing the results with those from the literature. In this work, we combined Perturbation Theory (PT) and Machine Learning (ML) to seek a general PTML model for this data set. We analyzed 343 738 unique compounds with 812 072 end points (assay outcomes), with 185 different experimental parameters, 592 protein targets, 51 cell lines, and/or 55 organisms (species). The best PTML linear model found has three input variables only and predicted 56 202/58 653 positive outcomes (sensitivity = 95.8%) and 470 230/550 401 control cases (specificity = 85.4%) in training series. The model also predicted correctly 18 732/19 549 (95.8%) of positive outcomes and 156 739/183 469 (85.4%) of cases in external validation series. To illustrate its practical use, we used the model to predict the outcomes of six different 5-HT2A receptor drugs, namely, TCB-2, DOI, DOB, altanserin, pimavanserin, and nelotanserin, in a very large number of different pharmacological assays. 5-HT2A receptors are altered in schizophrenia and represent drug target for antipsychotic therapeutic activity. The model correctly predicted 93.83% (76 of 86) experimental results for these compounds reported in ChEMBL. Moreover, [35S]GTPγS binding assays were performed experimentally with the same six drugs with the aim of determining their potency and efficacy in the modulation of G-proteins in human brain tissue. The antagonist ketanserin was included as inactive drug with demonstrated affinity for 5-HT2A/C receptors. Our results demonstrate that some of these drugs, previously described as serotonin 5-HT2A receptor agonists, antagonists, or inverse agonists, are not so specific and show different intrinsic activity to that previously reported. Overall, this work opens a new gate for the prediction of GPCRs targeting compounds.
Collapse
Affiliation(s)
- Rebeca Diez-Alarcia
- Centro de Investigación Biomédica en Red en Salud Mental, 48940 Leioa, Spain
| | | | | | | | | | - J. Javier Meana
- Centro de Investigación Biomédica en Red en Salud Mental, 48940 Leioa, Spain
| | - Humbert González-Díaz
- Biophysics Institute, CSIC-UPV/EHU, University of the Basque Country UPV/EHU, Leioa, 48940, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| |
Collapse
|
14
|
Speck-Planche A. Multiple Perspectives in Anti-cancer Drug Discovery: From old Targets and Natural Products to Innovative Computational Approaches. Anticancer Agents Med Chem 2019; 19:146-147. [PMID: 31298144 DOI: 10.2174/187152061902190418105054] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Alejandro Speck-Planche
- Research Program on Biomedical Informatics (GRIB) Hospital del Mar Medical Research Institute (IMIM) Barcelona, Spain
| |
Collapse
|
15
|
Vásquez-Domínguez E, Armijos-Jaramillo VD, Tejera E, González-Díaz H. Multioutput Perturbation-Theory Machine Learning (PTML) Model of ChEMBL Data for Antiretroviral Compounds. Mol Pharm 2019; 16:4200-4212. [PMID: 31426639 DOI: 10.1021/acs.molpharmaceut.9b00538] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Retroviral infections, such as HIV, are, until now, diseases with no cure. Medicine and pharmaceutical chemistry need and consider it a huge goal to define target proteins of new antiretroviral compounds. ChEMBL manages Big Data features with a complex data set, which is hard to organize. This makes information difficult to analyze due to a big number of characteristics described in order to predict new drug candidates for retroviral infections. For this reason, we propose to develop a new predictive model combining perturbation theory (PT) bases and machine learning (ML) modeling to create a new tool that can take advantage of all the available information. The PTML model proposed in this work for the ChEMBL data set preclinical experimental assays for antiretroviral compounds consists of a linear equation with four variables. The PT operators used are founded on multicondition moving averages, combining different features and simplifying the difficulty to manage all data. More than 140 000 preclinical assays for 56 105 compounds with different characteristics or experimental conditions have been carried out and can be found in ChEMBL database, covering combinations with 359 biological activity parameters (c0), 55 protein accessions (c1), 83 cell lines (c2), 64 organisms of assay (c3), and 773 subtypes or strains. We have included 150 148 preclinical experimental assays for HIV virus, 1188 for HTLV virus, 84 for simian immunodeficiency virus, 370 for murine leukemia virus, 119 for Rous sarcoma virus, 1581 for MMTV, etc. We also included 5277 assays for hepatitis B virus. The developed PTML model reached considerable values in sensibility (73.05% for training and 73.10% for validation), specificity (86.61% for training and 87.17% for validation), and accuracy (75.84% for training and 75.98% for validation). We also compared alternative PTML models with different PT operators such as covariance, moments, and exponential terms. Finally, we made a comparison between literature ML models with our PTML model and also artificial neural network (ANN) nonlinear models. We conclude that this PTML model is the first one to consider multiple characteristics of preclinical experimental antiretroviral assays combined, generating a simple, useful, and adaptable instrument, which could reduce time and costs in antiretroviral drugs research.
Collapse
Affiliation(s)
- Emilia Vásquez-Domínguez
- Department of Organic Chemistry II , University of Basque Country UPV/EHU , 48940 Leioa , Spain.,Faculty of Engineering and Applied Sciences-Biotechnology , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador
| | - Vinicio Danilo Armijos-Jaramillo
- Faculty of Engineering and Applied Sciences-Biotechnology , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador.,Bio-chemioinformatics group , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador
| | - Eduardo Tejera
- Faculty of Engineering and Applied Sciences-Biotechnology , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador.,Bio-chemioinformatics group , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador
| | - Humbert González-Díaz
- Department of Organic Chemistry II , University of Basque Country UPV/EHU , 48940 Leioa , Spain.,IKERBASQUE, Basque Foundation for Science , 48011 Bilbao , Spain
| |
Collapse
|
16
|
Ambure P, Halder AK, González Díaz H, Cordeiro MNDS. QSAR-Co: An Open Source Software for Developing Robust Multitasking or Multitarget Classification-Based QSAR Models. J Chem Inf Model 2019; 59:2538-2544. [PMID: 31083984 DOI: 10.1021/acs.jcim.9b00295] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Quantitative structure-activity relationships (QSAR) modeling is a well-known computational technique with wide applications in fields such as drug design, toxicity predictions, nanomaterials, etc. However, QSAR researchers still face certain problems to develop robust classification-based QSAR models, especially while handling response data pertaining to diverse experimental and/or theoretical conditions. In the present work, we have developed an open source standalone software "QSAR-Co" (available to download at https://sites.google.com/view/qsar-co ) to setup classification-based QSAR models that allow mining the response data coming from multiple conditions. The software comprises two modules: (1) the Model development module and (2) the Screen/Predict module. This user-friendly software provides several functionalities required for developing a robust multitasking or multitarget classification-based QSAR model using linear discriminant analysis or random forest techniques, with appropriate validation, following the principles set by the Organisation for Economic Co-operation and Development (OECD) for applying QSAR models in regulatory assessments.
Collapse
Affiliation(s)
- Pravin Ambure
- LAQV@REQUIMTE, Department of Chemistry and Biochemistry , University of Porto , 4169-007 Porto , Portugal
| | - Amit Kumar Halder
- LAQV@REQUIMTE, Department of Chemistry and Biochemistry , University of Porto , 4169-007 Porto , Portugal
| | - Humbert González Díaz
- Department of Organic Chemistry II , University of Basque Country UPV/EHU , 48940 Leioa , Spain
| | - M Natália D S Cordeiro
- LAQV@REQUIMTE, Department of Chemistry and Biochemistry , University of Porto , 4169-007 Porto , Portugal
| |
Collapse
|
17
|
Bediaga H, Arrasate S, González-Díaz H. PTML Combinatorial Model of ChEMBL Compounds Assays for Multiple Types of Cancer. ACS COMBINATORIAL SCIENCE 2018; 20:621-632. [PMID: 30240186 DOI: 10.1021/acscombsci.8b00090] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Determining the target proteins of new anticancer compounds is a very important task in Medicinal Chemistry. In this sense, chemists carry out preclinical assays with a high number of combinations of experimental conditions (c j). In fact, ChEMBL database contains outcomes of 65 534 different anticancer activity preclinical assays for 35 565 different chemical compounds (1.84 assays per compound). These assays cover different combinations of c j formed from >70 different biological activity parameters ( c0), >300 different drug targets ( c1), >230 cell lines ( c2), and 5 organisms of assay ( c3) or organisms of the target ( c4). It include a total of 45 833 assays in leukemia, 6227 assays in breast cancer, 2499 assays in ovarian cancer, 3499 in colon cancer, 3159 in lung cancer, 2750 in prostate cancer, 601 in melanoma, etc. This is a very complex data set with multiple Big Data features. This data is hard to be rationalized by researchers to extract useful relationships and predict new compounds. In this context, we propose to combine perturbation theory (PT) ideas and machine learning (ML) modeling to solve this combinatorial-like problem. In this work, we report a PTML (PT + ML) model for ChEMBL data set of preclinical assays of anticancer compounds. This is a simple linear model with only three variables. The model presented values of area under receiver operating curve = AUROC = 0.872, specificity = Sp(%) = 90.2, sensitivity = Sn(%) = 70.6, and overall accuracy = Ac(%) = 87.7 in training series. The model also have Sp(%) = 90.1, Sn(%) = 71.4, and Ac(%) = 87.8 in external validation series. The model use PT operators based on multicondition moving averages to capture all the complexity of the data set. We also compared the model with nonlinear artificial neural network (ANN) models obtaining similar results. This confirms the hypothesis of a linear relationship between the PT operators and the classification as anticancer compounds in different combinations of assay conditions. Last, we compared the model with other PTML models reported in the literature concluding that this is the only one PTML model able to predict activity against multiple types of cancer. This model is a simple but versatile tool for the prediction of the targets of anticancer compounds taking into consideration multiple combinations of experimental conditions in preclinical assays.
Collapse
Affiliation(s)
- Harbil Bediaga
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940, Leioa, Spain
| | - Sonia Arrasate
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940, Leioa, Spain
| | - Humbert González-Díaz
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940, Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Spain
| |
Collapse
|
18
|
Saldívar-González FI, Naveja JJ, Palomino-Hernández O, Medina-Franco JL. Getting SMARt in drug discovery: chemoinformatics approaches for mining structure–multiple activity relationships. RSC Adv 2017. [DOI: 10.1039/c6ra26230a] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
In light of the high relevance of polypharmacology, multi-target screening is a major trend in drug discovery.
Collapse
Affiliation(s)
- Fernanda I. Saldívar-González
- Facultad de Química
- Departamento de Farmacia
- Universidad Nacional Autónoma de México
- Avenida Universidad 3000
- Mexico City 04510
| | - J. Jesús Naveja
- Facultad de Química
- Departamento de Farmacia
- Universidad Nacional Autónoma de México
- Avenida Universidad 3000
- Mexico City 04510
| | - Oscar Palomino-Hernández
- Facultad de Química
- Departamento de Farmacia
- Universidad Nacional Autónoma de México
- Avenida Universidad 3000
- Mexico City 04510
| | - José L. Medina-Franco
- Facultad de Química
- Departamento de Farmacia
- Universidad Nacional Autónoma de México
- Avenida Universidad 3000
- Mexico City 04510
| |
Collapse
|
19
|
Halder A, Goodarzi M. Recent Advances in Multi-Task QSAR Modeling for Drug Design. PHARMACEUTICAL SCIENCES 2015. [DOI: 10.15171/ps.2015.33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
20
|
Zanni R, Galvez-Llompart M, García-Domenech R, Galvez J. Latest advances in molecular topology applications for drug discovery. Expert Opin Drug Discov 2015; 10:945-57. [DOI: 10.1517/17460441.2015.1062751] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|