Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Blay V, Yokoi T, González-Díaz H. Perturbation Theory–Machine Learning Study of Zeolite Materials Desilication. J Chem Inf Model 2018;58:2414-2419. [DOI: 10.1021/acs.jcim.8b00383] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

For:	Blay V, Yokoi T, González-Díaz H. Perturbation Theory–Machine Learning Study of Zeolite Materials Desilication. J Chem Inf Model 2018;58:2414-2419. [DOI: 10.1021/acs.jcim.8b00383] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Number

Cited by Other Article(s)

Baltasar-Marchueta M, Llona L, M-Alicante S, Barbolla I, Ibarluzea MG, Ramis R, Salomon AM, Fundora B, Araujo A, Muguruza-Montero A, Nuñez E, Pérez-Olea S, Villanueva C, Leonardo A, Arrasate S, Sotomayor N, Villarroel A, Bergara A, Lete E, González-Díaz H. Identification of Riluzole derivatives as novel calmodulin inhibitors with neuroprotective activity by a joint synthesis, biosensor, and computational guided strategy. Biomed Pharmacother 2024;174:116602. [PMID: 38636396 DOI: 10.1016/j.biopha.2024.116602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 04/20/2024] Open

Affiliation(s)

Maider Baltasar-Marchueta Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain
Leire Llona Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain
Sara M-Alicante Biofisika Institute, CSIC-UPV/EHU, Leioa 48940, Spain
Iratxe Barbolla Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain
Markel Garcia Ibarluzea Donostia International Physics Center, Donostia, Spain; Departament of Physics, University of the Basque Country, UPV/EHU, Leioa, Spain
Rafael Ramis Donostia International Physics Center, Donostia, Spain; Departament of Physics, University of the Basque Country, UPV/EHU, Leioa, Spain
Ane Miren Salomon Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain
Brenda Fundora Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain
Ariane Araujo Biofisika Institute, CSIC-UPV/EHU, Leioa 48940, Spain
Arantza Muguruza-Montero Biofisika Institute, CSIC-UPV/EHU, Leioa 48940, Spain
Eider Nuñez Biofisika Institute, CSIC-UPV/EHU, Leioa 48940, Spain
Scarlett Pérez-Olea Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain
Christian Villanueva Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain
Aritz Leonardo Donostia International Physics Center, Donostia, Spain; Departament of Physics, University of the Basque Country, UPV/EHU, Leioa, Spain
Sonia Arrasate Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain
Nuria Sotomayor Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain
Alvaro Villarroel Biofisika Institute, CSIC-UPV/EHU, Leioa 48940, Spain.
Aitor Bergara Donostia International Physics Center, Donostia, Spain; Departament of Physics, University of the Basque Country, UPV/EHU, Leioa, Spain.
Esther Lete Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain.
Humberto González-Díaz Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, Leioa 48940, Spain; Biofisika Institute, CSIC-UPV/EHU, Leioa 48940, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao 48011, Spain.

Collapse

Zhuang J, Midgley AC, Wei Y, Liu Q, Kong D, Huang X. Machine-Learning-Assisted Nanozyme Design: Lessons from Materials and Engineered Enzymes. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024;36:e2210848. [PMID: 36701424 DOI: 10.1002/adma.202210848] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 01/03/2023] [Indexed: 05/11/2023]

Mubashir M, Ahmad T, Liu X, Rehman LM, de Levay JPBB, Al Nuaimi R, Thankamony R, Lai Z. Artificial intelligence and structural design of inorganic hollow fiber membranes: Materials chemistry. CHEMOSPHERE 2023;338:139525. [PMID: 37467860 DOI: 10.1016/j.chemosphere.2023.139525] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/02/2023] [Accepted: 07/14/2023] [Indexed: 07/21/2023]

Diéguez-Santana K, Casañola-Martin GM, Torres R, Rasulev B, Green JR, González-Díaz H. Machine Learning Study of Metabolic Networks vs ChEMBL Data of Antibacterial Compounds. Mol Pharm 2022;19:2151-2163. [PMID: 35671399 PMCID: PMC9986951 DOI: 10.1021/acs.molpharmaceut.2c00029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Ortega-Tenezaca B, Quevedo-Tumailli V, Bediaga H, Collados J, Arrasate S, Madariaga G, Munteanu CR, Cordeiro MND, González-Díaz H. PTML Multi-Label Algorithms: Models, Software, and Applications. Curr Top Med Chem 2020;20:2326-2337. [DOI: 10.2174/1568026620666200916122616] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 07/19/2020] [Accepted: 07/20/2020] [Indexed: 12/17/2022]

Jafari K, Fatemi MH, Toropova AP, Toropov AA. Correlation Intensity Index (CII) as a criterion of predictive potential: Applying to model thermal conductivity of metal oxide-based ethylene glycol nanofluids. Chem Phys Lett 2020. [DOI: 10.1016/j.cplett.2020.137614] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Santana R, Zuluaga R, Gañán P, Arrasate S, Onieva E, Montemore MM, González-Díaz H. PTML Model for Selection of Nanoparticles, Anticancer Drugs, and Vitamins in the Design of Drug-Vitamin Nanoparticle Release Systems for Cancer Cotherapy. Mol Pharm 2020;17:2612-2627. [PMID: 32459098 DOI: 10.1021/acs.molpharmaceut.0c00308] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Abstract

Nanosystems are gaining momentum in pharmaceutical sciences because of the wide variety of possibilities for designing these systems to have specific functions. Specifically, studies of new cancer cotherapy drug-vitamin release nanosystems (DVRNs) including anticancer compounds and vitamins or vitamin derivatives have revealed encouraging results. However, the number of possible combinations of design and synthesis conditions is remarkably high. In addition, a large number of anticancer and vitamin derivatives have been already assayed, but a notably less number of cases of DVRNs were assayed as a whole (with the anticancer compound and the vitamin linked to them). Our approach combines with the perturbation theory and machine learning (PTML) model to predict the probability of obtaining an interesting DVRN by changing the anticancer compound and/or the vitamin present in a DVRN that is already tested for other anticancer compounds or vitamins that have not been tested yet as part of a DVRN. In a previous work, we built a linear PTML model useful for the design of these nanosystems. In doing so, we used information fusion (IF) techniques to carry out data enrichment of DVRN data compiled from the literature with the data for preclinical assays of vitamins from the ChEMBL database. The design features of DVRNs and the assay conditions of nanoparticles (NPs) and vitamins were included as multiplicative PT operators (PTOs) to the system, which indicates the importance of these variables. However, the previous work omitted experiments with nonlinear ML techniques and different types of PTOs such as metric-based PTOs. More importantly, the previous work does not consider the structure of the anticancer drug to be included in the new DVRNs. In this work, we are going to accomplish three main objectives (tasks). In the first task, we found a new model, alternative to the one published before, for the rational design of DVRNs using metric-based PTOs. The most accurate PTML model was the artificial neural network model, which showed values of specificity, sensitivity, and accuracy in the range of 90-95% in training and external validation series for more than 130,000 cases (DVRNs vs ChEMBL assays). Furthermore, in the second task, we used IF techniques to carry out data enrichment of our previous data set. In doing so, we constructed a new working data set of >970,000 cases with the data of preclinical assays of DVRNs, vitamins, and anticancer compounds from the ChEMBL database. All these assays have multiple continuous variables or descriptors d_k and categorical variables c_j (conditions of the assays) for drugs (d_ack, c_acj), vitamins (d_vk, c_vj), and NPs (d_nk, c_nj). These data include >20,000 potential anticancer compounds with >270 protein targets (c_ac1), >580 assay cell organisms (c_ac2), and so forth. Furthermore, we include >36,000 assay vitamin derivatives in >6200 types of cells (c_2vit), >120 assay organisms (c_3vit), >60 assay strains (c_4vit), and so forth. The enriched data set also contains >20 types of DVRNs (c_5n) with 9 NP core materials (c_4n), 8 synthesis methods (c_7n), and so forth. We expressed all this information with PTOs and developed a qualitatively new PTML model that incorporates information of the anticancer drugs. This new model presents 96-97% of accuracy for training and external validation subsets. In the last task, we carried out a comparative study of ML and/or PTML models published and described how the models we are presenting cover the gap of knowledge in terms of drug delivery. In conclusion, we present here for the first time a multipurpose PTML model that is able to select NPs, anticancer compounds, and vitamins and their conditions of assay for DVRN design.

Collapse

Santana R, Zuluaga R, Gañán P, Arrasate S, Onieva Caracuel E, González-Díaz H. PTML Model of ChEMBL Compounds Assays for Vitamin Derivatives. ACS COMBINATORIAL SCIENCE 2020;22:129-141. [PMID: 32011854 DOI: 10.1021/acscombsci.9b00166] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Abstract

Determining the biological activity of vitamin derivatives is needed given that organic synthesis of analogs of vitamins is an active field of interest for medicinal chemistry, pharmaceuticals, and food additives. Accordingly, scientists from different disciplines perform preclinical assays (n_ij) with a considerable combination of assay conditions (c_j). Indeed, the ChEMBL platform contains a database that includes results from 36 220 different biological activity bioassays of 21 240 different vitamins and vitamin derivatives. These assays present are heterogeneous in terms of assay combinations of c_j. They are focused on >500 different biological activity parameters (c₀), >340 different targets (c₁), >6200 types of cell (c₂), >120 organisms of assay (c₃), and >60 assay strains (c₄). It includes a total of >1850 niacin assays, >1580 tretinoin assays, >1580 retinol assays, 857 ascorbic acid assays, etc. Given the complexity of this combinatorial data in terms of being assimilated by researchers, we propose to build a model by combining perturbation theory (PT) and machine learning (ML). Through this study, we propose a PTML (PT + ML) combinatorial model for ChEMBL results on biological activity of vitamins and vitamins derivatives. The linear discriminant analysis (LDA) model presented the following results for training subset a: specificity (%) = 90.38, sensitivity (%) = 87.51, and accuracy (%) = 89.89. The model showed the following results for the external validation subset: specificity (%) = 90.58, sensitivity (%) = 87.72, and accuracy (%) = 90.09. Different types of linear and nonlinear PTML models, such as logistic regression (LR), classification tree (CT), näive Bayes (NB), and random Forest (RF), were applied to contrast the capacity of prediction. The PTML-LDA model predicts with more accuracy by applying combinatorial descriptors. In addition, a PCA experiment with chemical structure descriptors allowed us to characterize the high structural diversity of the chemical space studied. In any case, PTML models using chemical structure descriptors do not improve the performance of the PTML-LDA model based on ALOGP and PSA. We can conclude that the three variable PTML-LDA model is a simplified and adaptable tool for the prediction, for different experiment combinations, the biological activity of derivative vitamins.

Collapse

Montes-Bageneta I, Akesolo U, López S, Merino M, Anakabe E, Arrasate S. Pollutants in Organic Chemistry and Medicinal Chemistry Education Laboratory. Experimental and Machine Learning Studies. Curr Top Med Chem 2020;20:720-730. [PMID: 32066360 DOI: 10.2174/1568026620666200211110043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 12/27/2019] [Accepted: 12/27/2019] [Indexed: 11/22/2022]

Toyao T, Maeno Z, Takakusagi S, Kamachi T, Takigawa I, Shimizu KI. Machine Learning for Catalysis Informatics: Recent Applications and Prospects. ACS Catal 2019. [DOI: 10.1021/acscatal.9b04186] [Citation(s) in RCA: 189] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Santana R, Zuluaga R, Gañán P, Arrasate S, Onieva E, González-Díaz H. Designing nanoparticle release systems for drug-vitamin cancer co-therapy with multiplicative perturbation-theory machine learning (PTML) models. NANOSCALE 2019;11:21811-21823. [PMID: 31691701 DOI: 10.1039/c9nr05070a] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Abstract

Nano-systems for cancer co-therapy including vitamins or vitamin derivatives have showed adequate results to continue with further research studies to better understand them. However, the number of different combinations of drugs, vitamins, nanoparticle types, coating agents, synthesis conditions, and system types (nanocapsules, micelles, etc.) to be tested is very large generating a high cost in experimentations. In this context, there are reports of large datasets of preclinical assays of compounds (like in the ChEMBL database) and increasing but yet limited reports of experimental measurements of nano-systems per se. On the other hand, Machine Learning is gaining momentum in Nanotechnology and Pharmaceutical Sciences as a tool for rational design of new drugs and drug-release nano-systems. In this work, we propose to combine Perturbation Theory principles and Machine Learning to develop a PTML model for rational selection of the components of cancer co-therapy drug-vitamin release nano-systems (DVRNs). In doing so, we apply information fusion techniques with 2 data sets: (1) a large ChEMBL dataset of >36 000 preclinical assays of vitamin derivatives and a new dataset of >1000 outcomes of DVRNs, collected herein from the literature for the first time. The ChEMBL dataset used covers a considerable number of assay conditions (cjvit) each one with multiple levels. These conditions included >504 biological activity parameters (c0vit), >340 types of proteins (c1vit), >650 types of cells (c2vit), >120 assay organisms (c3vit), >60 assay strains (c4vit). Regarding the DVRNs, there are 25 different types of nano-systems (njn), with up to 16 conditions (cjn) including also different levels such as 8 biological activity parameters (c0n), 9 raw nanomaterials (c4n), 15 assay cells (c11n), etc. In the first stage, we used Moving Average operators to quantify the perturbations (deviations) in all input variables with respect to the conditions. After that, we used multiplicative PT operators to carry out data fusion, and dimension reduction, and Linear Discriminant Analysis (LDA) to seek the PTML model. The best PTML model found showed values of specificity, sensitivity, and accuracy in the range of 83-88% in training and external validation series for >130 000 cases (DVRNs vs. ChEMBL data pairs) formed after data fusion. To the best of our knowledge, this is the first general purpose model for the rational design of DVRNs for cancer co-therapy.

Collapse

Diez-Alarcia R, Yáñez-Pérez V, Muneta-Arrate I, Arrasate S, Lete E, Meana JJ, González-Díaz H. Big Data Challenges Targeting Proteins in GPCR Signaling Pathways; Combining PTML-ChEMBL Models and [³⁵S]GTPγS Binding Assays. ACS Chem Neurosci 2019;10:4476-4491. [PMID: 31618004 DOI: 10.1021/acschemneuro.9b00302] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Abstract

G-protein-coupled receptors (GPCRs), also known as 7-transmembrane receptors, are the single largest class of drug targets. Consequently, a large amount of preclinical assays having GPCRs as molecular targets has been released to public sources like the Chemical European Molecular Biology Laboratory (ChEMBL) database. These data are also very complex covering changes in drug chemical structure and assay conditions like c₀ = activity parameter (K_i, IC₅₀, etc.), c₁ = target protein, c₂ = cell line, c₃ = assay organism, etc., making difficult the analysis of these databases that are placed in the borders of a Big Data challenge. One of the aims of this work is to develop a computational model able to predict new GPCRs targeting drugs taking into consideration multiple conditions of assay. Another objective is to perform new predictive and experimental studies of selective 5-HTA2 receptor agonist, antagonist, or inverse agonist in human comparing the results with those from the literature. In this work, we combined Perturbation Theory (PT) and Machine Learning (ML) to seek a general PTML model for this data set. We analyzed 343 738 unique compounds with 812 072 end points (assay outcomes), with 185 different experimental parameters, 592 protein targets, 51 cell lines, and/or 55 organisms (species). The best PTML linear model found has three input variables only and predicted 56 202/58 653 positive outcomes (sensitivity = 95.8%) and 470 230/550 401 control cases (specificity = 85.4%) in training series. The model also predicted correctly 18 732/19 549 (95.8%) of positive outcomes and 156 739/183 469 (85.4%) of cases in external validation series. To illustrate its practical use, we used the model to predict the outcomes of six different 5-HT2A receptor drugs, namely, TCB-2, DOI, DOB, altanserin, pimavanserin, and nelotanserin, in a very large number of different pharmacological assays. 5-HT2A receptors are altered in schizophrenia and represent drug target for antipsychotic therapeutic activity. The model correctly predicted 93.83% (76 of 86) experimental results for these compounds reported in ChEMBL. Moreover, [³⁵S]GTPγS binding assays were performed experimentally with the same six drugs with the aim of determining their potency and efficacy in the modulation of G-proteins in human brain tissue. The antagonist ketanserin was included as inactive drug with demonstrated affinity for 5-HT2A/C receptors. Our results demonstrate that some of these drugs, previously described as serotonin 5-HT2A receptor agonists, antagonists, or inverse agonists, are not so specific and show different intrinsic activity to that previously reported. Overall, this work opens a new gate for the prediction of GPCRs targeting compounds.

Collapse

Senderowitz H, Tropsha A. Materials Informatics. J Chem Inf Model 2019;58:2377-2379. [PMID: 30990041 DOI: 10.1021/acs.jcim.8b00927] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Pérez-Parras Toledano J, García-Pedrajas N, Cerruela-García G. Multilabel and Missing Label Methods for Binary Quantitative Structure-Activity Relationship Models: An Application for the Prediction of Adverse Drug Reactions. J Chem Inf Model 2019;59:4120-4130. [PMID: 31514503 DOI: 10.1021/acs.jcim.9b00611] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Ma R, Liu Z, Zhang Q, Liu Z, Luo T. Evaluating Polymer Representations via Quantifying Structure-Property Relationships. J Chem Inf Model 2019;59:3110-3119. [PMID: 31268306 DOI: 10.1021/acs.jcim.9b00358] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Abstract

Machine learning techniques are being applied in quantifying structure-property relationships for a wide variety of materials, where the properly represented materials play key roles. Although algorithms for representation learning are extensively studied, their applications to domain-specific areas, such as polymers, are limited largely due to the lack of benchmark databases. In this work, we investigate different types of polymer representations, including Morgan fingerprint (MF), molecular embedding (ME), and molecular graph (MG), based on the benchmark database from a subset of the well-known web-based polymer databases, PolyInfo. We evaluate the quality of different polymer representations via quantifying the relationships between the representations and polymer properties, including density, melting temperature, and glass transition temperature. Different representation learning schemes for MEs, such as supervised learning, semisupervised learning, and transfer learning, are investigated. In supervised learning, only labeled molecules in our benchmark database are used for representation learning, in semisupervised learning, both labeled and unlabeled molecules in our benchmark database are used, and in transfer learning, molecules from an external database that is different from the benchmark database are used for representation learning. It is found that ME (with the R² of 0.724 in the density case, 0.684 in the melting temperature case, and 0.865 in the glass transition temperature case) outperforms the other representations for structure-property relationship quantification in all cases studied, and MG (with the R² of 0.260 in the density case, -0.149 in the melting temperature case, and 0.711 in the glass transition case) is shown to be much inferior to ME and MF (with the R² of 0.562 in the density case, 0.645 in the melting temperature case, and 0.849 in the glass transition case), likely due to the relatively small volumes of training data available. For MEs, it is found that the similarities of substructure MEs under different learning schemes (e.g., SL, SSL, and TL) are differently estimated, thus leading to different performance scores in structure-property relation quantification. Combinations of MEs show little effect on predictive performance when comparing to the single MEs in the corresponding regression tasks, proving no information gain in mixing MEs.

Collapse

Speck-Planche A. Combining Ensemble Learning with a Fragment-Based Topological Approach To Generate New Molecular Diversity in Drug Discovery: In Silico Design of Hsp90 Inhibitors. ACS OMEGA 2018;3:14704-14716. [PMID: 30555986 PMCID: PMC6289491 DOI: 10.1021/acsomega.8b02419] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 10/23/2018] [Indexed: 05/05/2023]

Abstract

Machine learning methods have revolutionized modern science, providing fast and accurate solutions to multiple problems. However, they are commonly treated as "black boxes". Therefore, in important scientific fields such as medicinal chemistry and drug discovery, machine learning methods are restricted almost exclusively to the task of performing predictions of large and heterogeneous data sets of chemicals. The lack of interpretability prevents the full exploitation of the machine learning models as generators of new chemical knowledge. This work focuses on the development of an ensemble learning model for the prediction and design of potent dual heat shock protein 90 (Hsp90) inhibitors. The model displays accuracy higher than 80% in both training and test sets. To use the ensemble model as a generator of new chemical knowledge, three steps were followed. First, a physicochemical and/or structural interpretation was provided for each molecular descriptor present in the ensemble learning model. Second, the term "pseudolinear equation" was introduced within the context of machine learning to calculate the relative quantitative contributions of different molecular fragments to the inhibitory activity against the two Hsp90 isoforms studied here. Finally, by assembling the fragments with positive contributions, new molecules were designed, being predicted as potent Hsp90 inhibitors. According to Lipinski's rule of five, the designed molecules were found to exhibit potentially good oral bioavailability, a primordial property that chemicals must have to pass early stages in drug discovery. The present approach based on the combination of ensemble learning and fragment-based topological design holds great promise in drug discovery, and it can be adapted and applied to many different scientific disciplines.

Collapse