1
|
He S, Segura Abarrategi J, Bediaga H, Arrasate S, González-Díaz H. On the additive artificial intelligence-based discovery of nanoparticle neurodegenerative disease drug delivery systems. BEILSTEIN JOURNAL OF NANOTECHNOLOGY 2024; 15:535-555. [PMID: 38774585 PMCID: PMC11106676 DOI: 10.3762/bjnano.15.47] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 04/23/2024] [Indexed: 05/24/2024]
Abstract
Neurodegenerative diseases are characterized by slowly progressing neuronal cell death. Conventional drug treatment strategies often fail because of poor solubility, low bioavailability, and the inability of the drugs to effectively cross the blood-brain barrier. Therefore, the development of new neurodegenerative disease drugs (NDDs) requires immediate attention. Nanoparticle (NP) systems are of increasing interest for transporting NDDs to the central nervous system. However, discovering effective nanoparticle neuronal disease drug delivery systems (N2D3Ss) is challenging because of the vast number of combinations of NP and NDD compounds, as well as the various assays involved. Artificial intelligence/machine learning (AI/ML) algorithms have the potential to accelerate this process by predicting the most promising NDD and NP candidates for assaying. Nevertheless, the relatively limited amount of reported data on N2D3S activity compared to assayed NDDs makes AI/ML analysis challenging. In this work, the IFPTML technique, which combines information fusion (IF), perturbation theory (PT), and machine learning (ML), was employed to address this challenge. Initially, we conducted the fusion into a unified dataset comprising 4403 NDD assays from ChEMBL and 260 NP cytotoxicity assays from journal articles. Through a resampling process, three new working datasets were generated, each containing 500,000 cases. We utilized linear discriminant analysis (LDA) along with artificial neural network (ANN) algorithms, such as multilayer perceptron (MLP) and deep learning networks (DLN), to construct linear and non-linear IFPTML models. The IFPTML-LDA models exhibited sensitivity (Sn) and specificity (Sp) values in the range of 70% to 73% (>375,000 training cases) and 70% to 80% (>125,000 validation cases), respectively. In contrast, the IFPTML-MLP and IFPTML-DLN achieved Sn and Sp values in the range of 85% to 86% for both training and validation series. Additionally, IFPTML-ANN models showed an area under the receiver operating curve (AUROC) of approximately 0.93 to 0.95. These results indicate that the IFPTML models could serve as valuable tools in the design of drug delivery systems for neurosciences.
Collapse
Affiliation(s)
- Shan He
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain
- IKERDATA S.L., ZITEK, UPV/EHU, Rectorate Building, nº6, 48940 Leioa, Greater Bilbao, Basque Country, Spain
| | - Julen Segura Abarrategi
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain
| | - Harbil Bediaga
- IKERDATA S.L., ZITEK, UPV/EHU, Rectorate Building, nº6, 48940 Leioa, Greater Bilbao, Basque Country, Spain
- Painting Department, Fine Arts Faculty, University of the Basque Country UPV/EHU, 48940, Leioa, Biscay, Basque Country, Spain
| | - Sonia Arrasate
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain
- Instituto Biofisika (UPV/EHU-CSIC), 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Biscay, Spain
| |
Collapse
|
2
|
Son J, Kim D. Applying network link prediction in drug discovery: an overview of the literature. Expert Opin Drug Discov 2024; 19:43-56. [PMID: 37794688 DOI: 10.1080/17460441.2023.2267020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 10/02/2023] [Indexed: 10/06/2023]
Abstract
INTRODUCTION Network representation can give a holistic view of relationships for biomedical entities through network topology. Link prediction estimates the probability of link formation between the pair of unconnected nodes. In the drug discovery process, the link prediction method not only enables the detection of connectivity patterns but also predicts the effects of one biomedical entity to multiple entities simultaneously and vice versa, which is useful for many applications. AREAS COVERED The authors provide a comprehensive overview of network link prediction in drug discovery. Link prediction methodologies such as similarity-based approaches, embedding-based approaches, probabilistic model-based approaches, and preprocessing methods are summarized with examples. In addition to describing their properties and limitations, the authors discuss the applications of link prediction in drug discovery based on the relationship between biomedical concepts. EXPERT OPINION Link prediction is a powerful method to infer the existence of novel relationships in drug discovery. However, link prediction has been hampered by the sparsity of data and the lack of negative links in biomedical networks. With preprocessing to balance positive and negative samples and the collection of more data, the authors believe it is possible to develop more reliable link prediction methods that can become invaluable tools for successful drug discovery.
Collapse
Affiliation(s)
- Jeongtae Son
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| |
Collapse
|
3
|
Diéguez-Santana K, Casañola-Martin GM, Torres R, Rasulev B, Green JR, González-Díaz H. Machine Learning Study of Metabolic Networks vs ChEMBL Data of Antibacterial Compounds. Mol Pharm 2022; 19:2151-2163. [PMID: 35671399 PMCID: PMC9986951 DOI: 10.1021/acs.molpharmaceut.2c00029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Antibacterial drugs (AD) change the metabolic status of bacteria, contributing to bacterial death. However, antibiotic resistance and the emergence of multidrug-resistant bacteria increase interest in understanding metabolic network (MN) mutations and the interaction of AD vs MN. In this study, we employed the IFPTML = Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) algorithm on a huge dataset from the ChEMBL database, which contains >155,000 AD assays vs >40 MNs of multiple bacteria species. We built a linear discriminant analysis (LDA) and 17 ML models centered on the linear index and based on atoms to predict antibacterial compounds. The IFPTML-LDA model presented the following results for the training subset: specificity (Sp) = 76% out of 70,000 cases, sensitivity (Sn) = 70%, and Accuracy (Acc) = 73%. The same model also presented the following results for the validation subsets: Sp = 76%, Sn = 70%, and Acc = 73.1%. Among the IFPTML nonlinear models, the k nearest neighbors (KNN) showed the best results with Sn = 99.2%, Sp = 95.5%, Acc = 97.4%, and Area Under Receiver Operating Characteristic (AUROC) = 0.998 in training sets. In the validation series, the Random Forest had the best results: Sn = 93.96% and Sp = 87.02% (AUROC = 0.945). The IFPTML linear and nonlinear models regarding the ADs vs MNs have good statistical parameters, and they could contribute toward finding new metabolic mutations in antibiotic resistance and reducing time/costs in antibacterial drug research.
Collapse
Affiliation(s)
- Karel Diéguez-Santana
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain.,Universidad Regional Amazónica IKIAM, Tena, Napo 150150, Ecuador
| | - Gerardo M Casañola-Martin
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States.,Department of Systems and Computer Engineering, Carleton University, K1S5B6 Ottawa, Ontario, Canada
| | - Roldan Torres
- Universidad Regional Amazónica IKIAM, Tena, Napo 150150, Ecuador
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
| | - James R Green
- Department of Systems and Computer Engineering, Carleton University, K1S5B6 Ottawa, Ontario, Canada
| | - Humbert González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain.,BIOFISIKA, Basque Center for Biophysics CSIC-UPVEH, 48940 Leioa, Spain.,IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Biscay, Spain
| |
Collapse
|
4
|
Diéguez-Santana K, González-Díaz H. Towards machine learning discovery of dual antibacterial drug-nanoparticle systems. NANOSCALE 2021; 13:17854-17870. [PMID: 34671801 DOI: 10.1039/d1nr04178a] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Artificial Intelligence/Machine Learning (AI/ML) algorithms may speed up the design of DADNP systems formed by Antibacterial Drugs (AD) and Nanoparticles (NP). In this work, we used IFPTML = Information Fusion (IF) + Perturbation-Theory (PT) + Machine Learning (ML) algorithm for the first time to study of a large dataset of putative DADNP systems composed by >165 000 ChEMBL AD assays and 300 NP assays vs. multiple bacteria species. We trained alternative models with Linear Discriminant Analysis (LDA), Artificial Neural Networks (ANN), Bayesian Networks (BNN), K-Nearest Neighbour (KNN) and other algorithms. IFPTML-LDA model was simpler with values of Sp ≈ 90% and Sn ≈ 74% in both training (>124 K cases) and validation (>41 K cases) series. IFPTML-ANN and KNN models are notably more complicated even when they are more balanced Sn ≈ Sp ≈ 88.5%-99.0% and AUROC ≈ 0.94-0.99 in both series. We also carried out a simulation (>1900 calculations) of the expected behavior for putative DADNPs in 72 different biological assays. The putative DADNPs studied are formed by 27 different drugs with multiple classes of NP and types of coats. In addition, we tested the validity of our additive model with 80 DADNP complexes experimentally synthetized and biologically tested (reported in >45 papers). All these DADNPs show values of MIC < 50 μg mL-1 (cutoff used) better that MIC of AD and NP alone (synergistic or additive effect). The assays involve DADNP complexes with 10 types of NP, 6 coating materials, NP size range 5-100 nm vs. 15 different antibiotics, and 12 bacteria species. The IFPTML-LDA model classified correctly 100% (80 out of 80) DADNP complexes as biologically active. IFPMTL additive strategy may become a useful tool to assist the design of DADNP systems for antibacterial therapy taking into consideration only information about AD and NP components by separate.
Collapse
Affiliation(s)
- Karel Diéguez-Santana
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, 48940 Leioa, Spain
- Basque Center for Biophysics CSIC-UPVEH, University of Basque Country UPV/EHU, 48940 Leioa, Spain.
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Biscay, Spain
| |
Collapse
|
5
|
Barreiro E, Munteanu CR, Cruz-Monteagudo M, Pazos A, González-Díaz H. Net-Net Auto Machine Learning (AutoML) Prediction of Complex Ecosystems. Sci Rep 2018; 8:12340. [PMID: 30120369 PMCID: PMC6098100 DOI: 10.1038/s41598-018-30637-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 07/24/2018] [Indexed: 11/09/2022] Open
Abstract
Biological Ecosystem Networks (BENs) are webs of biological species (nodes) establishing trophic relationships (links). Experimental confirmation of all possible links is difficult and generates a huge volume of information. Consequently, computational prediction becomes an important goal. Artificial Neural Networks (ANNs) are Machine Learning (ML) algorithms that may be used to predict BENs, using as input Shannon entropy information measures (Shk) of known ecosystems to train them. However, it is difficult to select a priori which ANN topology will have a higher accuracy. Interestingly, Auto Machine Learning (AutoML) methods focus on the automatic selection of the more efficient ML algorithms for specific problems. In this work, a preliminary study of a new approach to AutoML selection of ANNs is proposed for the prediction of BENs. We call it the Net-Net AutoML approach, because it uses for the first time Shk values of both networks involving BENs (networks to be predicted) and ANN topologies (networks to be tested). Twelve types of classifiers have been tested for the Net-Net model including linear, Bayesian, trees-based methods, multilayer perceptrons and deep neuronal networks. The best Net-Net AutoML model for 338,050 outputs of 10 ANN topologies for links of 69 BENs was obtained with a deep fully connected neuronal network, characterized by a test accuracy of 0.866 and a test AUROC of 0.935. This work paves the way for the application of Net-Net AutoML to other systems or ML algorithms.
Collapse
Affiliation(s)
- Enrique Barreiro
- Department of Computation, Computer Science Faculty, University of A Coruna (UDC), 15071, A Coruña, Spain.,Center for Computational Science (CCS), University of Miami (UM), Miami, 33136, FL, USA.,West Coast University, Miami Campus, 33178, FL, USA
| | - Cristian R Munteanu
- Department of Computation, Computer Science Faculty, University of A Coruna (UDC), 15071, A Coruña, Spain
| | - Maykel Cruz-Monteagudo
- Center for Computational Science (CCS), University of Miami (UM), Miami, 33136, FL, USA.,West Coast University, Miami Campus, 33178, FL, USA
| | - Alejandro Pazos
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), A Coruña, 15006, Spain
| | - Humbert González-Díaz
- Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940, Biscay, Spain. .,IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Biscay, Spain.
| |
Collapse
|
6
|
Quevedo-Tumailli VF, Ortega-Tenezaca B, González-Díaz H. Chromosome Gene Orientation Inversion Networks (GOINs) of Plasmodium Proteome. J Proteome Res 2018; 17:1258-1268. [DOI: 10.1021/acs.jproteome.7b00861] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Viviana F. Quevedo-Tumailli
- RNASA-IMEDIR,
Computer Science Faculty, University of A Coruña, 15071 A Coruña, Spain
- Universidad Estatal Amazónica UEA, Puyo, Pastaza, Ecuador
| | - Bernabé Ortega-Tenezaca
- RNASA-IMEDIR,
Computer Science Faculty, University of A Coruña, 15071 A Coruña, Spain
- Universidad Estatal Amazónica UEA, Puyo, Pastaza, Ecuador
- Universidad Regional Autónoma de los Andes UNIANDES-Puyo, Puyo, Pastaza,Ecuador
| | - Humbert González-Díaz
- Dept.
of Organic Chemistry II, University of the Basque Country UPV/EHU, 48940 Leioa, Biscay, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Biscay, Spain
| |
Collapse
|
7
|
Qiu T, Qiu J, Feng J, Wu D, Yang Y, Tang K, Cao Z, Zhu R. The recent progress in proteochemometric modelling: focusing on target descriptors, cross-term descriptors and application scope. Brief Bioinform 2016; 18:125-136. [PMID: 26873661 DOI: 10.1093/bib/bbw004] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 12/09/2015] [Indexed: 12/17/2022] Open
Abstract
As an extension of the conventional quantitative structure activity relationship models, proteochemometric (PCM) modelling is a computational method that can predict the bioactivity relations between multiple ligands and multiple targets. Traditional PCM modelling includes three essential elements: descriptors (including target descriptors, ligand descriptors and cross-term descriptors), bioactivity data and appropriate learning functions that link the descriptors to the bioactivity data. Since its appearance, PCM modelling has developed rapidly over the past decade by taking advantage of the progress of different descriptors and machine learning techniques, along with the increasing amounts of available bioactivity data. Specifically, the new emerging target descriptors and cross-term descriptors not only significantly increased the performance of PCM modelling but also expanded its application scope from traditional protein-ligand interaction to more abundant interactions, including protein-peptide, protein-DNA and even protein-protein interactions. In this review, target descriptors and cross-term descriptors, as well as the corresponding application scope, are intensively summarized. Additionally, we look forward to seeing PCM modelling extend into new application scopes, such as Target-Catalyst-Ligand systems, with the further development of descriptors, machine learning techniques and increasing amounts of available bioactivity data.
Collapse
|
8
|
Duardo-Sánchez A, Munteanu CR, Riera-Fernández P, López-Díaz A, Pazos A, González-Díaz H. Modeling Complex Metabolic Reactions, Ecological Systems, and Financial and Legal Networks with MIANN Models Based on Markov-Wiener Node Descriptors. J Chem Inf Model 2013; 54:16-29. [DOI: 10.1021/ci400280n] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Aliuska Duardo-Sánchez
- Department
of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, A Coruña, Spain
- Department of Special Public Law, Financial
and Tributary Law Area, Faculty of Law, University of Santiago de Compostela (USC), 15782, Santiago de Compostela, A Coruña, Spain
| | - Cristian R. Munteanu
- Department
of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, A Coruña, Spain
| | - Pablo Riera-Fernández
- Department
of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, A Coruña, Spain
| | - Antonio López-Díaz
- Department of Special Public Law, Financial
and Tributary Law Area, Faculty of Law, University of Santiago de Compostela (USC), 15782, Santiago de Compostela, A Coruña, Spain
| | - Alejandro Pazos
- Department
of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, Campus de Elviña, 15071, A Coruña, A Coruña, Spain
| | - Humberto González-Díaz
- Department of Organic Chemistry II, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), 48940, Leioa, Bizkaia, Spain
- IKERBASQUE, Basque
Foundation for Science, 48011, Bilbao, Biscay, Spain
| |
Collapse
|