1
|
Moulaei K, Afshari L, Moulaei R, Sabet B, Mousavi SM, Afrash MR. Explainable artificial intelligence for stroke prediction through comparison of deep learning and machine learning models. Sci Rep 2024; 14:31392. [PMID: 39733046 DOI: 10.1038/s41598-024-82931-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Accepted: 12/10/2024] [Indexed: 12/30/2024] Open
Abstract
Failure to predict stroke promptly may lead to delayed treatment, causing severe consequences like permanent neurological damage or death. Early detection using deep learning (DL) and machine learning (ML) models can enhance patient outcomes and mitigate the long-term effects of strokes. The aim of this study is to compare these models, exploring their efficacy in predicting stroke. This study analyzed a dataset comprising 663 records from patients hospitalized at Hazrat Rasool Akram Hospital in Tehran, Iran, including 401 healthy individuals and 262 stroke patients. A total of eight established ML (SVM, XGB, KNN, RF) and DL (DNN, FNN, LSTM, CNN) models were utilized to predict stroke. Techniques such as 10-fold cross-validation and hyperparameter tuning were implemented to prevent overfitting. The study also focused on interpretability through Shapley Additive Explanations (SHAP). The evaluation of model's performance was based on accuracy, specificity, sensitivity, F1-score, and ROC curve metrics. Among DL models, LSTM showed superior sensitivity at 96.15%, while FNN exhibited better specificity (96.0%), accuracy (96.0%), F1-score (95.0%), and ROC (98.0%) among DL models. For ML models, RF displayed higher sensitivity (99.9%), accuracy (99.0%), specificity (100%), F1-score (99.0%), and ROC (99.9%). Overall, RF outperformed all models, while DL models surpassed ML models in most metrics except for RF. DL models (CNN, LSTM, DNN, FNN) achieved sensitivities from 93.0 to 96.15%, specificities from 80.0 to 96.0%, accuracies from 92.0 to 96.0%, F1-scores from 87.34 to 95.0%, and ROC scores from 95.0 to 98.0%. In contrast, ML models (KNN, XGB, SVM) showed sensitivities between 29.0% and 94.0%, specificities between 89.47% and 96.0%, accuracies between 71.0% and 95.0%, F1-scores between 44.0% and 95.0%, and ROC scores between 64.0% and 95.0%. This study demonstrates the efficacy of DL and ML models in predicting stroke, with the RF models outperforming all others in key metrics. While DL models generally surpassed ML models, RF's exceptional performance highlights the potential of combining these technologies for early stroke detection, significantly improving patient outcomes by preventing severe consequences like permanent neurological damage or death.
Collapse
Affiliation(s)
- Khadijeh Moulaei
- Health Management and Economics Research Center, Health Management Research Institute, Iran University of Medical Sciences, Tehran, Iran
- Artificial Intelligence in Medical Sciences Research Center, Smart University of Medical Sciences, Tehran, Iran
| | - Lida Afshari
- Department of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
| | - Reza Moulaei
- Department of Orthopedic and Trauma Surgery, Shariati Hospital and School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Babak Sabet
- Artificial Intelligence in Medical Sciences Research Center, Smart University of Medical Sciences, Tehran, Iran
- Department of Surgery, Faculty of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed Mohammad Mousavi
- Medical Informatics Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
| | - Mohammad Reza Afrash
- Artificial Intelligence in Medical Sciences Research Center, Smart University of Medical Sciences, Tehran, Iran.
- Department of Artificial Intelligence in Medical Sciences Research Center, Smart University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
2
|
Trybus E, Trybus W. H1 Antihistamines-Promising Candidates for Repurposing in the Context of the Development of New Therapeutic Approaches to Cancer Treatment. Cancers (Basel) 2024; 16:4253. [PMID: 39766152 PMCID: PMC11674717 DOI: 10.3390/cancers16244253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Revised: 12/17/2024] [Accepted: 12/19/2024] [Indexed: 01/05/2025] Open
Abstract
Despite significant progress in the field of clinical oncology in terms of diagnostic and treatment methods, the results of anticancer therapy are still not fully satisfactory, especially due to limited response and high toxicity. This has forced the need for further research to finding alternative ways to improve success rates in oncological treatment. A good solution to this problem in the context of rapidly obtaining an effective drug that works on multiple levels of cancer and is also safe is the global strategy of repurposing an existing drug. Research into other applications of an existing drug enables a precise assessment of its possible mechanisms of action and, consequently, the broadening of therapeutic indications. This strategy is also supported by the fact that most non-oncological drugs have pleiotropic effects, and most of the diseases for which they were originally intended are multifactorial, which in turn is a very desirable phenomenon due to the heterogeneous and multifaceted biology of cancer. In this review, we will mainly focus on the anticancer potential of H1 antihistamines, especially the new generation that were not originally intended for cancer therapy, to highlight the relevant signaling pathways and discuss the properties of these agents for their judicious use based on the characteristic features of cancer.
Collapse
Affiliation(s)
- Ewa Trybus
- Department of Medical Biology, Jan Kochanowski University of Kielce, Uniwersytecka 7, 25-406 Kielce, Poland
| | - Wojciech Trybus
- Department of Medical Biology, Jan Kochanowski University of Kielce, Uniwersytecka 7, 25-406 Kielce, Poland
| |
Collapse
|
3
|
Amiri Souri E, Chenoweth A, Karagiannis SN, Tsoka S. Drug repurposing and prediction of multiple interaction types via graph embedding. BMC Bioinformatics 2023; 24:202. [PMID: 37193964 DOI: 10.1186/s12859-023-05317-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 04/30/2023] [Indexed: 05/18/2023] Open
Abstract
BACKGROUND Finding drugs that can interact with a specific target to induce a desired therapeutic outcome is key deliverable in drug discovery for targeted treatment. Therefore, both identifying new drug-target links, as well as delineating the type of drug interaction, are important in drug repurposing studies. RESULTS A computational drug repurposing approach was proposed to predict novel drug-target interactions (DTIs), as well as to predict the type of interaction induced. The methodology is based on mining a heterogeneous graph that integrates drug-drug and protein-protein similarity networks, together with verified drug-disease and protein-disease associations. In order to extract appropriate features, the three-layer heterogeneous graph was mapped to low dimensional vectors using node embedding principles. The DTI prediction problem was formulated as a multi-label, multi-class classification task, aiming to determine drug modes of action. DTIs were defined by concatenating pairs of drug and target vectors extracted from graph embedding, which were used as input to classification via gradient boosted trees, where a model is trained to predict the type of interaction. After validating the prediction ability of DT2Vec+, a comprehensive analysis of all unknown DTIs was conducted to predict the degree and type of interaction. Finally, the model was applied to propose potential approved drugs to target cancer-specific biomarkers. CONCLUSION DT2Vec+ showed promising results in predicting type of DTI, which was achieved via integrating and mapping triplet drug-target-disease association graphs into low-dimensional dense vectors. To our knowledge, this is the first approach that addresses prediction between drugs and targets across six interaction types.
Collapse
Affiliation(s)
- E Amiri Souri
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK
| | - A Chenoweth
- St. John's Institute of Dermatology, School of Basic and Medical Biosciences, Guy's Hospital, King's College London, London, SE1 9RT, UK
- Breast Cancer Now Research Unit, School of Cancer and Pharmaceutical Sciences, Guy's Cancer Centre, King's College London, London, SE1 9RT, UK
| | - S N Karagiannis
- St. John's Institute of Dermatology, School of Basic and Medical Biosciences, Guy's Hospital, King's College London, London, SE1 9RT, UK
- Breast Cancer Now Research Unit, School of Cancer and Pharmaceutical Sciences, Guy's Cancer Centre, King's College London, London, SE1 9RT, UK
| | - S Tsoka
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK.
| |
Collapse
|
4
|
Atas Guvenilir H, Doğan T. How to approach machine learning-based prediction of drug/compound-target interactions. J Cheminform 2023; 15:16. [PMID: 36747300 PMCID: PMC9901167 DOI: 10.1186/s13321-023-00689-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 01/30/2023] [Indexed: 02/08/2023] Open
Abstract
The identification of drug/compound-target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for protein featurization (including both conventional approaches and the novel learned embeddings), data preparation and exploration, machine learning-based modeling, and performance evaluation with the aim of achieving better data representations and more successful learning in DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of datasets into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, should be avoided, (ii) learned protein sequence embeddings work well in DTI prediction and offer high potential, despite interaction-related properties (e.g., structures) of proteins are unused during their self-supervised model training, and (iii) during the learning process, PCM models tend to rely heavily on compound features while partially ignoring protein features, primarily due to the inherent bias in DTI data, indicating the requirement for new and unbiased datasets. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery.
Collapse
Affiliation(s)
- Heval Atas Guvenilir
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.
- Institute of Informatics, Hacettepe University, Ankara, Turkey.
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey.
| |
Collapse
|
5
|
PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
6
|
DRUG REPOSITIONING FOR CANCER IN THE ERA OF BIG OMICS AND REAL-WORLD DATA. Crit Rev Oncol Hematol 2022; 175:103730. [DOI: 10.1016/j.critrevonc.2022.103730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 05/25/2022] [Accepted: 05/27/2022] [Indexed: 11/15/2022] Open
|
7
|
Amiri Souri E, Laddach R, Karagiannis SN, Papageorgiou LG, Tsoka S. Novel drug-target interactions via link prediction and network embedding. BMC Bioinformatics 2022; 23:121. [PMID: 35379165 PMCID: PMC8978405 DOI: 10.1186/s12859-022-04650-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 03/17/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND As many interactions between the chemical and genomic space remain undiscovered, computational methods able to identify potential drug-target interactions (DTIs) are employed to accelerate drug discovery and reduce the required cost. Predicting new DTIs can leverage drug repurposing by identifying new targets for approved drugs. However, developing an accurate computational framework that can efficiently incorporate chemical and genomic spaces remains extremely demanding. A key issue is that most DTI predictions suffer from the lack of experimentally validated negative interactions or limited availability of target 3D structures. RESULTS We report DT2Vec, a pipeline for DTI prediction based on graph embedding and gradient boosted tree classification. It maps drug-drug and protein-protein similarity networks to low-dimensional features and the DTI prediction is formulated as binary classification based on a strategy of concatenating the drug and target embedding vectors as input features. DT2Vec was compared with three top-performing graph similarity-based algorithms on a standard benchmark dataset and achieved competitive results. In order to explore credible novel DTIs, the model was applied to data from the ChEMBL repository that contain experimentally validated positive and negative interactions which yield a strong predictive model. Then, the developed model was applied to all possible unknown DTIs to predict new interactions. The applicability of DT2Vec as an effective method for drug repurposing is discussed through case studies and evaluation of some novel DTI predictions is undertaken using molecular docking. CONCLUSIONS The proposed method was able to integrate and map chemical and genomic space into low-dimensional dense vectors and showed promising results in predicting novel DTIs.
Collapse
Affiliation(s)
- E Amiri Souri
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK
| | - R Laddach
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK
- St. John's Institute of Dermatology, School of Basic and Medical Biosciences, King's College London, Guy's Hospital, London, SE1 9RT, UK
| | - S N Karagiannis
- St. John's Institute of Dermatology, School of Basic and Medical Biosciences, King's College London, Guy's Hospital, London, SE1 9RT, UK
- Breast Cancer Now Research Unit, School of Cancer and Pharmaceutical Sciences, King's College London, Guy's Cancer Centre, London, SE1 9RT, UK
| | - L G Papageorgiou
- Centre for Process Systems Engineering, Department of Chemical Engineering, University College London, Torrington Place, London, WC1E 7JE, UK
| | - S Tsoka
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK.
| |
Collapse
|
8
|
Staszak M, Staszak K, Wieszczycka K, Bajek A, Roszkowski K, Tylkowski B. Machine learning in drug design: Use of artificial intelligence to explore the chemical structure–biological activity relationship. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1568] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Maciej Staszak
- Institute of Technology and Chemical Engineering Poznan University of Technology Poznan Poland
| | - Katarzyna Staszak
- Institute of Technology and Chemical Engineering Poznan University of Technology Poznan Poland
| | - Karolina Wieszczycka
- Institute of Technology and Chemical Engineering Poznan University of Technology Poznan Poland
| | - Anna Bajek
- Department of Tissue Engineering Collegium Medicum, Nicolaus Copernicus University Bydgoszcz Poland
| | - Krzysztof Roszkowski
- Department of Oncology Collegium Medicum Nicolaus Copernicus University Bydgoszcz Poland
| | - Bartosz Tylkowski
- Department of Chemical Engineering University Rovira i Virgili Tarragona Spain
- Eurecat, Centre Tecnològic de Catalunya Chemical Technologies Unit Tarragona Spain
| |
Collapse
|
9
|
Jung S, Potapov I, Chillara S, Del Sol A. Leveraging systems biology for predicting modulators of inflammation in patients with COVID-19. SCIENCE ADVANCES 2021; 7:eabe5735. [PMID: 33536217 PMCID: PMC11323279 DOI: 10.1126/sciadv.abe5735] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 12/15/2020] [Indexed: 06/12/2023]
Abstract
Dysregulations in the inflammatory response of the body to pathogens could progress toward a hyperinflammatory condition amplified by positive feedback loops and associated with increased severity and mortality. Hence, there is a need for identifying therapeutic targets to modulate this pathological immune response. Here, we propose a single cell-based computational methodology for predicting proteins to modulate the dysregulated inflammatory response based on the reconstruction and analysis of functional cell-cell communication networks of physiological and pathological conditions. We validated the proposed method in 12 human disease datasets and performed an in-depth study of patients with mild and severe symptomatology of the coronavirus disease 2019 for predicting novel therapeutic targets. As a result, we identified the extracellular matrix protein versican and Toll-like receptor 2 as potential targets for modulating the inflammatory response. In summary, the proposed method can be of great utility in systematically identifying therapeutic targets for modulating pathological immune responses.
Collapse
Affiliation(s)
- Sascha Jung
- Computational Biology Group, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Bizkaia Technology Park, Derio 48160, Spain
| | - Ilya Potapov
- Computational Biology Group, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Samyukta Chillara
- Computational Biology Group, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Bizkaia Technology Park, Derio 48160, Spain
| | - Antonio Del Sol
- Computational Biology Group, CIC bioGUNE-BRTA (Basque Research and Technology Alliance), Bizkaia Technology Park, Derio 48160, Spain.
- Computational Biology Group, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
- IKERBASQUE, Basque Foundation for Science, Bilbao 48013, Spain
| |
Collapse
|