1
|
Mehrbakhsh Z, Hassanzadeh R, Behnampour N, Tapak L, Zarrin Z, Khazaei S, Dinu I. Machine learning-based evaluation of prognostic factors for mortality and relapse in patients with acute lymphoblastic leukemia: a comparative simulation study. BMC Med Inform Decis Mak 2024; 24:261. [PMID: 39285373 PMCID: PMC11404043 DOI: 10.1186/s12911-024-02645-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 08/21/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND Predicting mortality and relapse in children with acute lymphoblastic leukemia (ALL) is crucial for effective treatment and follow-up management. ALL is a common and deadly childhood cancer that often relapses after remission. In this study, we aimed to apply and evaluate machine learning-based models for predicting mortality and relapse in pediatric ALL patients. METHODS This retrospective cohort study was conducted on 161 children aged less than 16 years with ALL. Survival status (dead/alive) and patient experience of relapse (yes/no) were considered as the outcome variables. Ten machine learning (ML) algorithms were used to predict mortality and relapse. The performance of the algorithms was evaluated by cross-validation and reported as mean sensitivity, specificity, accuracy and area under the curve (AUC). Finally, prognostic factors were identified based on the best algorithms. RESULTS The mean accuracy of the ML algorithms for prediction of patient mortality ranged from 64 to 74% and for prediction of relapse, it varied from 64 to 84% on test data sets. The mean AUC of the ML algorithms for mortality and relapse was above 64%. The most important prognostic factors for predicting both mortality and relapse were identified as age at diagnosis, hemoglobin and platelets. In addition, significant prognostic factors for predicting mortality included clinical side effects such as splenomegaly, hepatomegaly and lymphadenopathy. CONCLUSIONS Our results showed that artificial neural networks and bagging algorithms outperformed other algorithms in predicting mortality, while boosting and random forest algorithms excelled in predicting relapse in ALL patients across all criteria. These results offer significant clinical insights into the prognostic factors for children with ALL, which can inform treatment decisions and improve patient outcomes.
Collapse
Affiliation(s)
- Zahra Mehrbakhsh
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
- Student Research Committee, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Roghayyeh Hassanzadeh
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
- Student Research Committee, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Nasser Behnampour
- Department of Biostatistics and Epidemiology, School of Health, Golestan University of Medical Sciences, Gorgan, Iran
| | - Leili Tapak
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.
- Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran.
| | - Ziba Zarrin
- Department of Photogrammetry and Remote Sensing, K.N. Toosi University of Technology, Tehran, Iran
| | - Salman Khazaei
- Health Sciences Research Center, Health Sciences & Technology Research Institute, Hamadan University of Medical Science, Hamadan, Iran
| | - Irina Dinu
- School of Public Health, University of Alberta, Edmonton, Canada
| |
Collapse
|
2
|
Seo S, Lee JW. Applications of Big Data and AI-Driven Technologies in CADD (Computer-Aided Drug Design). Methods Mol Biol 2024; 2714:295-305. [PMID: 37676605 DOI: 10.1007/978-1-0716-3441-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
In the field of computer-aided drug design (CADD), there has been dramatic progress in the development of big data and AI-driven methodologies. The expensive and time-consuming process of drug design is related to biomedical complexity. CADD can be used to apply effective and efficient strategies to overcome obstacles in the field of drug design in order to properly design and develop a new medicine. To prepare the raw data for consistent and repeatable applications of big data and AI methodologies, data pre-processing methods are introduced. Big data and AI technologies can be used to develop drugs in areas including predicting absorption, distribution, metabolism, excretion, and toxicity properties as well as finding binding sites in target proteins and conducting structure-based virtual screenings. The accurate and thorough analysis of large amounts of biomedical data as well as the design of prediction models in the area of drug design is made possible by data pre-processing and applications of big data and AI skills. In the biomedical big data era, knowledge on the biological, chemical, or pharmacological structures of biomedical entities relevant to drug design should be analyzed with significant big data and AI approaches.
Collapse
Affiliation(s)
- Seongmin Seo
- Department of Mechanical Engineering, Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea
| | - Jai Woo Lee
- Department of Big Data Science, College of Public Policy, Korea University, Sejong, Republic of Korea.
| |
Collapse
|
3
|
Arayeshgari M, Najafi-Ghobadi S, Tarhsaz H, Parami S, Tapak L. Machine Learning-based Classifiers for the Prediction of Low Birth Weight. Healthc Inform Res 2023; 29:54-63. [PMID: 36792101 PMCID: PMC9932310 DOI: 10.4258/hir.2023.29.1.54] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 12/29/2022] [Indexed: 02/11/2023] Open
Abstract
OBJECTIVES Low birth weight (LBW) is a global concern associated with fetal and neonatal mortality as well as adverse consequences such as intellectual disability, impaired cognitive development, and chronic diseases in adulthood. Numerous factors contribute to LBW and vary based on the region. The main objectives of this study were to compare four machine learning classifiers in the prediction of LBW and to determine the most important factors related to this phenomenon in Hamadan, Iran. METHODS We carried out a retrospective cross-sectional study on a dataset collected from Fatemieh Hospital in 2017 that included 741 mother-newborn pairs and 13 potential factors. Decision tree, random forest, artificial neural network, support vector machine, and logistic regression (LR) methods were used to predict LBW, with five evaluation criteria utilized to compare performance. RESULTS Our findings revealed a 7% prevalence of LBW. The average accuracy of all models was 87% or higher. The LR method provided a sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and accuracy of 74%, 89%, 7.04%, 29%, and 88%, respectively. Using LR, gestational age, number of abortions, gravida, consanguinity, maternal age at delivery, and neonatal sex were determined to be the six most important variables associated with LBW. CONCLUSIONS Our findings underscore the importance of facilitating timely diagnosis of causes of abortion, providing genetic counseling to consanguineous couples, and strengthening care before and during pregnancy (particularly for young mothers) to reduce LBW.
Collapse
Affiliation(s)
- Mahya Arayeshgari
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan,
Iran
| | - Somayeh Najafi-Ghobadi
- Department of Industrial Engineering, Faculty of Engineering, Kermanshah Branch, Islamic Azad University, Kermanshah,
Iran
| | - Hosein Tarhsaz
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan,
Iran
| | - Sharareh Parami
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan,
Iran
| | - Leili Tapak
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan,
Iran,Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan,
Iran
| |
Collapse
|
4
|
KHAZAEI SALMAN, NAJAFI-GhOBADI SOMAYEH, RAMEZANI-DOROH VAJIHE. Construction data mining methods in the prediction of death in hemodialysis patients using support vector machine, neural network, logistic regression and decision tree. JOURNAL OF PREVENTIVE MEDICINE AND HYGIENE 2021; 62:E222-E230. [PMID: 34322640 PMCID: PMC8283642 DOI: 10.15167/2421-4248/jpmh2021.62.1.1837] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 12/21/2021] [Indexed: 11/25/2022]
Abstract
Objectives Chronic kidney disease (CKD) is one of the main causes of morbidity and mortality worldwide. Detecting survival modifiable factors could help in prioritizing the clinical care and offers a treatment decision-making for hemodialysis patients. The aim of this study was to develop the best predictive model to explain the predictors of death in Hemodialysis patients by data mining techniques. Methods In this study, we used a dataset included records of 857 dialysis patients. Thirty-one potential risk factors, that might be associated with death in dialysis patients, were selected. The performances of four classifiers of support vector machine, neural network, logistic regression and decision tree were compared in terms of sensitivity, specificity, total accuracy, positive likelihood ratio and negative likelihood ratio. Results The average total accuracy of all methods was over 61%; the greatest total accuracy belonged to logistic regression (0.71). Also, logistic regression produced the greatest specificity (0.72), sensitivity (0.69), positive likelihood ratio (2.48) and the lowest negative likelihood ratio (0.43). Conclusions Logistic regression had the best performance in comparison to other methods for predicting death among hemodialysis patients. According to this model female gender, increasing age at diagnosis, addiction, low Iron level, C-reactive protein positive and low urea reduction ratio (URR) were the main predictors of death in these patients.
Collapse
Affiliation(s)
- SALMAN KHAZAEI
- Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran
| | - SOMAYEH NAJAFI-GhOBADI
- Department of Industrial Engineering, Faculty of Engineering, Kermanshah Branch, Islamic Azad University, Kermanshah, Iran
| | - VAJIHE RAMEZANI-DOROH
- Department of Health Management and Economics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
- Modeling of Non-communicable diseases research center, Hamadan University of Medical Sciences, Hamadan, Iran
- Correspondence: Vajihe Ramezani-Doroh, Hamadan University of Medical Sciences, Shahid Fahmide St., Pazhuhesh Square., Hamadan, Iran - Tel.: +98 9175375707 - E-mail:
| |
Collapse
|
5
|
Ahmadi-Jouybari T, Najafi-Ghobadi S, Karami-Matin R, Najafian-Ghobadi S, Najafi-Ghobadi K. Investigating factors affecting the interval between a burn and the start of treatment using data mining methods and logistic regression. BMC Med Res Methodol 2021; 21:71. [PMID: 33853547 PMCID: PMC8048305 DOI: 10.1186/s12874-021-01270-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 04/06/2021] [Indexed: 11/30/2022] Open
Abstract
Background Burn is a tragic event for an individual, the family, and community. It can cause irreparable physical, mental, economic, and social injury. Researches well documented that a quick visit to a healthcare center can greatly reduce burn injuries. Therefore, the aim of this study is to identify the effective factors in the interval between a burn and start of treatment in burn patients by comparing three classification data mining methods and logistic regression. Methods This cross-sectional study conducted on 389 hospitalized patients in Imam Khomeini Hospital of Kermanshah city since 2012 to 2015. The data collection instrument was a three-part questionnaire, including demographic information, geographical information, and burn information. Four classification methods (decision tree (DT), random forest (RF), support vector machine (SVM) and logistic regression (LR)) were used to identify the effective factors in the interval between burn and start of treatment (less than two hours and equal or more than two hours). Results The mean total accuracy of all models is higher than 0.8. The DT model has the highest mean total accuracy (0.87), sensitivity (0.44), positive likelihood ratio (14.58), negative predictive value (0.89) and positive predictive value (0.71). However, the specificity of the SVM model and RF model (0.99) was higher than other models, and the mean negative likelihood ratio (0.98) of the SVM model are higher than other models. Conclusions The results of this study shows that DT model performed better that data mining models in terms of total accuracy, sensitivity, positive likelihood ratio, negative predictive value and positive predictive value. Therefore, this method is a promising classifier for investigating the factors affecting the interval between a burn and the start of treatment in burn patients. Also, key factors based on DT model were location of transfer to hospital, place of occurrence, time of accident, religion, history and degree of burn, income, province of residence, burnt limbs and education.
Collapse
Affiliation(s)
- Touraj Ahmadi-Jouybari
- Clinical Research Development Center, Imam Khomeini and Mohammad Kermanshahi Hospitals, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Somayeh Najafi-Ghobadi
- Department of Industrial Engineering, Faculty of Engineering, Kermanshah Branch, Islamic Azad University, Kermanshah, Iran
| | - Reza Karami-Matin
- Burn Unit of Imam Khomeini Hospital Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Saeid Najafian-Ghobadi
- Department of Industrial Engineering, Faculty of Engineering, West Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Khadijeh Najafi-Ghobadi
- Clinical Research Development Center, Imam Khomeini and Mohammad Kermanshahi Hospitals, Kermanshah University of Medical Sciences, Kermanshah, Iran.
| |
Collapse
|