1
|
Thavanesan N, Farahi A, Parfitt C, Belkhatir Z, Azim T, Vallejos EP, Walters Z, Ramchurn S, Underwood TJ, Vigneswaran G. Insights from explainable AI in oesophageal cancer team decisions. Comput Biol Med 2024; 180:108978. [PMID: 39106674 DOI: 10.1016/j.compbiomed.2024.108978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 07/31/2024] [Accepted: 07/31/2024] [Indexed: 08/09/2024]
Abstract
BACKGROUND Clinician-led quality control into oncological decision-making is crucial for optimising patient care. Explainable artificial intelligence (XAI) techniques provide data-driven approaches to unravel how clinical variables influence this decision-making. We applied global XAI techniques to examine the impact of key clinical decision-drivers when mapped by a machine learning (ML) model, on the likelihood of receiving different oesophageal cancer (OC) treatment modalities by the multidisciplinary team (MDT). METHODS Retrospective analysis of 893 OC patients managed between 2010 and 2022 at our tertiary unit, used a random forests (RF) classifier to predict four possible treatment pathways as determined by the MDT: neoadjuvant chemotherapy followed by surgery (NACT + S), neoadjuvant chemoradiotherapy followed by surgery (NACRT + S), surgery-alone, and palliative management. Variable importance and partial dependence (PD) analyses then examined the influence of targeted high-ranking clinical variables within the ML model on treatment decisions as a surrogate model of the MDT decision-making dynamic. RESULTS Amongst guideline-variables known to determine treatments, such as Tumour-Node-Metastasis (TNM) staging, age also proved highly important to the RF model (16.1 % of total importance) on variable importance analysis. PD subsequently revealed that predicted probabilities for all treatment modalities change significantly after 75 years (p < 0.001). Likelihood of surgery-alone and palliative therapies increased for patients aged 75-85yrs but lowered for NACT/NACRT. Performance status divided patients into two clusters which influenced all predicted outcomes in conjunction with age. CONCLUSION XAI techniques delineate the relationship between clinical factors and OC treatment decisions. These techniques identify advanced age as heavily influencing decisions based on our model with a greater role in patients with specific tumour characteristics. This study methodology provides the means for exploring conscious/subconscious bias and interrogating inconsistencies in team-based decision-making within the era of AI-driven decision support.
Collapse
Affiliation(s)
| | - Arya Farahi
- Department of Statistics and Data Science, University of Texas at Austin, United States
| | | | - Zehor Belkhatir
- School of Electronics and Computer Science, University of Southampton, UK
| | - Tayyaba Azim
- School of Electronics and Computer Science, University of Southampton, UK
| | - Elvira Perez Vallejos
- School of Computer Science, Horizon Digital Economy Research, University of Nottingham, UK
| | - Zoë Walters
- School of Cancer Sciences, Faculty of Medicine, University of Southampton, UK
| | - Sarvapali Ramchurn
- School of Electronics and Computer Science, University of Southampton, UK
| | - Timothy J Underwood
- School of Cancer Sciences, Faculty of Medicine, University of Southampton, UK. https://twitter.com/TimTheSurgeon
| | - Ganesh Vigneswaran
- School of Cancer Sciences, Faculty of Medicine, University of Southampton, UK. https://twitter.com/ganesh_vignes
| |
Collapse
|
2
|
Wang X, Wang X, Cheng Y, Luo C, Xia W, Gao Z, Bu W, Jiang Y, Fei Y, Shi W, Tang J, Liu L, Zhu J, Zhao X. Construction of metal interpretable scoring system and identification of tungsten as a novel risk factor in COPD. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 283:116842. [PMID: 39106568 DOI: 10.1016/j.ecoenv.2024.116842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 07/24/2024] [Accepted: 08/02/2024] [Indexed: 08/09/2024]
Abstract
Numerous studies have highlighted the correlation between metal intake and deteriorated pulmonary function, emphasizing its pivotal role in the progression of Chronic Obstructive Pulmonary Disease (COPD). However, the efficacy of traditional models is often compromised due to overfitting and high bias in datasets with low-level exposure, rendering them ineffective in delineating the contemporary risk trends associated with pulmonary diseases. To address these limitations, we embarked on developing advanced, interpretable models, crucial for elucidating the intricate mechanisms of metal toxicity and enriching the domain knowledge embedded in toxicity models. In this endeavor, we scrutinized extensive, long-term metal exposure datasets from NHANES to explore the interplay between metal and pulmonary functionality. Employing a variety of machine-learning approaches, we opted for the "Mixer of Experts" model for its proficiency in identifying a myriad of toxicological trends and sensitivities. We conceptualized and illustrated the TSAP (Toxicity Score at Population-level), a metal interpretable scoring system offering performance nearly equivalent to the amalgamation of standard interpretable methods addressing the "black box" conundrum. This streamlined, bifurcated procedural analysis proved instrumental in discerning established risk factors, thereby uncovering Tungsten as a novel contributor to COPD risk. SYNOPSIS: TSAP achieved satisfied performance with transparent interpretability, suggesting tungsten intake need further action for COPD prevention.
Collapse
Affiliation(s)
- Xuehai Wang
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Xiangdong Wang
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Yulan Cheng
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Chao Luo
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Weiyi Xia
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Zhengnan Gao
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Wenxia Bu
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Yichen Jiang
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Yue Fei
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Weiwei Shi
- Nantong Hospital to Nanjing University of Chinese Medicine, China
| | - Juan Tang
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China
| | - Lei Liu
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China; Department of Pathology, Affiliated Hospital of Nantong University, Nantong 226001, China.
| | - Jinfeng Zhu
- Nantong Hospital to Nanjing University of Chinese Medicine, China.
| | - Xinyuan Zhao
- Department of Occupational Medicine and Environmental Toxicology, Nantong Key Laboratory of Environmental Toxicology, School of Public Health, Nantong University, Nantong 226019, China.
| |
Collapse
|
3
|
Wang M, Li Z, Zeng S, Wang Z, Ying Y, He W, Zhang Z, Wang H, Xu C. Explainable machine learning predicts survival of retroperitoneal liposarcoma: A study based on the SEER database and external validation in China. Cancer Med 2024; 13:e7324. [PMID: 38847519 PMCID: PMC11157677 DOI: 10.1002/cam4.7324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/15/2024] [Accepted: 05/12/2024] [Indexed: 06/10/2024] Open
Abstract
OBJECTIVE We have developed explainable machine learning models to predict the overall survival (OS) of retroperitoneal liposarcoma (RLPS) patients. This approach aims to enhance the explainability and transparency of our modeling results. METHODS We collected clinicopathological information of RLPS patients from The Surveillance, Epidemiology, and End Results (SEER) database and allocated them into training and validation sets with a 7:3 ratio. Simultaneously, we obtained an external validation cohort from The First Affiliated Hospital of Naval Medical University (Shanghai, China). We performed LASSO regression and multivariate Cox proportional hazards analysis to identify relevant risk factors, which were then combined to develop six machine learning (ML) models: Cox proportional hazards model (Coxph), random survival forest (RSF), ranger, gradient boosting with component-wise linear models (GBM), decision trees, and boosting trees. The predictive performance of these ML models was evaluated using the concordance index (C-index), the integrated cumulative/dynamic area under the curve (AUC), and the integrated Brier score, as well as the Cox-Snell residual plot. We also used time-dependent variable importance, analysis of partial dependence survival plots, and the generation of aggregated survival SHapley Additive exPlanations (SurvSHAP) plots to provide a global explanation of the optimal model. Additionally, SurvSHAP (t) and survival local interpretable model-agnostic explanations (SurvLIME) plots were used to provide a local explanation of the optimal model. RESULTS The final ML models are consisted of six factors: patient's age, gender, marital status, surgical history, as well as tumor's histopathological classification, histological grade, and SEER stage. Our prognostic model exhibits significant discriminative ability, particularly with the ranger model performing optimally. In the training set, validation set, and external validation set, the AUC for 1, 3, and 5 year OS are all above 0.83, and the integrated Brier scores are consistently below 0.15. The explainability analysis of the ranger model also indicates that histological grade, histopathological classification, and age are the most influential factors in predicting OS. CONCLUSIONS The ranger ML prognostic model exhibits optimal performance and can be utilized to predict the OS of RLPS patients, offering valuable and crucial references for clinical physicians to make informed decisions in advance.
Collapse
Affiliation(s)
- Maoyu Wang
- Department of UrologyShanghai Changhai Hospital, Naval Medical UniversityShanghaiChina
| | - Zhizhou Li
- Department of UrologyShanghai Changhai Hospital, Naval Medical UniversityShanghaiChina
| | - Shuxiong Zeng
- Department of UrologyShanghai Changhai Hospital, Naval Medical UniversityShanghaiChina
| | - Ziwei Wang
- Department of UrologyShanghai Changhai Hospital, Naval Medical UniversityShanghaiChina
| | - Yidie Ying
- Department of UrologyShanghai Changhai Hospital, Naval Medical UniversityShanghaiChina
| | - Wei He
- Department of UrologyShanghai Changhai Hospital, Naval Medical UniversityShanghaiChina
| | - Zhensheng Zhang
- Department of UrologyShanghai Changhai Hospital, Naval Medical UniversityShanghaiChina
| | - Huiqing Wang
- Department of UrologyShanghai Changhai Hospital, Naval Medical UniversityShanghaiChina
| | - Chuanliang Xu
- Department of UrologyShanghai Changhai Hospital, Naval Medical UniversityShanghaiChina
| |
Collapse
|
4
|
Jahangiri L. Predicting Neuroblastoma Patient Risk Groups, Outcomes, and Treatment Response Using Machine Learning Methods: A Review. Med Sci (Basel) 2024; 12:5. [PMID: 38249081 PMCID: PMC10801560 DOI: 10.3390/medsci12010005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/28/2023] [Accepted: 01/03/2024] [Indexed: 01/23/2024] Open
Abstract
Neuroblastoma, a paediatric malignancy with high rates of cancer-related morbidity and mortality, is of significant interest to the field of paediatric cancers. High-risk NB tumours are usually metastatic and result in survival rates of less than 50%. Machine learning approaches have been applied to various neuroblastoma patient data to retrieve relevant clinical and biological information and develop predictive models. Given this background, this study will catalogue and summarise the literature that has used machine learning and statistical methods to analyse data such as multi-omics, histological sections, and medical images to make clinical predictions. Furthermore, the question will be turned on its head, and the use of machine learning to accurately stratify NB patients by risk groups and to predict outcomes, including survival and treatment response, will be summarised. Overall, this study aims to catalogue and summarise the important work conducted to date on the subject of expression-based predictor models and machine learning in neuroblastoma for risk stratification and patient outcomes including survival, and treatment response which may assist and direct future diagnostic and therapeutic efforts.
Collapse
Affiliation(s)
- Leila Jahangiri
- School of Science and Technology, Nottingham Trent University, Clifton Site, Nottingham NG11 8NS, UK;
- Division of Cellular and Molecular Pathology, Addenbrookes Hospital, University of Cambridge, Cambridge CB2 0QQ, UK
| |
Collapse
|
5
|
Xu L, Guo C, Liu M. A weighted distance-based dynamic ensemble regression framework for gastric cancer survival time prediction. Artif Intell Med 2024; 147:102740. [PMID: 38184344 DOI: 10.1016/j.artmed.2023.102740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 10/28/2023] [Accepted: 11/28/2023] [Indexed: 01/08/2024]
Abstract
Accurate prediction of gastric cancer patient survival time is essential for clinical decision-making. However, unified static models lack specificity and flexibility in predictions owing to the varying survival outcomes among gastric cancer patients. We address these problems by using an ensemble learning approach and adaptively assigning greater weights to similar patients to make more targeted predictions when predicting an individual's survival time. We treat these problems as regression problems and introduce a weighted dynamic ensemble regression framework. To better identify similar patients, we devise a method to measure patient similarity, considering the diverse impacts of features. Subsequently, we use this measure to design both a weighted K-means clustering method and a fuzzy K-means sampling technique to group patients and train corresponding base regressors. To achieve more targeted predictions, we calculate the weight of each base regressor based on the similarity between the patient to be predicted and the patient clusters, culminating in the integration of the results. The model is validated on a dataset of 7791 patients, outperforming other models in terms of three evaluation metrics, namely, the root mean square error, mean absolute error, and the coefficient of determination. The weighted dynamic ensemble regression strategy can improve the baseline model by 1.75%, 2.12%, and 13.45% in terms of the three respective metrics while also mitigating the imbalanced survival time distribution issue. This enhanced performance has been statistically validated, even when tested on six public datasets with different sizes. By considering feature variations, patients with distinct survival profiles can be effectively differentiated, and the model predictive performance can be enhanced. The results generated by our proposed model can be invaluable in guiding decisions related to treatment plans and resource allocation. Furthermore, the model has the potential for broader applications in prognosis for other types of cancers or similar regression problems in various domains.
Collapse
Affiliation(s)
- Liangchen Xu
- Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Chonghui Guo
- Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Mucan Liu
- Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China.
| |
Collapse
|
6
|
Li M, Duan X, Li C, You D, Liu L. A novel clinical tool and risk stratification system for predicting the event-free survival of neuroblastoma patients: A TARGET-based study. Medicine (Baltimore) 2023; 102:e34925. [PMID: 37746942 PMCID: PMC10519501 DOI: 10.1097/md.0000000000034925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 07/11/2023] [Accepted: 08/03/2023] [Indexed: 09/26/2023] Open
Abstract
Neuroblastoma (NB), considered the most common non-intracranial solid tumor in children, accounts for nearly 8% of pediatric malignancies. This study aimed to develop a simple and practical nomogram to predict event-free survival (EFS) in NB patients and establish a new risk stratification system. In this study, 763 patients primarily diagnosed with NB in the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) database were included and randomly assigned to a training set (70%) and a validation set (30%) in a 7:3 ratio. First, the independent prognostic factors of EFS for NB patients were identified through univariate and multivariate Cox regression analyses. Second, a nomogram was created based on these factors and was validated for calibration capability, discriminative, and clinical significance by C-curves, receiver operating characteristic (ROC) curves, and decision curve analysis. Finally, a new risk stratification system was established for NB patients based on the nomogram. The univariate Cox analysis demonstrated that NB patients with age at diagnosis >318 days, International Neuroblastoma Staging System (INSS) stage 4, DNA diploidy, MYCN amplification status, and children oncology group (COG) high-risk group had a relatively poor prognosis. However, according to the multivariate Cox regression analysis, only age, INSS stage, and DNA ploidy were independent predictive factors in NB patients regarding EFS, and a nomogram was created based on these factors. The area under the curve (AUC) values of the ROC curves for the 3-, 5-, and 10-year EFS of this nomogram were 0.681, 0.706, and 0.720, respectively. Additionally, the AUC values of individual independent prognostic factors of EFS were lower than those of the nomogram, suggesting that the developed nomogram had a higher predictive reliability for prognosis. In addition, a new risk stratification system was developed to better stratify NB patients and provide clinical practitioners with a better reference for clinical decision-making. NB patients' EFS could be predicted more accurately and easily through the constructed nomogram and event-occurrence risk stratification system, allowing clinicians to better differentiate NB patients and establish individualized treatment plans to maximize patient benefits.
Collapse
Affiliation(s)
- Mingzhen Li
- Department of Radiation Oncology, China-Japan Union Hospital of Jilin University, Nanguan District, Changchun, Jilin, People’s Republic of China
| | - Xiaoying Duan
- Department of Acupuncture and moxibustion, Second Hospital of Jilin University, Nanguan District, Changchun, Jilin, People’s Republic of China
| | - Chunyan Li
- Department of Endocrinology, The Affiliated Hospital of Beihua University, Chuanying District, Jilin, People’s Republic of China
| | - Di You
- Department of Anesthesiology, China-Japan Union Hospital of Jilin University, Nanguan District, Changchun, Jilin, People’s Republic of China
| | - Linlin Liu
- Department of Radiation Oncology, China-Japan Union Hospital of Jilin University, Nanguan District, Changchun, Jilin, People’s Republic of China
| |
Collapse
|
7
|
Ye M, Zhang G, Lu Y, Ren S, Ji Y. Cuproptosis-related risk score based on machine learning algorithm predicts prognosis and characterizes tumor microenvironment in head and neck squamous carcinomas. Sci Rep 2023; 13:11870. [PMID: 37481622 PMCID: PMC10363129 DOI: 10.1038/s41598-023-38060-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 07/02/2023] [Indexed: 07/24/2023] Open
Abstract
Cuproptosis is a recently discovered type of programmed cell death that shows significant potential in the diagnosis and treatment of cancer. It has important significance in the prognosis of HSNC. This study aims to construct a cuproptosis-related prognostic model and risk score through new data analysis methods such as machine learning algorithms for the prognosis analysis of HSNC. Protein-protein interaction network and machine learning methods were employed to identify hub genes that were used to construct a TreeGradientBoosting model for predicting overall survival. The relationship between the risk scores obtained from the model and features such as tumor microenvironment (TME) and tumor immunity was explored. The C-indexes of the TreeGradientBoosting model in the training and validation cohorts were 0.776 and 0.848, respectively. The nomogram based on risk scores and clinical features showed good performance, and distinguished the TME and immunity between high-risk and low-risk groups. The cuproptosis-associated risk score can be used to predict prognoses, TME, and tumor immunity of HNSC patients.
Collapse
Affiliation(s)
- Maodong Ye
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, 515041, Guangdong, People's Republic of China.
| | - Guangping Zhang
- Shantou University Medical College, Shantou, 515041, Guangdong, People's Republic of China
| | - Yongjian Lu
- Shantou University Medical College, Shantou, 515041, Guangdong, People's Republic of China
| | - Shuai Ren
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, 515041, Guangdong, People's Republic of China.
| | - Yingchang Ji
- Medical Cosmetic Center, First Affiliated Hospital of Shantou University Medical College, Shantou, 515041, Guangdong, People's Republic of China.
| |
Collapse
|
8
|
Sun Q, Liang C, Chen T, Ji B, Liu R, Wang L, Tang M, Chen Y, Wang C. Early detection of myocardial ischemia in 12-lead ECG using deterministic learning and ensemble learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107124. [PMID: 36156437 DOI: 10.1016/j.cmpb.2022.107124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 08/18/2022] [Accepted: 09/09/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVE Early detection of myocardial ischemia is a necessary but difficult problem in cardiovascular diseases. Approaches that exclusively rely on classical ST and T wave changes on the standard 12-lead electrocardiogram (ECG) lack sufficient accuracy in detecting myocardial ischemia. This study aims to construct generalizable models for the detection of myocardial ischemia in patients with subtle ECG waveform changes (namely non-diagnostic ECG) using ensemble learning to integrate ECG dynamic features acquired via deterministic learning. METHODS First, cardiodynamicsgram (CDG), a noninvasive spatiotemporal electrocardiographic method, is generated through dynamic modeling of ECG signals using the deterministic learning algorithm. Then, the spectral fitting exponent, Lyapunov exponent, and Lempel-Ziv complexity are extracted from CDG. Subsequently, the bagging-based heterogeneous ensemble algorithm is applied on CDG features to generate diverse base classifiers and aggregate them with weighted voting to obtain an ensemble model for myocardial ischemia detection. Finally, we train and test the proposed heterogeneous ensemble model on a real-world clinical dataset. This dataset consists of 499 non-diagnostic 12-lead ECG records from 499 patients collected from three independent medical centers, including 383 patients with myocardial ischemia and 116 patients without ischemia. RESULTS With 10-times 5-fold cross-validation technology, our proposed method achieves an average accuracy of 89.10%, sensitivity of 91.72%, and specificity of 82.69% using the heterogeneous ensemble algorithm on the real-world clinical dataset. On three independent medical centers, our ensemble model also achieves accuracy performance over 82% for patients with non-diagnostic ECG. Furthermore, our ensemble model trained with real-world clinical data yields promising results of 91.11% accuracy, 90.49% sensitivity, and 92.88% specificity on the external test set of the public PTB dataset. CONCLUSION The experimental results demonstrate that the proposed model combining ensemble learning and deterministic learning presents excellent diagnostic accuracy and generalization in clinical practice, and could be implemented as a complement to the standard ECG in the clinical diagnosis of myocardial ischemia.
Collapse
Affiliation(s)
- Qinghua Sun
- Center for Intelligent Medical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Chunmiao Liang
- Center for Intelligent Medical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Tianrui Chen
- Center for Intelligent Medical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Bing Ji
- Center for Intelligent Medical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China
| | - Rugang Liu
- Department of Emergency, Qilu Hospital of Shandong University, Jinan, China
| | - Lei Wang
- Department of Cardiology, Shihezi People's Hospital, Shihezi, China
| | - Min Tang
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yuguo Chen
- Department of Emergency, Qilu Hospital of Shandong University, Jinan, China
| | - Cong Wang
- Center for Intelligent Medical Engineering, School of Control Science and Engineering, Shandong University, Jinan, China.
| |
Collapse
|
9
|
Shanbehzadeh M, Afrash MR, Mirani N, Kazemi-Arpanahi H. Comparing machine learning algorithms to predict 5-year survival in patients with chronic myeloid leukemia. BMC Med Inform Decis Mak 2022; 22:236. [PMID: 36068539 PMCID: PMC9450320 DOI: 10.1186/s12911-022-01980-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 08/30/2022] [Indexed: 12/03/2022] Open
Abstract
Introduction Chronic myeloid leukemia (CML) is a myeloproliferative disorder resulting from the translocation of chromosomes 19 and 22. CML includes 15–20% of all cases of leukemia. Although bone marrow transplant and, more recently, tyrosine kinase inhibitors (TKIs) as a first-line treatment have significantly prolonged survival in CML patients, accurate prediction using available patient-level factors can be challenging. We intended to predict 5-year survival among CML patients via eight machine learning (ML) algorithms and compare their performance.
Methods The data of 837 CML patients were retrospectively extracted and randomly split into training and test segments (70:30 ratio). The outcome variable was 5-year survival with potential values of alive or deceased. The dataset for the full features and important features selected by minimal redundancy maximal relevance (mRMR) feature selection were fed into eight ML techniques, including eXtreme gradient boosting (XGBoost), multilayer perceptron (MLP), pattern recognition network, k-nearest neighborhood (KNN), probabilistic neural network, support vector machine (SVM) (kernel = linear), SVM (kernel = RBF), and J-48. The scikit-learn library in Python was used to implement the models. Finally, the performance of the developed models was measured using some evaluation criteria with 95% confidence intervals (CI). Results Spleen palpable, age, and unexplained hemorrhage were identified as the top three effective features affecting CML 5-year survival. The performance of ML models using the selected-features was superior to that of the full-features dataset. Among the eight ML algorithms, SVM (kernel = RBF) had the best performance in tenfold cross-validation with an accuracy of 85.7%, specificity of 85%, sensitivity of 86%, F-measure of 87%, kappa statistic of 86.1%, and area under the curve (AUC) of 85% for the selected-features. Using the full-features dataset yielded an accuracy of 69.7%, specificity of 69.1%, sensitivity of 71.3%, F-measure of 72%, kappa statistic of 75.2%, and AUC of 70.1%. Conclusions Accurate prediction of the survival likelihood of CML patients can inform caregivers to promote patient prognostication and choose the best possible treatment path. While external validation is required, our developed models will offer customized treatment and may guide the prescription of personalized medicine for CML patients.
Collapse
Affiliation(s)
- Mostafa Shanbehzadeh
- Department of Health Information Technology, Faculty of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
| | - Mohammad Reza Afrash
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Nader Mirani
- Department of Treatment, Head of the Medical Truism, Zanjan University of Medical Sciences, Zanjan, Iran
| | - Hadi Kazemi-Arpanahi
- Department of Health Information Technology, Abadan University of Medical Sciences, Abadan, Iran. .,Department of Student Research Committee, Abadan University of Medical Sciences, Abadan, Iran.
| |
Collapse
|
10
|
Liu W, Wang S, Ye Z, Xu P, Xia X, Guo M. Prediction of lung metastases in thyroid cancer using machine learning based on SEER database. Cancer Med 2022; 11:2503-2515. [PMID: 35191613 PMCID: PMC9189456 DOI: 10.1002/cam4.4617] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 12/25/2021] [Accepted: 01/03/2022] [Indexed: 12/17/2022] Open
Abstract
PURPOSE Lung metastasis (LM) is one of the most frequent distant metastases of thyroid cancer (TC). This study aimed to develop a machine learning algorithm model to predict lung metastasis of thyroid cancer for providing relative information in clinical decision-making. METHODS Data comprising of demographic and clinicopathological characteristics of patients with thyroid cancer were extracted from the National Institutes of Health (NIH)'s Surveillance, Epidemiology, and End Results (SEER) database between 2010 and 2015, which is employed to develop six machine learning algorithm models support vector machine (SVM), logistic regression (LR), eXtreme gradient boosting (XGBoost), decision tree (DT), random forest (RF), and k-nearest neighbor (KNN). Compared and evaluated models by the following indicators: accuracy, precision, recall rate, F1-score, the area under the ROC curve (AUC) value and Brier score, and interpreted the association between clinicopathological characteristics and target variables based on the best model. RESULTS Nine thousand nine hundred and fifty patients were selected, which including 212 patients (2.1%) with lung metastasis, and 9738 patients without lung metastasis (97.9%). Multivariate logistic regression showed that age, T stage, N stage, and histological type were independent factors in TC with LM. Evaluation indicators of the best model- RF were as following: accuracy (0.99), recall rate (0.88), precision (0.61), F1-score (0.72), AUC value (0.99), and the Brier score (0.016). CONCLUSION RF learning model performed better and can be applied to forecast lung metastasis of thyroid cancer, and offer valuable and significant reference for clinicians' decision-making in advance.
Collapse
Affiliation(s)
- Wenfei Liu
- Department of Thyroid, Parathyroid, Breast and Hernia Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Shoufei Wang
- Department of Thyroid, Parathyroid, Breast and Hernia Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Ziheng Ye
- Department of Thyroid, Parathyroid, Breast and Hernia Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Peipei Xu
- Department of Thyroid, Parathyroid, Breast and Hernia Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Xiaotian Xia
- Department of Thyroid, Parathyroid, Breast and Hernia Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Minggao Guo
- Department of Thyroid, Parathyroid, Breast and Hernia Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| |
Collapse
|
11
|
Afrash MR, Shanbehzadeh M, Kazemi-Arpanahi H. Design and Development of an Intelligent System for Predicting 5-Year Survival in Gastric Cancer. CLINICAL MEDICINE INSIGHTS: ONCOLOGY 2022; 16:11795549221116833. [PMID: 36035639 PMCID: PMC9403452 DOI: 10.1177/11795549221116833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 07/13/2022] [Indexed: 11/17/2022] Open
Abstract
Background: Gastric cancer remains one of the leading causes of worldwide cancer-specific
deaths. Accurately predicting the survival likelihood of gastric cancer
patients can inform caregivers to boost patient prognostication and choose
the best possible treatment path. This study intends to develop an
intelligent system based on machine learning (ML) algorithms for predicting
the 5-year survival status in gastric cancer patients. Methods: A data set that includes the records of 974 gastric cancer patients
retrospectively was used. First, the most important predictors were
recognized using the Boruta feature selection algorithm. Five classifiers,
including J48 decision tree (DT), support vector machine (SVM) with radial
basic function (RBF) kernel, bootstrap aggregating (Bagging), hist gradient
boosting (HGB), and adaptive boosting (AdaBoost), were trained for
predicting gastric cancer survival. The performance of the used techniques
was evaluated with specificity, sensitivity, likelihood ratio, and total
accuracy. Finally, the system was developed according to the best model. Results: The stage, position, and size of tumor were selected as the 3 top predictors
for gastric cancer survival. Among the 6 selected ML algorithms, the HGB
classifier with the mean accuracy, mean specificity, mean sensitivity, mean
area under the curve, and mean F1-score of 88.37%, 86.24%, 89.72%, 88.11%,
and 89.91%, respectively, gained the best performance. Conclusions: The ML models can accurately predict the 5-year survival and potentially act
as a customized recommender for decision-making in gastric cancer patients.
The developed system in our study can improve the quality of treatment,
patient safety, and survival rates; it may guide prescribing more
personalized medicine.
Collapse
Affiliation(s)
- Mohammad Reza Afrash
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mostafa Shanbehzadeh
- Department of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
| | - Hadi Kazemi-Arpanahi
- Department of Health Information Technology, Abadan University of Medical Sciences, Abadan, Iran
- Student Research Committee, Abadan University of Medical Sciences, Abadan, Iran
| |
Collapse
|