1
|
Li S, Yi H, Leng Q, Wu Y, Mao Y. New perspectives on cancer clinical research in the era of big data and machine learning. Surg Oncol 2024; 52:102009. [PMID: 38215544 DOI: 10.1016/j.suronc.2023.102009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 10/16/2023] [Indexed: 01/14/2024]
Abstract
In the 21st century, the development of medical science has entered the era of big data, and machine learning has become an essential tool for mining medical big data. The establishment of the SEER database has provided a wealth of epidemiological data for cancer clinical research, and the number of studies based on SEER and machine learning has been growing in recent years. This article reviews recent research based on SEER and machine learning and finds that the current focus of such studies is primarily on the development and validation of models using machine learning algorithms, with the main directions being lymph node metastasis prediction, distant metastasis prediction, and prognosis-related research. Compared to traditional models, machine learning algorithms have the advantage of stronger adaptability, but also suffer from disadvantages such as overfitting and poor interpretability, which need to be weighed in practical applications. At present, machine learning algorithms, as the foundation of artificial intelligence, have just begun to emerge in the field of cancer clinical research. The future development of oncology will enter a more precise era of cancer research, characterized by larger data, higher dimensions, and more frequent information exchange. Machine learning is bound to shine brightly in this field.
Collapse
Affiliation(s)
- Shujun Li
- Department of Hematology, Xiangya Hospital, Central South University, Changsha, 410008, China; National Clinical Research Center for Geriatric Diseases (Xiangya Hospital), China; Hunan Hematology Oncology Clinical Medical Research Center, China
| | - Hang Yi
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Qihao Leng
- Xiangya School of Medicine, Central South University, Changsha, 410013, Hunan Province, China
| | - You Wu
- Institute for Hospital Management, School of Medicine, Tsinghua University, 30 Shuangqing Rd, Haidian District, Beijing, China; Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, 21205, USA.
| | - Yousheng Mao
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| |
Collapse
|
2
|
Babaei Rikan S, Sorayaie Azar A, Naemi A, Bagherzadeh Mohasefi J, Pirnejad H, Wiil UK. Survival prediction of glioblastoma patients using modern deep learning and machine learning techniques. Sci Rep 2024; 14:2371. [PMID: 38287149 PMCID: PMC10824760 DOI: 10.1038/s41598-024-53006-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 01/25/2024] [Indexed: 01/31/2024] Open
Abstract
In this study, we utilized data from the Surveillance, Epidemiology, and End Results (SEER) database to predict the glioblastoma patients' survival outcomes. To assess dataset skewness and detect feature importance, we applied Pearson's second coefficient test of skewness and the Ordinary Least Squares method, respectively. Using two sampling strategies, holdout and five-fold cross-validation, we developed five machine learning (ML) models alongside a feed-forward deep neural network (DNN) for the multiclass classification and regression prediction of glioblastoma patient survival. After balancing the classification and regression datasets, we obtained 46,340 and 28,573 samples, respectively. Shapley additive explanations (SHAP) were then used to explain the decision-making process of the best model. In both classification and regression tasks, as well as across holdout and cross-validation sampling strategies, the DNN consistently outperformed the ML models. Notably, the accuracy were 90.25% and 90.22% for holdout and five-fold cross-validation, respectively, while the corresponding R2 values were 0.6565 and 0.6622. SHAP analysis revealed the importance of age at diagnosis as the most influential feature in the DNN's survival predictions. These findings suggest that the DNN holds promise as a practical auxiliary tool for clinicians, aiding them in optimal decision-making concerning the treatment and care trajectories for glioblastoma patients.
Collapse
Affiliation(s)
| | | | - Amin Naemi
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | | | - Habibollah Pirnejad
- Erasmus School of Health Policy and Management (ESHPM), Erasmus University Rotterdam, Rotterdam, The Netherlands.
- Patient Safety Research Center, Clinical Research Institute, Urmia University of Medical Sciences, Urmia, Iran.
| | - Uffe Kock Wiil
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
3
|
Azimi P, Yazdanian T, Zohrevand A, Ahmadiani A. Predicting Survival in Glioblastoma Using Gene Expression Databases: A Neural Network Analysis. INTERNATIONAL JOURNAL OF MOLECULAR AND CELLULAR MEDICINE 2024; 13:79-90. [PMID: 39156868 PMCID: PMC11329931 DOI: 10.22088/ijmcm.bums.13.1.79] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 06/08/2024] [Accepted: 06/08/2024] [Indexed: 08/20/2024]
Abstract
Glioblastoma (GBM) is the most aggressive and lethal brain tumor. Artificial neural networks (ANNs) have the potential to make accurate predictions and improve decision making. The aim of this study was to create an ANN model to predict 15-month survival in GBM patients according to gene expression databases. Genomic data of GBM were downloaded from the CGGA, TCGA, MYO, and CPTAC. Logistic regression (LR) and ANN model were used. Age, gender, IDH wild-type/mutant and the 31 most important genes from our previous study, were determined as input factors for the established ANN model. 15-month survival time was used to evaluate the results. The normalized importance scores of each covariate were calculated using the selected ANN model. The area under a receiver operating characteristic (ROC) curve (AUC), Hosmer-Lemeshow (H-L) statistic and accuracy of prediction were measured to evaluate the two models. SPSS 26 was utilized. A total of 551 patients (61% male, mean age 55.5 ± 13.3 years) patients were divided into training, testing, and validation datasets of 441, 55 and 55 patients, respectively. The main candidate genes found were: FN1, ICAM1, MYD88, IL10, and CCL2 with the ANN model; and MMP9, MYD88, and CDK4 with LR model. The AUCs were 0.71 for the LR and 0.81 for the ANN analysis. Compared to the LR model, the ANN model showed better results: Accuracy rate, 83.3 %; H-L statistic, 6.5 %; and AUC, 0.81 % of patients. The findings show that ANNs can accurately predict the 15-month survival in GBM patients and contribute to precise medical treatment.
Collapse
Affiliation(s)
- Parisa Azimi
- Neurosurgeon, Neuroscience Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran,
| | - Taravat Yazdanian
- Research Fellow at the Neurological Clinical Research Institute and Healey and AMG Center for ALS, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA.
| | - Amirhosein Zohrevand
- Department of Neurosurgery, School of Medicine, Babol University of Medical Sciences, Babol, Iran.
| | - Abolhassan Ahmadiani
- Neurosurgeon, Neuroscience Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran,
| |
Collapse
|
4
|
Nath G, Coursey A, Ekong J, Rastegari E, Sengupta S, Dag AZ, Delen D. Determining the temporal factors of survival associated with brain and nervous system cancer patients: A hybrid machine learning methodology. INTERNATIONAL JOURNAL OF HEALTHCARE MANAGEMENT 2023. [DOI: 10.1080/20479700.2023.2196101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Affiliation(s)
- Gopal Nath
- Department of Mathematics and Statistics, Murray State University, Murray, KY, USA
| | - Austin Coursey
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
| | - Joseph Ekong
- Department of Industrial Engineering, Western New England University, Springfield, MA, USA
| | - Elham Rastegari
- Department of Business, Intelligence and Analytics, Creighton University, Omaha, NE, USA
| | - Saptarshi Sengupta
- Department of Computer Science, San José State University, San José, CA, USA
| | - Asli Z. Dag
- Heider College of Business, Creighton University, Omaha, NE, USA
| | - Dursun Delen
- Spears School of Business, Oklahoma State University, Stillwater, OK, USA
- Faculty of Engineering and Natural Sciences, Istinye University, Istanbul, Turkey
| |
Collapse
|
5
|
Tang J, Wang X, Wan H, Lin C, Shao Z, Chang Y, Wang H, Wu Y, Zhang T, Du Y. Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage. BMC Med Inform Decis Mak 2022; 22:278. [PMID: 36284327 PMCID: PMC9594939 DOI: 10.1186/s12911-022-02018-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 10/10/2022] [Indexed: 11/28/2022] Open
Abstract
Background Outliers and class imbalance in medical data could affect the accuracy of machine learning models. For physicians who want to apply predictive models, how to use the data at hand to build a model and what model to choose are very thorny problems. Therefore, it is necessary to consider outliers, imbalanced data, model selection, and parameter tuning when modeling. Methods This study used a joint modeling strategy consisting of: outlier detection and removal, data balancing, model fitting and prediction, performance evaluation. We collected medical record data for all ICH patients with admissions in 2017–2019 from Sichuan Province. Clinical and radiological variables were used to construct models to predict mortality outcomes 90 days after discharge. We used stacking ensemble learning to combine logistic regression (LR), random forest (RF), artificial neural network (ANN), support vector machine (SVM), and k-nearest neighbors (KNN) models. Accuracy, sensitivity, specificity, AUC, precision, and F1 score were used to evaluate model performance. Finally, we compared all 84 combinations of the joint modeling strategy, including training set with and without cross-validated committees filter (CVCF), five resampling techniques (random under-sampling (RUS), random over-sampling (ROS), adaptive synthetic sampling (ADASYN), Borderline synthetic minority oversampling technique (Borderline SMOTE), synthetic minority oversampling technique and edited nearest neighbor (SMOTEENN)) and no resampling, seven models (LR, RF, ANN, SVM, KNN, Stacking, AdaBoost). Results Among 4207 patients with ICH, 2909 (69.15%) survived 90 days after discharge, and 1298 (30.85%) died within 90 days after discharge. The performance of all models improved with removing outliers by CVCF except sensitivity. For data balancing processing, the performance of training set without resampling was better than that of training set with resampling in terms of accuracy, specificity, and precision. And the AUC of ROS was the best. For seven models, the average accuracy, specificity, AUC, and precision of RF were the highest. Stacking performed best in F1 score. Among all 84 combinations of joint modeling strategy, eight combinations performed best in terms of accuracy (0.816). For sensitivity, the best performance was SMOTEENN + Stacking (0.662). For specificity, the best performance was CVCF + KNN (0.987). Stacking and AdaBoost had the best performances in AUC (0.756) and F1 score (0.602), respectively. For precision, the best performance was CVCF + SVM (0.938). Conclusion This study proposed a joint modeling strategy including outlier detection and removal, data balancing, model fitting and prediction, performance evaluation, in order to provide a reference for physicians and researchers who want to build their own models. This study illustrated the importance of outlier detection and removal for machine learning and showed that ensemble learning might be a good modeling strategy. Due to the low imbalanced ratio (IR, the ratio of majority class and minority class) in this study, we did not find any improvement in models with resampling in terms of accuracy, specificity, and precision, while ROS performed best on AUC. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-02018-x.
Collapse
Affiliation(s)
- Jianxiang Tang
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Xiaoyu Wang
- Department of Neurosurgery, West China Hospital of Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Hongli Wan
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China.,Health Emergency Management Research Center, West China-PUMC C.C. Chen Institute of Health, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Chunying Lin
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China.,Health Emergency Management Research Center, West China-PUMC C.C. Chen Institute of Health, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Zilun Shao
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China.,Health Emergency Management Research Center, West China-PUMC C.C. Chen Institute of Health, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Yang Chang
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China.,Health Emergency Management Research Center, West China-PUMC C.C. Chen Institute of Health, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Hexuan Wang
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China.,Health Emergency Management Research Center, West China-PUMC C.C. Chen Institute of Health, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Yi Wu
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China.,Health Emergency Management Research Center, West China-PUMC C.C. Chen Institute of Health, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Tao Zhang
- Department of Epidemiology and Health Statistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China. .,Health Emergency Management Research Center, West China-PUMC C.C. Chen Institute of Health, Sichuan University, Chengdu, Sichuan, People's Republic of China.
| | - Yu Du
- Health Emergency Management Research Center, West China-PUMC C.C. Chen Institute of Health, Sichuan University, Chengdu, Sichuan, People's Republic of China. .,Department of Emergency and Critical Care Medicine, West China School of Public Health, West China Fourth Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China.
| |
Collapse
|
6
|
Liu W, Zhu Y, Lin C, Liu L, Li G. An Online Prognostic Application for Melanoma Based on Machine Learning and Statistics. J Plast Reconstr Aesthet Surg 2022; 75:3853-3858. [DOI: 10.1016/j.bjps.2022.06.069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 05/07/2022] [Accepted: 06/10/2022] [Indexed: 11/16/2022]
|
7
|
Pasquini L, Napolitano A, Lucignani M, Tagliente E, Dellepiane F, Rossi-Espagnet MC, Ritrovato M, Vidiri A, Villani V, Ranazzi G, Stoppacciaro A, Romano A, Di Napoli A, Bozzao A. AI and High-Grade Glioma for Diagnosis and Outcome Prediction: Do All Machine Learning Models Perform Equally Well? Front Oncol 2021; 11:601425. [PMID: 34888226 PMCID: PMC8649764 DOI: 10.3389/fonc.2021.601425] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 11/02/2021] [Indexed: 12/30/2022] Open
Abstract
Radiomic models outperform clinical data for outcome prediction in high-grade gliomas (HGG). However, lack of parameter standardization limits clinical applications. Many machine learning (ML) radiomic models employ single classifiers rather than ensemble learning, which is known to boost performance, and comparative analyses are lacking in the literature. We aimed to compare ML classifiers to predict clinically relevant tasks for HGG: overall survival (OS), isocitrate dehydrogenase (IDH) mutation, O-6-methylguanine-DNA-methyltransferase (MGMT) promoter methylation, epidermal growth factor receptor vIII (EGFR) amplification, and Ki-67 expression, based on radiomic features from conventional and advanced magnetic resonance imaging (MRI). Our objective was to identify the best algorithm for each task. One hundred fifty-six adult patients with pathologic diagnosis of HGG were included. Three tumoral regions were manually segmented: contrast-enhancing tumor, necrosis, and non-enhancing tumor. Radiomic features were extracted with a custom version of Pyradiomics and selected through Boruta algorithm. A Grid Search algorithm was applied when computing ten times K-fold cross-validation (K=10) to get the highest mean and lowest spread of accuracy. Model performance was assessed as AUC-ROC curve mean values with 95% confidence intervals (CI). Extreme Gradient Boosting (xGB) obtained highest accuracy for OS (74,5%), Adaboost (AB) for IDH mutation (87.5%), MGMT methylation (70,8%), Ki-67 expression (86%), and EGFR amplification (81%). Ensemble classifiers showed the best performance across tasks. High-scoring radiomic features shed light on possible correlations between MRI and tumor histology.
Collapse
Affiliation(s)
- Luca Pasquini
- Neuroradiology Service, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, United States
- Neuroradiology Unit, Neuroscience, Mental Health and Sensory Organs (NESMOS) Department, Sant’Andrea Hospital, La Sapienza University, Rome, Italy
| | - Antonio Napolitano
- Medical Physics Department, Bambino Gesù Children’s Hospital, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS), Rome, Italy
| | - Martina Lucignani
- Medical Physics Department, Bambino Gesù Children’s Hospital, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS), Rome, Italy
| | - Emanuela Tagliente
- Medical Physics Department, Bambino Gesù Children’s Hospital, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS), Rome, Italy
| | - Francesco Dellepiane
- Neuroradiology Unit, Neuroscience, Mental Health and Sensory Organs (NESMOS) Department, Sant’Andrea Hospital, La Sapienza University, Rome, Italy
| | - Maria Camilla Rossi-Espagnet
- Neuroradiology Unit, Neuroscience, Mental Health and Sensory Organs (NESMOS) Department, Sant’Andrea Hospital, La Sapienza University, Rome, Italy
- Neuroradiology Unit, Imaging Department, Bambino Gesù Children’s Hospital, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS), Rome, Italy
| | - Matteo Ritrovato
- Unit of Health Technology Assessment (HTA), Biomedical Technology Risk Manager, Bambino Gesù Children’s Hospital, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS), Rome, Italy
| | - Antonello Vidiri
- Radiology and Diagnostic Imaging Department, Regina Elena National Cancer Institute, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS), Rome, Italy
| | - Veronica Villani
- Neuro-Oncology Unit, Regina Elena National Cancer Institute, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS), Rome, Italy
| | - Giulio Ranazzi
- Department of Clinical and Molecular Medicine, Surgical Pathology Units, Sant’Andrea Hospital, La Sapienza University, Rome, Italy
| | - Antonella Stoppacciaro
- Department of Clinical and Molecular Medicine, Surgical Pathology Units, Sant’Andrea Hospital, La Sapienza University, Rome, Italy
| | - Andrea Romano
- Neuroradiology Unit, Neuroscience, Mental Health and Sensory Organs (NESMOS) Department, Sant’Andrea Hospital, La Sapienza University, Rome, Italy
| | - Alberto Di Napoli
- Neuroradiology Unit, Neuroscience, Mental Health and Sensory Organs (NESMOS) Department, Sant’Andrea Hospital, La Sapienza University, Rome, Italy
- Radiology Department, Castelli Romani Hospital, Rome, Italy
| | - Alessandro Bozzao
- Neuroradiology Unit, Neuroscience, Mental Health and Sensory Organs (NESMOS) Department, Sant’Andrea Hospital, La Sapienza University, Rome, Italy
| |
Collapse
|