1
|
Hu D, Wang Y, Ji G, Liu Y. Using machine learning algorithms to predict the prognosis of advanced nasopharyngeal carcinoma after intensity-modulated radiotherapy. Curr Probl Cancer 2024; 48:101040. [PMID: 37979476 DOI: 10.1016/j.currproblcancer.2023.101040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 10/09/2023] [Accepted: 11/03/2023] [Indexed: 11/20/2023]
Abstract
BACKGROUND The prognosis of advanced nasopharyngeal carcinoma (NPC) patients after intensity-modulated radiotherapy (IMRT) has not been well studied. We aimed to construct prognostic models for advanced NPC patients with stage III-IV after their first treatment with IMRT by using machine learning algorithms and to identify the most important predictors. METHODS A total of 427 patients treated in Meizhou People's Hospital in Guangdong province, China from January 1, 2013 to December 12, 2018 were enrolled in this study, with an average follow-up period of 7.16 years from July 2020 to March 2021. Candidate predictors were selected from demographics, clinical features, medical examinations and test results. Three machine learning algorithms were applied to construct advanced NPC prognostic models: logistic regression (LR), decision tree (DT), and random forest (RF). Area under the receiver operating characteristic curve (AUC) was used to evaluate the model performance. The important predictors of the optimal model for unfavourable prognosis were identified and ranked. RESULTS There were 50 (11.7%) NPC-related deaths observed in this study. The mean age of all participants was 49.39±11.29 years, of whom 299 (70.0%) were males. In general, RF showed the best predictive performance with the highest AUC (0.753, 95% CI: 0.609, 0.896), compared to LR (0.736, 95% confidence interval (CI): 0.590, 0.881), and DT (0.720, 95% CI: 0.520, 0.921). The six most important predictors identified by RF were Epstein-Barr virus deoxyribonucleic acid, aspartate aminotransferase, body mass index, age, blood glucose level, and alanine aminotransferase. CONCLUSIONS We proposed RF as a simple and accurate tool for the evaluation of the prognosis of advanced NPC patients after the treatment with IMRT in clinical settings.
Collapse
Affiliation(s)
- Dan Hu
- Department of Radiation Oncology, Center for Cancer Prevention and Treatment, Meizhou People's Hospital, Meizhou Academy of Medical Sciences, Meizhou, China.
| | - Ying Wang
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Genxin Ji
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou China
| | - Yu Liu
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
2
|
Vanmathi P, Jose D. An ensemble-based serial cascaded attention network and improved variational auto encoder for breast cancer prognosis prediction using data. Comput Methods Biomech Biomed Engin 2024; 27:98-115. [PMID: 38006210 DOI: 10.1080/10255842.2023.2280883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023]
Abstract
Breast cancer is one of the most common types of cancer in women and it produces a huge amount of death rate in the world. Early recognition is lessening its impact. The early recognition of breast cancer could convince patients to receive surgical therapy, which will significantly improve the chance of restoration. This information is used by the machine learning technique to find links between them and appraise our forecasts of fresh occurrences. Later recognition of breast cancer can lead to death. An accurate prescient framework for breast cancer prediction is urgently needed in the current era. In order to accomplish the objective, an adaptive ensemble model is proposed for breast cancer prognosis prediction using data. At the initial stage, the raw data are fetched from benchmark datasets. It is then followed by data cleaning and preprocessing. Subsequently, the pre-processed data is fed into the Improved Variational Autoencoder (IVAE), where the deep features are extracted. Finally, the resultant features are given as input to the Ensemble-based Serial Cascaded Attention Network (ESCANet), which is built with Deep Temporal Convolution Network (DTCN), Bi-directional Long Short-Term Memory (BiLSTM), and Recurrent Neural Network (RNN). The effectiveness of the model is validated and compared with conventional methodologies. Therefore, the results elucidate that the proposed methodology achieves extensive results; thus, it increases the system's efficiency.
Collapse
Affiliation(s)
- P Vanmathi
- Full time Research Scholar, Department of ECE, KCG College of Technology, Karapakkam, Chennai, Tamil Nadu, India
| | - Deepa Jose
- Professor, Department of ECE, KCG College of Technology, Karapakkam, Chennai, Tamil Nadu, India
| |
Collapse
|
3
|
Nazari E, Naderi H, Tabadkani M, ArefNezhad R, Farzin AH, Dashtiahangar M, Khazaei M, Ferns GA, Mehrabian A, Tabesh H, Avan A. Breast cancer prediction using different machine learning methods applying multi factors. J Cancer Res Clin Oncol 2023; 149:17133-17146. [PMID: 37773467 DOI: 10.1007/s00432-023-05388-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 09/01/2023] [Indexed: 10/01/2023]
Abstract
OBJECTIVE Breast cancer (BC) is a multifactorial disease and is one of the most common cancers globally. This study aimed to compare different machine learning (ML) techniques to develop a comprehensive breast cancer risk prediction model based on features of various factors. METHODS The population sample contained 810 records (115 cancer patients and 695 healthy individuals). 45 attributes out of 85 were selected based on the opinion of experts. These selected attributes are in genetic, biochemical, biomarker, gender, demographic and pathological factors. 13 Machine learning models were trained with proposed attributes and coefficient of attributes and internal relationships were calculated. RESULT Compared to other methods random forest (RF) has higher performance (accuracy 99.26%, precision 99%, and area under the curve (AUC) 99%). The results of assessing the impact and correlation of variables using the RF method based on PCA indicated that pathology, biomarker, biochemistry, gene, and demographic factors with a coefficient of 0.35, 0.23, 0.15, 0.14, and 0.13 respectively, affected the risk of BC (r2 = 0.54). CONCLUSION Breast cancer has several risk factors. Medical experts use these risk factors for early diagnosis. Therefore, identifying related risk factors and their effect can increase the accuracy of diagnosis. Considering the broad features for predicting breast cancer leads to the development of a comprehensive prediction model. In this study, using RF technique a breast cancer prediction model with 99.3% accuracy was developed based on multifactorial features.
Collapse
Affiliation(s)
- Elham Nazari
- Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hamid Naderi
- Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahla Tabadkani
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Reza ArefNezhad
- Halal Research Center of IRI, FDA, Tehran, Iran
- Department of Anatomy, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | | | | | - Majid Khazaei
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Division of Medical Education, Brighton & Sussex Medical School, Falmer, Brighton, BN1 9PH, Sussex, UK
| | - Amin Mehrabian
- Warwick Medical School, University of Warwick, Coventry, UK
| | - Hamed Tabesh
- Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Amir Avan
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
- Faculty of Health, School of Biomedical Sciences, Queensland University of Technology, Brisbane, QLD, Australia.
- College of Medicine, University of Warith Al-Anbiyaa, Karbala, Iraq.
| |
Collapse
|
4
|
Plisson M, Moll A, Sarrazin V, Charles D, Antoine T, Ionescu R, Koehren O, Raymond E. Methods for Inclusive Underwriting of Breast Cancer Risk with Machine Learning and Innovative Algorithms. J Insur Med 2023; 50:36-48. [PMID: 37725502 DOI: 10.17849/insm-50-1-36-48.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 03/21/2023] [Indexed: 09/21/2023]
Abstract
INTRODUCTION -Due to early detection and improved therapies, the prevalence of long-term breast cancer survivors is increasing. This has increased the need for more inclusive underwriting in individuals with a history of breast cancer. Herein, we developed a method using algorithm aiming facilitating the underwriting of multiple parameters in breast cancer survivors. METHODS -Variables and data were extracted from the SEER database and analyzed using 4 different machine learning based algorithms (Logistic Regression, GA2M, Random Forest, and XGBoost) that were compared with Kaplan Meier survival estimates. The performances of these algorithms have been compared with multiple metrics (Log Loss, AUC, and SMR). In situ (non-invasive) and metastatic breast cancer were excluded from this analysis. RESULTS -Parameters included the pathological subtype, pTNM staging (T: tumor size, N; number of nodes; M presence or absence of metastases), Scarff-Bloom-Richardson grading, the expression of estrogen and progesterone hormone receptors were selected to predict the individual outcome at any time point from diagnosis. While all models had identical performance in terms of statistical metrics (AUC, Log Loss, and SMR), the logistic regression was the one and only model that respects all business constraints and was intelligible for medical and underwriting users. CONCLUSION -This study provides insight to develop algorithms to set underwriter-friendly calculators for more accurate risk estimations that can be used to rationalize insurance pricing for breast cancer survivors. This study supports the development of a more inclusive underwriting based on models that can encompass the heterogeneity of several malignancies such as breast cancer.
Collapse
Affiliation(s)
- Manuel Plisson
- SCOR Global Life, Knowledge Team, 5 Avenue Kléber, 75795 Paris Cedex 16, France
| | - Antoine Moll
- SCOR Global Life, Knowledge Team, 5 Avenue Kléber, 75795 Paris Cedex 16, France
| | - Valentine Sarrazin
- SCOR Global Life, Knowledge Team, 5 Avenue Kléber, 75795 Paris Cedex 16, France
| | - Denis Charles
- SCOR Global Life, Knowledge Team, 5 Avenue Kléber, 75795 Paris Cedex 16, France
- Université de Poitiers, CRIEF
| | - Thibault Antoine
- SCOR Global Life, Knowledge Team, 5 Avenue Kléber, 75795 Paris Cedex 16, France
| | - Razvan Ionescu
- SCOR Global Life, Knowledge Team, 5 Avenue Kléber, 75795 Paris Cedex 16, France
| | - Odile Koehren
- SCOR Global Life, Knowledge Team, 5 Avenue Kléber, 75795 Paris Cedex 16, France
| | - Eric Raymond
- SCOR Global Life, Knowledge Team, 5 Avenue Kléber, 75795 Paris Cedex 16, France
- Université de Poitiers, CRIEF
- Department of Oncology, Groupe Hospitalier Paris Saint Joseph, 185 Rue Raymond Losserand, 75014 Paris, France
| |
Collapse
|
5
|
González-Castro L, Chávez M, Duflot P, Bleret V, Martin AG, Zobel M, Nateqi J, Lin S, Pazos-Arias JJ, Del Fiol G, López-Nores M. Machine Learning Algorithms to Predict Breast Cancer Recurrence Using Structured and Unstructured Sources from Electronic Health Records. Cancers (Basel) 2023; 15:2741. [PMID: 37345078 DOI: 10.3390/cancers15102741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/26/2023] [Accepted: 05/06/2023] [Indexed: 06/23/2023] Open
Abstract
Recurrence is a critical aspect of breast cancer (BC) that is inexorably tied to mortality. Reuse of healthcare data through Machine Learning (ML) algorithms offers great opportunities to improve the stratification of patients at risk of cancer recurrence. We hypothesized that combining features from structured and unstructured sources would provide better prediction results for 5-year cancer recurrence than either source alone. We collected and preprocessed clinical data from a cohort of BC patients, resulting in 823 valid subjects for analysis. We derived three sets of features: structured information, features from free text, and a combination of both. We evaluated the performance of five ML algorithms to predict 5-year cancer recurrence and selected the best-performing to test our hypothesis. The XGB (eXtreme Gradient Boosting) model yielded the best performance among the five evaluated algorithms, with precision = 0.900, recall = 0.907, F1-score = 0.897, and area under the receiver operating characteristic AUROC = 0.807. The best prediction results were achieved with the structured dataset, followed by the unstructured dataset, while the combined dataset achieved the poorest performance. ML algorithms for BC recurrence prediction are valuable tools to improve patient risk stratification, help with post-cancer monitoring, and plan more effective follow-up. Structured data provides the best results when fed to ML algorithms. However, an approach based on natural language processing offers comparable results while potentially requiring less mapping effort.
Collapse
Affiliation(s)
| | - Marcela Chávez
- Department of Information System Management, Centre Hospitalier Universitaire de Liège, 4000 Liège, Belgium
| | - Patrick Duflot
- Department of Information System Management, Centre Hospitalier Universitaire de Liège, 4000 Liège, Belgium
| | - Valérie Bleret
- Senology Department, Centre Hospitalier Universitaire de Liège, 4000 Liège, Belgium
| | | | - Marc Zobel
- Science Department, Symptoma GmbH, 1030 Vienna, Austria
| | - Jama Nateqi
- Science Department, Symptoma GmbH, 1030 Vienna, Austria
- Department of Internal Medicine, Paracelsus Medical University, 5020 Salzburg, Austria
| | - Simon Lin
- Science Department, Symptoma GmbH, 1030 Vienna, Austria
- Department of Internal Medicine, Paracelsus Medical University, 5020 Salzburg, Austria
| | - José J Pazos-Arias
- atlanTTic Research Center, Department of Telematics Engineering, University of Vigo, 36310 Vigo, Spain
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT 84108, USA
| | - Martín López-Nores
- atlanTTic Research Center, Department of Telematics Engineering, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
6
|
Dubey S, Tiwari G, Singh S, Goldberg S, Pinsky E. Using machine learning for healthcare treatment planning. Front Artif Intell 2023; 6:1124182. [PMID: 37181733 PMCID: PMC10167842 DOI: 10.3389/frai.2023.1124182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 04/03/2023] [Indexed: 05/16/2023] Open
Abstract
We present a methodology for using machine learning for planning treatments. As a case study, we apply the proposed methodology to Breast Cancer. Most of the application of Machine Learning to breast cancer has been on diagnosis and early detection. By contrast, our paper focuses on applying Machine Learning to suggest treatment plans for patients with different disease severity. While the need for surgery and even its type is often obvious to a patient, the need for chemotherapy and radiation therapy is not as obvious to the patient. With this in mind, the following treatment plans were considered in this study: chemotherapy, radiation, chemotherapy with radiation, and none of these options (only surgery). We use real data from more than 10,000 patients over 6 years that includes detailed cancer information, treatment plans, and survival statistics. Using this data set, we construct Machine Learning classifiers to suggest treatment plans. Our emphasis in this effort is not only on suggesting the treatment plan but on explaining and defending a particular treatment choice to the patient.
Collapse
Affiliation(s)
- Snigdha Dubey
- Department of Computer Science, Metropolitan College, Boston University, Boston, MA, United States
| | - Gaurav Tiwari
- Department of Computer Science, Metropolitan College, Boston University, Boston, MA, United States
| | - Sneha Singh
- Department of Computer Science, Metropolitan College, Boston University, Boston, MA, United States
| | - Saveli Goldberg
- Department of Radiation Oncology Mass General Hospital, Boston, MA, United States
| | - Eugene Pinsky
- Department of Computer Science, Metropolitan College, Boston University, Boston, MA, United States
| |
Collapse
|
7
|
Guttà C, Morhard C, Rehm M. Applying a GAN-based classifier to improve transcriptome-based prognostication in breast cancer. PLoS Comput Biol 2023; 19:e1011035. [PMID: 37011102 PMCID: PMC10101642 DOI: 10.1371/journal.pcbi.1011035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 04/13/2023] [Accepted: 03/17/2023] [Indexed: 04/05/2023] Open
Abstract
Established prognostic tests based on limited numbers of transcripts can identify high-risk breast cancer patients yet are approved only for individuals presenting with specific clinical features or disease characteristics. Deep learning algorithms could hold potential for stratifying patient cohorts based on full transcriptome data, yet the development of robust classifiers is hampered by the number of variables in omics datasets typically far exceeding the number of patients. To overcome this hurdle, we propose a classifier based on a data augmentation pipeline consisting of a Wasserstein generative adversarial network (GAN) with gradient penalty and an embedded auxiliary classifier to obtain a trained GAN discriminator (T-GAN-D). Applied to 1244 patients of the METABRIC breast cancer cohort, this classifier outperformed established breast cancer biomarkers in separating low- from high-risk patients (disease specific death, progression or relapse within 10 years from initial diagnosis). Importantly, the T-GAN-D also performed across independent, merged transcriptome datasets (METABRIC and TCGA-BRCA cohorts), and merging data improved overall patient stratification. In conclusion, the reiterative GAN-based training process allowed generating a robust classifier capable of stratifying low- vs high-risk patients based on full transcriptome data and across independent and heterogeneous breast cancer cohorts.
Collapse
Affiliation(s)
- Cristiano Guttà
- Institute of Cell Biology and Immunology, University of Stuttgart, Stuttgart, Germany
| | | | - Markus Rehm
- Institute of Cell Biology and Immunology, University of Stuttgart, Stuttgart, Germany
- Stuttgart Research Center Systems Biology, University of Stuttgart, Stuttgart, Germany
| |
Collapse
|
8
|
Zeng L, Liu L, Chen D, Lu H, Xue Y, Bi H, Yang W. The innovative model based on artificial intelligence algorithms to predict recurrence risk of patients with postoperative breast cancer. Front Oncol 2023; 13:1117420. [PMID: 36959794 PMCID: PMC10029918 DOI: 10.3389/fonc.2023.1117420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 02/16/2023] [Indexed: 03/09/2023] Open
Abstract
Purpose This study aimed to develop a machine learning model to retrospectively study and predict the recurrence risk of breast cancer patients after surgery by extracting the clinicopathological features of tumors from unstructured clinical electronic health record (EHR) data. Methods This retrospective cohort included 1,841 breast cancer patients who underwent surgical treatment. To extract the principal features associated with recurrence risk, the clinical notes and histopathology reports of patients were collected and feature engineering was used. Predictive models were next conducted based on this important information. All algorithms were implemented using Python software. The accuracy of prediction models was further verified in the test cohort. The area under the curve (AUC), precision, recall, and F1 score were adopted to evaluate the performance of each model. Results A training cohort with 1,289 patients and a test cohort with 552 patients were recruited. From 2011 to 2019, a total of 1,841 textual reports were included. For the prediction of recurrence risk, both LSTM, XGBoost, and SVM had favorable accuracies of 0.89, 0.86, and 0.78. The AUC values of the micro-average ROC curve corresponding to LSTM, XGBoost, and SVM were 0.98 ± 0.01, 0.97 ± 0.03, and 0.92 ± 0.06. Especially the LSTM model achieved superior execution than other models. The accuracy, F1 score, macro-avg F1 score (0.87), and weighted-avg F1 score (0.89) of the LSTM model produced higher values. All P values were statistically significant. Patients in the high-risk group predicted by our model performed more resistant to DNA damage and microtubule targeting drugs than those in the intermediate-risk group. The predicted low-risk patients were not statistically significant compared with intermediate- or high-risk patients due to the small sample size (188 low-risk patients were predicted via our model, and only two of them were administered chemotherapy alone after surgery). The prognosis of patients predicted by our model was consistent with the actual follow-up records. Conclusions The constructed model accurately predicted the recurrence risk of breast cancer patients from EHR data and certainly evaluated the chemoresistance and prognosis of patients. Therefore, our model can help clinicians to formulate the individualized management of breast cancer patients.
Collapse
Affiliation(s)
- Lixuan Zeng
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Lei Liu
- Department of Breast Surgery, The Third Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Dongxin Chen
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Henghui Lu
- Department of Dermatology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yang Xue
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Hongjie Bi
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Weiwei Yang
- Department of Pathology, Harbin Medical University, Harbin, China
| |
Collapse
|
9
|
Wu M, Zhao T, Zhang Q, Zhang T, Wang L, Sun G. Prognostic analysis of breast cancer in Xinjiang based on Cox proportional hazards model and two-step cluster method. Front Oncol 2023; 12:1044945. [PMID: 36733362 PMCID: PMC9887128 DOI: 10.3389/fonc.2022.1044945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 12/29/2022] [Indexed: 01/19/2023] Open
Abstract
Objective To examine the factors that affect the prognosis and survival of breast cancer patients who were diagnosed at the Affiliated Cancer Hospital of Xinjiang Medical University between 2015 and 2021, forecast the overall survival (OS), and assess the clinicopathological traits and risk level of prognosis of patients in various subgroups. Method First, nomogram model was constructed using the Cox proportional hazards models to identify the independent prognostic factors of breast cancer patients. In order to assess the discrimination, calibration, and clinical utility of the model, additional tools such as the receiver operating characteristic (ROC) curve, calibration curve, and clinical decision curve analysis (DCA) were used. Finally, using two-step cluster analysis (TCA), the patients were grouped in accordance with the independent prognostic factors. Kaplan-Meier survival analysis was employed to compare prognostic risk among various subgroups. Result T-stage, N-stage, M-stage, molecular subtyping, type of operation, and involvement in postoperative chemotherapy were identified as the independent prognostic factors. The nomogram was subsequently constructed and confirmed. The area under the ROC curve used to predict 1-, 3-, 5- and 7-year OS were 0.848, 0.820, 0.813, and 0.791 in the training group and 0.970, 0.898, 0.863, and 0.798 in the validation group, respectively. The calibration curves of both groups were relatively near to the 45° reference line. And the DCA curve further demonstrated that the nomogram has a higher clinical utility. Furthermore, using the TCA, the patients were divided into two subgroups. Additionally, the two groups' survival curves were substantially different. In particular, in the group with the worse prognosis (the majority of patients did not undergo surgical therapy or postoperative chemotherapy treatment), the T-, N-, and M-stage were more prevalent in the advanced, and the total points were likewise distributed in the high score side. Conclusion For the survival and prognosis of breast cancer patients in Xinjiang, the nomogram constructed in this paper has a good prediction value, and the clustering results further demonstrated that the selected factors were important. This conclusion can give a scientific basis for tailored treatment and is conducive to the formulation of focused treatment regimens for patients in practical practice.
Collapse
Affiliation(s)
- Mengjuan Wu
- Country College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Ting Zhao
- Department of Medical Record Management, The Affiliated Cancer Hospital of Xinjiang Medical University, Urumqi, Xinjiang, China
| | - Qian Zhang
- Information Management and Big Date Center, The Affiliated Cancer Hospital of Xinjiang Medical University, Urumqi, Xinjiang, China
| | - Tao Zhang
- Country College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Lei Wang
- Department of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, China,*Correspondence: Lei Wang, ; Gang Sun,
| | - Gang Sun
- Xinjiang Cancer Center/Key Laboratory of Oncology of Xinjiang Uyghur Autonomous Region, Urumqi, Xinjiang, China,Department of Breast and Thyroid Surgery, The Affiliated Cancer Hospital of Xinjiang Medical University, Urumqi, Xinjiang, China,*Correspondence: Lei Wang, ; Gang Sun,
| |
Collapse
|
10
|
Zhang F, Zhang Y, Zhu X, Chen X, Du H, Zhang X. PregGAN: A prognosis prediction model for breast cancer based on conditional generative adversarial networks. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 224:107026. [PMID: 35872384 DOI: 10.1016/j.cmpb.2022.107026] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 07/13/2022] [Accepted: 07/13/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE Generative adversarial network (GAN) is able to learn from a set of training data and generate new data with the same characteristics as the training data. Based on the characteristics of GAN, this paper developed its capability as a tool of disease prognosis prediction, and proposed a prognostic model PregGAN based on conditional generative adversarial network (CGAN). METHODS The idea of PregGAN is to generate the prognosis prediction results based on the clinical data of patients. PregGAN added the clinical data as conditions to the training process. Conditions were used as the input to the generator along with noises. The generator synthesized new samples using the noises vectors and the conditions. In order to solve the mode collapse problem during PregGAN training, Wasserstein distance and gradient penalty strategy were used to make the training process more stable. RESULTS In the prognosis prediction experiments using the METABRIC breast cancer dataset, PregGAN achieved good results, with the average accurate (ACC) of 90.6% and the average AUC (area under curve) of 0.946. CONCLUSIONS Experimental results show that PregGAN is a reliable prognosis predictive model for breast cancer. Due to the strong ability of probability distribution learning, PregGAN can also be used for the prognosis prediction of other diseases.
Collapse
Affiliation(s)
- Fan Zhang
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng 475004, China; Henan Engineering Laboratory of Spatial Information Processing, Henan University, Kaifeng 475004, China
| | - Yingqi Zhang
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng 475004, China
| | - Xiaoke Zhu
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng 475004, China
| | - Xiaopan Chen
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng 475004, China
| | - Haishun Du
- School of Artificial Intelligence, Henan University, Kaifeng 475004, China
| | - Xinhong Zhang
- School of Software, Henan University, Kaifeng 475004, China.
| |
Collapse
|
11
|
Afrash MR, Bayani A, Shanbehzadeh M, Bahadori M, Kazemi-Arpanahi H. Developing the breast cancer risk prediction system using hybrid machine learning algorithms. JOURNAL OF EDUCATION AND HEALTH PROMOTION 2022; 11:272. [PMID: 36325225 PMCID: PMC9621357 DOI: 10.4103/jehp.jehp_42_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 02/12/2022] [Accepted: 02/21/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Breast cancer (BC) is the most common cause of cancer-related deaths in women globally. Currently, many machine learning (ML)-based predictive models have been established to assist clinicians in decision making for the prediction of BC. However, preventing risk factor formation even with having healthy lifestyle behaviors or preventing disease at early stages can significantly lead to optimal population-wide BC health. Thus, we aimed to develop a prediction model by using a genetic algorithm (GA) incorporating several ML algorithms for the prediction and early warning of BC. MATERIAL AND METHODS The data of 3168 healthy individuals and 1742 patient case records in the BC Registry Database in Ayatollah Taleghani hospital, Abadan, Iran were analyzed. First, a modified hybrid GA was used to perform feature selection and optimization of selected features. Then, with the use of selected features, several ML algorithms were trained to predict BC. Afterward, the performance of each model was measured in terms of accuracy, precision, sensitivity, specificity, and receiver operating characteristic (ROC) curve metrics. Finally, a clinical decision support system based on the best model was developed. RESULTS After performing feature selection, age, consumption of dairy products, BC family history, breast biopsy, chest X-ray, hormone therapy, alcohol consumption, being overweight, having children, and education statuses were selected as the most important features for prediction of BC. The experimental results showed that the decision tree yielded a superior performance than other ML models, with values of 99.3%, 99.5%, 98.26% for accuracy, specificity, and sensitivity, respectively. CONCLUSION The developed predictive system can accurately identify persons who are at elevated risk for BC and can be used as an essential clinical screening tool for the early prevention of BC and serve as an important tool for developing preventive health strategies.
Collapse
Affiliation(s)
- Mohammad R. Afrash
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Azadeh Bayani
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mostafa Shanbehzadeh
- Department of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
| | - Mohammadkarim Bahadori
- Health Management Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Hadi Kazemi-Arpanahi
- Department of Health Information Management and Technology, Abadan University of Medical Sciences, Abadan, Iran
- Student Research Committee, Abadan University of Medical Sciences, Abadan, Iran
| |
Collapse
|
12
|
Pan LC, Wu XR, Lu Y, Zhang HQ, Zhou YL, Liu X, Liu SL, Yan QY. Artificial intelligence empowered Digital Health Technologies in Cancer Survivorship Care: a scoping review. Asia Pac J Oncol Nurs 2022; 9:100127. [PMID: 36176267 PMCID: PMC9513729 DOI: 10.1016/j.apjon.2022.100127] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 07/29/2022] [Indexed: 12/03/2022] Open
Abstract
Objective The objectives of this systematic review are to describe features and specific application scenarios for current cancer survivorship care services of Artificial intelligence (AI)-driven digital health technologies (DHTs) and to explore the acceptance and briefly evaluate its feasibility in the application process. Methods Search for literatures published from 2010 to 2022 on sites MEDLINE, IEEE-Xplor, PubMed, Embase, Cochrane Central Register of Controlled Trials and Scopus systematically. The types of literatures include original research, descriptive study, randomized controlled trial, pilot study, and feasible or acceptable study. The literatures above described current status and effectiveness of digital medical technologies based on AI and used in cancer survivorship care services. Additionally, we use QuADS quality assessment tool to evaluate the quality of literatures included in this review. Results 43 studies that met the inclusion criteria were analyzed and qualitatively synthesized. The current status and results related to the application of AI-driven DHTs in cancer survivorship care were reviewed. Most of these studies were designed specifically for breast cancer survivors’ care and focused on the areas of recurrence or secondary cancer prediction, clinical decision support, cancer survivability prediction, population or treatment stratified, anti-cancer treatment-induced adverse reaction prediction, and so on. Applying AI-based DHTs to cancer survivors actually has shown some positive outcomes, including increased motivation of patient-reported outcomes (PROs), reduce fatigue and pain levels, improved quality of life, and physical function. However, current research mostly explored the technology development and formation (testing) phases, with limited-scale population, and single-center trial. Therefore, it is not suitable to draw conclusions that the effectiveness of AI-based DHTs in supportive cancer care, as most of applications are still in the early stage of development and feasibility testing. Conclusions While digital therapies are promising in the care of cancer patients, more high-quality studies are still needed in the future to demonstrate the effectiveness of digital therapies in cancer care. Studies should explore how to develop uniform standards for measuring patient-related outcomes, ensure the scientific validity of research methods, and emphasize patient and health practitioner involvement in the development and use of technology.
Collapse
Affiliation(s)
- Lu-Chen Pan
- Department of Nursing, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Xiao-Ru Wu
- School of Nursing, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Ying Lu
- Department of Nursing, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
- School of Nursing, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Han-Qing Zhang
- Health Science Center, Yangtze University, Jinzhou 434023, China
| | - Yao-Ling Zhou
- Department of Nursing, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
- School of Nursing, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Xue Liu
- School of Nursing, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Sheng-Lin Liu
- Department of Medical Engineering, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
- Corresponding authors.
| | - Qiao-Yuan Yan
- Department of Nursing, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
- Corresponding authors.
| |
Collapse
|
13
|
Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers (Basel) 2022; 14:cancers14133215. [PMID: 35804988 PMCID: PMC9265023 DOI: 10.3390/cancers14133215] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/27/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary The rise of Big Data, the widespread use of Machine Learning, and the cheapening of omics techniques have allowed for the creation of more sophisticated and accurate models in biomedical research. This article presents the state-of-the-art predictive models of cancer prognosis that use multimodal data, considering clinical, molecular (omics and non-omics), and image data. The subject of study, the data modalities used, the data processing and modelling methods applied, the validation strategies involved, the integration strategies encompassed, and the evolution of prognostic predictive models are discussed. Finally, we discuss challenges and opportunities in this field of cancer research, with great potential impact on the clinical management of patients and, by extension, on the implementation of personalised and precision medicine. Abstract Cancer is one of the most detrimental diseases globally. Accordingly, the prognosis prediction of cancer patients has become a field of interest. In this review, we have gathered 43 state-of-the-art scientific papers published in the last 6 years that built cancer prognosis predictive models using multimodal data. We have defined the multimodality of data as four main types: clinical, anatomopathological, molecular, and medical imaging; and we have expanded on the information that each modality provides. The 43 studies were divided into three categories based on the modelling approach taken, and their characteristics were further discussed together with current issues and future trends. Research in this area has evolved from survival analysis through statistical modelling using mainly clinical and anatomopathological data to the prediction of cancer prognosis through a multi-faceted data-driven approach by the integration of complex, multimodal, and high-dimensional data containing multi-omics and medical imaging information and by applying Machine Learning and, more recently, Deep Learning techniques. This review concludes that cancer prognosis predictive multimodal models are capable of better stratifying patients, which can improve clinical management and contribute to the implementation of personalised medicine as well as provide new and valuable knowledge on cancer biology and its progression.
Collapse
|
14
|
Delpino F, Costa Â, Farias S, Chiavegatto Filho A, Arcêncio R, Nunes B. Machine learning for predicting chronic diseases: a systematic review. Public Health 2022; 205:14-25. [DOI: 10.1016/j.puhe.2022.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 10/26/2021] [Accepted: 01/11/2022] [Indexed: 12/12/2022]
|
15
|
Zhou L, Rueda M, Alkhateeb A. Classification of Breast Cancer Nottingham Prognostic Index Using High-Dimensional Embedding and Residual Neural Network. Cancers (Basel) 2022. [PMID: 35205681 DOI: 10.3390/cancers14040934.pmid:35205681;pmcid:pmc8870306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023] Open
Abstract
The Nottingham Prognostics Index (NPI) is a prognostics measure that predicts operable primary breast cancer survival. The NPI value is calculated based on the size of the tumor, the number of lymph nodes, and the tumor grade. Next-generation sequencing advancements have led to measuring different biological indicators called multi-omics data. The availability of multi-omics data triggered the challenge of integrating and analyzing these various biological measures to understand the progression of the diseases. High-dimensional embedding techniques are incorporated to present the features in the lower dimension, i.e., in a 2-dimensional map. The dataset consists of three -omics: gene expression, copy number alteration (CNA), and mRNA from 1885 female patients. The model creates a gene similarity network (GSN) map for each omic using t-distributed stochastic neighbor embedding (t-SNE) before being merged into the residual neural network (ResNet) classification model. The aim of this work was to (i) extract multi-omics biomarkers that are associated with the prognosis and prediction of breast cancer survival; and (ii) build a prediction model for multi-class breast cancer NPI classes. We evaluated this model and compared it to different high-dimensional embedding techniques and neural network combinations. The proposed model outperformed the other methods with an accuracy of 98.48%, and the area under the curve (AUC) equals 0.9999. The findings in the literature confirm associations between some of the extracted omics and breast cancer prognosis and survival including CDCA5, IL17RB, MUC2, NOD2 and NXPH4 from the gene expression dataset; MED30, RAD21, EIF3H and EIF3E from the CNA dataset; and CENPA, MACF1, UGT2B7 and SEMA3B from the mRNA dataset.
Collapse
Affiliation(s)
- Li Zhou
- School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada
| | - Maria Rueda
- Department of Chemistry and Biochemistry, University of Windsor, Windsor, ON N9B 3P4, Canada
| | - Abedalrhman Alkhateeb
- School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada
- King Hussein School of Computing Science, Princess Sumaya University for Technology, Al-Jubaiha, Amman P.O. Box 1438, Jordan
| |
Collapse
|
16
|
Classification of Breast Cancer Nottingham Prognostic Index Using High-Dimensional Embedding and Residual Neural Network. Cancers (Basel) 2022; 14:cancers14040934. [PMID: 35205681 PMCID: PMC8870306 DOI: 10.3390/cancers14040934] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/29/2022] [Accepted: 02/10/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary A deep learning model based on multi-omics data to classify Nottingham prognostic Index score levels. The model represents each omic dataset using 2-dimensional map before integrating all omics maps into the prediction model. The literature confirms the relationship between the extracted omics features with the progression and survival of breast cancer. Abstract The Nottingham Prognostics Index (NPI) is a prognostics measure that predicts operable primary breast cancer survival. The NPI value is calculated based on the size of the tumor, the number of lymph nodes, and the tumor grade. Next-generation sequencing advancements have led to measuring different biological indicators called multi-omics data. The availability of multi-omics data triggered the challenge of integrating and analyzing these various biological measures to understand the progression of the diseases. High-dimensional embedding techniques are incorporated to present the features in the lower dimension, i.e., in a 2-dimensional map. The dataset consists of three -omics: gene expression, copy number alteration (CNA), and mRNA from 1885 female patients. The model creates a gene similarity network (GSN) map for each omic using t-distributed stochastic neighbor embedding (t-SNE) before being merged into the residual neural network (ResNet) classification model. The aim of this work was to (i) extract multi-omics biomarkers that are associated with the prognosis and prediction of breast cancer survival; and (ii) build a prediction model for multi-class breast cancer NPI classes. We evaluated this model and compared it to different high-dimensional embedding techniques and neural network combinations. The proposed model outperformed the other methods with an accuracy of 98.48%, and the area under the curve (AUC) equals 0.9999. The findings in the literature confirm associations between some of the extracted omics and breast cancer prognosis and survival including CDCA5, IL17RB, MUC2, NOD2 and NXPH4 from the gene expression dataset; MED30, RAD21, EIF3H and EIF3E from the CNA dataset; and CENPA, MACF1, UGT2B7 and SEMA3B from the mRNA dataset.
Collapse
|
17
|
Construction and Validation of a Newly Prognostic Signature for CRISPR-Cas9-Based Cancer Dependency Map Genes in Breast Cancer. JOURNAL OF ONCOLOGY 2022; 2022:4566577. [PMID: 35096059 PMCID: PMC8791742 DOI: 10.1155/2022/4566577] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 12/01/2021] [Indexed: 12/11/2022]
Abstract
Cancer Dependency Map (CDM) genes comprise an extensive series of genome-scale RNAi-based loss-of-function tests; hence, it served as a method based on the CRISPR-Cas9 technique that could assist scientists in investigating potential gene functions. These CDM genes have a role in tumor cell survival and proliferation, suggesting that they may be used as new therapeutic targets for some malignant tumors. So far, there have been less research on the involvement of CDM genes in breast cancer, and only a tiny percentage of CDM genes have been studied. In this study, information of patients with breast cancer was extracted from The Cancer Genome Atlas (TCGA), from which differentially expressed CDM genes in breast cancer were determined. A variety of bioinformatics techniques were used to assess the functions and prognostic relevance of these confirmed CDM genes. In all, 290 CDM genes were found differentially expressed. Six CDM genes (SRF, RAD51, PMF1, EXOSC3, EXOC1, and TSEN54) were found to be associated with the prognosis of breast cancer samples. Based on the expression of the identified CDM genes and their coefficients, a prognosis model was constructed for prediction, according to which patients with breast cancer were separated into two risk groups. Those with high risk had substantially poorer overall survival (OS) than patients in the other risk group in the TCGA training set, TCGA testing set, and an external cohort from Gene Expression Omnibus (GEO) database. The area under the receiver operating characteristic (ROC) curve for this prognostic signature was, respectively, 0.717 and 0.635 for TCGA training and testing sets, demonstrating its reliability in predicting the prognosis of patients with breast cancer. We next created a nomogram using the six CDM genes discovered to create a therapeutically useful model. The Human Protein Atlas database was used to acquire all immunohistochemistry staining images of the discovered CDM genes. The proportions of tumor-infiltrating immune cells, as well as the expression levels of checkpoint genes, varied substantially between the two risk groups, according to the analyses of immune response. In conclusion, the findings of this research may aid in the understanding of the prognostic value and biological roles of CDM genes in breast cancer.
Collapse
|
18
|
Bohannan ZS, Coffman F, Mitrofanova A. Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia. Comput Struct Biotechnol J 2022; 20:583-597. [PMID: 35116134 PMCID: PMC8777142 DOI: 10.1016/j.csbj.2022.01.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 12/30/2021] [Accepted: 01/01/2022] [Indexed: 12/16/2022] Open
Abstract
High-risk pediatric B-ALL patients experience 5-year negative event rates up to 25%. Although some biomarkers of relapse are utilized in the clinic, their ability to predict outcomes in high-risk patients is limited. Here, we propose a random survival forest (RSF) machine learning model utilizing interpretable genomic inputs to predict relapse/death in high-risk pediatric B-ALL patients. We utilized whole exome sequencing profiles from 156 patients in the TARGET-ALL study (with samples collected at presentation) further stratified into training and test cohorts (109 and 47 patients, respectively). To avoid overfitting and facilitate the interpretation of machine learning results, input genomic variables were engineered using a stepwise approach involving univariable Cox models to select variables directly associated with outcomes, genomic coordinate-based analysis to select mutational hotspots, and correlation analysis to eliminate feature co-linearity. Model training identified 7 genomic regions most predictive of relapse/death-free survival. The test cohort error rate was 12.47%, and a polygenic score based on the sum of the top 7 variables effectively stratified patients into two groups, with significant differences in time to relapse/death (log-rank P = 0.001, hazard ratio = 5.41). Our model outperformed other EFS modeling approaches including an RSF using gold-standard prognostic variables (error rate = 24.35%). Validation in 174 standard-risk patients and 3 patients who failed to respond to induction therapy confirmed that our RSF model and polygenic score were specific to high-risk disease. We propose that our feature selection/engineering approach can increase the clinical interpretability of RSF, and our polygenic score could be utilized for enhance clinical decision-making in high-risk B-ALL.
Collapse
Affiliation(s)
- Zachary S. Bohannan
- Rutgers, The State University of New Jersey, School of Health Professions, Department of Health Informatics, 65 Bergen Street, Suite 120, Newark, NJ 07107-1709, United States
| | - Frederick Coffman
- Rutgers, The State University of New Jersey, School of Health Professions, Department of Health Informatics, 65 Bergen Street, Suite 120, Newark, NJ 07107-1709, United States
| | - Antonina Mitrofanova
- Rutgers, The State University of New Jersey, School of Health Professions, Department of Health Informatics, 65 Bergen Street, Suite 120, Newark, NJ 07107-1709, United States
| |
Collapse
|
19
|
Oei RW, Lyu Y, Ye L, Kong F, Du C, Zhai R, Xu T, Shen C, He X, Kong L, Hu C, Ying H. Progression-Free Survival Prediction in Patients with Nasopharyngeal Carcinoma after Intensity-Modulated Radiotherapy: Machine Learning vs. Traditional Statistics. J Pers Med 2021; 11:jpm11080787. [PMID: 34442430 PMCID: PMC8398698 DOI: 10.3390/jpm11080787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/08/2021] [Accepted: 08/10/2021] [Indexed: 12/24/2022] Open
Abstract
Background: The Cox proportional hazards (CPH) model is the most commonly used statistical method for nasopharyngeal carcinoma (NPC) prognostication. Recently, machine learning (ML) models are increasingly adopted for this purpose. However, only a few studies have compared the performances between CPH and ML models. This study aimed at comparing CPH with two state-of-the-art ML algorithms, namely, conditional survival forest (CSF) and DeepSurv for disease progression prediction in NPC. Methods: From January 2010 to March 2013, 412 eligible NPC patients were reviewed. The entire dataset was split into training cohort and testing cohort in a ratio of 90%:10%. Ten features from patient-related, disease-related, and treatment-related data were used to train the models for progression-free survival (PFS) prediction. The model performance was compared using the concordance index (c-index), Brier score, and log-rank test based on the risk stratification results. Results: DeepSurv (c-index = 0.68, Brier score = 0.13, log-rank test p = 0.02) achieved the best performance compared to CSF (c-index = 0.63, Brier score = 0.14, log-rank test p = 0.38) and CPH (c-index = 0.57, Brier score = 0.15, log-rank test p = 0.81). Conclusions: Both CSF and DeepSurv outperformed CPH in our relatively small dataset. ML-based survival prediction may guide physicians in choosing the most suitable treatment strategy for NPC patients.
Collapse
Affiliation(s)
- Ronald Wihal Oei
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Yingchen Lyu
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Lulu Ye
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Fangfang Kong
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Chengrun Du
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Ruiping Zhai
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Tingting Xu
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Chunying Shen
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Xiayun He
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Lin Kong
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Chaosu Hu
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Hongmei Ying
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai 200032, China; (R.W.O.); (Y.L.); (L.Y.); (F.K.); (C.D.); (R.Z.); (T.X.); (C.S.); (X.H.); (L.K.); (C.H.)
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai 200032, China
- Correspondence: ; Tel.: +86-21-64175590; Fax: +86-21-6417477
| |
Collapse
|
20
|
Okagbue HI, Adamu PI, Oguntunde PE, Obasi ECM, Odetunmibi OA. Machine learning prediction of breast cancer survival using age, sex, length of stay, mode of diagnosis and location of cancer. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00572-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
21
|
Kim JY, Lee YS, Yu J, Park Y, Lee SK, Lee M, Lee JE, Kim SW, Nam SJ, Park YH, Ahn JS, Kang M, Im YH. Deep Learning-Based Prediction Model for Breast Cancer Recurrence Using Adjuvant Breast Cancer Cohort in Tertiary Cancer Center Registry. Front Oncol 2021; 11:596364. [PMID: 34017679 PMCID: PMC8129587 DOI: 10.3389/fonc.2021.596364] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 02/17/2021] [Indexed: 01/06/2023] Open
Abstract
Several prognosis prediction models have been developed for breast cancer (BC) patients with curative surgery, but there is still an unmet need to precisely determine BC prognosis for individual BC patients in real time. This is a retrospectively collected data analysis from adjuvant BC registry at Samsung Medical Center between January 2000 and December 2016. The initial data set contained 325 clinical data elements: baseline characteristics with demographics, clinical and pathologic information, and follow-up clinical information including laboratory and imaging data during surveillance. Weibull Time To Event Recurrent Neural Network (WTTE-RNN) by Martinsson was implemented for machine learning. We searched for the optimal window size as time-stamped inputs. To develop the prediction model, data from 13,117 patients were split into training (60%), validation (20%), and test (20%) sets. The median follow-up duration was 4.7 years and the median number of visits was 8.4. We identified 32 features related to BC recurrence and considered them in further analyses. Performance at a point of statistics was calculated using Harrell's C-index and area under the curve (AUC) at each 2-, 5-, and 7-year points. After 200 training epochs with a batch size of 100, the C-index reached 0.92 for the training data set and 0.89 for the validation and test data sets. The AUC values were 0.90 at 2-year point, 0.91 at 5-year point, and 0.91 at 7-year point. The deep learning-based final model outperformed three other machine learning-based models. In terms of pathologic characteristics, the median absolute error (MAE) and weighted mean absolute error (wMAE) showed great results of as little as 3.5%. This BC prognosis model to determine the probability of BC recurrence in real time was developed using information from the time of BC diagnosis and the follow-up period in RNN machine learning model.
Collapse
Affiliation(s)
- Ji-Yeon Kim
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Yong Seok Lee
- Digital Health Business Team, Samsung SDS, Seoul, South Korea
| | - Jonghan Yu
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Youngmin Park
- Digital Health Business Team, Samsung SDS, Seoul, South Korea
| | - Se Kyung Lee
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Minyoung Lee
- Digital Health Business Team, Samsung SDS, Seoul, South Korea
| | - Jeong Eon Lee
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Seok Won Kim
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Seok Jin Nam
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Yeon Hee Park
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Jin Seok Ahn
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Mira Kang
- Center for Health Promotion, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Young-Hyuck Im
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| |
Collapse
|
22
|
Alzu’bi A, Najadat H, Doulat W, Al-Shari O, Zhou L. Predicting the recurrence of breast cancer using machine learning algorithms. MULTIMEDIA TOOLS AND APPLICATIONS 2021; 80:13787-13800. [DOI: 10.1007/s11042-020-10448-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Revised: 08/24/2020] [Accepted: 12/22/2020] [Indexed: 08/29/2023]
|
23
|
Direct comparison of three different mathematical models in two independent datasets of EUSOMA certified centers to predict recurrence and survival in estrogen receptor-positive breast cancer: impact on clinical practice. Breast Cancer Res Treat 2021; 187:455-465. [PMID: 33646494 DOI: 10.1007/s10549-021-06144-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 02/08/2021] [Indexed: 10/22/2022]
Abstract
PURPOSE Prediction algorithms estimating survival rates for breast cancer (BC) based upon risk factors and treatment could give a help to the clinicians during multidisciplinary meetings. The aim of this study was to evaluate accuracy and clinical utility of three different scores: the Clinical Treatment Score Post-5 Years (CTS5), the PREDICT Score, and the Nottingham Prognostic Index (NPI). METHODS This is a retrospective cohort analysis conducted on prospectively recorded databases of two EUSOMA certified centers (Breast Unit of Trieste Academic Hospital and of Cremona Hospital, Italy). We included patients with Luminal BC undergone to breast surgery between 2010 and 2015, and subsequent endocrine therapy for 5 years for curative intent. RESULTS A total of 473 patients were enrolled in this study. ROC analysis showed fair accuracy for NPI, good accuracy for PREDICT, and optimal accuracy for CTS5 (AUC 0.7, 0.76, and 0.83, respectively). The three scores seemed strongly correlated in Spearman's rank correlation coefficient analysis. PREDICT partially overestimated OS in patients undergone to mastectomy, and in pT3-4, G3 tumors. Considering DRFS as a surrogate of OS, CTS5 showed women in intermediate and high risk class had shorter OS too (respectively p = 0.001 and p < 0.001). Combining scores does not improve prognostication power. CONCLUSION Mathematical models may help clinicians in decision making (adjuvant therapies, CDK4/6i, genomic test's gray zones). CTS5 has the higher prognostic accuracy in predicting recurrence, while score predicting OS did not show substantial advances, proving that pN, G, and pT are still the most important variables in predicting OS.
Collapse
|
24
|
Malherbe K. Tumor Microenvironment and the Role of Artificial Intelligence in Breast Cancer Detection and Prognosis. THE AMERICAN JOURNAL OF PATHOLOGY 2021; 191:1364-1373. [PMID: 33639101 DOI: 10.1016/j.ajpath.2021.01.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 01/02/2021] [Accepted: 01/28/2021] [Indexed: 12/21/2022]
Abstract
A critical knowledge gap has been noted in breast cancer detection, prognosis, and evaluation between tumor microenvironment and associated neoplasm. Artificial intelligence (AI) has multiple subsets or methods for data extraction and evaluation, including artificial neural networking, which allows computational foundations, similar to neurons, to make connections and new neural pathways during data set training. Deep machine learning and AI hold great potential to accurately assess tumor microenvironment models employing vast data management techniques. Despite the significant potential AI holds, there is still much debate surrounding the appropriate and ethical curation of medical data from picture archiving and communication systems. AI output's clinical significance depends on its human predecessor's data training sets. Integration between biomarkers, risk factors, and imaging data will allow the best predictor models for patient-based outcomes.
Collapse
Affiliation(s)
- Kathryn Malherbe
- Department Radiography, Faculty Health Sciences, University of Pretoria, Pretoria, South Africa.
| |
Collapse
|
25
|
Mohammadi F, Pourzamani H, Karimi H, Mohammadi M, Mohammadi M, Ardalan N, Khoshravesh R, Pooresmaeil H, Shahabi S, Sabahi M, Sadat Miryonesi F, Najafi M, Yavari Z, Mohammadi F, Teiri H, Jannati M. Artificial neural network and logistic regression modelling to characterize COVID-19 infected patients in local areas of Iran. Biomed J 2021; 44:304-316. [PMID: 34127421 PMCID: PMC7905378 DOI: 10.1016/j.bj.2021.02.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 02/11/2021] [Accepted: 02/17/2021] [Indexed: 01/08/2023] Open
Abstract
Background COVID-19 is an infectious disease that started spreading globally at the end of 2019. Due to differences in patient characteristics and symptoms in different regions, in this research, a comparative study was performed on COVID-19 patients in 6 provinces of Iran. Also, multilayer perceptron (MLP) neural network and Logistic Regression (LR) models were applied for the diagnosis of COVID-19. Methods A total of 1043 patients with suspected COVID-19 infection in Iran participated in this study. 29 characteristics, symptoms and underlying disease were obtained from hospitalized patients. Afterwards, we compared the obtained data between confirmed cases. Furthermore, the data was applied for building the ANN and LR models to diagnosis the infected patients by COVID-19. Results In 750 confirmed patients, Common symptoms were: fever (%) >37.5 °C, cough, shortness of breath, fatigue, chills and headache. The most common underlying diseases were: hypertension, diabetes, chronic obstructive pulmonary disease and coronary heart disease. Finally, the accuracy of the ANN model to the diagnosis of COVID-19 infection was higher than the LR model. Conclusion The prevalent symptoms and underlying diseases of COVID-19 patients were similar in different provinces, but the incidence of symptoms was significantly different from each other. Also, the study demonstrated that ANN and LR models have a high ability in the diagnosis of COVID-19 infection.
Collapse
Affiliation(s)
- Farzaneh Mohammadi
- Department of Environmental Health Engineering, School of Health, Isfahan University of Medical Sciences, Isfahan, Iran; Environment Research Center, Research Institute for Primordial Prevention of Non-communicable Disease, Isfahan University of Medical Sciences, Isfahan, Iran.
| | - Hamidreza Pourzamani
- Department of Environmental Health Engineering, School of Health, Isfahan University of Medical Sciences, Isfahan, Iran; Environment Research Center, Research Institute for Primordial Prevention of Non-communicable Disease, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Hossein Karimi
- Department of Environmental Health Engineering, School of Health, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Maryam Mohammadi
- Department of Management and Health Information Technology, School of Management and Medical Information Sciences, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammad Mohammadi
- Department of Electrical Engineering, Shahreza University, Isfahan, Iran
| | - Nahid Ardalan
- Kurdistan University of Medical Sciences, Sanandaj, Kurdistan, Iran
| | | | | | | | | | | | - Marzieh Najafi
- Isfahan Endocrine and Metabolism Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Zeynab Yavari
- Genetic and Environmental Adventures Research Center, School of Abarkouh Paramedicine, Shahid Sadoughi University of Medical Sciences, Yazd, Iran
| | - Farideh Mohammadi
- Department of Textile Engineering, Isfahan University of Technology, Isfahan, Iran
| | - Hakimeh Teiri
- Department of Environmental Health Engineering, School of Health, Isfahan University of Medical Sciences, Isfahan, Iran; Environment Research Center, Research Institute for Primordial Prevention of Non-communicable Disease, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mahsa Jannati
- Graduate Student, Dept. of Civil Engineering, Lakehead University, Thunder Bay, ON, Canada
| |
Collapse
|
26
|
Boeri C, Chiappa C, Galli F, De Berardinis V, Bardelli L, Carcano G, Rovera F. Machine Learning techniques in breast cancer prognosis prediction: A primary evaluation. Cancer Med 2020; 9:3234-3243. [PMID: 32154669 PMCID: PMC7196042 DOI: 10.1002/cam4.2811] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 11/28/2019] [Accepted: 12/13/2019] [Indexed: 01/13/2023] Open
Abstract
More than 750 000 women in Italy are surviving a diagnosis of breast cancer. A large body of literature tells us which characteristics impact the most on their prognosis. However, the prediction of each disease course and then the establishment of a therapeutic plan and follow‐up tailored to the patient is still very complicated. In order to address this issue, a multidisciplinary approach has become widely accepted, while the Multigene Signature Panels and the Nottingham Prognostic Index are still discussed options. The current technological resources permit to gather many data for each patient. Machine Learning (ML) allows us to draw on these data, to discover their mutual relations and to esteem the prognosis for the new instances. This study provides a primary evaluation of the application of ML to predict breast cancer prognosis. We analyzed 1021 patients who underwent surgery for breast cancer in our Institute and we included 610 of them. Three outcomes were chosen: cancer recurrence (both loco‐regional and systemic) and death from the disease within 32 months. We developed two types of ML models for every outcome (Artificial Neural Network and Support Vector Machine). Each ML algorithm was tested in accuracy (=95.29%‐96.86%), sensitivity (=0.35‐0.64), specificity (=0.97‐0.99), and AUC (=0.804‐0.916). These models might become an additional resource to evaluate the prognosis of breast cancer patients in our daily clinical practice. Before that, we should increase their sensitivity, according to literature, by considering a wider population sample with a longer period of follow‐up. However, specificity, accuracy, minimal additional costs, and reproducibility are already encouraging.
Collapse
Affiliation(s)
- Carlo Boeri
- SSD Breast Unit - ASST-Settelaghi Varese, Senology Research Center, Department of Medicine, University of Insubria, Varese, Italy
| | - Corrado Chiappa
- SSD Breast Unit - ASST-Settelaghi Varese, Senology Research Center, Department of Medicine, University of Insubria, Varese, Italy
| | - Federica Galli
- SSD Breast Unit - ASST-Settelaghi Varese, Senology Research Center, Department of Medicine, University of Insubria, Varese, Italy
| | - Valentina De Berardinis
- SSD Breast Unit - ASST-Settelaghi Varese, Senology Research Center, Department of Medicine, University of Insubria, Varese, Italy
| | - Laura Bardelli
- SSD Breast Unit - ASST-Settelaghi Varese, Senology Research Center, Department of Medicine, University of Insubria, Varese, Italy
| | - Giulio Carcano
- SSD Breast Unit - ASST-Settelaghi Varese, Senology Research Center, Department of Medicine, University of Insubria, Varese, Italy
| | - Francesca Rovera
- SSD Breast Unit - ASST-Settelaghi Varese, Senology Research Center, Department of Medicine, University of Insubria, Varese, Italy
| |
Collapse
|