1
|
Jin Y, Lan A, Dai Y, Jiang L, Liu S. Development and testing of a random forest-based machine learning model for predicting events among breast cancer patients with a poor response to neoadjuvant chemotherapy. Eur J Med Res 2023; 28:394. [PMID: 37777809 PMCID: PMC10543332 DOI: 10.1186/s40001-023-01361-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/11/2023] [Indexed: 10/02/2023] Open
Abstract
BACKGROUND Breast cancer (BC) is the most common malignant tumor around the world. Timely detection of the tumor progression after treatment could improve the survival outcome of patients. This study aimed to develop machine learning models to predict events (defined as either (1) the first tumor relapse locally, regionally, or distantly; (2) a diagnosis of secondary malignant tumor; or (3) death because of any reason.) in BC patients post-treatment. METHODS The patients with the response of stable disease (SD) and progressive disease (PD) after neoadjuvant chemotherapy (NAC) were selected. The clinicopathological features and the survival data were recorded in 1 year and 5 years, respectively. Patients were randomly divided into the training set and test set in the ratio of 8:2. A random forest (RF) and a logistic regression were established in both of 1-year cohort and the 5-year cohort. The performance was compared between the two models. The models were validated using data from the Surveillance, Epidemiology, and End Results (SEER) database. RESULTS A total of 315 patients were included. In the 1-year cohort, 197 patients were divided into a training set while 87 were into a test set. The specificity, sensitivity, and AUC were 0.800, 0.833, and 0.810 in the RF model. And 0.520, 0.833, and 0.653 of the logistic regression. In the 5-year cohort, 132 patients were divided into the training set while 33 were into the test set. The specificity, sensitivity, and AUC were 0.882, 0.750, and 0.829 in the RF model. And 0.882, 0.688, and 0.752 of the logistic regression. In the external validation set, of the RF model, the specificity, sensitivity, and AUC were 0.765, 0.812, and 0.779. Of the logistics regression model, the specificity, sensitivity, and AUC were 0.833, 0.376, and 0.619. CONCLUSION The RF model has a good performance in predicting events among BC patients with SD and PD post-NAC. It may be beneficial to BC patients, assisting in detecting tumor recurrence.
Collapse
Affiliation(s)
- Yudi Jin
- Department of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- Department of Pathology, Chongqing Key Laboratory for Intelligent Oncology in Breast Cancer (iCQBC), Chongqing University Cancer Hospital, Chongqing, 400030, China
| | - Ailin Lan
- Department of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yuran Dai
- Department of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Linshan Jiang
- Department of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Shengchun Liu
- Department of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
| |
Collapse
|
2
|
Su F, Chao J, Liu P, Zhang B, Zhang N, Luo Z, Han J. Prognostic models for breast cancer: based on logistics regression and Hybrid Bayesian Network. BMC Med Inform Decis Mak 2023; 23:120. [PMID: 37443001 PMCID: PMC10347801 DOI: 10.1186/s12911-023-02224-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 07/03/2023] [Indexed: 07/15/2023] Open
Abstract
BACKGROUND To construct two prognostic models to predict survival in breast cancer patients; to compare the efficacy of the two models in the whole group and the advanced human epidermal growth factor receptor-2-positive (HER2+) subgroup of patients; to conclude whether the Hybrid Bayesian Network (HBN) model outperformed the logistics regression (LR) model. METHODS In this paper, breast cancer patient data were collected from the SEER database. Data processing and analysis were performed using Rstudio 4.2.0, including data preprocessing, model construction and validation. The L_DVBN algorithm in Julia0.4.7 and bnlearn package in R was used to build and evaluate the HBN model. Data with a diagnosis time of 2018(n = 23,384) were distributed randomly as training and testing sets in the ratio of 7:3 using the leave-out method for model construction and internal validation. External validation of the model was done using the dataset of 2019(n = 8128). Finally, the late HER2 + patients(n = 395) was selected for subgroup analysis. Accuracy, calibration and net benefit of clinical decision making were evaluated for both models. RESULTS The HBN model showed that seventeen variables were associated with survival outcome, including age, tumor size, site, histologic type, radiotherapy, surgery, chemotherapy, distant metastasis, subtype, clinical stage, ER receptor, PR receptor, clinical grade, race, marital status, tumor laterality, and lymph node. The AUCs for the internal validation of the LR and HBN models were 0.831 and 0.900; The AUCs for the external validation of the LR and HBN models on the whole population were 0.786 and 0.871; the AUCs for the external validation of the two models on the subgroup population were 0.601 and 0.813. CONCLUSION The accuracy, net clinical benefit, and calibration of the HBN model were better than LR model. The predictive efficacy of both models decreased and the difference was greater in advanced HER2 + patients, which means the HBN model had higher robustness and more stable predictive performance in the subgroup.
Collapse
Affiliation(s)
- Fan Su
- Department of Epidemiology and Health Statistics, School of Public Health, Southeast University, No. 87 Ding Jia Qiao, Central Gate Street, Gulou District, Nanjing, Jiangsu China
| | - Jianqian Chao
- Department of Epidemiology and Health Statistics, School of Public Health, Southeast University, No. 87 Ding Jia Qiao, Central Gate Street, Gulou District, Nanjing, Jiangsu China
- Department of Medical Insurance, School of Public Health, Southeast University, No. 87 Ding Jia Qiao, Central Gate Street, Gulou District, Nanjing, Jiangsu China
| | - Pei Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Southeast University, No. 87 Ding Jia Qiao, Central Gate Street, Gulou District, Nanjing, Jiangsu China
| | - Bowen Zhang
- Department of Epidemiology and Health Statistics, School of Public Health, Southeast University, No. 87 Ding Jia Qiao, Central Gate Street, Gulou District, Nanjing, Jiangsu China
| | - Na Zhang
- Department of Epidemiology and Health Statistics, School of Public Health, Southeast University, No. 87 Ding Jia Qiao, Central Gate Street, Gulou District, Nanjing, Jiangsu China
| | - Zongyu Luo
- Department of Medical Insurance, School of Public Health, Southeast University, No. 87 Ding Jia Qiao, Central Gate Street, Gulou District, Nanjing, Jiangsu China
| | - Jiaying Han
- Department of Epidemiology and Health Statistics, School of Public Health, Southeast University, No. 87 Ding Jia Qiao, Central Gate Street, Gulou District, Nanjing, Jiangsu China
| |
Collapse
|
3
|
Nik Ab Kadir MN, Yaacob NM, Yusof SN, Ab Hadi IS, Musa KI, Mohd Isa SA, Bahtiar B, Adam F, Yahya MM, Hairon SM. Development of Predictive Models for Survival among Women with Breast Cancer in Malaysia. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:15335. [PMID: 36430052 PMCID: PMC9690612 DOI: 10.3390/ijerph192215335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/17/2022] [Accepted: 11/18/2022] [Indexed: 06/16/2023]
Abstract
Prediction of survival probabilities based on models developed by other countries has shown inconsistent findings among Malaysian patients. This study aimed to develop predictive models for survival among women with breast cancer in Malaysia. A retrospective cohort study was conducted involving patients who were diagnosed between 2012 and 2016 in seven breast cancer centres, where their survival status was followed until 31 December 2021. A total of 13 predictors were selected to model five-year survival probabilities by applying Cox proportional hazards (PH), artificial neural networks (ANN), and decision tree (DT) classification analysis. The random-split dataset strategy was used to develop and measure the models' performance. Among 1006 patients, the majority were Malay, with ductal carcinoma, hormone-sensitive, HER2-negative, at T2-, N1-stage, without metastasis, received surgery and chemotherapy. The estimated five-year survival rate was 60.5% (95% CI: 57.6, 63.6). For Cox PH, the c-index was 0.82 for model derivation and 0.81 for validation. The model was well-calibrated. The Cox PH model outperformed the DT and ANN models in most performance indices, with the Cox PH model having the highest accuracy of 0.841. The accuracies of the DT and ANN models were 0.811 and 0.821, respectively. The Cox PH model is more useful for survival prediction in this study's setting.
Collapse
Affiliation(s)
- Mohd Nasrullah Nik Ab Kadir
- Department of Community Medicine, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia
| | - Najib Majdi Yaacob
- Biostatistics and Research Methodology Unit, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia
| | - Siti Norbayah Yusof
- Malaysian National Cancer Registry Department, National Cancer Institute, Ministry of Health Malaysia, Putrajaya 62250, Federal Territory of Putrajaya, Malaysia
| | - Imi Sairi Ab Hadi
- Breast and Endocrine Surgery Unit, Department of Surgery, Hospital Raja Perempuan Zainab II, Ministry of Health Malaysia, Kota Bharu 15586, Kelantan, Malaysia
| | - Kamarul Imran Musa
- Department of Community Medicine, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia
| | - Seoparjoo Azmel Mohd Isa
- Department of Pathology, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia
| | - Balqis Bahtiar
- Malaysian National Cancer Registry Department, National Cancer Institute, Ministry of Health Malaysia, Putrajaya 62250, Federal Territory of Putrajaya, Malaysia
| | - Farzaana Adam
- Public Health Division, Penang State Health Department, Ministry of Health Malaysia, Georgetown 10590, Penang, Malaysia
| | - Maya Mazuwin Yahya
- Department of Surgery, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia
| | - Suhaily Mohd Hairon
- Department of Community Medicine, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian 16150, Kelantan, Malaysia
| |
Collapse
|
4
|
Bucheli Giler C, Alarcón Cano DF, Montes Escobar K. Modelo de regresión de Cox para análisis de supervivencia en pacientes con cáncer de mama en la provincia de Manabí, Ecuador. BIONATURA 2021. [DOI: 10.21931/rb/2021.06.03.24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
El cáncer de mama es un problema de salud pública, que ha venido incrementándose, ubicándose como el cáncer de mayor incidencia en las mujeres, su repercusión en la población en general ocupaba el segundo lugar a nivel mundial, siendo la neoplasia maligna más frecuente en la población femenina y en relación a los decesos por cáncer y afecta con más frecuencia a los países del primer mundo y en vías de desarrollo. Estudio observacional, descriptivo y retrospectivo realizado a pacientes diagnosticados con cáncer de mama en el Hospital Oncológico de Manabí, Ecuador. La supervivencia global y libre de enfermedad a seis años se estableció a partir del tiempo trascurrido desde el diagnóstico hasta la ocurrencia de un evento o fecha del último contacto, con límite a diciembre de 2015. De los 403 pacientes, los límites de edad fueron 15 y 90 años, con media de 56.08 años. Se considero el tamaño del tumor, donde (T1) representa el 26.55%, (T2) representaron 45.66%, la supervivencia global fue de 80% a 6 años. Los pacientes en etapas avanzadas tuvieron menores probabilidades de supervivir con un porcentaje del 43%. Con el modelo de regresión de Cox, fue posible demostrar asociación estadísticamente significativa entre el tamaño del tumor y la supervivencia. El estudio demuestra que los pacientes en etapas avanzadas tienen menores probabilidades de supervivir, por lo que es imperativo que se continúen esfuerzos en promoción de la salud hasta conseguir que la detección sea en etapas curables.
Collapse
Affiliation(s)
- Cecilia Bucheli Giler
- Maestría en Estadística, Instituto de Posgrado, Universidad técnica de Manabí, Portoviejo, Ecuador
| | | | - Karime Montes Escobar
- Departamento de Matemáticas y Estadística. Instituto de Ciencias Básicas. Universidad Técnica de Manabí, Portoviejo, 130105, Ecuador. Departamento de Estadística. Universidad de Salamanca, Salamanca, 37007, España
| |
Collapse
|
5
|
Dynamic and subtype-specific interactions between tumour burden and prognosis in breast cancer. Sci Rep 2020; 10:15445. [PMID: 32963275 PMCID: PMC7508816 DOI: 10.1038/s41598-020-72033-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Accepted: 08/17/2020] [Indexed: 12/31/2022] Open
Abstract
We investigated the relationship between the prognostic importance of anatomic tumour burden and subtypes of breast cancer using data from the Korean Breast Cancer Registry Database. In HR+/HER2+ and HR−/HER2−tumours, an increase in T stage profoundly increased the hazard of death, while the presence of lymph node metastasis was more important in HR+/HER2+ and HR−/HER2+ tumours among 131,178 patients with stage I–III breast cancer. The patterns of increasing mortality risk and tumour growth (per centimetre) and metastatic nodes (per node) were examined in 67,038 patients with a tumour diameter ≤ 7 cm and < 8 metastatic nodes. HR+/HER2− and HR−/HER2− tumours showed a persistent increase in mortality risk with an increase in tumour diameter, while the effect was modest in HER2+ tumours. Conversely, an increased number of metastatic nodes was accompanied by a persistently increased risk in HR−/HER2+ tumours, while the effect was minimal for HR−/HER2− tumours with > 3 or 4 nodes. The interactions between the prognostic significance of anatomic tumour burden and subtypes were significant. The prognostic relevance of the anatomic tumour burden was non-linear and highly dependent on the subtypes of breast cancer.
Collapse
|
6
|
Shi Y, Chen W, Li C, Qi S, Zhou X, Zhang Y, Li Y, Li G. Clinicopathological characteristics and prediction of cancer-specific survival in large cell lung cancer: a population-based study. J Thorac Dis 2020; 12:2261-2269. [PMID: 32642131 PMCID: PMC7330367 DOI: 10.21037/jtd.2020.04.24] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Background To describe the demographic and clinical characteristics of large cell lung cancer (LCLC) with a population-based database and to find the prognosis factors of cancer-specific survival (CSS) for these patients; also, to develop a nomogram to independently validate and predict the CSS for LCLC based on the identified prognosis factors. Methods We extracted the LCLC patient’s information from the Surveillance, Epidemiology, and End Results (SEER) database [2005–2014] and summarized the characteristics of the extracted factors. We used Cox proportional hazards regression to find the prognosis factors for LCLC patients and to develop the nomogram based on these in a split train cohort from the extracted data. The validation of the developed nomograms was performed in an independent validation cohort from the extracted data, in which the C-index and the average of the time-dependent area under the receiver operating characteristic curve (time-dependent AUC) for CSS in 1-year, 3-year, and 5-year CSS was calculated. The calibration curves were drawn to visualize the performance of the established nomogram. Results As a result, 4,936 patients with LCLC were identified from the SEER database. Nearly half of LCLC patients were diagnosed with stage IV; only approximately 20% of patients underwent surgery. The prognosis factors that influenced the LCLC patients included age, sex, American Joint Committee on Cancer (AJCC) stage, race, surgery, tumor size, and marital status. The calculated C-index was 0.701±0.01, and the mean time-dependent AUC for in 1-year, 3-year, and 5-year CSS was 0.88. The calibrated curve showed that the gap between the predicted and observed values for 1-year, 3-year, and 5-year CSS was small. Conclusions Sex, age, race, marital status, AJCC stage, surgery, and tumor size were shown to all be the independent prognostic factors of CSS in LCLC. The established nomogram can provide more precise evaluation for the survival of LCLC patients and help the clinicians in the individual management of patients.
Collapse
Affiliation(s)
- Yafei Shi
- Department of Pharmacy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Wei Chen
- Department of Pharmacy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Chunyu Li
- Department of Pharmacy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Shuya Qi
- Department of Pharmacy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Xiaowei Zhou
- Department of Pharmacy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Yujun Zhang
- Department of Pharmacy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Ying Li
- Department of Pharmacy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| | - Guohui Li
- Department of Pharmacy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, China
| |
Collapse
|
7
|
Phung MT, Tin Tin S, Elwood JM. Prognostic models for breast cancer: a systematic review. BMC Cancer 2019; 19:230. [PMID: 30871490 PMCID: PMC6419427 DOI: 10.1186/s12885-019-5442-6] [Citation(s) in RCA: 91] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Accepted: 03/06/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Breast cancer is the most common cancer in women worldwide, with a great diversity in outcomes among individual patients. The ability to accurately predict a breast cancer outcome is important to patients, physicians, researchers, and policy makers. Many models have been developed and tested in different settings. We systematically reviewed the prognostic models developed and/or validated for patients with breast cancer. METHODS We conducted a systematic search in four electronic databases and some oncology websites, and a manual search in the bibliographies of the included studies. We identified original studies that were published prior to 1st January 2017, and presented the development and/or validation of models based mainly on clinico-pathological factors to predict mortality and/or recurrence in female breast cancer patients. RESULTS From the 96 articles selected from 4095 citations found, we identified 58 models, which predicted mortality (n = 28), recurrence (n = 23), or both (n = 7). The most frequently used predictors were nodal status (n = 49), tumour size (n = 42), tumour grade (n = 29), age at diagnosis (n = 24), and oestrogen receptor status (n = 21). Models were developed in Europe (n = 25), Asia (n = 13), North America (n = 12), and Australia (n = 1) between 1982 and 2016. Models were validated in the development cohorts (n = 43) and/or independent populations (n = 17), by comparing the predicted outcomes with the observed outcomes (n = 55) and/or with the outcomes estimated by other models (n = 32), or the outcomes estimated by individual prognostic factors (n = 8). The most commonly used methods were: Cox proportional hazards regression for model development (n = 32); the absolute differences between the predicted and observed outcomes (n = 30) for calibration; and C-index/AUC (n = 44) for discrimination. Overall, the models performed well in the development cohorts but less accurately in some independent populations, particularly in patients with high risk and young and elderly patients. An exception is the Nottingham Prognostic Index, which retains its predicting ability in most independent populations. CONCLUSIONS Many prognostic models have been developed for breast cancer, but only a few have been validated widely in different settings. Importantly, their performance was suboptimal in independent populations, particularly in patients with high risk and in young and elderly patients.
Collapse
Affiliation(s)
- Minh Tung Phung
- Epidemiology and Biostatistics, School of Population Health, The University of Auckland, Private Bag 92019, Auckland, 1142 New Zealand
| | - Sandar Tin Tin
- Epidemiology and Biostatistics, School of Population Health, The University of Auckland, Private Bag 92019, Auckland, 1142 New Zealand
| | - J. Mark Elwood
- Epidemiology and Biostatistics, School of Population Health, The University of Auckland, Private Bag 92019, Auckland, 1142 New Zealand
| |
Collapse
|