Huang Y, Mao Y, Xu L, Wen J, Chen G. Exploring risk factors for cervical lymph node metastasis in papillary thyroid microcarcinoma: construction of a novel population-based predictive model.
BMC Endocr Disord 2022;
22:269. [PMID:
36329470 PMCID:
PMC9635156 DOI:
10.1186/s12902-022-01186-1]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/25/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND
Machine learning was a highly effective tool in model construction. We aim to establish a machine learning-based predictive model for predicting the cervical lymph node metastasis (LNM) in papillary thyroid microcarcinoma (PTMC).
METHODS
We obtained data on PTMC from the SEER database, including 10 demographic and clinicopathological characteristics. Univariate and multivariate logistic regression (LR) analyses were applied to screen the risk factors for cervical LNM in PTMC. Risk factors with P < 0.05 in multivariate LR analysis were used as modeling variables. Five different machine learning (ML) algorithms including extreme gradient boosting (XGBoost), random forest (RF), adaptive boosting (AdaBoost), gaussian naive bayes (GNB) and multi-layer perceptron (MLP) and traditional regression analysis were used to construct the prediction model. Finally, the area under the receiver operating characteristic (AUROC) curve was used to compare the model performance.
RESULTS
Through univariate and multivariate LR analysis, we screened out 9 independent risk factors most closely associated with cervical LNM in PTMC, including age, sex, race, marital status, region, histology, tumor size, and extrathyroidal extension (ETE) and multifocality. We used these risk factors to build an ML prediction model, in which the AUROC value of the XGBoost algorithm was higher than the other 4 ML algorithms and was the best ML model. We optimized the XGBoost algorithm through 10-fold cross-validation, and its best performance on the training set (AUROC: 0.809, 95%CI 0.800-0.818) was better than traditional LR analysis (AUROC: 0.780, 95%CI 0.772-0.787).
CONCLUSIONS
ML algorithms have good predictive performance, especially the XGBoost algorithm. With the continuous development of artificial intelligence, ML algorithms have broad prospects in clinical prognosis prediction.
Collapse