1
|
Wang Y, Pan Z, Li S, Cai H, Huang Y, Zhuang J, Liu X, Lu X, Guan G. Prediction and validation of pathologic complete response for locally advanced rectal cancer under neoadjuvant chemoradiotherapy based on a novel predictor using interpretable machine learning. EUROPEAN JOURNAL OF SURGICAL ONCOLOGY 2024; 50:108738. [PMID: 39395242 DOI: 10.1016/j.ejso.2024.108738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 10/01/2024] [Indexed: 10/14/2024]
Abstract
BACKGROUND Precise evaluation of pathological complete response (pCR) is essential for determining the prognosis of patients with locally advanced rectal cancer (LARC) undergoing neoadjuvant chemoradiotherapy (NCRT) and can offer clues for the selection of subsequent treatment strategies. Most current predictive models for pCR focus primarily on pre-treatment factors, neglecting the dynamic systemic changes that occur during neoadjuvant chemoradiotherapy, and are constrained by low accuracy and lack of integrity. PURPOSE This study devised a novel predictor of pCR using dynamic alterations in systemic inflammation-nutritional marker indexes (SINI) during neoadjuvant therapy and developed a machine-learning model to predict pCR. METHODS Two cohorts of patients with LARC from center one from 2012 to 2017 and from center two from 2020 to 2023 were integrated for analysis. This study compared dynamic changes in blood indexes before and after neoadjuvant therapy and surgical operation. A least absolute shrinkage and selection operator (LASSO) regression analysis was conducted to mitigate collinearity and identify key indexes, constructing the SINI. Univariate and multiple logistic regression analyses were used to identify the independent risk factors associated with pCR. Additionally, 10 machine learning algorithms were employed to develop predictive models to assess risk. The hyperparameters of the machine learning models were optimized using a random search and 10-fold cross-validation. The models were assessed by examining various metrics, including the area under the receiver operating characteristic curves (AUC), the area under the precision-recall curve (AUPRC), decision curve analysis, calibration curves, and the precision and accuracy of the internal and external validation cohorts. Additionally, Shapley's additive explanations (SHAP) were employed to interpret the machine learning models. RESULTS The study cohort comprised 677 patients from the center one and 224 patients from the center two. Six key indexes were identified, and a predictive index, SINI, was constructed. Univariate and multiple logistic regression analyses revealed that SINI, clinical T-stage, clinical N-stage, tumor size, and the distance from the anal verge were independent risk factors for pCR in patients with LARC following NCRT. The mean AUC value of the extreme gradient boosting (XGB) model in the 10-fold cross-validation of the training set was 0.877. The XGB model demonstrated superior performance in the internal and external validation sets. Specifically, in the internal test set, the XGB model achieved an AUC of 0.86, AUPRC of 0.707, accuracy of 0.82, and precision of 0.80. In the external validation set, the XGB model exhibited an AUC of 0.83, AUPRC of 0.702, accuracy of 0.81, and precision of 0.81. Additionally, the predictions generated by the XGB model were analyzed using SHAP. CONCLUSION This study involved developing and validating an XGB model using SINI to predict pCR in patients with LARC. Besides, a SINI-based machine learning model shows promise in accurately predicting pCR following NCRT in patients with resectable LARC, offering valuable insights for personalized treatment approaches.
Collapse
Affiliation(s)
- Ye Wang
- Department of Colorectal Surgery, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Zhen Pan
- Department of Colorectal Surgery, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Shoufeng Li
- Department of Colorectal Surgery, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Huajun Cai
- Department of Colorectal Surgery, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Ying Huang
- Department of Colorectal Surgery, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Jinfu Zhuang
- Department of Colorectal Surgery, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Xing Liu
- Department of Colorectal Surgery, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Xingrong Lu
- Department of Colorectal Surgery, Fujian Medical University Union Hospital, Fuzhou, China.
| | - Guoxian Guan
- Department of Colorectal Surgery, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China; Department of Colorectal Surgery, National Regional Medical Center, Binhai Campus of the First Afliated Hospital, Fuzhou, China.
| |
Collapse
|
2
|
Neagu AI, Poalelungi DG, Fulga A, Neagu M, Fulga I, Nechita A. Enhanced Immunohistochemistry Interpretation with a Machine Learning-Based Expert System. Diagnostics (Basel) 2024; 14:1853. [PMID: 39272638 PMCID: PMC11394116 DOI: 10.3390/diagnostics14171853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 07/26/2024] [Accepted: 08/22/2024] [Indexed: 09/15/2024] Open
Abstract
BACKGROUND In recent decades, machine-learning (ML) technologies have advanced the management of high-dimensional and complex cancer data by developing reliable and user-friendly automated diagnostic tools for clinical applications. Immunohistochemistry (IHC) is an essential staining method that enables the identification of cellular origins by analyzing the expression of specific antigens within tissue samples. The aim of this study was to identify a model that could predict histopathological diagnoses based on specific immunohistochemical markers. METHODS The XGBoost learning model was applied, where the input variable (target variable) was the histopathological diagnosis and the predictors (independent variables influencing the target variable) were the immunohistochemical markers. RESULTS Our study demonstrated a precision rate of 85.97% within the dataset, indicating a high level of performance and suggesting that the model is generally reliable in producing accurate predictions. CONCLUSIONS This study demonstrated the feasibility and clinical efficacy of utilizing the probabilistic decision tree algorithm to differentiate tumor diagnoses according to immunohistochemistry profiles.
Collapse
Affiliation(s)
- Anca Iulia Neagu
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint John Clinical Emergency Hospital for Children, 800487 Galati, Romania
| | - Diana Gina Poalelungi
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint Apostle Andrew Emergency County Clinical Hospital, 177 Brailei St., 800578 Galati, Romania
| | - Ana Fulga
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint Apostle Andrew Emergency County Clinical Hospital, 177 Brailei St., 800578 Galati, Romania
| | - Marius Neagu
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint Apostle Andrew Emergency County Clinical Hospital, 177 Brailei St., 800578 Galati, Romania
| | - Iuliu Fulga
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint Apostle Andrew Emergency County Clinical Hospital, 177 Brailei St., 800578 Galati, Romania
| | - Aurel Nechita
- Faculty of Medicine and Pharmacy, Dunarea de Jos University of Galati, 35 AI Cuza St., 800010 Galati, Romania
- Saint John Clinical Emergency Hospital for Children, 800487 Galati, Romania
| |
Collapse
|
3
|
Galadima H, Anson-Dwamena R, Johnson A, Bello G, Adunlin G, Blando J. Machine Learning as a Tool for Early Detection: A Focus on Late-Stage Colorectal Cancer across Socioeconomic Spectrums. Cancers (Basel) 2024; 16:540. [PMID: 38339293 PMCID: PMC10854986 DOI: 10.3390/cancers16030540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 01/19/2024] [Accepted: 01/23/2024] [Indexed: 02/12/2024] Open
Abstract
PURPOSE To assess the efficacy of various machine learning (ML) algorithms in predicting late-stage colorectal cancer (CRC) diagnoses against the backdrop of socio-economic and regional healthcare disparities. METHODS An innovative theoretical framework was developed to integrate individual- and census tract-level social determinants of health (SDOH) with sociodemographic factors. A comparative analysis of the ML models was conducted using key performance metrics such as AUC-ROC to evaluate their predictive accuracy. Spatio-temporal analysis was used to identify disparities in late-stage CRC diagnosis probabilities. RESULTS Gradient boosting emerged as the superior model, with the top predictors for late-stage CRC diagnosis being anatomic site, year of diagnosis, age, proximity to superfund sites, and primary payer. Spatio-temporal clusters highlighted geographic areas with a statistically significant high probability of late-stage diagnoses, emphasizing the need for targeted healthcare interventions. CONCLUSIONS This research underlines the potential of ML in enhancing the prognostic predictions in oncology, particularly in CRC. The gradient boosting model, with its robust performance, holds promise for deployment in healthcare systems to aid early detection and formulate localized cancer prevention strategies. The study's methodology demonstrates a significant step toward utilizing AI in public health to mitigate disparities and improve cancer care outcomes.
Collapse
Affiliation(s)
- Hadiza Galadima
- School of Community and Environmental Health, Old Dominion University, Norfolk, VA 23529, USA; (R.A.-D.); (A.J.); (J.B.)
| | - Rexford Anson-Dwamena
- School of Community and Environmental Health, Old Dominion University, Norfolk, VA 23529, USA; (R.A.-D.); (A.J.); (J.B.)
| | - Ashley Johnson
- School of Community and Environmental Health, Old Dominion University, Norfolk, VA 23529, USA; (R.A.-D.); (A.J.); (J.B.)
| | - Ghalib Bello
- Department of Environmental Medicine & Public Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA;
| | - Georges Adunlin
- Department of Pharmaceutical, Social and Administrative Sciences, Samford University, Birmingham, AL 35229, USA;
| | - James Blando
- School of Community and Environmental Health, Old Dominion University, Norfolk, VA 23529, USA; (R.A.-D.); (A.J.); (J.B.)
| |
Collapse
|
4
|
Zhang Y, Ma Y, Wang J, Guan Q, Yu B. Construction and validation of a clinical prediction model for deep vein thrombosis in patients with digestive system tumors based on a machine learning. Am J Cancer Res 2024; 14:155-168. [PMID: 38323284 PMCID: PMC10839316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/13/2023] [Indexed: 02/08/2024] Open
Abstract
This study developed a deep vein thrombosis (DVT) risk prediction model based on multiple machine learning methods for patients with digestive system tumors undergoing surgical treatment. Data of 1048 patients with digestive system tumors admitted to Shanxi Provincial People's Hospital (College of Shanxi Medical University) from January 2020 to January 2023 were retrospectively analyzed, and 845 cases were screened according to the inclusion and exclusion criteria. The patients were divided into a training group (586 patients), and a validation group (259 patients), then feature selection was performed using six models, including Lasso regression, XGBoost, Random Forest, Decision Tree, Support Vector Machine, and Logistics. Predictive models were subsequently constructed from column-line plots, and the predictive validity of the models was assessed using receiver operating characteristic curves, precision-recall curves, and decision-curve analysis. In the model comparison, the XGBoost model showed the largest area under the curve (AUC) on the validation set (P < 0.05), demonstrating excellent predictive performance and generalization ability. We selected the common characteristic factors in the six models to further develop the column line plots to assess the DVT risk. The model performed well in clinical validation and effectively differentiated high-risk and low-risk patients. The differences in BMI, procedure time, and D-dimer were statistically significant between patients in the thrombus group and those in the non-thrombus group (P < 0.05). However, the AUC of the Xgboost model was found to be greater than that of the column chart model by the Delong test (P < 0.05). BMI, procedure time, and D-dimer are critical predictors of DVT risk in patients with digestive system tumors. Our model is an adequate assessment tool for DVT risk, which can help improve the prevention and treatment of DVT.
Collapse
Affiliation(s)
- Yunfeng Zhang
- Department of Vascular Surgery, Shanxi Provincial People’s Hospital (The Fifth Clinical Medical School of Shanxi Medical University)No. 29 Shuangtasi Street, Taiyuan 030012, Shanxi, China
| | - Yongqi Ma
- Shanxi University of Chinese MedicineNo. 121 Daxue Street, Yuci District, Jinzhong 030619, Shanxi, China
| | - Jie Wang
- Department of Vascular Surgery, Shanxi Provincial People’s Hospital (The Fifth Clinical Medical School of Shanxi Medical University)No. 29 Shuangtasi Street, Taiyuan 030012, Shanxi, China
| | - Qiang Guan
- Department of Vascular Surgery, Shanxi Provincial People’s Hospital (The Fifth Clinical Medical School of Shanxi Medical University)No. 29 Shuangtasi Street, Taiyuan 030012, Shanxi, China
| | - Bo Yu
- Department of Operating Room, Affiliated Hospital of Hebei UniversityNo. 212 Yuhua East Road, Lianchi District, Baoding 071000, Hebei, China
| |
Collapse
|
5
|
Sun J, Wu S, Mou Z, Wen J, Wei H, Zou J, Li Q, Liu Z, Xu SH, Kang M, Ling Q, Huang H, Chen X, Wang Y, Liao X, Tan G, Shao Y. Prediction model of ocular metastasis from primary liver cancer: Machine learning-based development and interpretation study. Cancer Med 2023; 12:20482-20496. [PMID: 37795569 PMCID: PMC10652349 DOI: 10.1002/cam4.6540] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 08/21/2023] [Accepted: 09/05/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Ocular metastasis (OM) is a rare metastatic site of primary liver cancer (PLC). The purpose of this study was to establish a clinical predictive model of OM in PLC patients based on machine learning (ML). METHODS We retrospectively collected the clinical data of 1540 PLC patients and divided it into a training set and an internal test set in a 7:3 proportion. PLC patients were divided into OM and non-ocular metastasis (NOM) groups, and univariate logistic regression analysis was performed between the two groups. The variables with univariate logistic analysis p < 0.05 were selected for the ML model. We constructed six ML models, which were internally verified by 10-fold cross-validation. The prediction performance of each ML model was evaluated by receiver operating characteristic curves (ROCs). We also constructed a web calculator based on the optimal performance ML model to personalize the risk probability for OM. RESULTS Six variables were selected for the ML model. The extreme gradient boost (XGB) ML model achieved the optimal differential diagnosis ability, with an area under the curve (AUC) = 0.993, accuracy = 0.992, sensitivity = 0.998, and specificity = 0.984. Based on these results, an online web calculator was constructed by using the XGB ML model to help clinicians diagnose and treat the risk probability of OM in PLC patients. Finally, the Shapley additive explanations (SHAP) library was used to obtain the six most important risk factors for OM in PLC patients: CA125, ALP, AFP, TG, CA199, and CEA. CONCLUSION We used the XGB model to establish a risk prediction model of OM in PLC patients. The predictive model can help identify PLC patients with a high risk of OM, provide early and personalized diagnosis and treatment, reduce the poor prognosis of OM patients, and improve the quality of life of PLC patients.
Collapse
Affiliation(s)
- Jin‐Qi Sun
- Fuxing Hospital, The Eighth Clinical Medical CollegeCapital Medical UniversityBeijingPeople's Republic of China
| | - Shi‐Nan Wu
- Department of OphthalmologyThe First Affiliated Hospital of Nanchang University, Jiangxi Branch of the National Clinical Research Center for Ocular DiseaseNanchangPeople's Republic of China
- Fujian Provincial Key Laboratory of Ophthalmology and Visual Science, Eye Institute of Xiamen UniversitySchool of Medicine, Xiamen UniversityXiamenPeople's Republic of China
| | - Zheng‐Lin Mou
- Department of OphthalmologyThe First Affiliated Hospital of Nanchang University, Jiangxi Branch of the National Clinical Research Center for Ocular DiseaseNanchangPeople's Republic of China
| | - Jia‐Yi Wen
- Department of OphthalmologyThe First Affiliated Hospital of Nanchang University, Jiangxi Branch of the National Clinical Research Center for Ocular DiseaseNanchangPeople's Republic of China
| | - Hong Wei
- Department of OphthalmologyThe First Affiliated Hospital of Nanchang University, Jiangxi Branch of the National Clinical Research Center for Ocular DiseaseNanchangPeople's Republic of China
| | - Jie Zou
- Department of OphthalmologyThe First Affiliated Hospital of Nanchang University, Jiangxi Branch of the National Clinical Research Center for Ocular DiseaseNanchangPeople's Republic of China
| | - Qing‐Jian Li
- Fujian Provincial Key Laboratory of Ophthalmology and Visual Science, Eye Institute of Xiamen UniversitySchool of Medicine, Xiamen UniversityXiamenPeople's Republic of China
| | - Zhao‐Lin Liu
- Department of OphthalmologyThe First Affiliated Hospital of University of South China, Hunan Branch of The National Clinical Research Center for Ocular DiseaseHengyangPeople's Republic of China
| | - San Hua Xu
- Department of OphthalmologyThe First Affiliated Hospital of Nanchang University, Jiangxi Branch of the National Clinical Research Center for Ocular DiseaseNanchangPeople's Republic of China
| | - Min Kang
- Department of OphthalmologyThe First Affiliated Hospital of Nanchang University, Jiangxi Branch of the National Clinical Research Center for Ocular DiseaseNanchangPeople's Republic of China
| | - Qian Ling
- Department of OphthalmologyThe First Affiliated Hospital of Nanchang University, Jiangxi Branch of the National Clinical Research Center for Ocular DiseaseNanchangPeople's Republic of China
| | - Hui Huang
- Department of OphthalmologyThe First Affiliated Hospital of Nanchang University, Jiangxi Branch of the National Clinical Research Center for Ocular DiseaseNanchangPeople's Republic of China
| | - Xu Chen
- Department of Ophthalmology and Visual SciencesMaastricht UniversityMaastrichtNetherlands
| | - Yi‐Xin Wang
- School of Optometry and Vision SciencesCardiff UniversityCardiffUK
| | - Xu‐Lin Liao
- Department of Ophthalmology and Visual SciencesThe Chinese University of Hong KongHong KongPeople's Republic of China
| | - Gang Tan
- Department of OphthalmologyThe First Affiliated Hospital of University of South China, Hunan Branch of The National Clinical Research Center for Ocular DiseaseHengyangPeople's Republic of China
| | - Yi Shao
- Department of OphthalmologyThe First Affiliated Hospital of Nanchang University, Jiangxi Branch of the National Clinical Research Center for Ocular DiseaseNanchangPeople's Republic of China
| |
Collapse
|
6
|
Huang W, Zhang H, Ge Y, Duan S, Ma Y, Wang X, Zhou X, Zhou T, Tu W, Wang Y, Liu S, Dong P, Fan L. Radiomics-based Machine Learning Methods for Volume Doubling Time Prediction of Pulmonary Ground-glass Nodules With Baseline Chest Computed Tomography. J Thorac Imaging 2023; 38:304-314. [PMID: 37423615 DOI: 10.1097/rti.0000000000000725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
PURPOSE Reliable prediction of volume doubling time (VDT) is essential for the personalized management of pulmonary ground-glass nodules (GGNs). We aimed to determine the optimal VDT prediction method by comparing different machine learning methods only based on the baseline chest computed tomography (CT) images. MATERIALS AND METHODS Seven classical machine learning methods were evaluated in terms of their stability and performance for VDT prediction. The VDT, calculated by the preoperative and baseline CT, was divided into 2 groups with a cutoff value of 400 days. A total of 90 GGNs from 3 hospitals constituted the training set, and 86 GGNs from the fourth hospital served as the external validation set. The training set was used for feature selection and model training, and the validation set was used to evaluate the predictive performance of the model independently. RESULTS The eXtreme Gradient Boosting showed the highest predictive performance (accuracy: 0.890±0.128 and area under the ROC curve (AUC): 0.896±0.134), followed by the neural network (NNet) (accuracy: 0.865±0.103 and AUC: 0.886±0.097). While regarding stability, the NNet showed the highest robustness against data perturbation (relative SDs [%] of mean AUC: 10.9%). Therefore, the NNet was chosen as the final model, achieving high accuracy of 0.756 in the external validation set. CONCLUSION The NNet is a promising machine learning method to predict the VDT of GGNs, which would assist in the personalized follow-up and treatment strategies for GGNs reducing unnecessary follow-up and radiation dose.
Collapse
Affiliation(s)
- Wenjun Huang
- School of Medical Imaging, Weifang Medical University
- Department of Radiology, Changzheng Hospital, Naval Medical University, Shanghai
| | - Hanxiao Zhang
- Department of Radiology, Changzheng Hospital, Naval Medical University, Shanghai
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, Jiangsu
| | - Yanming Ge
- School of Medical Imaging, Weifang Medical University
- Medical Imaging Center, Affiliated Hospital of Weifang Medical University, Weifang
| | - Shaofeng Duan
- GE Healthcare, Precision Health Institution, Shanghai
| | - Yanqing Ma
- Department of Radiology, Zhejiang Provincial People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou, Zhejiang Province
| | - Xiaoling Wang
- Department of Radiology, Deyang People's Hospital, Deyang, Sichuan Province, China
| | - Xiuxiu Zhou
- Department of Radiology, Changzheng Hospital, Naval Medical University, Shanghai
| | - Taohu Zhou
- School of Medical Imaging, Weifang Medical University
- Department of Radiology, Changzheng Hospital, Naval Medical University, Shanghai
| | - Wenting Tu
- Department of Radiology, Changzheng Hospital, Naval Medical University, Shanghai
| | - Yun Wang
- Department of Radiology, Changzheng Hospital, Naval Medical University, Shanghai
| | - Shiyuan Liu
- Department of Radiology, Changzheng Hospital, Naval Medical University, Shanghai
| | - Peng Dong
- School of Medical Imaging, Weifang Medical University
| | - Li Fan
- Department of Radiology, Changzheng Hospital, Naval Medical University, Shanghai
| |
Collapse
|
7
|
Ma Z, Lv J, Zhu M, Yu C, Ma H, Jin G, Guo Y, Bian Z, Yang L, Chen Y, Chen Z, Hu Z, Li L, Shen H. Lung cancer risk score for ever and never smokers in China. Cancer Commun (Lond) 2023; 43:877-895. [PMID: 37410540 PMCID: PMC10397566 DOI: 10.1002/cac2.12463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 05/23/2023] [Accepted: 06/28/2023] [Indexed: 07/07/2023] Open
Abstract
BACKGROUND Most lung cancer risk prediction models were developed in European and North-American cohorts of smokers aged ≥ 55 years, while less is known about risk profiles in Asia, especially for never smokers or individuals aged < 50 years. Hence, we aimed to develop and validate a lung cancer risk estimate tool for ever and never smokers across a wide age range. METHODS Based on the China Kadoorie Biobank cohort, we first systematically selected the predictors and explored the nonlinear association of predictors with lung cancer risk using restricted cubic splines. Then, we separately developed risk prediction models to construct a lung cancer risk score (LCRS) in 159,715 ever smokers and 336,526 never smokers. The LCRS was further validated in an independent cohort over a median follow-up of 13.6 years, consisting of 14,153 never smokers and 5,890 ever smokers. RESULTS A total of 13 and 9 routinely available predictors were identified for ever and never smokers, respectively. Of these predictors, cigarettes per day and quit years showed nonlinear associations with lung cancer risk (Pnon-linear < 0.001). The curve of lung cancer incidence increased rapidly above 20 cigarettes per day and then was relatively flat until approximately 30 cigarettes per day. We also observed that lung cancer risk declined sharply within the first 5 years of quitting, and then continued to decrease but at a slower rate in the subsequent years. The 6-year area under the receiver operating curve for the ever and never smokers' models were respectively 0.778 and 0.733 in the derivation cohort, and 0.774 and 0.759 in the validation cohort. In the validation cohort, the 10-year cumulative incidence of lung cancer was 0.39% and 2.57% for ever smokers with low (< 166.2) and intermediate-high LCRS (≥ 166.2), respectively. Never smokers with a high LCRS (≥ 21.2) had a higher 10-year cumulative incidence rate than those with a low LCRS (< 21.2; 1.05% vs. 0.22%). An online risk evaluation tool (LCKEY; http://ccra.njmu.edu.cn/lckey/web) was developed to facilitate the use of LCRS. CONCLUSIONS The LCRS can be an effective risk assessment tool designed for ever and never smokers aged 30 to 80 years.
Collapse
Affiliation(s)
- Zhimin Ma
- Department of EpidemiologyCenter for Global HealthSchool of Public HealthNanjing Medical UniversityNanjingJiangsuP. R. China
- Jiangsu Key Lab of Cancer BiomarkersPrevention and TreatmentCollaborative Innovation Center for Cancer Personalized MedicineNanjing Medical UniversityNanjingJiangsuP. R. China
- Department of EpidemiologySchool of Public HealthSoutheast UniversityNanjingJiangsuP. R. China
| | - Jun Lv
- Department of Epidemiology & BiostatisticsSchool of Public HealthPeking UniversityBeijingP. R. China
- Ministry of EducationKey Laboratory of Molecular Cardiovascular Sciences (Peking University)BeijingP. R. China
| | - Meng Zhu
- Department of EpidemiologyCenter for Global HealthSchool of Public HealthNanjing Medical UniversityNanjingJiangsuP. R. China
- Jiangsu Key Lab of Cancer BiomarkersPrevention and TreatmentCollaborative Innovation Center for Cancer Personalized MedicineNanjing Medical UniversityNanjingJiangsuP. R. China
| | - Canqing Yu
- Department of Epidemiology & BiostatisticsSchool of Public HealthPeking UniversityBeijingP. R. China
| | - Hongxia Ma
- Department of EpidemiologyCenter for Global HealthSchool of Public HealthNanjing Medical UniversityNanjingJiangsuP. R. China
- Jiangsu Key Lab of Cancer BiomarkersPrevention and TreatmentCollaborative Innovation Center for Cancer Personalized MedicineNanjing Medical UniversityNanjingJiangsuP. R. China
| | - Guangfu Jin
- Department of EpidemiologyCenter for Global HealthSchool of Public HealthNanjing Medical UniversityNanjingJiangsuP. R. China
- Jiangsu Key Lab of Cancer BiomarkersPrevention and TreatmentCollaborative Innovation Center for Cancer Personalized MedicineNanjing Medical UniversityNanjingJiangsuP. R. China
| | - Yu Guo
- Chinese Academy of Medical SciencesBeijingP. R. China
| | - Zheng Bian
- Chinese Academy of Medical SciencesBeijingP. R. China
| | - Ling Yang
- Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU)Nuffield Department of Population HealthUniversity of OxfordOxfordOxfordshireUK
| | - Yiping Chen
- Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU)Nuffield Department of Population HealthUniversity of OxfordOxfordOxfordshireUK
| | - Zhengming Chen
- Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU)Nuffield Department of Population HealthUniversity of OxfordOxfordOxfordshireUK
| | - Zhibin Hu
- Department of EpidemiologyCenter for Global HealthSchool of Public HealthNanjing Medical UniversityNanjingJiangsuP. R. China
- Jiangsu Key Lab of Cancer BiomarkersPrevention and TreatmentCollaborative Innovation Center for Cancer Personalized MedicineNanjing Medical UniversityNanjingJiangsuP. R. China
| | - Liming Li
- Department of Epidemiology & BiostatisticsSchool of Public HealthPeking UniversityBeijingP. R. China
- Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU)Nuffield Department of Population HealthUniversity of OxfordOxfordOxfordshireUK
| | - Hongbing Shen
- Department of EpidemiologyCenter for Global HealthSchool of Public HealthNanjing Medical UniversityNanjingJiangsuP. R. China
- Jiangsu Key Lab of Cancer BiomarkersPrevention and TreatmentCollaborative Innovation Center for Cancer Personalized MedicineNanjing Medical UniversityNanjingJiangsuP. R. China
- Research Units of Cohort Study on Cardiovascular Diseases and CancersChinese Academy of Medical SciencesBeijingP. R. China
| |
Collapse
|
8
|
Guan X, Du Y, Ma R, Teng N, Ou S, Zhao H, Li X. Construction of the XGBoost model for early lung cancer prediction based on metabolic indices. BMC Med Inform Decis Mak 2023; 23:107. [PMID: 37312179 DOI: 10.1186/s12911-023-02171-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 04/05/2023] [Indexed: 06/15/2023] Open
Abstract
BACKGROUND Lung cancer is a malignant tumour, and early diagnosis has been shown to improve the survival rate of lung cancer patients. In this study, we assessed the use of plasma metabolites as biomarkers for lung cancer diagnosis. In this work, we used a novel interdisciplinary mechanism, applied for the first time to lung cancer, to detect biomarkers for early lung cancer diagnosis by combining metabolomics and machine learning approaches. RESULTS In total, 478 lung cancer patients and 370 subjects with benign lung nodules were enrolled from a hospital in Dalian, Liaoning Province. We selected 47 serum amino acid and carnitine indicators from targeted metabolomics studies using LC‒MS/MS and age and sex demographic indicators of the subjects. After screening by a stepwise regression algorithm, 16 metrics were included. The XGBoost model in the machine learning algorithm showed superior predictive power (AUC = 0.81, accuracy = 75.29%, sensitivity = 74%), with the metabolic biomarkers ornithine and palmitoylcarnitine being potential biomarkers to screen for lung cancer. The machine learning model XGBoost is proposed as an tool for early lung cancer prediction. This study provides strong support for the feasibility of blood-based screening for metabolites and provide a safer, faster and more accurate tool for early diagnosis of lung cancer. CONCLUSIONS This study proposes an interdisciplinary approach combining metabolomics with a machine learning model (XGBoost) to predict early the occurrence of lung cancer. The metabolic biomarkers ornithine and palmitoylcarnitine showed significant power for early lung cancer diagnosis.
Collapse
Affiliation(s)
- Xiuliang Guan
- School of Public Health, Dalian Medical University, Dalian, 116000, China
| | - Yue Du
- School of Public Health, Dalian Medical University, Dalian, 116000, China
| | - Rufei Ma
- School of Public Health, Dalian Medical University, Dalian, 116000, China
| | - Nan Teng
- School of Public Health, Dalian Medical University, Dalian, 116000, China
| | - Shu Ou
- School of Public Health, Dalian Medical University, Dalian, 116000, China
| | - Hui Zhao
- Department of Health Examination Center, The Second Affiliated Hospital of Dalian Medical University, Dalian, China.
| | - Xiaofeng Li
- School of Public Health, Dalian Medical University, Dalian, 116000, China.
| |
Collapse
|
9
|
Ling D, Liu A, Sun J, Wang Y, Wang L, Song X, Zhao X. Integration of IDPC Clustering Analysis and Interpretable Machine Learning for Survival Risk Prediction of Patients with ESCC. Interdiscip Sci 2023:10.1007/s12539-023-00569-9. [PMID: 37248421 DOI: 10.1007/s12539-023-00569-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 04/26/2023] [Accepted: 04/26/2023] [Indexed: 05/31/2023]
Abstract
Precise forecasting of survival risk plays a pivotal role in comprehending and predicting the prognosis of patients afflicted with esophageal squamous cell carcinoma (ESCC). The existing methods have the problems of insufficient fitting ability and poor interpretability. To address this issue, this work proposes a novel interpretable survival risk prediction method for ESCC patients based on extreme gradient boosting improved by whale optimization algorithm (WOA-XGBoost) and shapley additive explanations (SHAP). Given the imbalanced nature of the data set, the adaptive synthetic sampling (ADASYN) is first used to generate the samples with high survival risk. Then, an improved clustering by fast search and find of density peaks (IDPC) algorithm based on cosine distance and K nearest neighbors is used to cluster the patients. Next, the prediction model for each cluster is obtained by WOA-XGBoost and the constructed model is visualized with SHAP to uncover the factors hidden in the structured model and improve the interpretability of the black-box model. Finally, the effectiveness of the proposed scheme is demonstrated by analyzing the data collected from the First Affiliated Hospital of Zhengzhou University. The results of the analysis reveal that the proposed methodology exhibits superior performance, as indicated by the area under the receiver operating characteristic curve (AUROC) of 0.918 and accuracy of 0.881.
Collapse
Affiliation(s)
- Dan Ling
- Henan Key Lab of Information-Based Electrical Appliances, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
| | - Anhao Liu
- Henan Key Lab of Information-Based Electrical Appliances, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
| | - Junwei Sun
- Henan Key Lab of Information-Based Electrical Appliances, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
| | - Yanfeng Wang
- Henan Key Lab of Information-Based Electrical Appliances, Zhengzhou University of Light Industry, Zhengzhou, 450002, China.
| | - Lidong Wang
- State Key Laboratory of Esophageal Cancer Prevention and Treatment and Henan Key Laboratory for Esophageal Cancer Research of The First Affiliated Hospital, Zhengzhou University, Zhengzhou, 450052, China
| | - Xin Song
- State Key Laboratory of Esophageal Cancer Prevention and Treatment and Henan Key Laboratory for Esophageal Cancer Research of The First Affiliated Hospital, Zhengzhou University, Zhengzhou, 450052, China
| | - Xueke Zhao
- State Key Laboratory of Esophageal Cancer Prevention and Treatment and Henan Key Laboratory for Esophageal Cancer Research of The First Affiliated Hospital, Zhengzhou University, Zhengzhou, 450052, China
| |
Collapse
|
10
|
Li Y, Zou Z, Gao Z, Wang Y, Xiao M, Xu C, Jiang G, Wang H, Jin L, Wang J, Wang HZ, Guo S, Wu J. Prediction of lung cancer risk in Chinese population with genetic-environment factor using extreme gradient boosting. Cancer Med 2022; 11:4469-4478. [PMID: 35499292 PMCID: PMC9741969 DOI: 10.1002/cam4.4800] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 04/22/2022] [Accepted: 04/24/2022] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Detecting early-stage lung cancer is critical to reduce the lung cancer mortality rate; however, existing models based on germline variants perform poorly, and new models are needed. This study aimed to use extreme gradient boosting to develop a predictive model for the early diagnosis of lung cancer in a multicenter case-control study. MATERIALS AND METHODS A total of 974 cases and 1005 controls in Shanghai and Taizhou were recruited, and 61 single nucleotide polymorphisms (SNPs) were genotyped. Multivariate logistic regression was used to calculate the association between signal SNPs and lung cancer risk. Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms, a large-scale machine learning algorithm, were adopted to build the lung cancer risk model. In both models, 10-fold cross-validation was performed, and model predictive performance was evaluated by the area under the curve (AUC). RESULTS After FDR adjustment, TYMS rs3819102 and BAG6 rs1077393 were significantly associated with lung cancer risk (p < 0.05). For lung cancer risk prediction, the model predicted only with epidemiology attained an AUC of 0.703 for LR and 0.744 for XGBoost. Compared with the LR model predicted only with epidemiology, further adding SNPs and applying XGBoost increased the AUC to 0.759 (p < 0.001) in the XGBoost model. BAG6 rs1077393 was the most important predictor among all SNPs in the lung cancer prediction XGBoost model, followed by TERT rs2735845 and CAMKK1 rs7214723. Further stratification in lung adenocarcinoma (ADC) showed a significantly elevated performance from 0.639 to 0.699 (p = 0.009) when applying XGBoost and adding SNPs to the model, while the best model for lung squamous cell carcinoma (SCC) prediction was the LR model predicted with epidemiology and SNPs (AUC = 0.833), compared with the XGBoost model (AUC = 0.816). CONCLUSION Our lung cancer risk prediction models in the Chinese population have a strong predictive ability, especially for SCC. Adding SNPs and applying the XGBoost algorithm to the epidemiologic-based logistic regression risk prediction model significantly improves model performance.
Collapse
Affiliation(s)
- Yutao Li
- School of Life SciencesFudan UniversityShanghaiChina
| | - Zixiu Zou
- School of Life SciencesFudan UniversityShanghaiChina
| | - Zhunyi Gao
- Company 6 of Basic Medical SchoolNavy Military Medical UniversityShanghaiChina
| | - Yi Wang
- School of Life SciencesFudan UniversityShanghaiChina
| | - Man Xiao
- Department of Biochemistry and Molecular BiologyHainan Medical UniversityHaikouChina
| | - Chang Xu
- Clinical College of Xiangnan UniversityChenzhouChina
| | - Gengxi Jiang
- Department of Thoracic Surgerythe First Affiliated Hospital of Naval Medical University (Second Military Medical University)ShanghaiChina
| | - Haijian Wang
- School of Life SciencesFudan UniversityShanghaiChina
| | - Li Jin
- School of Life SciencesFudan UniversityShanghaiChina
| | - Jiucun Wang
- School of Life SciencesFudan UniversityShanghaiChina
| | - Huai Zhou Wang
- Department of Laboratory Diagnosisthe First Affiliated Hospital of Naval Medical University (Second Military Medical University)ShanghaiChina
| | - Shicheng Guo
- School of Life SciencesFudan UniversityShanghaiChina
| | - Junjie Wu
- School of Life SciencesFudan UniversityShanghaiChina,Department of Pulmonary and Critical Care Medicine, Zhongshan HospitalFudan UniversityShanghaiChina,Department of Pulmonary and Critical Care MedicineShanghai Geriatric Medical CenterShanghaiChina
| |
Collapse
|