1
|
Deng F, Zhao L, Yu N, Lin Y, Zhang L. Union With Recursive Feature Elimination: A Feature Selection Framework to Improve the Classification Performance of Multicategory Causes of Death in Colorectal Cancer. J Transl Med 2024; 104:100320. [PMID: 38158124 DOI: 10.1016/j.labinv.2023.100320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 12/05/2023] [Accepted: 12/20/2023] [Indexed: 01/03/2024] Open
Abstract
Despite the use of machine learning tools, it is challenging to properly model cause-specific deaths in colorectal cancer (CRC) patients and choose appropriate treatments. Here, we propose an interesting feature selection framework, namely union with recursive feature elimination (U-RFE), to select the union feature sets that are crucial in CRC progression-specific mortality using The Cancer Genome Atlas (TCGA) dataset. Based on the union feature sets, we compared the performance of 5 classification algorithms, including logistic regression (LR), support vector machines (SVM), random forest (RF), eXtreme gradient boosting (XGBoost), and Stacking, to identify the best model for classifying 4-category deaths. In the first stage of U-RFE, LR, SVM, and RF were used as base estimators to obtain subsets containing the same number of features but not exactly the same specific features. Union analysis of the subsets was then performed to determine the final union feature set, effectively combining the advantages of different algorithms. We found that the U-RFE framework could improve various models' performance. Stacking outperformed LR, SVM, RF, and XGBoost in most scenarios. When the target feature number of the RFE was set to 50 and the union feature set contained 298 deterministic features, the Stacking model achieved F1_weighted, Recall_weighted, Precision_weighted, Accuracy, and Matthews correlation coefficient of 0.851, 0.864, 0.854, 0.864, and 0.717, respectively. The performance of the minority categories was also significantly improved. Therefore, this recursive feature elimination-based approach of feature selection improves performances of classifying CRC deaths using clinical and omics data or those using other data with high feature redundancy and imbalance.
Collapse
Affiliation(s)
- Fei Deng
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.
| | - Lin Zhao
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Ning Yu
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Yuxiang Lin
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Lanjing Zhang
- Department of Biological Sciences, Rutgers University, Newark, New Jersey; Department of Pathology, Princeton Medical Center, Plainsboro, New Jersey; Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey; Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey.
| |
Collapse
|
2
|
Fujikawa K, Omori T, Shinno N, Hara H, Yamamoto M, Yasui M, Matsuda C, Wada H, Nishimura J, Haraguchi N, Akita H, Ohue M, Miyata H. Tumor Deposit Is an Independent Factor Predicting Early Recurrence and Poor Prognosis in Gastric Cancer. J Gastrointest Surg 2023; 27:1336-1344. [PMID: 37014588 DOI: 10.1007/s11605-023-05668-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 02/11/2023] [Indexed: 04/05/2023]
Abstract
BACKGROUND Accurate prognostic estimation is crucial; however, the prognostic value of tumor deposits in gastric cancer remains controversial. This study aimed to investigate their prognostic significance. METHODS Clinicopathological and prognostic data of 1012 gastric cancer patients who underwent R0 or R1 surgery from 2010 to 2017 at the Osaka International Cancer Institute were retrospectively reviewed. RESULTS Overall, 6.3% patients had tumor deposits, which were associated with Borrmann type, surgical procedure, type of gastrectomy, extent of lymphadenectomy, tumor size, histology, pT, pN, pM, pStage, lymphatic invasion, vascular invasion, preoperative chemotherapy, and postoperative chemotherapy. Tumor deposit-positive patients had worse 5-year disease-free survival (32.60% vs. 92.45%) and overall survival (41.22% vs. 89.37%) than tumor deposit-negative patients. Subgroup analysis regarding pStage II-III also showed significant differences between patients with and without tumor deposits for 5-year disease-free survival (34.15% vs. 80.98%) and overall survival (43.17% vs. 75.78%). Multivariable analysis showed that older age, undifferentiated histology, deeper tumor invasion, lymph node metastasis, distant metastasis, and presence of tumor deposits were significantly correlated with early tumor recurrence and shorter survival time; these factors were identified as independent prognostic factors. The 5-year disease-free survival of tumor deposit-positive patients was significantly worse than that of patients in the pStage III group and comparable to that of patients in the pT4, pN3, and pM1 groups. The 5-year overall survival of tumor deposit-positive patients was comparable to that of the pT4, pN3, pM1, and pStage III groups. CONCLUSIONS Tumor deposits are strong and independent predictors of tumor recurrence and poor survival.
Collapse
Affiliation(s)
- Kaoru Fujikawa
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Takeshi Omori
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan.
| | - Naoki Shinno
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Hisashi Hara
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Masaaki Yamamoto
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Masayoshi Yasui
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Chu Matsuda
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Hiroshi Wada
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Junichi Nishimura
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Naotsugu Haraguchi
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Hirofumi Akita
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Masayuki Ohue
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| | - Hiroshi Miyata
- Osaka International Cancer Institute, 3-1-69 Otemae, Tyuo-Ward, Osaka City, Osaka-Prefecture, Japan
| |
Collapse
|
3
|
Zhou X, Ji Y, Zhou J. Multiple Strategies to Develop Small Molecular KRAS Directly Bound Inhibitors. Molecules 2023; 28:molecules28083615. [PMID: 37110848 PMCID: PMC10146153 DOI: 10.3390/molecules28083615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 04/08/2023] [Accepted: 04/17/2023] [Indexed: 04/29/2023] Open
Abstract
KRAS gene mutation is widespread in tumors and plays an important role in various malignancies. Targeting KRAS mutations is regarded as the "holy grail" of targeted cancer therapies. Recently, multiple strategies, including covalent binding strategy, targeted protein degradation strategy, targeting protein and protein interaction strategy, salt bridge strategy, and multivalent strategy, have been adopted to develop KRAS direct inhibitors for anti-cancer therapy. Various KRAS-directed inhibitors have been developed, including the FDA-approved drugs sotorasib and adagrasib, KRAS-G12D inhibitor MRTX1133, and KRAS-G12V inhibitor JAB-23000, etc. The different strategies greatly promote the development of KRAS inhibitors. Herein, the strategies are summarized, which would shed light on the drug discovery for both KRAS and other "undruggable" targets.
Collapse
Affiliation(s)
- Xile Zhou
- Department of Colorectal Surgery, The First Affiliated Hospital, Zhejiang University School of Medicine, 79 Qingchun Road, Hangzhou 310003, China
| | - Yang Ji
- Drug Development and Innovation Center, College of Chemistry and Life Sciences, Zhejiang Normal University, 688 Yingbin Road, Jinhua 321004, China
| | - Jinming Zhou
- Drug Development and Innovation Center, College of Chemistry and Life Sciences, Zhejiang Normal University, 688 Yingbin Road, Jinhua 321004, China
| |
Collapse
|
4
|
Chen J, Zhang Z, Ni J, Sun J, Ren W, Shen Y, Shi L, Xue M. Predictive and Prognostic Assessment Models for Tumor Deposit in Colorectal Cancer Patients With No Distant Metastasis. Front Oncol 2022; 12:809277. [PMID: 35251979 PMCID: PMC8888919 DOI: 10.3389/fonc.2022.809277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 01/24/2022] [Indexed: 12/12/2022] Open
Abstract
Background More and more evidence indicated that tumor deposit (TD) was significantly associated with local recurrence, distant metastasis (DM), and poor prognosis for patients with colorectal cancer (CRC). This study aims to explore the main clinical risk factors for the presence of TD in CRC patients with no DM (CRC-NDM) and the prognostic factors for TD-positive patients after surgery. Methods The data of patients with CRC-NDM between 2010 and 2017 were extracted from the Surveillance, Epidemiology, and End Results (SEER) database. A logistic regression model was used to identify risk factors for TD presence. Fine and Gray’s competing-risk model was performed to analyze prognostic factors for TD-positive CRC-NDM patients. A predictive nomogram was constructed using the multivariate logistic regression model. The concordance index (C-index), the area under the receiver operating characteristic (ROC) curve (AUC), and the calibration were used to evaluate the predictive nomogram. Also, a prognostic nomogram was built based on multivariate competing-risk regression. C-index, the calibration, and decision-curve analysis (DCA) were performed to validate the prognostic model. Results The predictive nomogram to predict the presence of TD had a C-index of 0.785 and AUC of 0.787 and 0.782 in the training and validation sets, respectively. From the competing-risk analysis, chemotherapy (subdistribution hazard ratio (SHR) = 0.542, p < 0.001) can significantly reduce CRC-specific death (CCSD). The prognostic nomogram for the outcome prediction in postoperative CRC-NDM patients with TD had a C-index of 0.727. The 5-year survival of CCSD was 17.16%, 36.20%, and 63.19% in low-, medium-, and high-risk subgroups, respectively (Gray’s test, p < 0.001). Conclusions We constructed an easily predictive nomogram in identifying the high-risk TD-positive CRC-NDM patients. Besides, a prognostic nomogram was built to help clinicians identify poor-outcome individuals in postoperative CRC-NDM patients with TD. For the high-risk or medium-risk subgroup, additional chemotherapy may be more advantageous for the TD-positive patients rather than radiotherapy.
Collapse
Affiliation(s)
- Jingyu Chen
- Department of Gastroenterology, The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China.,Institute of Gastroenterology, Zhejiang University, Hangzhou, China
| | - Zizhen Zhang
- Department of Gastroenterology, The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China.,Institute of Gastroenterology, Zhejiang University, Hangzhou, China.,Department of Gastrointestinal Oncology, Peking University Cancer Hospital and Institute, Beijing, China
| | - Jiaojiao Ni
- Department of Gastroenterology, The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China.,Institute of Gastroenterology, Zhejiang University, Hangzhou, China
| | - Jiawei Sun
- Department of Gastroenterology, The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China.,Institute of Gastroenterology, Zhejiang University, Hangzhou, China.,Shulan International Medical College, Zhejiang Shuren University, Hangzhou, China
| | - Wenhao Ren
- Department of Pathology, Peking University Cancer Hospital and Institute, Beijing, China
| | - Yan Shen
- School of Medicine, Ningbo University, Ningbo, Zhejiang, China
| | - Liuhong Shi
- Department of Ultrasound, The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China
| | - Meng Xue
- Department of Gastroenterology, The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China.,Institute of Gastroenterology, Zhejiang University, Hangzhou, China
| |
Collapse
|
5
|
Feng CH, Disis ML, Cheng C, Zhang L. Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models. J Transl Med 2022; 102:236-244. [PMID: 34537824 DOI: 10.1038/s41374-021-00662-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/10/2021] [Accepted: 08/12/2021] [Indexed: 11/09/2022] Open
Abstract
Colorectal cancer (CRC) is one of the most common cancers worldwide, and a leading cause of cancer deaths. Better classifying multicategory outcomes of CRC with clinical and omic data may help adjust treatment regimens based on individual's risk. Here, we selected the features that were useful for classifying four-category survival outcome of CRC using the clinical and transcriptomic data, or clinical, transcriptomic, microsatellite instability and selected oncogenic-driver data (all data) of TCGA. We also optimized multimetric feature selection to develop the best multinomial logistic regression (MLR) and random forest (RF) models that had the highest accuracy, precision, recall and F1 score, respectively. We identified 2073 differentially expressed genes of the TCGA RNASeq dataset. MLR overall outperformed RF in the multimetric feature selection. In both RF and MLR models, precision, recall and F1 score increased as the feature number increased and peaked at the feature number of 600-1000, while the models' accuracy remained stable. The best model was the MLR one with 825 features based on sum of squared coefficients using all data, and attained the best accuracy of 0.855, F1 of 0.738 and precision of 0.832, which were higher than those using clinical and transcriptomic data. The top-ranked features in the MLR model of the best performance using clinical and transcriptomic data were different from those using all data. However, pathologic staging, HBS1L, TSPYL4, and TP53TG3B were the overlapping top-20 ranked features in the best models using clinical and transcriptomic, or all data. Thus, we developed a multimetric feature-selection based MLR model that outperformed RF models in classifying four-category outcome of CRC patients. Interestingly, adding microsatellite instability and oncogenic-driver data to clinical and transcriptomic data improved models' performances. Precision and recall of tuned algorithms may change significantly as the feature number changes, but accuracy appears not sensitive to these changes.
Collapse
Affiliation(s)
| | - Mary L Disis
- UW Medicine Cancer Vaccine Institute, University of Washington, Seattle, WA, USA
| | - Chao Cheng
- Department of Medicine, Section of Epidemiology and Population Sciences, Baylor College of Medicine, Houston, TX, USA.,Department of Medicine, Baylor College of Medicine, Houston, TX, USA.,Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Lanjing Zhang
- Department of Biological Sciences, Rutgers University, Newark, NJ, USA. .,Department of Pathology, Princeton Medical Center, Plainsboro, NJ, USA. .,Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA. .,Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ, USA.
| |
Collapse
|
6
|
Xiao S, Guo J, Zhang W, Hu X, Wang R, Chen Z, Lai C. A Six-microRNA Signature Nomogram for Preoperative Prediction of Tumor Deposits in Colorectal Cancer. Int J Gen Med 2022; 15:675-687. [PMID: 35082517 PMCID: PMC8785134 DOI: 10.2147/ijgm.s346790] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 12/29/2021] [Indexed: 11/23/2022] Open
Abstract
Purpose Tumor deposits (TDs) are acknowledged negative prognostic factors in colorectal cancer (CRC), and their pathogenesis remains a puzzle. This study aimed to construct and validate a nomogram available for preoperative TDs prediction in CRC patients. Patients and Methods Patients from the Surveillance, Epidemiology, and End Results (SEER) and the cancer genome atlas (TCGA) databases were randomly divided into training and validation sets according to the sample size ratio of 7:3. Univariate logistic regression was performed for identifying differentially expressed microRNAs between TDs and non-TDs. Nomograms for TDs prediction were developed from the multivariate logistic regression model with least absolute shrinkage and selection operator and were validated internally in terms of accuracy, calibration, and clinical utility. Based on the target genes, pathways tightly associated with TDs were selected using enrichment analysis. Results Six clinicopathologic factors and expressions of six microRNAs (miR-614, miR-1197, miR-4770, miR-3136, miR-3173, and miR-4636) differed significantly between TDs and non-TDs CRC patients from the SEER and TCGA training sets. We compared potential prediction discrimination between two nomograms: a clinicopathologic nomogram and a six-microRNA signature nomogram. The six-microRNA signature nomogram revealed better accuracy than the clinicopathologic one for TDs prediction (AUC values of 0.96 and 0.93 in the validation cohort). The calibration plots and decision curve analysis demonstrated that the six-microRNA signature nomogram had better validity and a greater prognostic benefit versus the clinicopathologic one for TDs prediction. Calcium signaling pathways were closely associated with roles of the six microRNAs in TDs of CRC patients. Conclusion The six-microRNA signature nomogram can be used as an efficient tool for preoperative TDs prediction in CRC patients.
Collapse
Affiliation(s)
- Shihan Xiao
- Department of General Surgery, Xiangya Hospital, Central South University, Changsha, Hunan, People’s Republic of China
- International Joint Research Center of Minimally Invasive Endoscopic Technology Equipment & Standardization, Xiangya Hospital, Central South University, Changsha, Hunan Province, People’s Republic of China
| | - Jianping Guo
- Department of Gastrointestinal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, People’s Republic of China
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, the Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, People’s Republic of China
| | - Wuming Zhang
- Department of General Surgery, Xiangya Hospital, Central South University, Changsha, Hunan, People’s Republic of China
- International Joint Research Center of Minimally Invasive Endoscopic Technology Equipment & Standardization, Xiangya Hospital, Central South University, Changsha, Hunan Province, People’s Republic of China
| | - Xianqin Hu
- Department of General Surgery, Xiangya Hospital, Central South University, Changsha, Hunan, People’s Republic of China
- International Joint Research Center of Minimally Invasive Endoscopic Technology Equipment & Standardization, Xiangya Hospital, Central South University, Changsha, Hunan Province, People’s Republic of China
| | - Ran Wang
- Department of General Surgery, Xiangya Hospital, Central South University, Changsha, Hunan, People’s Republic of China
- International Joint Research Center of Minimally Invasive Endoscopic Technology Equipment & Standardization, Xiangya Hospital, Central South University, Changsha, Hunan Province, People’s Republic of China
| | - Zhikang Chen
- Department of General Surgery, Xiangya Hospital, Central South University, Changsha, Hunan, People’s Republic of China
- International Joint Research Center of Minimally Invasive Endoscopic Technology Equipment & Standardization, Xiangya Hospital, Central South University, Changsha, Hunan Province, People’s Republic of China
- Hunan Key Laboratory of Precise Diagnosis and Treatment of Gastrointestinal Tumor, Xiangya Hospital Central South University, Changsha, Hunan Province, People’s Republic of China
- Correspondence: Zhikang Chen; Chen Lai Department of General Surgery, Xiangya Hospital, Central South University, 87th Xiangya Road, Kaifu District, Changsha, Hunan, People’s Republic of ChinaTel +86-13875982443Tel +86-13875982443 Email ;
| | - Chen Lai
- Department of General Surgery, Xiangya Hospital, Central South University, Changsha, Hunan, People’s Republic of China
- International Joint Research Center of Minimally Invasive Endoscopic Technology Equipment & Standardization, Xiangya Hospital, Central South University, Changsha, Hunan Province, People’s Republic of China
- Hunan Key Laboratory of Precise Diagnosis and Treatment of Gastrointestinal Tumor, Xiangya Hospital Central South University, Changsha, Hunan Province, People’s Republic of China
| |
Collapse
|
7
|
Predict multicategory causes of death in lung cancer patients using clinicopathologic factors. Comput Biol Med 2020; 129:104161. [PMID: 33307409 DOI: 10.1016/j.compbiomed.2020.104161] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 11/25/2020] [Accepted: 11/29/2020] [Indexed: 12/23/2022]
Abstract
BACKGROUND Random forests (RF) is a widely used machine-learning algorithm, and outperforms many other machine learning algorithms in prediction-accuracy. But it is rarely used for predicting causes of death (COD) in cancer patients. On the other hand, multicategory COD are difficult to classify in lung cancer patients, largely because they have multiple labels (versus binary labels). METHODS We tuned RF algorithms to classify 5-category COD among the lung cancer patients in the surveillance, epidemiology and end results-18, whose lung cancers were diagnosed in 2004, for the completeness in their follow-up. The patients were randomly divided into training and validation sets (1:1 and 4:1 sample-splits). We compared the prediction accuracy of the tuned RF and multinomial logistic regression (MLR) models. RESULTS We included 42,257 qualified lung cancers in the database. The COD were lung cancer (72.41%), other causes or alive (14.43%), non-lung cancer (6.85%), cardiovascular disease (5.35%), and infection (0.96%). The tuned RF model with 300 iterations and 10 variables outperformed the MLR model (accuracy = 69.8% vs 64.6%, 1:1 sample-split), while 4:1 sample-split produced lower prediction-accuracy than 1:1 sample-split. The top-10 important factors in the RF model were sex, chemotherapy status, age (65+ vs < 65 years), radiotherapy status, nodal status, T category, histology type and laterality, all of which except T category and laterality were also important in MLR model. CONCLUSION We tuned RF models to predict 5-category CODs in lung cancer patients, and show RF outperforms MLR in prediction accuracy. We also identified the factors associated with these COD.
Collapse
|