1
|
Zhu B, Dai L, Wang H, Zhang K, Zhang C, Wang Y, Yin F, Li J, Ning E, Wang Q, Yang L, Yang H, Li R, Li J, Hu C, Wu H, Jiang H, Bai Y. Machine learning discrimination of Gleason scores below GG3 and above GG4 for HSPC patients diagnosis. Sci Rep 2024; 14:25641. [PMID: 39465343 PMCID: PMC11514210 DOI: 10.1038/s41598-024-77033-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 10/18/2024] [Indexed: 10/29/2024] Open
Abstract
This study aims to develop machine learning (ML)-assisted models for analyzing datasets related to Gleason scores in prostate cancer, conducting statistical analyses on the datasets, and identifying meaningful features. We retrospectively collected data from 717 hormone-sensitive prostate cancer (HSPC) patients at Yunnan Cancer Hospital. Of these, data from 526 patients were used for modeling. Seven auxiliary models were established using Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Extreme gradient boosting tree (XGBoost), Adaptive Boosting (Adaboost), and artificial neural network (ANN) based on 21 clinical biochemical indicators and features. Evaluation metrics included accuracy (ACC), precision (PRE), specificity (SPE), sensitivity (SEN) or regression rate(Recall), and f1 score. Evaluation metrics for the models primarily included ACC, PRE, SPE, SEN or Recall, f1 score, and area under the curve(AUC). Evaluation metrics were visualized using confusion matrices and ROC curves. Among the ensemble learning methods, RF, XGBoost, and Adaboost performed the best. RF achieved a training dataset score of 0.769 (95% CI: 0.759-0.835) and a testing dataset score of 0.755 (95% CI: 0.660-0.760) (AUC: 0.786, 95%CI: 0.722-0.803), while XGBoost achieved a training dataset score of 0.755 (95% CI: 95%CI: 0.711-0.809) and a testing dataset score of 0.745 (95% CI: 0.660-0.764) (AUC: 0.777, 95% CI: 0.726-0.798). Adaboost scored 0.789 on the training dataset (95% CI: 0.782-0.857) and 0.774 on the testing dataset (95% CI: 0.651-0.774) (AUC: 0.799, 95% CI: 0.703-0.802). In terms of feature importance (FI) in ensemble learning, Bone metastases at first visit, prostatic volume, age, and T1-T2 have significant proportions in RF's FI. fPSA, TPSA, and tumor burden have significant proportions in Adaboost's FI, while f/TPSA, LDH, and testosterone have the highest proportions in XGBoost. Our findings indicate that ensemble learning methods demonstrate good performance in classifying HSPC patient data, with TNM staging and fPSA being important classification indicators. These discoveries provide valuable references for distinguishing different Gleason scores, facilitating more accurate patient assessments and personalized treatment plans.
Collapse
Affiliation(s)
- Bingyu Zhu
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Longguo Dai
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Huijian Wang
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Kun Zhang
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Chongjian Zhang
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Yang Wang
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Feiyu Yin
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Ji Li
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Enfa Ning
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Qilin Wang
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Libo Yang
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Hong Yang
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Ruiqian Li
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Jun Li
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Chen Hu
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Hongyi Wu
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China
| | - Haiyang Jiang
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China.
| | - Yu Bai
- Department of Urology I, The Third Affiliated Hospital of Kunming Medical University (Peking University Cancer Hospital Yunnan, Yunnan Cancer Hospital, Cancer Center of Yunnan Province), 519 Kunzhou Road, Kunming, 650199, Yunnan, China.
| |
Collapse
|
10
|
Fine SW, Gopalan A, Leversha MA, Al-Ahmadie HA, Tickoo SK, Zhou Q, Satagopan JM, Scardino PT, Gerald WL, Reuter VE. TMPRSS2-ERG gene fusion is associated with low Gleason scores and not with high-grade morphological features. Mod Pathol 2010; 23:1325-33. [PMID: 20562851 PMCID: PMC3413944 DOI: 10.1038/modpathol.2010.120] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
TMPRSS2-ERG gene rearrangement is seen in about half of clinically localized prostate cancers, yet controversy exists with regard to its prognostic implications. Similarly, the relationship of TMPRSS2-ERG fusion to Gleason score and morphology remains uncertain. We assigned Gleason scores and recorded morphological features for 521 clinically localized prostate cancers sampled in triplicate and arrayed in eight tissue microarray blocks. Fluorescence in situ hybridization was performed to delineate TMPRSS2-ERG aberrations. Using maximum Gleason score, based on three core evaluation, and overall Gleason score, based on prostatectomy sections, Fisher's exact test was performed for tumors with TMPRSS2-ERG translocation/deletion, copy number increase (≥ 3) of the TMPRSS2-ERG region without translocation/deletion, and copy number increase and concomitant translocation/deletion. In all, 217 (42%) translocation/deletion and 30 (5.9%) copy number increase-alone cases were detected. Among 217 translocation/deletion cases, 32 had translocation/deletion with copy number increase. In all, 237, 200, and 75 cancers had maximum core-specific Gleason score of 6, 7, and 8-10, respectively. Tumors with translocation/deletion tended toward lower Gleason scores than those without (P=0.002) with similar results for overall Gleason score (P=0.02); copy number increase cases tended toward higher Gleason scores than those without (P<0.001). Gleason score of 8-10 tumors demonstrated lower odds of translocation/deletion (odds ratio (OR) 0.38; 95% CI 0.21-0.68) and higher odds of copy number increase alone (OR 7.33; 95% CI 2.65-20.31) or copy number increase+translocation/deletion (OR 3.03; 95% CI 1.12-8.15) relative to Gleason score of <7 tumors. No significant difference in TMPRSS2-ERG incidence was observed between patients with and without cribriform glands, glomerulations, signet-ring cells, or intraductal cancer (P=0.821, 0.095, 0.132, 0.375). TMPRSS2-ERG gene fusion is associated with lower core-specific and overall Gleason scores and not with high-grade morphologies. Conversely, TMPRSS2-ERG copy number increase, with or without rearrangement, is associated with higher Gleason score. These findings indicate that translocation/deletion of TMPRSS2-ERG is not associated with histological features of aggressive prostate cancer.
Collapse
Affiliation(s)
- Samson W. Fine
- Departments of Pathology, Memorial Sloan-Kettering Cancer Center, New York, NY
| | - Anuradha Gopalan
- Departments of Pathology, Memorial Sloan-Kettering Cancer Center, New York, NY
| | - Margaret A. Leversha
- Departments of Molecular Cytogenetics, Memorial Sloan-Kettering Cancer Center, New York, NY
| | | | - Satish K. Tickoo
- Departments of Pathology, Memorial Sloan-Kettering Cancer Center, New York, NY
| | - Qin Zhou
- Departments of Epidemiology & Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY
| | - Jaya M. Satagopan
- Departments of Epidemiology & Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY
| | - Peter T. Scardino
- Departments of Surgery, Memorial Sloan-Kettering Cancer Center, New York, NY
| | - William L. Gerald
- Departments of Pathology, Memorial Sloan-Kettering Cancer Center, New York, NY
| | - Victor E. Reuter
- Departments of Pathology, Memorial Sloan-Kettering Cancer Center, New York, NY
| |
Collapse
|
11
|
Beyene J, Atenafu EG, Hamid JS, To T, Sung L. Determining relative importance of variables in developing and validating predictive models. BMC Med Res Methodol 2009; 9:64. [PMID: 19751506 PMCID: PMC2761416 DOI: 10.1186/1471-2288-9-64] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Accepted: 09/14/2009] [Indexed: 12/11/2022] Open
Abstract
Background Multiple regression models are used in a wide range of scientific disciplines and automated model selection procedures are frequently used to identify independent predictors. However, determination of relative importance of potential predictors and validating the fitted models for their stability, predictive accuracy and generalizability are often overlooked or not done thoroughly. Methods Using a case study aimed at predicting children with acute lymphoblastic leukemia (ALL) who are at low risk of Tumor Lysis Syndrome (TLS), we propose and compare two strategies, bootstrapping and random split of data, for ordering potential predictors according to their relative importance with respect to model stability and generalizability. We also propose an approach based on relative increase in percentage of explained variation and area under the Receiver Operating Characteristic (ROC) curve for developing models where variables from our ordered list enter the model according to their importance. An additional data set aimed at identifying predictors of prostate cancer penetration is also used for illustrative purposes. Results Age is chosen to be the most important predictor of TLS. It is selected 100% of the time using the bootstrapping approach. Using the random split method, it is selected 99% of the time in the training data and is significant (at 5% level) 98% of the time in the validation data set. This indicates that age is a stable predictor of TLS with good generalizability. The second most important variable is white blood cell count (WBC). Our methods also identified an important predictor of TLS that was otherwise omitted if relying on any of the automated model selection procedures alone. A group at low risk of TLS consists of children younger than 10 years of age, without T-cell immunophenotype, whose baseline WBC is < 20 × 109/L and palpable spleen is < 2 cm. For the prostate cancer data set, the Gleason score and digital rectal exam are identified to be the most important indicators of whether tumor has penetrated the prostate capsule. Conclusion Our model selection procedures based on bootstrap re-sampling and repeated random split techniques can be used to assess the strength of evidence that a variable is truly an independent and reproducible predictor. Our methods, therefore, can be used for developing stable and reproducible models with good performances. Moreover, our methods can serve as a good tool for validating a predictive model. Previous biological and clinical studies support the findings based on our selection and validation strategies. However, extensive simulations may be required to assess the performance of our methods under different scenarios as well as check their sensitivity to a random fluctuation in the data.
Collapse
Affiliation(s)
- Joseph Beyene
- Child Heath Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada.
| | | | | | | | | |
Collapse
|