1
|
Zaccaria GM, Altini N, Mezzolla G, Vegliante MC, Stranieri M, Pappagallo SA, Ciavarella S, Guarini A, Bevilacqua V. SurvIAE: Survival prediction with Interpretable Autoencoders from Diffuse Large B-Cells Lymphoma gene expression data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107966. [PMID: 38091844 DOI: 10.1016/j.cmpb.2023.107966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/24/2023] [Accepted: 12/01/2023] [Indexed: 01/26/2024]
Abstract
BACKGROUND In Diffuse Large B-Cell Lymphoma (DLBCL), several methodologies are emerging to derive novel biomarkers to be incorporated in the risk assessment. We realized a pipeline that relies on autoencoders (AE) and Explainable Artificial Intelligence (XAI) to stratify prognosis and derive a gene-based signature. METHODS AE was exploited to learn an unsupervised representation of the gene expression (GE) from three publicly available datasets, each with its own technology. Multi-layer perceptron (MLP) was used to classify prognosis from latent representation. GE data were preprocessed as normalized, scaled, and standardized. Four different AE architectures (Large, Medium, Small and Extra Small) were compared to find the most suitable for GE data. The joint AE-MLP classified patients on six different outcomes: overall survival at 12, 36, 60 months and progression-free survival (PFS) at 12, 36, 60 months. XAI techniques were used to derive a gene-based signature aimed at refining the Revised International Prognostic Index (R-IPI) risk, which was validated in a fourth independent publicly available dataset. We named our tool SurvIAE: Survival prediction with Interpretable AE. RESULTS From the latent space of AEs, we observed that scaled and standardized data reduced the batch effect. SurvIAE models outperformed R-IPI with Matthews Correlation Coefficient up to 0.42 vs. 0.18 for the validation-set (PFS36) and to 0.30 vs. 0.19 for the test-set (PFS60). We selected the SurvIAE-Small-PFS36 as the best model and, from its gene signature, we stratified patients in three risk groups: R-IPI Poor patients with High levels of GAB1, R-IPI Poor patients with Low levels of GAB1 or R-IPI Good/Very Good patients with Low levels of GPR132, and R-IPI Good/Very Good patients with High levels of GPR132. CONCLUSIONS SurvIAE showed the potential to derive a gene signature with translational purpose in DLBCL. The pipeline was made publicly available and can be reused for other pathologies.
Collapse
Affiliation(s)
- Gian Maria Zaccaria
- Department of Electrical and Information Engineering (DEI), Polytechnic University of Bari, Via Edoardo Orabona, 4, Bari 70126, Italy
| | - Nicola Altini
- Department of Electrical and Information Engineering (DEI), Polytechnic University of Bari, Via Edoardo Orabona, 4, Bari 70126, Italy.
| | - Giuseppe Mezzolla
- Department of Electrical and Information Engineering (DEI), Polytechnic University of Bari, Via Edoardo Orabona, 4, Bari 70126, Italy
| | - Maria Carmela Vegliante
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori "Giovanni Paolo II", Via O. Flacco, 65, Bari 70124, Italy
| | - Marianna Stranieri
- Department of Electrical and Information Engineering (DEI), Polytechnic University of Bari, Via Edoardo Orabona, 4, Bari 70126, Italy
| | - Susanna Anita Pappagallo
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori "Giovanni Paolo II", Via O. Flacco, 65, Bari 70124, Italy
| | - Sabino Ciavarella
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori "Giovanni Paolo II", Via O. Flacco, 65, Bari 70124, Italy
| | - Attilio Guarini
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori "Giovanni Paolo II", Via O. Flacco, 65, Bari 70124, Italy
| | - Vitoantonio Bevilacqua
- Department of Electrical and Information Engineering (DEI), Polytechnic University of Bari, Via Edoardo Orabona, 4, Bari 70126, Italy; Apulian Bioengineering srl, Via delle Violette, 14, Modugno 70026, Italy
| |
Collapse
|
2
|
Qiao X, Gu X, Liu Y, Shu X, Ai G, Qian S, Liu L, He X, Zhang J. MRI Radiomics-Based Machine Learning Models for Ki67 Expression and Gleason Grade Group Prediction in Prostate Cancer. Cancers (Basel) 2023; 15:4536. [PMID: 37760505 PMCID: PMC10526397 DOI: 10.3390/cancers15184536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 09/02/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023] Open
Abstract
PURPOSE The Ki67 index and the Gleason grade group (GGG) are vital prognostic indicators of prostate cancer (PCa). This study investigated the value of biparametric magnetic resonance imaging (bpMRI) radiomics feature-based machine learning (ML) models in predicting the Ki67 index and GGG of PCa. METHODS A total of 122 patients with pathologically proven PCa who had undergone preoperative MRI were retrospectively included. Radiomics features were extracted from T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI), and apparent diffusion coefficient (ADC) maps. Then, recursive feature elimination (RFE) was applied to remove redundant features. ML models for predicting Ki67 expression and GGG were constructed based on bpMRI and different algorithms, including logistic regression (LR), support vector machine (SVM), random forest (RF), and K-nearest neighbor (KNN). The performances of different models were evaluated with receiver operating characteristic (ROC) analysis. In addition, a joint analysis of Ki67 expression and GGG was performed by assessing their Spearman correlation and calculating the diagnostic accuracy for both indices. RESULTS The ML model based on LR and ADC + T2 (LR_ADC + T2, AUC = 0.8882) performed best in predicting Ki67 expression, and ADC_wavelet-LHH_firstorder_Maximum had the highest feature weighting. The SVM_DWI + T2 (AUC = 0.9248) performed best in predicting GGG, and DWI_wavelet HLL_glcm_SumAverage had the highest feature weighting. The Ki67 and GGG exhibited a weak positive correlation (r = 0.382, p < 0.001), and LR_ADC + DWI had the highest diagnostic accuracy in predicting both (0.6230). CONCLUSION The proposed ML models are suitable for predicting both Ki67 expression and GGG in PCa. This algorithm could be used to identify indolent or invasive PCa with a noninvasive, repeatable, and accurate diagnostic method.
Collapse
Affiliation(s)
- Xiaofeng Qiao
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing 400010, China; (X.Q.); (X.G.); (Y.L.); (X.S.); (G.A.)
| | - Xiling Gu
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing 400010, China; (X.Q.); (X.G.); (Y.L.); (X.S.); (G.A.)
| | - Yunfan Liu
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing 400010, China; (X.Q.); (X.G.); (Y.L.); (X.S.); (G.A.)
| | - Xin Shu
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing 400010, China; (X.Q.); (X.G.); (Y.L.); (X.S.); (G.A.)
| | - Guangyong Ai
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing 400010, China; (X.Q.); (X.G.); (Y.L.); (X.S.); (G.A.)
| | - Shuang Qian
- Big Data and Software Engineering College, Chongqing University, Chongqing 400000, China; (S.Q.); (L.L.)
| | - Li Liu
- Big Data and Software Engineering College, Chongqing University, Chongqing 400000, China; (S.Q.); (L.L.)
| | - Xiaojing He
- Department of Radiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing 400010, China; (X.Q.); (X.G.); (Y.L.); (X.S.); (G.A.)
| | - Jingjing Zhang
- Departments of Diagnostic Radiology, National University of Singapore, Singapore 119074, Singapore
- Clinical Imaging Research Centre, Centre for Translational Medicine, National University of Singapore, Singapore 117599, Singapore
| |
Collapse
|
3
|
Hill HA, Jain P, Ok CY, Sasaki K, Chen H, Wang ML, Chen K. Integrative Prognostic Machine Learning Models in Mantle Cell Lymphoma. CANCER RESEARCH COMMUNICATIONS 2023; 3:1435-1446. [PMID: 37538987 PMCID: PMC10395375 DOI: 10.1158/2767-9764.crc-23-0083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 04/17/2023] [Accepted: 06/27/2023] [Indexed: 08/05/2023]
Abstract
Patients with mantle cell lymphoma (MCL), an incurable B-cell malignancy, benefit from accurate pretreatment disease stratification. We curated an extensive database of 862 patients diagnosed between 2014 and 2022. A machine learning (ML) gradient-boosted model incorporated baseline features from clinicopathologic, cytogenetic, and genomic data with high predictive power discriminating between patients with indolent or responsive MCL and those with aggressive disease (AUC ROC = 0.83). In addition, we utilized the gradient-boosted framework as a robust feature selection method for multivariate logistic and survival modeling. The best ML models incorporated features from clinical and genomic data types highlighting the need for correlative molecular studies in precision oncology. As proof of concept, we launched our most accurate and practical models using an application interface, which has potential for clinical implementation. We designated the 20-feature ML model-based index the "integrative MIPI" or iMIPI and a similar 10-feature ML index the "integrative simplified MIPI" or iMIPI-s. The top 10 baseline prognostic features represented in the iMIPI-s are: lactase dehydrogenase (LDH), Ki-67%, platelet count, bone marrow involvement percentage, hemoglobin levels, the total number of observed somatic mutations, TP53 mutational status, Eastern Cooperative Oncology Group performance level, beta-2 microglobulin, and morphology. Our findings emphasize that prognostic applications and indices should include molecular features, especially TP53 mutational status. This work demonstrates the clinical utility of complex ML models and provides further evidence for existing prognostic markers in MCL. Significance Our model is the first to integrate a dynamic algorithm with multiple clinical and molecular features, allowing for accurate predictions of MCL disease outcomes in a large patient cohort.
Collapse
Affiliation(s)
- Holly A. Hill
- Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, Texas
- Department of Lymphoma and Myeloma, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
- Department of Epidemiology, Human Genetics and Environmental Sciences, The University of Texas Health Science Center at Houston School of Public Health, Houston, Texas
| | - Preetesh Jain
- Department of Lymphoma and Myeloma, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Chi Young Ok
- Department of Hematopathology, Division of Pathology-Lab Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Koji Sasaki
- Department of Leukemia, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Han Chen
- Department of Epidemiology, Human Genetics and Environmental Sciences, The University of Texas Health Science Center at Houston School of Public Health, Houston, Texas
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Michael L. Wang
- Department of Lymphoma and Myeloma, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, Texas
| |
Collapse
|
4
|
Prognostic Stratification of Diffuse Large B-cell Lymphoma Using Clinico-genomic Models: Validation and Improvement of the LymForest-25 Model. Hemasphere 2022; 6:e706. [PMID: 35392483 PMCID: PMC8984321 DOI: 10.1097/hs9.0000000000000706] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 03/03/2022] [Indexed: 11/25/2022] Open
|
5
|
Liu X, Lei S, Wei Q, Wang Y, Liang H, Chen L. Machine Learning-based Correlation Study between Perioperative Immunonutritional Index and Postoperative Anastomotic Leakage in Patients with Gastric Cancer. Int J Med Sci 2022; 19:1173-1183. [PMID: 35919820 PMCID: PMC9339417 DOI: 10.7150/ijms.72195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Accepted: 06/18/2022] [Indexed: 11/23/2022] Open
Abstract
Backgrounds: The immunonutritional index showed great potential for predicting postoperative complications in various malignant diseases, while risk assessment based on machine learning (ML) methods is becoming popular in clinical practice. Early detection and prevention for postoperative anastomotic leakage (AL) play an important role in prognosis improvement among patients with gastric cancer (GC). Methods: This retrospective study included 297 patients with gastric cancer receiving gastrectomy between 2018 and 2021 in general surgery department of Xinhua Hospital. Perioperative clinical variables were collected to evaluate the predictive value for postoperative AL with 5 ML models. Then, AUROC was applied to identify the optimal perioperative clinical index and ML model for predicting postoperative AL. Results: The incidence of postoperative AL was 6.1% (n=18). After the training of 5 ML classification models, we found that immunonutritional index had significantly better classification ability than inflammatory or nutritional index alone separately (AUROC=0.87 vs. 0.83, P=0.01; AUROC=0.87 vs. 0.68, P<0.01). Next, we found that support vector machine (SVM), one of the ML methods, with selected immunonutritional index showed significantly greater classification ability than optimal univariant parameter [CRP on postoperative day 4 (AUROC=0.89 vs.0.86, P=0.02)]. Also, statistical analysis revealed multiple variables with significant relevance to postoperative AL, including serum CRP and albumin on postoperative day 4, NLR and SII etc. Conclusion: This study showed that perioperative immunonutritional index could act as an indicator for postoperative AL. Also, ML methods could significantly enhance the classification ability, and therefore, could be applied as a powerful tool for postoperative risk assessment for patients with GC.
Collapse
Affiliation(s)
- Xuanyu Liu
- Department of General Surgery, Xinhua Hospital, Affiliated to Shanghai Jiao Tong University School of Medicine, No. 1665 Kongjiang Road, Shanghai 200092, China
| | - Su Lei
- Department of General Surgery, Xinhua Hospital, Affiliated to Shanghai Jiao Tong University School of Medicine, No. 1665 Kongjiang Road, Shanghai 200092, China
| | - Qi Wei
- Department of General Surgery, Xinhua Hospital, Affiliated to Shanghai Jiao Tong University School of Medicine, No. 1665 Kongjiang Road, Shanghai 200092, China
| | - Yizhou Wang
- Department of General Surgery, Xinhua Hospital, Affiliated to Shanghai Jiao Tong University School of Medicine, No. 1665 Kongjiang Road, Shanghai 200092, China
| | - Haibin Liang
- Department of General Surgery, Xinhua Hospital, Affiliated to Shanghai Jiao Tong University School of Medicine, No. 1665 Kongjiang Road, Shanghai 200092, China
| | - Lei Chen
- Department of General Surgery, Xinhua Hospital, Affiliated to Shanghai Jiao Tong University School of Medicine, No. 1665 Kongjiang Road, Shanghai 200092, China
| |
Collapse
|