1
|
Fanizzi A, Comes MC, Bove S, Cavalera E, de Franco P, Di Rito A, Errico A, Lioce M, Pati F, Portaluri M, Saponaro C, Scognamillo G, Troiano I, Troiano M, Zito FA, Massafra R. Explainable prediction model for the human papillomavirus status in patients with oropharyngeal squamous cell carcinoma using CNN on CT images. Sci Rep 2024; 14:14276. [PMID: 38902523 PMCID: PMC11189928 DOI: 10.1038/s41598-024-65240-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 06/18/2024] [Indexed: 06/22/2024] Open
Abstract
Several studies have emphasised how positive and negative human papillomavirus (HPV+ and HPV-, respectively) oropharyngeal squamous cell carcinoma (OPSCC) has distinct molecular profiles, tumor characteristics, and disease outcomes. Different radiomics-based prediction models have been proposed, by also using innovative techniques such as Convolutional Neural Networks (CNNs). Although some of these models reached encouraging predictive performances, there evidence explaining the role of radiomic features in achieving a specific outcome is scarce. In this paper, we propose some preliminary results related to an explainable CNN-based model to predict HPV status in OPSCC patients. We extracted the Gross Tumor Volume (GTV) of pre-treatment CT images related to 499 patients (356 HPV+ and 143 HPV-) included into the OPC-Radiomics public dataset to train an end-to-end Inception-V3 CNN architecture. We also collected a multicentric dataset consisting of 92 patients (43 HPV+ , 49 HPV-), which was employed as an independent test set. Finally, we applied Gradient-weighted Class Activation Mapping (Grad-CAM) technique to highlight the most informative areas with respect to the predicted outcome. The proposed model reached an AUC value of 73.50% on the independent test. As a result of the Grad-CAM algorithm, the most informative areas related to the correctly classified HPV+ patients were located into the intratumoral area. Conversely, the most important areas referred to the tumor edges. Finally, since the proposed model provided additional information with respect to the accuracy of the classification given by the visualization of the areas of greatest interest for predictive purposes for each case examined, it could contribute to increase confidence in using computer-based predictive models in the actual clinical practice.
Collapse
Affiliation(s)
- Annarita Fanizzi
- Laboratorio Biostatistica e Bioinformatica, I.R.C.C.S. Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Maria Colomba Comes
- Laboratorio Biostatistica e Bioinformatica, I.R.C.C.S. Istituto Tumori 'Giovanni Paolo II', Bari, Italy.
| | - Samantha Bove
- Laboratorio Biostatistica e Bioinformatica, I.R.C.C.S. Istituto Tumori 'Giovanni Paolo II', Bari, Italy.
| | - Elisa Cavalera
- Radiation Oncology Unit, Dipartimento di Oncoematologia, Ospedale Vito Fazzi, Lecce, Italy
| | - Paola de Franco
- Radiation Oncology Unit, Dipartimento di Oncoematologia, Ospedale Vito Fazzi, Lecce, Italy
| | | | - Angelo Errico
- Ospedale Monsignor Raffaele Dimiccoli, Barletta, Italy
| | - Marco Lioce
- Unità Operativa Complessa di Radioterpia, I.R.C.C.S. Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | | | | | - Concetta Saponaro
- Unità Operativa Complessi di Anatomia Patologia, I.R.C.C.S. Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Giovanni Scognamillo
- Unità Operativa Complessa di Radioterpia, I.R.C.C.S. Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Ippolito Troiano
- Radiation Oncology Department, Fondazione IRCCS "Casa Sollievo della Sofferenza", San Giovanni Rotondo, Italy
| | - Michele Troiano
- Radiation Oncology Department, Fondazione IRCCS "Casa Sollievo della Sofferenza", San Giovanni Rotondo, Italy
| | - Francesco Alfredo Zito
- Unità Operativa Complessi di Anatomia Patologia, I.R.C.C.S. Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Raffaella Massafra
- Laboratorio Biostatistica e Bioinformatica, I.R.C.C.S. Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| |
Collapse
|
2
|
Huang L, Zhou K, Chen S, Chen Y, Zhang J. Automatic detection of epilepsy from EEGs using a temporal convolutional network with a self-attention layer. Biomed Eng Online 2024; 23:50. [PMID: 38824547 PMCID: PMC11143608 DOI: 10.1186/s12938-024-01244-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 05/08/2024] [Indexed: 06/03/2024] Open
Abstract
BACKGROUND Over 60% of epilepsy patients globally are children, whose early diagnosis and treatment are critical for their development and can substantially reduce the disease's burden on both families and society. Numerous algorithms for automated epilepsy detection from EEGs have been proposed. Yet, the occurrence of epileptic seizures during an EEG exam cannot always be guaranteed in clinical practice. Models that exclusively use seizure EEGs for detection risk artificially enhanced performance metrics. Therefore, there is a pressing need for a universally applicable model that can perform automatic epilepsy detection in a variety of complex real-world scenarios. METHOD To address this problem, we have devised a novel technique employing a temporal convolutional neural network with self-attention (TCN-SA). Our model comprises two primary components: a TCN for extracting time-variant features from EEG signals, followed by a self-attention (SA) layer that assigns importance to these features. By focusing on key features, our model achieves heightened classification accuracy for epilepsy detection. RESULTS The efficacy of our model was validated on a pediatric epilepsy dataset we collected and on the Bonn dataset, attaining accuracies of 95.50% on our dataset, and 97.37% (A v. E), and 93.50% (B vs E), respectively. When compared with other deep learning architectures (temporal convolutional neural network, self-attention network, and standardized convolutional neural network) using the same datasets, our TCN-SA model demonstrated superior performance in the automated detection of epilepsy. CONCLUSION The proven effectiveness of the TCN-SA approach substantiates its potential as a valuable tool for the automated detection of epilepsy, offering significant benefits in diverse and complex real-world clinical settings.
Collapse
Affiliation(s)
- Leen Huang
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, 510080, Guangdong, China
| | - Keying Zhou
- Department of Pediatrics, Shenzhen People's Hospital, Shenzhen, 518020, Guangdong, China
- Department of Pediatrics, Second Clinical Medical College of Jinan University, Shenzhen, 518020, Guangdong, China
- Department of Pediatrics, First Affiliated Hospital of Southern University of Science and Technology, Shenzhen, 518020, Guangdong, China
| | - Siyang Chen
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, 510080, Guangdong, China
| | - Yanzhao Chen
- Department of Pediatrics, Shenzhen People's Hospital, Shenzhen, 518020, Guangdong, China
- Department of Pediatrics, Second Clinical Medical College of Jinan University, Shenzhen, 518020, Guangdong, China
- Department of Pediatrics, First Affiliated Hospital of Southern University of Science and Technology, Shenzhen, 518020, Guangdong, China
| | - Jinxin Zhang
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, 510080, Guangdong, China.
| |
Collapse
|
3
|
Kocak B, Keles A, Akinci D'Antonoli T. Self-reporting with checklists in artificial intelligence research on medical imaging: a systematic review based on citations of CLAIM. Eur Radiol 2024; 34:2805-2815. [PMID: 37740080 DOI: 10.1007/s00330-023-10243-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 08/09/2023] [Accepted: 08/20/2023] [Indexed: 09/24/2023]
Abstract
OBJECTIVE To evaluate the usage of a well-known and widely adopted checklist, Checklist for Artificial Intelligence in Medical imaging (CLAIM), for self-reporting through a systematic analysis of its citations. METHODS Google Scholar, Web of Science, and Scopus were used to search for citations (date, 29 April 2023). CLAIM's use for self-reporting with proof (i.e., filled-out checklist) and other potential use cases were systematically assessed in research papers. Eligible papers were evaluated independently by two readers, with the help of automatic annotation. Item-by-item confirmation analysis on papers with checklist proof was subsequently performed. RESULTS A total of 391 unique citations were identified from three databases. Of the 118 papers included in this study, 12 (10%) provided a proof of self-reported CLAIM checklist. More than half (70; 59%) only mentioned some sort of adherence to CLAIM without providing any proof in the form of a checklist. Approximately one-third (36; 31%) cited the CLAIM for reasons unrelated to their reporting or methodological adherence. Overall, the claims on 57 to 93% of the items per publication were confirmed in the item-by-item analysis, with a mean and standard deviation of 81% and 10%, respectively. CONCLUSION Only a small proportion of the publications used CLAIM as checklist and supplied filled-out documentation; however, the self-reported checklists may contain errors and should be approached cautiously. We hope that this systematic citation analysis would motivate artificial intelligence community about the importance of proper self-reporting, and encourage researchers, journals, editors, and reviewers to take action to ensure the proper usage of checklists. CLINICAL RELEVANCE STATEMENT Only a small percentage of the publications used CLAIM for self-reporting with proof (i.e., filled-out checklist). However, the filled-out checklist proofs may contain errors, e.g., false claims of adherence, and should be approached cautiously. These may indicate inappropriate usage of checklists and necessitate further action by authorities. KEY POINTS • Of 118 eligible papers, only 12 (10%) followed the CLAIM checklist for self-reporting with proof (i.e., filled-out checklist). More than half (70; 59%) only mentioned some kind of adherence without providing any proof. • Overall, claims on 57 to 93% of the items were valid in item-by-item confirmation analysis, with a mean and standard deviation of 81% and 10%, respectively. • Even with the checklist proof, the items declared may contain errors and should be approached cautiously.
Collapse
Affiliation(s)
- Burak Kocak
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey.
| | - Ali Keles
- Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey
| | - Tugba Akinci D'Antonoli
- Institute of Radiology and Nuclear Medicine, Cantonal Hospital Baselland, Liestal, Switzerland
| |
Collapse
|
4
|
Saikia MJ, Kuanar S, Mahapatra D, Faghani S. Multi-Modal Ensemble Deep Learning in Head and Neck Cancer HPV Sub-Typing. Bioengineering (Basel) 2023; 11:13. [PMID: 38247890 PMCID: PMC11154466 DOI: 10.3390/bioengineering11010013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 12/14/2023] [Accepted: 12/21/2023] [Indexed: 01/23/2024] Open
Abstract
Oropharyngeal Squamous Cell Carcinoma (OPSCC) is one of the common forms of heterogeneity in head and neck cancer. Infection with human papillomavirus (HPV) has been identified as a major risk factor for OPSCC. Therefore, differentiating the HPV-positive and negative cases in OPSCC patients is an essential diagnostic factor influencing future treatment decisions. In this study, we investigated the accuracy of a deep learning-based method for image interpretation and automatically detected the HPV status of OPSCC in routinely acquired Computed Tomography (CT) and Positron Emission Tomography (PET) images. We introduce a 3D CNN-based multi-modal feature fusion architecture for HPV status prediction in primary tumor lesions. The architecture is composed of an ensemble of CNN networks and merges image features in a softmax classification layer. The pipeline separately learns the intensity, contrast variation, shape, texture heterogeneity, and metabolic assessment from CT and PET tumor volume regions and fuses those multi-modal features for final HPV status classification. The precision, recall, and AUC scores of the proposed method are computed, and the results are compared with other existing models. The experimental results demonstrate that the multi-modal ensemble model with soft voting outperformed single-modality PET/CT, with an AUC of 0.76 and F1 score of 0.746 on publicly available TCGA and MAASTRO datasets. In the MAASTRO dataset, our model achieved an AUC score of 0.74 over primary tumor volumes of interest (VOIs). In the future, more extensive cohort validation may suffice for better diagnostic accuracy and provide preliminary assessment before the biopsy.
Collapse
Affiliation(s)
- Manob Jyoti Saikia
- Electrical Engineering, University of North Florida, Jacksonville, FL 32224, USA
| | - Shiba Kuanar
- Department of Radiology, Mayo Clinic, Rochester, MN 55905, USA; (S.K.); (S.F.)
| | - Dwarikanath Mahapatra
- Inception Institute of Artificial Intelligence, Abu Dhabi 127788, United Arab Emirates;
| | - Shahriar Faghani
- Department of Radiology, Mayo Clinic, Rochester, MN 55905, USA; (S.K.); (S.F.)
| |
Collapse
|
5
|
Yao H, Tian L, Liu X, Li S, Chen Y, Cao J, Zhang Z, Chen Z, Feng Z, Xu Q, Zhu J, Wang Y, Guo Y, Chen W, Li C, Li P, Wang H, Luo J. Development and external validation of the multichannel deep learning model based on unenhanced CT for differentiating fat-poor angiomyolipoma from renal cell carcinoma: a two-center retrospective study. J Cancer Res Clin Oncol 2023; 149:15827-15838. [PMID: 37672075 PMCID: PMC10620299 DOI: 10.1007/s00432-023-05339-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 08/24/2023] [Indexed: 09/07/2023]
Abstract
PURPOSE There are undetectable levels of fat in fat-poor angiomyolipoma. Thus, it is often misdiagnosed as renal cell carcinoma. We aimed to develop and evaluate a multichannel deep learning model for differentiating fat-poor angiomyolipoma (fp-AML) from renal cell carcinoma (RCC). METHODS This two-center retrospective study included 320 patients from the First Affiliated Hospital of Sun Yat-Sen University (FAHSYSU) and 132 patients from the Sun Yat-Sen University Cancer Center (SYSUCC). Data from patients at FAHSYSU were divided into a development dataset (n = 267) and a hold-out dataset (n = 53). The development dataset was used to obtain the optimal combination of CT modality and input channel. The hold-out dataset and SYSUCC dataset were used for independent internal and external validation, respectively. RESULTS In the development phase, models trained on unenhanced CT images performed significantly better than those trained on enhanced CT images based on the fivefold cross-validation. The best patient-level performance, with an average area under the receiver operating characteristic curve (AUC) of 0.951 ± 0.026 (mean ± SD), was achieved using the "unenhanced CT and 7-channel" model, which was finally selected as the optimal model. In the independent internal and external validation, AUCs of 0.966 (95% CI 0.919-1.000) and 0.898 (95% CI 0.824-0.972), respectively, were obtained using the optimal model. In addition, the performance of this model was better on large tumors (≥ 40 mm) in both internal and external validation. CONCLUSION The promising results suggest that our multichannel deep learning classifier based on unenhanced whole-tumor CT images is a highly useful tool for differentiating fp-AML from RCC.
Collapse
Affiliation(s)
- Haohua Yao
- Department of Urology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
- Department of Urology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Li Tian
- Department of Medical Imaging, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-Sen University Cancer Center, Guangzhou, China
| | - Xi Liu
- Department of Urology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Shurong Li
- Department of Radiology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Yuhang Chen
- Department of Urology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Jiazheng Cao
- Department of Urology, Jiangmen Central Hospital, Jiangmen, China
| | - Zhiling Zhang
- Department of Urology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-Sen University Cancer Center, Guangzhou, China
| | - Zhenhua Chen
- Department of Urology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Zihao Feng
- Department of Urology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Quanhui Xu
- Department of Urology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Jiangquan Zhu
- Department of Urology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Yinghan Wang
- Department of Urology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Yan Guo
- Department of Radiology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Wei Chen
- Department of Urology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China
| | - Caixia Li
- School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, China
| | - Peixing Li
- School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, China
| | - Huanjun Wang
- Department of Radiology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
| | - Junhang Luo
- Department of Urology, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, China.
| |
Collapse
|
6
|
Song C, Chen X, Tang C, Xue P, Jiang Y, Qiao Y. Artificial intelligence for HPV status prediction based on disease-specific images in head and neck cancer: A systematic review and meta-analysis. J Med Virol 2023; 95:e29080. [PMID: 37691329 DOI: 10.1002/jmv.29080] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 07/14/2023] [Accepted: 08/03/2023] [Indexed: 09/12/2023]
Abstract
Accurate early detection of the human papillomavirus (HPV) status in head and neck cancer (HNC) is crucial to identify at-risk populations, stratify patients, personalized treatment options, and predict prognosis. Artificial intelligence (AI) is an emerging tool to dissect imaging features. This systematic review and meta-analysis aimed to evaluate the performance of AI to predict the HPV positivity through the HPV-associated diseased images in HNC patients. A systematic literature search was conducted in databases including Ovid-MEDLINE, Embase, and Web of Science Core Collection for studies continuously published from inception up to October 30, 2022. Search strategies included keywords such as "artificial intelligence," "head and neck cancer," "HPV," and "sensitivity & specificity." Duplicates, articles without HPV predictions, letters, scientific reports, conference abstracts, or reviews were excluded. Binary diagnostic data were then extracted to generate contingency tables and then used to calculate the pooled sensitivity (SE), specificity (SP), area under the curve (AUC), and their 95% confidence interval (CI). A random-effects model was used for meta-analysis, four subgroup analyses were further explored. Totally, 22 original studies were included in the systematic review, 15 of which were eligible to generate 33 contingency tables for meta-analysis. The pooled SE and SP for all studies were 79% (95% CI: 75-82%) and 74% (95% CI: 69-78%) respectively, with an AUC of 0.83 (95% CI: 0.79-0.86). When only selecting one contingency table with the highest accuracy from each study, our analysis revealed a pooled SE of 79% (95% CI: 75-83%), SP of 75% (95% CI: 69-79%), and an AUC of 0.84 (95% CI: 0.81-0.87). The respective heterogeneities were moderate (I2 for SE and SP were 51.70% and 51.01%) and only low (35.99% and 21.44%). This evidence-based study showed an acceptable and promising performance for AI algorithms to predict HPV status in HNC but was not comparable to the routine p16 immunohistochemistry. The exploitation and optimization of AI algorithms warrant further research. Compared with previous studies, future studies anticipate to make progress in the selection of databases, improvement of international reporting guidelines, and application of high-quality deep learning algorithms.
Collapse
Affiliation(s)
- Cheng Song
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xu Chen
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Chao Tang
- Shenzhen Maternity & Child Healthcare Hospital, Shenzhen, China
| | - Peng Xue
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yu Jiang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Youlin Qiao
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
7
|
Yao H, Zhang X. A comprehensive review for machine learning based human papillomavirus detection in forensic identification with multiple medical samples. Front Microbiol 2023; 14:1232295. [PMID: 37529327 PMCID: PMC10387549 DOI: 10.3389/fmicb.2023.1232295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 06/30/2023] [Indexed: 08/03/2023] Open
Abstract
Human papillomavirus (HPV) is a sexually transmitted virus. Cervical cancer is one of the highest incidences of cancer, almost all patients are accompanied by HPV infection. In addition, the occurrence of a variety of cancers is also associated with HPV infection. HPV vaccination has gained widespread popularity in recent years with the increase in public health awareness. In this context, HPV testing not only needs to be sensitive and specific but also needs to trace the source of HPV infection. Through machine learning and deep learning, information from medical examinations can be used more effectively. In this review, we discuss recent advances in HPV testing in combination with machine learning and deep learning.
Collapse
Affiliation(s)
- Huanchun Yao
- Department of Cancer, Shengjing Hospital of China Medical University, Shenyang, Liaoning, China
| | - Xinglong Zhang
- Department of Hematology, The Fourth Affiliated Hospital of China Medical University, Shenyang, Liaoning, China
| |
Collapse
|
8
|
Omobolaji Alabi R, Sjöblom A, Carpén T, Elmusrati M, Leivo I, Almangush A, Mäkitie AA. Application of artificial intelligence for overall survival risk stratification in oropharyngeal carcinoma: A validation of ProgTOOL. Int J Med Inform 2023; 175:105064. [PMID: 37094545 DOI: 10.1016/j.ijmedinf.2023.105064] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 03/31/2023] [Accepted: 04/03/2023] [Indexed: 04/26/2023]
Abstract
BACKGROUND In recent years, there has been a surge in machine learning-based models for diagnosis and prognostication of outcomes in oncology. However, there are concerns relating to the model's reproducibility and generalizability to a separate patient cohort (i.e., external validation). OBJECTIVES This study primarily provides a validation study for a recently introduced and publicly available machine learning (ML) web-based prognostic tool (ProgTOOL) for overall survival risk stratification of oropharyngeal squamous cell carcinoma (OPSCC). Additionally, we reviewed the published studies that have utilized ML for outcome prognostication in OPSCC to examine how many of these models were externally validated, type of external validation, characteristics of the external dataset, and diagnostic performance characteristics on the internal validation (IV) and external validation (EV) datasets were extracted and compared. METHODS We used a total of 163 OPSCC patients obtained from the Helsinki University Hospital to externally validate the ProgTOOL for generalizability. In addition, PubMed, OvidMedline, Scopus, and Web of Science databases were systematically searched according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. RESULTS The ProgTOOL produced a predictive performance of 86.5% balanced accuracy, Mathew's correlation coefficient of 0.78, Net Benefit (0.7) and Brier score (0.06) for overall survival stratification of OPSCC patients as either low-chance or high-chance. In addition, out of a total of 31 studies found to have used ML for the prognostication of outcomes in OPSCC, only seven (22.6%) reported a form of EV. Three studies (42.9%) each used either temporal EV or geographical EV while only one study (14.2%) used expert as a form of EV. Most of the studies reported a reduction in performance when externally validated. CONCLUSION The performance of the model in this validation study indicates that it may be generalized, therefore, bringing recommendations of the model for clinical evaluation closer to reality. However, the number of externally validated ML-based models for OPSCC is still relatively small. This significantly limits the transfer of these models for clinical evaluation and subsequently reduces the likelihood of the use of these models in daily clinical practice. As a gold standard, we recommend the use of geographical EV and validation studies to reveal biases and overfitting of these models. These recommendations are poised to facilitate the implementation of these models in clinical practice.
Collapse
Affiliation(s)
- Rasheed Omobolaji Alabi
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland; Department of Industrial Digitalization, School of Technology and Innovations, University of Vaasa, Vaasa, Finland.
| | - Anni Sjöblom
- Department of Pathology, University of Helsinki, Helsinki, Finland
| | - Timo Carpén
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland; Department of Pathology, University of Helsinki, Helsinki, Finland; Department of Otorhinolaryngology - Head and Neck Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Mohammed Elmusrati
- Department of Industrial Digitalization, School of Technology and Innovations, University of Vaasa, Vaasa, Finland
| | - Ilmo Leivo
- University of Turku, Institute of Biomedicine, Pathology, Turku, Finland
| | - Alhadi Almangush
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland; Department of Pathology, University of Helsinki, Helsinki, Finland; University of Turku, Institute of Biomedicine, Pathology, Turku, Finland; Faculty of Dentistry, Misurata University, Misurata, Libya
| | - Antti A Mäkitie
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland; Department of Otorhinolaryngology - Head and Neck Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland; Division of Ear, Nose and Throat Diseases, Department of Clinical Sciences, Intervention and Technology, Karolinska Institute and Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|