1
|
Tsiknakis N, Trivizakis E, Vassalou EE, Papadakis GZ, Spandidos DA, Tsatsakis A, Sánchez-García J, López-González R, Papanikolaou N, Karantanas AH, Marias K. Interpretable artificial intelligence framework for COVID-19 screening on chest X-rays. Exp Ther Med 2020; 20:727-735. [PMID: 32742318 PMCID: PMC7388253 DOI: 10.3892/etm.2020.8797] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 05/27/2020] [Indexed: 02/07/2023] Open
Abstract
COVID-19 has led to an unprecedented healthcare crisis with millions of infected people across the globe often pushing infrastructures, healthcare workers and entire economies beyond their limits. The scarcity of testing kits, even in developed countries, has led to extensive research efforts towards alternative solutions with high sensitivity. Chest radiological imaging paired with artificial intelligence (AI) can offer significant advantages in diagnosis of novel coronavirus infected patients. To this end, transfer learning techniques are used for overcoming the limitations emanating from the lack of relevant big datasets, enabling specialized models to converge on limited data, as in the case of X-rays of COVID-19 patients. In this study, we present an interpretable AI framework assessed by expert radiologists on the basis on how well the attention maps focus on the diagnostically-relevant image regions. The proposed transfer learning methodology achieves an overall area under the curve of 1 for a binary classification problem across a 5-fold training/testing dataset.
Collapse
|
research-article |
5 |
55 |
2
|
Trivizakis E, Tsiknakis N, Vassalou EE, Papadakis GZ, Spandidos DA, Sarigiannis D, Tsatsakis A, Papanikolaou N, Karantanas AH, Marias K. Advancing COVID-19 differentiation with a robust preprocessing and integration of multi-institutional open-repository computer tomography datasets for deep learning analysis. Exp Ther Med 2020; 20:78. [PMID: 32968435 PMCID: PMC7500043 DOI: 10.3892/etm.2020.9210] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 09/11/2020] [Indexed: 12/15/2022] Open
Abstract
The coronavirus pandemic and its unprecedented consequences globally has spurred the interest of the artificial intelligence research community. A plethora of published studies have investigated the role of imaging such as chest X-rays and computer tomography in coronavirus disease 2019 (COVID-19) automated diagnosis. Οpen repositories of medical imaging data can play a significant role by promoting cooperation among institutes in a world-wide scale. However, they may induce limitations related to variable data quality and intrinsic differences due to the wide variety of scanner vendors and imaging parameters. In this study, a state-of-the-art custom U-Net model is presented with a dice similarity coefficient performance of 99.6% along with a transfer learning VGG-19 based model for COVID-19 versus pneumonia differentiation exhibiting an area under curve of 96.1%. The above was significantly improved over the baseline model trained with no segmentation in selected tomographic slices of the same dataset. The presented study highlights the importance of a robust preprocessing protocol for image analysis within a heterogeneous imaging dataset and assesses the potential diagnostic value of the presented COVID-19 model by comparing its performance to the state of the art.
Collapse
|
research-article |
5 |
4 |
3
|
Zaridis DI, Mylona E, Tachos N, Pezoulas VC, Grigoriadis G, Tsiknakis N, Marias K, Tsiknakis M, Fotiadis DI. Region-adaptive magnetic resonance image enhancement for improving CNN-based segmentation of the prostate and prostatic zones. Sci Rep 2023; 13:714. [PMID: 36639671 PMCID: PMC9837765 DOI: 10.1038/s41598-023-27671-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 01/05/2023] [Indexed: 01/14/2023] Open
Abstract
Automatic segmentation of the prostate of and the prostatic zones on MRI remains one of the most compelling research areas. While different image enhancement techniques are emerging as powerful tools for improving the performance of segmentation algorithms, their application still lacks consensus due to contrasting evidence regarding performance improvement and cross-model stability, further hampered by the inability to explain models' predictions. Particularly, for prostate segmentation, the effectiveness of image enhancement on different Convolutional Neural Networks (CNN) remains largely unexplored. The present work introduces a novel image enhancement method, named RACLAHE, to enhance the performance of CNN models for segmenting the prostate's gland and the prostatic zones. The improvement in performance and consistency across five CNN models (U-Net, U-Net++, U-Net3+, ResU-net and USE-NET) is compared against four popular image enhancement methods. Additionally, a methodology is proposed to explain, both quantitatively and qualitatively, the relation between saliency maps and ground truth probability maps. Overall, RACLAHE was the most consistent image enhancement algorithm in terms of performance improvement across CNN models with the mean increase in Dice Score ranging from 3 to 9% for the different prostatic regions, while achieving minimal inter-model variability. The integration of a feature driven methodology to explain the predictions after applying image enhancement methods, enables the development of a concrete, trustworthy automated pipeline for prostate segmentation on MR images.
Collapse
|
research-article |
2 |
2 |
4
|
Tsiknakis N, Spanakis C, Tsompou P, Karanasiou G, Karanasiou G, Sakellarios A, Rigas G, Kyriakidis S, Papafaklis M, Nikopoulos S, Gijsen F, Michalis L, Fotiadis DI, Marias K. IVUS Longitudinal and Axial Registration for Atherosclerosis Progression Evaluation. Diagnostics (Basel) 2021; 11:1513. [PMID: 34441447 PMCID: PMC8394087 DOI: 10.3390/diagnostics11081513] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 08/20/2021] [Accepted: 08/21/2021] [Indexed: 11/16/2022] Open
Abstract
Intravascular ultrasound (IVUS) imaging offers accurate cross-sectional vessel information. To this end, registering temporal IVUS pullbacks acquired at two time points can assist the clinicians to accurately assess pathophysiological changes in the vessels, disease progression and the effect of the treatment intervention. In this paper, we present a novel two-stage registration framework for aligning pairs of longitudinal and axial IVUS pullbacks. Initially, we use a Dynamic Time Warping (DTW)-based algorithm to align the pullbacks in a temporal fashion. Subsequently, an intensity-based registration method, that utilizes a variant of the Harmony Search optimizer to register each matched pair of the pullbacks by maximizing their Mutual Information, is applied. The presented method is fully automated and only required two single global image-based measurements, unlike other methods that require extraction of morphology-based features. The data used includes 42 synthetically generated pullback pairs, achieving an alignment error of 0.1853 frames per pullback, a rotation error 0.93° and a translation error of 0.0161 mm. In addition, it was also tested on 11 baseline and follow-up, and 10 baseline and post-stent deployment real IVUS pullback pairs from two clinical centres, achieving an alignment error of 4.3±3.9 for the longitudinal registration, and a distance and a rotational error of 0.56±0.323 mm and 12.4°±10.5°, respectively, for the axial registration. Although the performance of the proposed method does not match that of the state-of-the-art, our method relies on computationally lighter steps for its computations, which is crucial in real-time applications. On the other hand, the proposed method performs even or better that the state-of-the-art when considering the axial registration. The results indicate that the proposed method can support clinical decision making and diagnosis based on sequential imaging examinations.
Collapse
|
research-article |
4 |
2 |
5
|
Tsiknakis N, Savvidaki E, Manikis GC, Gotsiou P, Remoundou I, Marias K, Alissandrakis E, Vidakis N. Pollen Grain Classification Based on Ensemble Transfer Learning on the Cretan Pollen Dataset. PLANTS 2022; 11:plants11070919. [PMID: 35406899 PMCID: PMC9002917 DOI: 10.3390/plants11070919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 03/17/2022] [Accepted: 03/26/2022] [Indexed: 12/03/2022]
Abstract
Pollen identification is an important task for the botanical certification of honey. It is performed via thorough microscopic examination of the pollen present in honey; a process called melissopalynology. However, manual examination of the images is hard, time-consuming and subject to inter- and intra-observer variability. In this study, we investigated the applicability of deep learning models for the classification of pollen-grain images into 20 pollen types, based on the Cretan Pollen Dataset. In particular, we applied transfer and ensemble learning methods to achieve an accuracy of 97.5%, a sensitivity of 96.9%, a precision of 97%, an F1 score of 96.89% and an AUC of 0.9995. However, in a preliminary case study, when we applied the best-performing model on honey-based pollen-grain images, we found that it performed poorly; only 0.02 better than random guessing (i.e., an AUC of 0.52). This indicates that the model should be further fine-tuned on honey-based pollen-grain images to increase its effectiveness on such data.
Collapse
|
|
3 |
1 |
6
|
Matikas A, Papakonstantinou A, Loibl S, Steger GG, Untch M, Johansson H, Tsiknakis N, Hellström M, Greil R, Möbus V, Gnant M, Bergh J, Foukakis T. Benefit from dose-dense adjuvant chemotherapy for breast cancer: subgroup analyses from the randomised phase 3 PANTHER trial. THE LANCET REGIONAL HEALTH. EUROPE 2025; 49:101162. [PMID: 39703564 PMCID: PMC11652897 DOI: 10.1016/j.lanepe.2024.101162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/20/2024] [Accepted: 11/21/2024] [Indexed: 12/21/2024]
Abstract
Background It is unclear whether some patients with high-risk breast cancer do not warrant adjuvant dose-dense chemotherapy due to small expected absolute benefit. Methods The phase 3 PANTHER trial (NCT00798070) compared adjuvant sequential epirubicin/cyclophosphamide (EC) and docetaxel (D) administered in either tailored dose-dense (tDD EC/D) or standard interval schedule (FEC/D) to patients with high-risk resected early breast cancer (n = 2003). We compared outcomes across key subgroups of interest, evaluated the performance of the online prognostication and treatment benefit estimation tool PREDICT and conducted a subpopulation treatment effect pattern plot (STEPP) analysis. Primary endpoint was breast cancer recurrence free survival (BCRFS). Findings Median follow-up was 10.3 years. Treatment with tDD EC/D improved 10-year BCRFS across all subgroups including according to menopausal status, with an absolute benefit of 2% or more, as well as in luminal (Hazard Ratio [HR] = 0.83, 95% Confidence Interval [CI] 0.65-1.05) and Human Epidermal Growth Factor Receptor 2 (HER2) positive (HR = 0.53, 95% CI 0.30-0.93), but not triple negative breast cancer patients (HR = 1.02, 95% CI 0.66-1.57). PREDICT underestimated overall survival in the entire population and across all subgroups. In STEPP analysis, absolute benefit from tDD EC/D in BCRFS was stable across risk-defined subpopulations, from 3.8% in the lowest risk patients to 3.6% in the highest risk ones. There was no differential treatment effect over time. Interpretation We could not reliably identify any subgroup not benefiting from dose-dense treatment, which should be considered for patients with primary resected high-risk breast cancer. Funding Cancerfonden, Bröstcancerförbundet, Radiumhemmets Forskningsfonder, Amgen, Roche, sanofi-aventis.
Collapse
|
research-article |
1 |
|
7
|
Zaridis DI, Pezoulas VC, Mylona E, Kalantzopoulos CN, Tachos NS, Tsiknakis N, Matsopoulos GK, Regge D, Papanikolaou N, Tsiknakis M, Marias K, Fotiadis DI. Simplatab: An Automated Machine Learning Framework for Radiomics-Based Bi-Parametric MRI Detection of Clinically Significant Prostate Cancer. Bioengineering (Basel) 2025; 12:242. [PMID: 40150706 PMCID: PMC11939345 DOI: 10.3390/bioengineering12030242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2025] [Revised: 02/21/2025] [Accepted: 02/24/2025] [Indexed: 03/29/2025] Open
Abstract
BACKGROUND Prostate cancer (PCa) diagnosis using MRI is often challenged by lesion variability. METHODS This study introduces Simplatab, an open-source automated machine learning (AutoML) framework designed for, but not limited to, automating the entire machine Learning pipeline to facilitate the detection of clinically significant prostate cancer (csPCa) using radiomics features. Unlike existing AutoML tools such as Auto-WEKA, Auto-Sklearn, ML-Plan, ATM, Google AutoML, and TPOT, Simplatab offers a comprehensive, user-friendly framework that integrates data bias detection, feature selection, model training with hyperparameter optimization, explainable AI (XAI) analysis, and post-training model vulnerabilities detection. Simplatab requires no coding expertise, provides detailed performance reports, and includes robust data bias detection, making it particularly suitable for clinical applications. RESULTS Evaluated on a large pan-European cohort of 4816 patients from 12 clinical centers, Simplatab supports multiple machine learning algorithms. The most notable features that differentiate Simplatab include ease of use, a user interface accessible to those with no coding experience, comprehensive reporting, XAI integration, and thorough bias assessment, all provided in a human-understandable format. CONCLUSIONS Our findings indicate that Simplatab can significantly enhance the usability, accountability, and explainability of machine learning in clinical settings, thereby increasing trust and accessibility for AI non-experts.
Collapse
|
research-article |
1 |
|
8
|
Scarpa F, Berto A, Tsiknakis N, Manikis G, Fotiadis DI, Marias K, Scarpa A. Automated analysis for glaucoma screening of retinal videos acquired with smartphone-based ophthalmoscope. Heliyon 2024; 10:e34308. [PMID: 39816342 PMCID: PMC11734129 DOI: 10.1016/j.heliyon.2024.e34308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 06/12/2024] [Accepted: 07/08/2024] [Indexed: 01/18/2025] Open
Abstract
Widespread screening is crucial for the early diagnosis and treatment of glaucoma, the leading cause of visual impairment and blindness. The development of portable technologies, such as smartphone-based ophthalmoscopes, able to image the optical nerve head, represents a resource for large-scale glaucoma screening. Indeed, they consist of an optical device attached to a common smartphone, making the overall device cheap and easy to use. Automated analyses able to assist clinicians are crucial for fast, reproducible, and accurate screening, and can promote its diffusion making it possible even for non-expert ophthalmologists. Images acquired with smartphone ophthalmoscopes differ from that acquired with a fundus camera for the field of view, noise, colour, and the presence of pupil, iris and eyelid. Consequently, algorithms specifically designed for this type of image need to be developed. We propose a completely automated analysis of retinal video acquired with smartphone ophthalmoscopy. The proposed algorithm, based on convolutional neural networks, selects the most relevant frames in the video, segments both optic disc and cup, and computes the cup-to-disc ratio. The developed networks were partially trained on images from a publicly available fundus camera datasets, modified through an original procedure to be statistically equal to the ones acquired with a smartphone ophthalmoscope. The proposed algorithm achieves good results in images acquired from healthy and pathological subjects. Indeed, an accuracy ≥95 % was obtained for both disc and cup segmentation and the computed cup-to-disc ratios denote good agreement with manual analysis (mean difference 9 %), allowing a substantial differentiation between healthy and pathological subjects.
Collapse
|
research-article |
1 |
|
9
|
Zaridis DI, Mylona E, Tsiknakis N, Tachos NS, Matsopoulos GK, Marias K, Tsiknakis M, Fotiadis DI. ProLesA-Net: A multi-channel 3D architecture for prostate MRI lesion segmentation with multi-scale channel and spatial attentions. PATTERNS (NEW YORK, N.Y.) 2024; 5:100992. [PMID: 39081575 PMCID: PMC11284496 DOI: 10.1016/j.patter.2024.100992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 03/06/2024] [Accepted: 04/17/2024] [Indexed: 08/02/2024]
Abstract
Prostate cancer diagnosis and treatment relies on precise MRI lesion segmentation, a challenge notably for small (<15 mm) and intermediate (15-30 mm) lesions. Our study introduces ProLesA-Net, a multi-channel 3D deep-learning architecture with multi-scale squeeze and excitation and attention gate mechanisms. Tested against six models across two datasets, ProLesA-Net significantly outperformed in key metrics: Dice score increased by 2.2%, and Hausdorff distance and average surface distance improved by 0.5 mm, with recall and precision also undergoing enhancements. Specifically, for lesions under 15 mm, our model showed a notable increase in five key metrics. In summary, ProLesA-Net consistently ranked at the top, demonstrating enhanced performance and stability. This advancement addresses crucial challenges in prostate lesion segmentation, enhancing clinical decision making and expediting treatment processes.
Collapse
|
research-article |
1 |
|
10
|
Tsiknakis N, Manikis G, Tzoras E, Salgkamis D, Vidal JM, Wang K, Zaridis D, Sifakis E, Zerdes I, Bergh J, Hartman J, Acs B, Marias K, Foukakis T. Unveiling the Power of Model-Agnostic Multiscale Analysis for Enhancing Artificial Intelligence Models in Breast Cancer Histopathology Images. IEEE J Biomed Health Inform 2024; 28:5312-5322. [PMID: 38865229 DOI: 10.1109/jbhi.2024.3413533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Developing AI models for digital pathology has traditionally relied on single-scale analysis of histopathology slides. However, a whole slide image is a rich digital representation of the tissue, captured at various magnification levels. Limiting our analysis to a single scale overlooks critical information, spanning from intricate high-resolution cellular details to broad low-resolution tissue structures. In this study, we propose a model-agnostic multiresolution feature aggregation framework tailored for the analysis of histopathology slides in the context of breast cancer, on a multicohort dataset of 2038 patient samples. We have adapted 9 state-of-the-art multiple instance learning models on our multi-scale methodology and evaluated their performance on grade prediction, TP53 mutation status prediction and survival prediction. The results prove the dominance of the multiresolution methodology, and specifically, concatenating or linearly transforming via a learnable layer the feature vectors of image patches from a high (20x) and low (10x) magnification factors achieve improved performance for all prediction tasks across domain-specific and imagenet-based features. On the contrary, the performance of uniresolution baseline models was not consistent across domain-specific and imagenet-based features. Moreover, we shed light on the inherent inconsistencies observed in models trained on whole-tissue-sections when validated against biopsy-based datasets. Despite these challenges, our findings underscore the superiority of multiresolution analysis over uniresolution methods. Finally, cross-scale analysis also benefits the explainability aspects of attention-based architectures, since one can extract attention maps at the tissue- and cell-levels, improving the interpretation of the model's decision.
Collapse
|
|
1 |
|
11
|
Tsiknakis N, Tzoras E, Zerdes I, Manikis GC, Acs B, Hartman J, Hatschek T, Foukakis T, Marias K. Multiresolution Self-Supervised Feature Integration via Attention Multiple Instance Learning for Histopathology Analysis. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083519 DOI: 10.1109/embc40787.2023.10341061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Digital histopathology image analysis of tumor tissue sections has seen great research interest for automating standard diagnostic tasks, but also for developing novel prognostic biomarkers. However, research has mainly been focused on developing uniresolution models, capturing either high-resolution cellular features or low-resolution tissue architectural features. In addition, in the patch-based weakly-supervised training of deep learning models, the features which represent the intratumoral heterogeneity are lost. In this study, we propose a multiresolution attention-based multiple instance learning framework that can capture cellular and contextual features from the whole tissue for predicting patient-level outcomes. Several basic mathematical operations were examined for integrating multiresolution features, i.e. addition, mean, multiplication and concatenation. The proposed multiplication-based multiresolution model performed the best (AUC=0.864), while all multiresolution models outperformed the uniresolution baseline models (AUC=0.669, 0.713) for breast-cancer grading. (Implementation: https://github.com/tsikup/multiresolution-clam).
Collapse
|
|
2 |
|
12
|
Panagiotopoulos KN, Tsiknakis N, Zaridis DI, Mavragani CP, Tzioufas AG, Fotiadis DI, Goules AV. Evaluation of minor labial salivary gland focus score in Sjögren's disease using deep learning: a tool for more efficient diagnosis and future tissue biomarker discovery. J Autoimmun 2025; 153:103418. [PMID: 40262321 DOI: 10.1016/j.jaut.2025.103418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/17/2025] [Accepted: 04/04/2025] [Indexed: 04/24/2025]
Abstract
BACKGROUND Sjögren's Disease (SjD) is histopathologically characterized by focal sialadenitis in minor labial salivary gland biopsies (mLSGB), which is evaluated by utilizing the focus score (FS). Focus score ≥1 identification is a critical step of the diagnostic approach and SjD classification. Nonetheless, during mLSGB analysis, FS reporting is neglected in a staggering 17 %, and a degree of inter-observer variability is introduced, even among specialized university centers. As the unmet need for reliable FS reporting is displayed, leveraging artificial intelligence in mLSGB evaluation shows encouraging potential and mandates to be investigated. METHODS Minor LSGBs stained only with hematoxylin and eosin (H&E) during evaluation of individuals with a clinical suspicion of SjD, were randomly chosen from our archive. All mLSGBs were scanned digitally as whole slide images (WSI) and the final dataset was partitioned into a training (70 %) and a test set (30 %). An attention-based deep learning binary classification model was employed for evaluation of mLSGBs positivity (FS ≥ 1 or FS < 1). RESULTS The final dataset consisted of 271 mLSGBs, with 153 (56 %) having FS < 1 and 118 (44 %) FS ≥ 1. In the FS ≥ 1 subset, 74 (63 %) were in the FS = 1-2 range, and the remaining biopsies had FS > 2, following the expected FS distribution among the typical SjD population. Our model resulted in: AUC = 0.932 (0.881-0.984), sensitivity 87 % (0.733-0.944), specificity 84 % (0.71-0.915) and accuracy 85.2 % (0.763-0.912), achieving better performance from previous works. CONCLUSION Artificial intelligence models may overcome the intra-observer biases and inter-observer variability in FS evaluation, reinforcing the diagnosis and biomarker discovery in SjD.
Collapse
|
|
1 |
|
13
|
Tsiknakis N, Wang K, Salgkamis D, Tzoras E, Manikis GC, Sifakis E, Bergh J, Zerdes I, Marias K, Matikas A, Foukakis T. Ensuring Model Fairness via Stratified Training: TP53 Mutation Prediction with Estrogen Receptor Stratification in Breast Histopathology. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-5. [PMID: 40039878 DOI: 10.1109/embc53108.2024.10782012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Developing AI models on medical images as decision support systems has seen a huge increase in interest during the last few years. However, most published studies have neglected testing the model's robustness against certain dataset-related biases and unbalanced variables. For example, although the prevalence of TP53 mutations is higher in Estrogen Receptor (ER)-negative breast cancer, while most ER-positive tumors are not mutated, published models have been developed on the entirety of the available data without testing for such intrinsic biases that can lead to overfitting. In this study we show that models trained for TP53 mutation prediction overfit on ER status and that stratification of training on the basis of ER is beneficial for all subgroups while it reduces bias and increases generalizability and fairness. (Implementation: https://github.com/tsikup/er-stratified-training-tp53-prediction).
Collapse
|
|
1 |
|
14
|
Tsiknakis N, Spanakis C, Tsoumpou P, Karanasiou G, Karanasiou G, Sakellarios A, Rigas G, Kyriakidis S, Papafaklis MI, Nikopoulos S, Gijsen F, Michalis L, Fotiadis DI, Marias K. OCT sequence registration before and after percutaneous coronary intervention (stent implantation). Biomed Signal Process Control 2023; 79:104251. [DOI: 10.1016/j.bspc.2022.104251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
|
2 |
|
15
|
Vidal JM, Tsiknakis N, Staaf J, Bosch A, Ehinger A, Nimeus E, Salgado R, Bai Y, Rimm DL, Hartman J, Acs B. The analytical and clinical validity of AI algorithms to score TILs in TNBC: can we use different machine learning models interchangeably? EClinicalMedicine 2024; 78:102928. [PMID: 39634035 PMCID: PMC11615110 DOI: 10.1016/j.eclinm.2024.102928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 10/24/2024] [Accepted: 10/24/2024] [Indexed: 12/07/2024] Open
Abstract
Background Pathologist-read tumor-infiltrating lymphocytes (TILs) have showcased their predictive and prognostic potential for early and metastatic triple-negative breast cancer (TNBC) but it is still subject to variability. Artificial intelligence (AI) is a promising approach toward eliminating variability and objectively automating TILs assessment. However, demonstrating robust analytical and prognostic validity is the key challenge currently preventing their integration into clinical workflows. Methods We evaluated the impact of ten AI models on TILs scoring, emphasizing their distinctions in TILs analytical and prognostic validity. Several AI-based TILs scoring models (seven developed and three previously validated AI models) were tested in a retrospective analytical cohort and in an independent prospective cohort to compare prognostic validation against invasive disease-free survival endpoint with 4 years median follow-up. The development and analytical validity set consisted of diagnostic tissue slides of 79 women with surgically resected primary invasive TNBC tumors diagnosed between 2012 and 2016 from the Yale School of Medicine. An independent set comprising of 215 TNBC patients from Sweden diagnosed between 2010 and 2015, was used for testing prognostic validity. Findings A significant difference in analytical validity (Spearman's r = 0.63-0.73, p < 0.001) is highlighted across AI methodologies and training strategies. Interestingly, the prognostic performance of digital TILs is demonstrated for eight out of ten AI models, even less extensively trained ones, with similar and overlapping hazard ratios (HR) in the external validation cohort (Cox regression analysis based on IDFS-endpoint, HR = 0.40-0.47; p < 0.004). Interpretation The demonstrated prognostic validity for most of the AI TIL models can be attributed to the intrinsic robustness of host anti-tumor immunity (measured by TILs) as a biomarker. However, the discrepancies between AI models should not be overlooked; rather, we believe that there is a critical need for an accessible, large, multi-centric dataset that will serve as a benchmark ensuring the comparability and reliability of different AI tools in clinical implementation. Funding Nikos Tsiknakis is supported by the Swedish Research Council (Grant Number 2021-03061, Theodoros Foukakis). Balazs Acs is supported by The Swedish Society for Medical Research (Svenska Sällskapet för Medicinsk Forskning) postdoctoral grant. Roberto Salgado is supported by a grant from Breast Cancer Research Foundation (BCRF).
Collapse
|
research-article |
1 |
|