1
|
Goessinger EV, Cerminara SE, Mueller AM, Gottfrois P, Huber S, Amaral M, Wenz F, Kostner L, Weiss L, Kunz M, Maul JT, Wespi S, Broman E, Kaufmann S, Patpanathapillai V, Treyer I, Navarini AA, Maul LV. Consistency of convolutional neural networks in dermoscopic melanoma recognition: A prospective real-world study about the pitfalls of augmented intelligence. J Eur Acad Dermatol Venereol 2024; 38:945-953. [PMID: 38158385 DOI: 10.1111/jdv.19777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 10/23/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND Deep-learning convolutional neural networks (CNNs) have outperformed even experienced dermatologists in dermoscopic melanoma detection under controlled conditions. It remains unexplored how real-world dermoscopic image transformations affect CNN robustness. OBJECTIVES To investigate the consistency of melanoma risk assessment by two commercially available CNNs to help formulate recommendations for current clinical use. METHODS A comparative cohort study was conducted from January to July 2022 at the Department of Dermatology, University Hospital Basel. Five dermoscopic images of 116 different lesions on the torso of 66 patients were captured consecutively by the same operator without deliberate rotation. Classification was performed by two CNNs (CNN-1/CNN-2). Lesions were divided into four subgroups based on their initial risk scoring and clinical dignity assessment. Reliability was assessed by variation and intraclass correlation coefficients. Excisions were performed for melanoma suspicion or two consecutively elevated CNN risk scores, and benign lesions were confirmed by expert consensus (n = 3). RESULTS 117 repeated image series of 116 melanocytic lesions (2 melanomas, 16 dysplastic naevi, 29 naevi, 1 solar lentigo, 1 suspicious and 67 benign) were classified. CNN-1 demonstrated superior measurement repeatability for clinically benign lesions with an initial malignant risk score (mean variation coefficient (mvc): CNN-1: 49.5(±34.3)%; CNN-2: 71.4(±22.5)%; p = 0.03), while CNN-2 outperformed for clinically benign lesions with benign scoring (mvc: CNN-1: 49.7(±22.7)%; CNN-2: 23.8(±29.3)%; p = 0.002). Both systems exhibited lowest score consistency for lesions with an initial malignant risk score and benign assessment. In this context, averaging three initial risk scores achieved highest sensitivity of dignity assessment (CNN-1: 94%; CNN-2: 89%). Intraclass correlation coefficients indicated 'moderate'-to-'good' reliability for both systems (CNN-1: 0.80, 95% CI:0.71-0.87, p < 0.001; CNN-2: 0.67, 95% CI:0.55-0.77, p < 0.001). CONCLUSIONS Potential user-induced image changes can significantly influence CNN classification. For clinical application, we recommend using the average of three initial risk scores. Furthermore, we advocate for CNN robustness optimization by cross-validation with repeated image sets. TRIAL REGISTRATION ClinicalTrials.gov (NCT04605822).
Collapse
Affiliation(s)
- E V Goessinger
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - S E Cerminara
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - A M Mueller
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - P Gottfrois
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - S Huber
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - M Amaral
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - F Wenz
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - L Kostner
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - L Weiss
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - M Kunz
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - J-T Maul
- Department of Dermatology, University Hospital Zurich, Zurich, Switzerland
| | - S Wespi
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - E Broman
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - S Kaufmann
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - V Patpanathapillai
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - I Treyer
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - A A Navarini
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
| | - L V Maul
- Department of Dermatology, University Hospital Basel, Basel, Switzerland
- Department of Dermatology, University Hospital Zurich, Zurich, Switzerland
| |
Collapse
|
2
|
Fliorent R, Fardman B, Podwojniak A, Javaid K, Tan IJ, Ghani H, Truong TM, Rao B, Heath C. Artificial intelligence in dermatology: advancements and challenges in skin of color. Int J Dermatol 2024; 63:455-461. [PMID: 38444331 DOI: 10.1111/ijd.17076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/13/2024] [Accepted: 01/30/2024] [Indexed: 03/07/2024]
Abstract
Artificial intelligence (AI) uses algorithms and large language models in computers to simulate human-like problem-solving and decision-making. AI programs have recently acquired widespread popularity in the field of dermatology through the application of online tools in the assessment, diagnosis, and treatment of skin conditions. A literature review was conducted using PubMed and Google Scholar analyzing recent literature (from the last 10 years through October 2023) to evaluate current AI programs in use for dermatologic purposes, identifying challenges in this technology when applied to skin of color (SOC), and proposing future steps to enhance the role of AI in dermatologic practice. Challenges surrounding AI and its application to SOC stem from the underrepresentation of SOC in datasets and issues with image quality and standardization. With these existing issues, current AI programs inevitably do worse at identifying lesions in SOC. Additionally, only 30% of the programs identified in this review had data reported on their use in dermatology, specifically in SOC. Significant development of these applications is required for the accurate depiction of darker skin tone images in datasets. More research is warranted in the future to better understand the efficacy of AI in aiding diagnosis and treatment options for SOC patients.
Collapse
Affiliation(s)
| | - Brian Fardman
- Rowan-Virtua School of Osteopathic Medicine, Stratford, NJ, USA
| | | | - Kiran Javaid
- Rowan-Virtua School of Osteopathic Medicine, Stratford, NJ, USA
| | - Isabella J Tan
- Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, USA
| | - Hira Ghani
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Thu M Truong
- Center for Dermatology, Rutgers Robert Wood Johnson, Somerset, NJ, USA
| | - Babar Rao
- Center for Dermatology, Rutgers Robert Wood Johnson, Somerset, NJ, USA
| | - Candrice Heath
- Lewis Katz School of Medicine at Temple University, Philadelphia, PA, USA
| |
Collapse
|
4
|
Lee KH, Lee RW, Kwon YE. Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT. Diagnostics (Basel) 2023; 14:90. [PMID: 38201398 PMCID: PMC10795741 DOI: 10.3390/diagnostics14010090] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 01/12/2024] Open
Abstract
This study evaluates the diagnostic accuracy and clinical utility of two artificial intelligence (AI) techniques: Kakao Brain Artificial Neural Network for Chest X-ray Reading (KARA-CXR), an assistive technology developed using large-scale AI and large language models (LLMs), and ChatGPT, a well-known LLM. The study was conducted to validate the performance of the two technologies in chest X-ray reading and explore their potential applications in the medical imaging diagnosis domain. The study methodology consisted of randomly selecting 2000 chest X-ray images from a single institution's patient database, and two radiologists evaluated the readings provided by KARA-CXR and ChatGPT. The study used five qualitative factors to evaluate the readings generated by each model: accuracy, false findings, location inaccuracies, count inaccuracies, and hallucinations. Statistical analysis showed that KARA-CXR achieved significantly higher diagnostic accuracy compared to ChatGPT. In the 'Acceptable' accuracy category, KARA-CXR was rated at 70.50% and 68.00% by two observers, while ChatGPT achieved 40.50% and 47.00%. Interobserver agreement was moderate for both systems, with KARA at 0.74 and GPT4 at 0.73. For 'False Findings', KARA-CXR scored 68.00% and 68.50%, while ChatGPT scored 37.00% for both observers, with high interobserver agreements of 0.96 for KARA and 0.97 for GPT4. In 'Location Inaccuracy' and 'Hallucinations', KARA-CXR outperformed ChatGPT with significant margins. KARA-CXR demonstrated a non-hallucination rate of 75%, which is significantly higher than ChatGPT's 38%. The interobserver agreement was high for KARA (0.91) and moderate to high for GPT4 (0.85) in the hallucination category. In conclusion, this study demonstrates the potential of AI and large-scale language models in medical imaging and diagnostics. It also shows that in the chest X-ray domain, KARA-CXR has relatively higher accuracy than ChatGPT.
Collapse
Affiliation(s)
| | - Ro Woon Lee
- Department of Radiology, College of Medicine, Inha University, Incheon 22212, Republic of Korea
| | | |
Collapse
|
5
|
Kim C, Gadgil SU, DeGrave AJ, Cai ZR, Daneshjou R, Lee SI. Fostering transparent medical image AI via an image-text foundation model grounded in medical literature. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.07.23291119. [PMID: 37398017 PMCID: PMC10312868 DOI: 10.1101/2023.06.07.23291119] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Building trustworthy and transparent image-based medical AI systems requires the ability to interrogate data and models at all stages of the development pipeline: from training models to post-deployment monitoring. Ideally, the data and associated AI systems could be described using terms already familiar to physicians, but this requires medical datasets densely annotated with semantically meaningful concepts. Here, we present a foundation model approach, named MONET (Medical cONcept rETriever), which learns how to connect medical images with text and generates dense concept annotations to enable tasks in AI transparency from model auditing to model interpretation. Dermatology provides a demanding use case for the versatility of MONET, due to the heterogeneity in diseases, skin tones, and imaging modalities. We trained MONET on the basis of 105,550 dermatological images paired with natural language descriptions from a large collection of medical literature. MONET can accurately annotate concepts across dermatology images as verified by board-certified dermatologists, outperforming supervised models built on previously concept-annotated dermatology datasets. We demonstrate how MONET enables AI transparency across the entire AI development pipeline from dataset auditing to model auditing to building inherently interpretable models.
Collapse
Affiliation(s)
- Chanwoo Kim
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| | - Soham U Gadgil
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| | - Alex J DeGrave
- Paul G. Allen School of Computer Science and Engineering, University of Washington
- Medical Scientist Training Program, University of Washington
| | - Zhuo Ran Cai
- Program for Clinical Research and Technology, Stanford University
| | - Roxana Daneshjou
- Department of Dermatology, Stanford School of Medicine
- Department of Biomedical Data Science, Stanford School of Medicine
| | - Su-In Lee
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| |
Collapse
|