1
|
Hanna MG, Olson NH, Zarella M, Dash RC, Herrmann MD, Furtado LV, Stram MN, Raciti PM, Hassell L, Mays A, Pantanowitz L, Sirintrapun JS, Krishnamurthy S, Parwani A, Lujan G, Evans A, Glassy EF, Bui MM, Singh R, Souers RJ, de Baca ME, Seheult JN. Recommendations for Performance Evaluation of Machine Learning in Pathology: A Concept Paper From the College of American Pathologists. Arch Pathol Lab Med 2024; 148:e335-e361. [PMID: 38041522 DOI: 10.5858/arpa.2023-0042-cp] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2023] [Indexed: 12/03/2023]
Abstract
CONTEXT.— Machine learning applications in the pathology clinical domain are emerging rapidly. As decision support systems continue to mature, laboratories will increasingly need guidance to evaluate their performance in clinical practice. Currently there are no formal guidelines to assist pathology laboratories in verification and/or validation of such systems. These recommendations are being proposed for the evaluation of machine learning systems in the clinical practice of pathology. OBJECTIVE.— To propose recommendations for performance evaluation of in vitro diagnostic tests on patient samples that incorporate machine learning as part of the preanalytical, analytical, or postanalytical phases of the laboratory workflow. Topics described include considerations for machine learning model evaluation including risk assessment, predeployment requirements, data sourcing and curation, verification and validation, change control management, human-computer interaction, practitioner training, and competency evaluation. DATA SOURCES.— An expert panel performed a review of the literature, Clinical and Laboratory Standards Institute guidance, and laboratory and government regulatory frameworks. CONCLUSIONS.— Review of the literature and existing documents enabled the development of proposed recommendations. This white paper pertains to performance evaluation of machine learning systems intended to be implemented for clinical patient testing. Further studies with real-world clinical data are encouraged to support these proposed recommendations. Performance evaluation of machine learning models is critical to verification and/or validation of in vitro diagnostic tests using machine learning intended for clinical practice.
Collapse
Affiliation(s)
- Matthew G Hanna
- From the Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York (Hanna, Sirintrapun)
| | - Niels H Olson
- The Defense Innovation Unit, Mountain View, California (Olson)
- The Department of Pathology, Uniformed Services University, Bethesda, Maryland (Olson)
| | - Mark Zarella
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Zarella, Seheult)
| | - Rajesh C Dash
- Department of Pathology, Duke University Health System, Durham, North Carolina (Dash)
| | - Markus D Herrmann
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston (Herrmann)
| | - Larissa V Furtado
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee (Furtado)
| | - Michelle N Stram
- The Department of Forensic Medicine, New York University, and Office of Chief Medical Examiner, New York (Stram)
| | | | - Lewis Hassell
- Department of Pathology, Oklahoma University Health Sciences Center, Oklahoma City (Hassell)
| | - Alex Mays
- The MITRE Corporation, McLean, Virginia (Mays)
| | - Liron Pantanowitz
- Department of Pathology & Clinical Labs, University of Michigan, Ann Arbor (Pantanowitz)
| | - Joseph S Sirintrapun
- From the Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York (Hanna, Sirintrapun)
| | | | - Anil Parwani
- Department of Pathology, The Ohio State University Wexner Medical Center, Columbus (Parwani, Lujan)
| | - Giovanni Lujan
- Department of Pathology, The Ohio State University Wexner Medical Center, Columbus (Parwani, Lujan)
| | - Andrew Evans
- Laboratory Medicine, Mackenzie Health, Toronto, Ontario, Canada (Evans)
| | - Eric F Glassy
- Affiliated Pathologists Medical Group, Rancho Dominguez, California (Glassy)
| | - Marilyn M Bui
- Departments of Pathology and Machine Learning, Moffitt Cancer Center, Tampa, Florida (Bui)
| | - Rajendra Singh
- Department of Dermatopathology, Summit Health, Summit Woodland Park, New Jersey (Singh)
| | - Rhona J Souers
- Department of Biostatistics, College of American Pathologists, Northfield, Illinois (Souers)
| | | | - Jansen N Seheult
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Zarella, Seheult)
| |
Collapse
|
2
|
Boulogne LH, Lorenz J, Kienzle D, Schön R, Ludwig K, Lienhart R, Jégou S, Li G, Chen C, Wang Q, Shi D, Maniparambil M, Müller D, Mertes S, Schröter N, Hellmann F, Elia M, Dirks I, Bossa MN, Berenguer AD, Mukherjee T, Vandemeulebroucke J, Sahli H, Deligiannis N, Gonidakis P, Huynh ND, Razzak I, Bouadjenek R, Verdicchio M, Borrelli P, Aiello M, Meakin JA, Lemm A, Russ C, Ionasec R, Paragios N, van Ginneken B, Revel MP. The STOIC2021 COVID-19 AI challenge: Applying reusable training methodologies to private data. Med Image Anal 2024; 97:103230. [PMID: 38875741 DOI: 10.1016/j.media.2024.103230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 01/11/2024] [Accepted: 06/03/2024] [Indexed: 06/16/2024]
Abstract
Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.
Collapse
Affiliation(s)
- Luuk H Boulogne
- Radboud university medical center, P.O. Box 9101, 6500HB Nijmegen, The Netherlands.
| | - Julian Lorenz
- University of Augsburg, Universitätsstraße 2, 86159 Augsburg, Germany.
| | - Daniel Kienzle
- University of Augsburg, Universitätsstraße 2, 86159 Augsburg, Germany
| | - Robin Schön
- University of Augsburg, Universitätsstraße 2, 86159 Augsburg, Germany
| | - Katja Ludwig
- University of Augsburg, Universitätsstraße 2, 86159 Augsburg, Germany
| | - Rainer Lienhart
- University of Augsburg, Universitätsstraße 2, 86159 Augsburg, Germany
| | | | - Guang Li
- Keya medical technology co. ltd, Floor 20, Building A, 1 Ronghua South Road, Yizhuang Economic Development Zone, Daxing District, Beijing, PR China.
| | - Cong Chen
- Keya medical technology co. ltd, Floor 20, Building A, 1 Ronghua South Road, Yizhuang Economic Development Zone, Daxing District, Beijing, PR China
| | - Qi Wang
- Keya medical technology co. ltd, Floor 20, Building A, 1 Ronghua South Road, Yizhuang Economic Development Zone, Daxing District, Beijing, PR China
| | - Derik Shi
- Keya medical technology co. ltd, Floor 20, Building A, 1 Ronghua South Road, Yizhuang Economic Development Zone, Daxing District, Beijing, PR China
| | - Mayug Maniparambil
- ML-Labs, Dublin City University, N210, Marconi building, Dublin City University, Glasnevin, Dublin 9, Ireland.
| | - Dominik Müller
- University of Augsburg, Universitätsstraße 2, 86159 Augsburg, Germany; Faculty of Applied Computer Science, University of Augsburg, Germany
| | - Silvan Mertes
- Faculty of Applied Computer Science, University of Augsburg, Germany
| | - Niklas Schröter
- Faculty of Applied Computer Science, University of Augsburg, Germany
| | - Fabio Hellmann
- Faculty of Applied Computer Science, University of Augsburg, Germany
| | - Miriam Elia
- Faculty of Applied Computer Science, University of Augsburg, Germany.
| | - Ine Dirks
- Vrije Universiteit Brussel, Department of Electronics and Informatics, Pleinlaan 2, 1050 Brussels, Belgium; imec, Kapeldreef 75, 3001 Leuven, Belgium.
| | - Matías Nicolás Bossa
- Vrije Universiteit Brussel, Department of Electronics and Informatics, Pleinlaan 2, 1050 Brussels, Belgium; imec, Kapeldreef 75, 3001 Leuven, Belgium
| | - Abel Díaz Berenguer
- Vrije Universiteit Brussel, Department of Electronics and Informatics, Pleinlaan 2, 1050 Brussels, Belgium; imec, Kapeldreef 75, 3001 Leuven, Belgium
| | - Tanmoy Mukherjee
- Vrije Universiteit Brussel, Department of Electronics and Informatics, Pleinlaan 2, 1050 Brussels, Belgium; imec, Kapeldreef 75, 3001 Leuven, Belgium
| | - Jef Vandemeulebroucke
- Vrije Universiteit Brussel, Department of Electronics and Informatics, Pleinlaan 2, 1050 Brussels, Belgium; imec, Kapeldreef 75, 3001 Leuven, Belgium
| | - Hichem Sahli
- Vrije Universiteit Brussel, Department of Electronics and Informatics, Pleinlaan 2, 1050 Brussels, Belgium; imec, Kapeldreef 75, 3001 Leuven, Belgium
| | - Nikos Deligiannis
- Vrije Universiteit Brussel, Department of Electronics and Informatics, Pleinlaan 2, 1050 Brussels, Belgium; imec, Kapeldreef 75, 3001 Leuven, Belgium
| | - Panagiotis Gonidakis
- Vrije Universiteit Brussel, Department of Electronics and Informatics, Pleinlaan 2, 1050 Brussels, Belgium; imec, Kapeldreef 75, 3001 Leuven, Belgium
| | | | - Imran Razzak
- University of New South Wales, Sydney, Australia.
| | | | | | | | | | - James A Meakin
- Radboud university medical center, P.O. Box 9101, 6500HB Nijmegen, The Netherlands
| | - Alexander Lemm
- Amazon Web Services, Marcel-Breuer-Str. 12, 80807 München, Germany
| | - Christoph Russ
- Amazon Web Services, Marcel-Breuer-Str. 12, 80807 München, Germany
| | - Razvan Ionasec
- Amazon Web Services, Marcel-Breuer-Str. 12, 80807 München, Germany
| | - Nikos Paragios
- Keya medical technology co. ltd, Floor 20, Building A, 1 Ronghua South Road, Yizhuang Economic Development Zone, Daxing District, Beijing, PR China; TheraPanacea, 75004, Paris, France
| | - Bram van Ginneken
- Radboud university medical center, P.O. Box 9101, 6500HB Nijmegen, The Netherlands
| | - Marie-Pierre Revel
- Department of Radiology, Université de Paris, APHP, Hôpital Cochin, 27 rue du Fg Saint Jacques, 75014 Paris, France
| |
Collapse
|
3
|
Bergan MB, Larsen M, Moshina N, Bartsch H, Koch HW, Aase HS, Satybaldinov Z, Haldorsen IHS, Lee CI, Hofvind S. AI performance by mammographic density in a retrospective cohort study of 99,489 participants in BreastScreen Norway. Eur Radiol 2024; 34:6298-6308. [PMID: 38528136 PMCID: PMC11399294 DOI: 10.1007/s00330-024-10681-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 01/19/2024] [Accepted: 02/10/2024] [Indexed: 03/27/2024]
Abstract
OBJECTIVE To explore the ability of artificial intelligence (AI) to classify breast cancer by mammographic density in an organized screening program. MATERIALS AND METHOD We included information about 99,489 examinations from 74,941 women who participated in BreastScreen Norway, 2013-2019. All examinations were analyzed with an AI system that assigned a malignancy risk score (AI score) from 1 (lowest) to 10 (highest) for each examination. Mammographic density was classified into Volpara density grade (VDG), VDG1-4; VDG1 indicated fatty and VDG4 extremely dense breasts. Screen-detected and interval cancers with an AI score of 1-10 were stratified by VDG. RESULTS We found 10,406 (10.5% of the total) examinations to have an AI risk score of 10, of which 6.7% (704/10,406) was breast cancer. The cancers represented 89.7% (617/688) of the screen-detected and 44.6% (87/195) of the interval cancers. 20.3% (20,178/99,489) of the examinations were classified as VDG1 and 6.1% (6047/99,489) as VDG4. For screen-detected cancers, 84.0% (68/81, 95% CI, 74.1-91.2) had an AI score of 10 for VDG1, 88.9% (328/369, 95% CI, 85.2-91.9) for VDG2, 92.5% (185/200, 95% CI, 87.9-95.7) for VDG3, and 94.7% (36/38, 95% CI, 82.3-99.4) for VDG4. For interval cancers, the percentages with an AI score of 10 were 33.3% (3/9, 95% CI, 7.5-70.1) for VDG1 and 48.0% (12/25, 95% CI, 27.8-68.7) for VDG4. CONCLUSION The tested AI system performed well according to cancer detection across all density categories, especially for extremely dense breasts. The highest proportion of screen-detected cancers with an AI score of 10 was observed for women classified as VDG4. CLINICAL RELEVANCE STATEMENT Our study demonstrates that AI can correctly classify the majority of screen-detected and about half of the interval breast cancers, regardless of breast density. KEY POINTS • Mammographic density is important to consider in the evaluation of artificial intelligence in mammographic screening. • Given a threshold representing about 10% of those with the highest malignancy risk score by an AI system, we found an increasing percentage of cancers with increasing mammographic density. • Artificial intelligence risk score and mammographic density combined may help triage examinations to reduce workload for radiologists.
Collapse
Affiliation(s)
- Marie Burns Bergan
- Section for Breast Cancer Screening, Cancer Registry of Norway, Norwegian Institute of Public Health, P.O. Box 5313, 0304, Oslo, Norway
| | - Marthe Larsen
- Section for Breast Cancer Screening, Cancer Registry of Norway, Norwegian Institute of Public Health, P.O. Box 5313, 0304, Oslo, Norway
| | - Nataliia Moshina
- Section for Breast Cancer Screening, Cancer Registry of Norway, Norwegian Institute of Public Health, P.O. Box 5313, 0304, Oslo, Norway
| | - Hauke Bartsch
- Department of Radiology, Mohn Medical Imaging and Visualization Centre (MMIV), Haukeland University Hospital, Bergen, Norway
| | - Henrik Wethe Koch
- Department of Radiology, Stavanger University Hospital, Stavanger, Norway
- Faculty of Health Sciences, University of Stavanger, Stavanger, Norway
| | | | - Zhanbolat Satybaldinov
- Department of Radiology, Mohn Medical Imaging and Visualization Centre (MMIV), Haukeland University Hospital, Bergen, Norway
| | - Ingfrid Helene Salvesen Haldorsen
- Department of Radiology, Mohn Medical Imaging and Visualization Centre (MMIV), Haukeland University Hospital, Bergen, Norway
- Section for Radiology, Department of Clinical Medicine, University of Bergen, Bergen, Norway
| | - Christoph I Lee
- Department of Radiology, University of Washington School of Medicine, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington School of Public Health, Seattle, WA, USA
| | - Solveig Hofvind
- Section for Breast Cancer Screening, Cancer Registry of Norway, Norwegian Institute of Public Health, P.O. Box 5313, 0304, Oslo, Norway.
- Department of Health and Care Sciences, Faculty of Health Sciences, UiT The Arctic University of Norway, Tromsø, Norway.
| |
Collapse
|
4
|
Lee HJ, Lee JH, Lee JE, Na YM, Park MH, Lee JS, Lim HS. Prediction of early clinical response to neoadjuvant chemotherapy in Triple-negative breast cancer: Incorporating Radiomics through breast MRI. Sci Rep 2024; 14:21691. [PMID: 39289507 PMCID: PMC11408492 DOI: 10.1038/s41598-024-72581-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 09/09/2024] [Indexed: 09/19/2024] Open
Abstract
This study assessed pretreatment breast MRI coupled with machine learning for predicting early clinical responses to neoadjuvant chemotherapy (NAC) in triple-negative breast cancer (TNBC), focusing on identifying non-responders. A retrospective analysis of 135 TNBC patients (107 responders, 28 non-responders) treated with NAC from January 2015 to October 2022 was conducted. Non-responders were defined according to RECIST guidelines. Data included clinicopathologic factors and clinical MRI findings, with radiomics features from contrast-enhanced T1-weighted images, to train a stacking ensemble of 13 machine learning models. For subgroup analysis, propensity score matching was conducted to adjust for clinical disparities in NAC response. The efficacy of the models was evaluated using the area under the receiver-operating-characteristic curve (AUROC) before and after matching. The model combining clinicopathologic factors and clinical MRI findings achieved an AUROC of 0.752 (95% CI 0.644-0.860) for predicting non-responders, while radiomics-based models showed 0.749 (95% CI 0.614-0.884). An integrated model of radiomics, clinicopathologic factors, and clinical MRI findings reached an AUROC of 0.802 (95% CI 0.699-0.905). After propensity score matching, the hierarchical order of key radiomics features remained consistent. Our study demonstrated the potential of using machine learning models based on pretreatment MRI to non-invasively predict TNBC non-responders to NAC.
Collapse
Affiliation(s)
- Hyo-Jae Lee
- Department of Radiology, Chonnam National University Hospital, Gwangju, Republic of Korea
| | - Jeong Hoon Lee
- Department of Radiology, Stanford University School of Medicine, Stanford, CA, USA
| | - Jong Eun Lee
- Department of Radiology and the Research Institute of Radiology, Asan Medical Center, Seoul, Republic of Korea
| | - Yong Min Na
- Department of Surgery, Chonnam National University Hwasun Hospital, Hwasun, Republic of Korea
| | - Min Ho Park
- Department of Surgery, Chonnam National University Hwasun Hospital, Hwasun, Republic of Korea
- Chonnam National University Medical School, Gwangju, Republic of Korea
| | - Ji Shin Lee
- Department of Pathology, Chonnam National University Hwasun Hospital, Hwasun, Republic of Korea
- Chonnam National University Medical School, Gwangju, Republic of Korea
| | - Hyo Soon Lim
- Department of Radiology, Chonnam National University Hwasun Hospital, Hwasun, Republic of Korea.
- Chonnam National University Medical School, Gwangju, Republic of Korea.
| |
Collapse
|
5
|
Manigrasso F, Milazzo R, Russo AS, Lamberti F, Strand F, Pagnani A, Morra L. Mammography classification with multi-view deep learning techniques: Investigating graph and transformer-based architectures. Med Image Anal 2024; 99:103320. [PMID: 39244796 DOI: 10.1016/j.media.2024.103320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 06/20/2024] [Accepted: 08/19/2024] [Indexed: 09/10/2024]
Abstract
The potential and promise of deep learning systems to provide an independent assessment and relieve radiologists' burden in screening mammography have been recognized in several studies. However, the low cancer prevalence, the need to process high-resolution images, and the need to combine information from multiple views and scales still pose technical challenges. Multi-view architectures that combine information from the four mammographic views to produce an exam-level classification score are a promising approach to the automated processing of screening mammography. However, training such architectures from exam-level labels, without relying on pixel-level supervision, requires very large datasets and may result in suboptimal accuracy. Emerging architectures such as Visual Transformers (ViT) and graph-based architectures can potentially integrate ipsi-lateral and contra-lateral breast views better than traditional convolutional neural networks, thanks to their stronger ability of modeling long-range dependencies. In this paper, we extensively evaluate novel transformer-based and graph-based architectures against state-of-the-art multi-view convolutional neural networks, trained in a weakly-supervised setting on a middle-scale dataset, both in terms of performance and interpretability. Extensive experiments on the CSAW dataset suggest that, while transformer-based architecture outperform other architectures, different inductive biases lead to complementary strengths and weaknesses, as each architecture is sensitive to different signs and mammographic features. Hence, an ensemble of different architectures should be preferred over a winner-takes-all approach to achieve more accurate and robust results. Overall, the findings highlight the potential of a wide range of multi-view architectures for breast cancer classification, even in datasets of relatively modest size, although the detection of small lesions remains challenging without pixel-wise supervision or ad-hoc networks.
Collapse
Affiliation(s)
- Francesco Manigrasso
- Politecnico di Torino, Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Rosario Milazzo
- Politecnico di Torino, Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Alessandro Sebastian Russo
- Politecnico di Torino, Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Fabrizio Lamberti
- Politecnico di Torino, Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Fredrik Strand
- Department of Oncology-Pathology, Karolinska Institute, Stockholm, Sweden; Department of Breast Radiology, Karolinska University Hospital, Stockholm, Sweden
| | - Andrea Pagnani
- Politecnico di Torino, Dipartimento di Scienza Applicata e Tecnologia, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Lia Morra
- Politecnico di Torino, Dipartimento di Automatica e Informatica, Corso Duca degli Abruzzi 24, 10129, Turin, Italy.
| |
Collapse
|
6
|
Seker ME, Koyluoglu YO, Ozaydin AN, Gurdal SO, Ozcinar B, Cabioglu N, Ozmen V, Aribal E. Diagnostic capabilities of artificial intelligence as an additional reader in a breast cancer screening program. Eur Radiol 2024; 34:6145-6157. [PMID: 38388718 PMCID: PMC11364680 DOI: 10.1007/s00330-024-10661-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 01/18/2024] [Accepted: 01/27/2024] [Indexed: 02/24/2024]
Abstract
OBJECTIVES We aimed to evaluate the early-detection capabilities of AI in a screening program over its duration, with a specific focus on the detection of interval cancers, the early detection of cancers with the assistance of AI from prior visits, and its impact on workload for various reading scenarios. MATERIALS AND METHODS The study included 22,621 mammograms of 8825 women within a 10-year biennial two-reader screening program. The statistical analysis focused on 5136 mammograms from 4282 women due to data retrieval issues, among whom 105 were diagnosed with breast cancer. The AI software assigned scores from 1 to 100. Histopathology results determined the ground truth, and Youden's index was used to establish a threshold. Tumor characteristics were analyzed with ANOVA and chi-squared test, and different workflow scenarios were evaluated using bootstrapping. RESULTS The AI software achieved an AUC of 89.6% (86.1-93.2%, 95% CI). The optimal threshold was 30.44, yielding 72.38% sensitivity and 92.86% specificity. Initially, AI identified 57 screening-detected cancers (83.82%), 15 interval cancers (51.72%), and 4 missed cancers (50%). AI as a second reader could have led to earlier diagnosis in 24 patients (average 29.92 ± 19.67 months earlier). No significant differences were found in cancer-characteristics groups. A hybrid triage workflow scenario showed a potential 69.5% reduction in workload and a 30.5% increase in accuracy. CONCLUSION This AI system exhibits high sensitivity and specificity in screening mammograms, effectively identifying interval and missed cancers and identifying 23% of cancers earlier in prior mammograms. Adopting AI as a triage mechanism has the potential to reduce workload by nearly 70%. CLINICAL RELEVANCE STATEMENT The study proposes a more efficient method for screening programs, both in terms of workload and accuracy. KEY POINTS • Incorporating AI as a triage tool in screening workflow improves sensitivity (72.38%) and specificity (92.86%), enhancing detection rates for interval and missed cancers. • AI-assisted triaging is effective in differentiating low and high-risk cases, reduces radiologist workload, and potentially enables broader screening coverage. • AI has the potential to facilitate early diagnosis compared to human reading.
Collapse
Affiliation(s)
- Mustafa Ege Seker
- Department of Radiology, Acibadem Mehmet Ali Aydinlar University, School of Medicine, Istanbul, Turkey
| | - Yilmaz Onat Koyluoglu
- Department of Radiology, Acibadem Mehmet Ali Aydinlar University, School of Medicine, Istanbul, Turkey
| | | | | | - Beyza Ozcinar
- Istanbul University, School of Medicine, Istanbul, Turkey
| | | | - Vahit Ozmen
- Istanbul University, School of Medicine, Istanbul, Turkey
| | - Erkin Aribal
- Department of Radiology, Acibadem Mehmet Ali Aydinlar University, School of Medicine, Istanbul, Turkey.
| |
Collapse
|
7
|
Lee SE, Han K, Rho M, Kim EK. Artificial intelligence-based computer-aided diagnosis abnormality score trends in the serial mammography of patients with breast cancer. Eur J Radiol 2024; 178:111626. [PMID: 39024665 DOI: 10.1016/j.ejrad.2024.111626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/08/2024] [Accepted: 07/12/2024] [Indexed: 07/20/2024]
Abstract
PURPOSE To explore the abnormality score trends of artificial intelligence-based computer-aided diagnosis (AI-CAD) in the serial mammography of patients until a final diagnosis of breast cancer. METHOD From 2015 to 2019, 126 breast cancer patients who had at least two previous mammograms obtained from 2008 up to cancer diagnosis were included. AI-CAD was retrospectively applied to 487 previous mammograms and all the abnormality scores calculated by AI-CAD were obtained. The contralateral breast of each affected breast was defined as the control group. We divided all mammograms by 6-month intervals from cancer diagnosis in reverse chronological order. The random coefficient model was used to estimate whether the chronological trend of AI-CAD abnormality scores differed between cancer and normal breasts. Subgroup analyses were performed according to mammographic visibility, invasiveness and molecular subtype of the invasive cancer. RESULTS Mean period from initial examination to cancer diagnosis was 6.0 years (range 1.7-10.7 years). The abnormality scores of breasts diagnosed with cancer showed a significantly increasing trend during the previous examination period (slope 0.6 per 6 months, p for the slope < 0.001), while the contralateral normal breast showed no trend (slope 0.03, p = 0.776). The difference in slope between the cancerous and contralateral breasts was significant (p < 0.001). For mammography-visible cancers, the abnormality scores in cancerous breasts showed a significant increasing trend (slope 0.8, p < 0.001), while for mammography-occult cancers, the trend was not significant (slope 0.1, p = 0.6). For invasive cancers, the slope of the abnormality scores showed a significant increasing trend (slope 1.4, p = 0.002), unlike ductal carcinoma in situ (DCIS) which showed no significant trend. There was no significant difference in the slope of abnormality scores among the subtypes of invasive cancers (p = 0.418). CONCLUSION Breasts diagnosed with cancer showed an increase in AI-CAD abnormality scores in previous serial mammograms, suggesting that AI-CAD could be useful for early detection of breast cancer.
Collapse
Affiliation(s)
- Si Eun Lee
- Department of Radiology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Republic of Korea
| | - Kyunghwa Han
- Department of Radiology, Research Institute of Radiologic Science, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Miribi Rho
- Department of Radiology, Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Eun-Kyung Kim
- Department of Radiology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Republic of Korea.
| |
Collapse
|
8
|
Frazer HML, Peña-Solorzano CA, Kwok CF, Elliott MS, Chen Y, Wang C, Lippey JF, Hopper JL, Brotchie P, Carneiro G, McCarthy DJ. Comparison of AI-integrated pathways with human-AI interaction in population mammographic screening for breast cancer. Nat Commun 2024; 15:7525. [PMID: 39214982 PMCID: PMC11364867 DOI: 10.1038/s41467-024-51725-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 08/14/2024] [Indexed: 09/04/2024] Open
Abstract
Artificial intelligence (AI) readers of mammograms compare favourably to individual radiologists in detecting breast cancer. However, AI readers cannot perform at the level of multi-reader systems used by screening programs in countries such as Australia, Sweden, and the UK. Therefore, implementation demands human-AI collaboration. Here, we use a large, high-quality retrospective mammography dataset from Victoria, Australia to conduct detailed simulations of five potential AI-integrated screening pathways, and examine human-AI interaction effects to explore automation bias. Operating an AI reader as a second reader or as a high confidence filter improves current screening outcomes by 1.9-2.5% in sensitivity and up to 0.6% in specificity, achieving 4.6-10.9% reduction in assessments and 48-80.7% reduction in human reads. Automation bias degrades performance in multi-reader settings but improves it for single-readers. This study provides insight into feasible approaches for AI-integrated screening pathways and prospective studies necessary prior to clinical adoption.
Collapse
Affiliation(s)
- Helen M L Frazer
- St Vincent's BreastScreen, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia.
- BreastScreen Victoria, Caulfield, VIC, Australia.
- Faculty of Medicine, Dentistry & Health Sciences, University of Melbourne, Melbourne, VIC, Australia.
| | - Carlos A Peña-Solorzano
- Bioinformatics and Cellular Genomics Unit, St Vincent's Institute of Medical Research, Fitzroy, VIC, Australia
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Chun Fung Kwok
- Bioinformatics and Cellular Genomics Unit, St Vincent's Institute of Medical Research, Fitzroy, VIC, Australia
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Michael S Elliott
- Bioinformatics and Cellular Genomics Unit, St Vincent's Institute of Medical Research, Fitzroy, VIC, Australia
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Yuanhong Chen
- School of Computer Science, Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Chong Wang
- School of Computer Science, Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Jocelyn F Lippey
- St Vincent's BreastScreen, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia
- Department of Surgery, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia
- Department of Surgery, University of Melbourne, Melbourne, VIC, Australia
| | - John L Hopper
- Centre for Epidemiology & Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia
| | - Peter Brotchie
- Department of Radiology, St Vincent's Hospital Melbourne, Melbourne, VIC, Australia
| | - Gustavo Carneiro
- School of Computer Science, Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
- Centre for Vision, Speech and Signal Processing (CVSSP), The University of Surrey, Surrey, UK
| | - Davis J McCarthy
- Bioinformatics and Cellular Genomics Unit, St Vincent's Institute of Medical Research, Fitzroy, VIC, Australia
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
9
|
Adachi M, Fujioka T, Ishiba T, Nara M, Maruya S, Hayashi K, Kumaki Y, Yamaga E, Katsuta L, Hao D, Hartman M, Mengling F, Oda G, Kubota K, Tateishi U. AI Use in Mammography for Diagnosing Metachronous Contralateral Breast Cancer. J Imaging 2024; 10:211. [PMID: 39330431 PMCID: PMC11432939 DOI: 10.3390/jimaging10090211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Revised: 08/14/2024] [Accepted: 08/22/2024] [Indexed: 09/28/2024] Open
Abstract
Although several studies have been conducted on artificial intelligence (AI) use in mammography (MG), there is still a paucity of research on the diagnosis of metachronous bilateral breast cancer (BC), which is typically more challenging to diagnose. This study aimed to determine whether AI could enhance BC detection, achieving earlier or more accurate diagnoses than radiologists in cases of metachronous contralateral BC. We included patients who underwent unilateral BC surgery and subsequently developed contralateral BC. This retrospective study evaluated the AI-supported MG diagnostic system called FxMammo™. We evaluated the capability of FxMammo™ (FathomX Pte Ltd., Singapore) to diagnose BC more accurately or earlier than radiologists' assessments. This evaluation was supplemented by reviewing MG readings made by radiologists. Out of 1101 patients who underwent surgery, 10 who had initially undergone a partial mastectomy and later developed contralateral BC were analyzed. The AI system identified malignancies in six cases (60%), while radiologists identified five cases (50%). Notably, two cases (20%) were diagnosed solely by the AI system. Additionally, for these cases, the AI system had identified malignancies a year before the conventional diagnosis. This study highlights the AI system's effectiveness in diagnosing metachronous contralateral BC via MG. In some cases, the AI system consistently diagnosed cancer earlier than radiological assessments.
Collapse
Affiliation(s)
- Mio Adachi
- Department of Breast Surgery, Tokyo Medical and Dental University Hospital, Tokyo 113-8510, Japan
| | - Tomoyuki Fujioka
- Department of Diagnostic Radiology, Tokyo Medical and Dental University Hospital, Tokyo 113-8510, Japan
| | - Toshiyuki Ishiba
- Department of Breast Surgery, Tokyo Medical and Dental University Hospital, Tokyo 113-8510, Japan
| | - Miyako Nara
- Ohtsuka Breast Care Clinic, Tokyo 121-0813, Japan
| | - Sakiko Maruya
- Department of Breast Surgery, Tokyo Medical and Dental University Hospital, Tokyo 113-8510, Japan
| | - Kumiko Hayashi
- Department of Breast Surgery, Tokyo Medical and Dental University Hospital, Tokyo 113-8510, Japan
| | - Yuichi Kumaki
- Department of Breast Surgery, Tokyo Medical and Dental University Hospital, Tokyo 113-8510, Japan
| | - Emi Yamaga
- Department of Diagnostic Radiology, Tokyo Medical and Dental University Hospital, Tokyo 113-8510, Japan
| | - Leona Katsuta
- Department of Diagnostic Radiology, Tokyo Medical and Dental University Hospital, Tokyo 113-8510, Japan
| | - Du Hao
- Saw Swee Hock School of Public Health, National University of Singapore, National University Health System, Singapore 119074, Singapore
| | - Mikael Hartman
- Saw Swee Hock School of Public Health, National University of Singapore, National University Health System, Singapore 119074, Singapore
- Department of Surgery, National University Hospital, National University Health System, Singapore 119074, Singapore
- Institute of Data Science, National University of Singapore, Singapore 117597, Singapore
| | - Feng Mengling
- Saw Swee Hock School of Public Health, National University of Singapore, National University Health System, Singapore 119074, Singapore
- Institute of Data Science, National University of Singapore, Singapore 117597, Singapore
| | - Goshi Oda
- Department of Breast Surgery, Tokyo Medical and Dental University Hospital, Tokyo 113-8510, Japan
| | - Kazunori Kubota
- Department of Radiology, Dokkyo Medical University Saitama Medical Center, Saitama 343-8555, Japan
| | - Ukihide Tateishi
- Department of Diagnostic Radiology, Tokyo Medical and Dental University Hospital, Tokyo 113-8510, Japan
| |
Collapse
|
10
|
Lu Z, Zou Q, Wang M, Han X, Shi X, Wu S, Xie Z, Ye Q, Song L, He Y, Feng Q, Zhao Y. Artificial intelligence improves the diagnosis of human leukocyte antigen (HLA)-B27-negative axial spondyloarthritis based on multi-sequence magnetic resonance imaging and clinical features. Quant Imaging Med Surg 2024; 14:5845-5860. [PMID: 39144059 PMCID: PMC11320510 DOI: 10.21037/qims-24-729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 07/05/2024] [Indexed: 08/16/2024]
Abstract
Background Axial spondyloarthritis (axSpA) is frequently diagnosed late, particularly in human leukocyte antigen (HLA)-B27-negative patients, resulting in a missed opportunity for optimal treatment. This study aimed to develop an artificial intelligence (AI) tool, termed NegSpA-AI, using sacroiliac joint (SIJ) magnetic resonance imaging (MRI) and clinical SpA features to improve the diagnosis of axSpA in HLA-B27-negative patients. Methods We retrospectively included 454 HLA-B27-negative patients with rheumatologist-diagnosed axSpA or other diseases (non-axSpA) from the Third Affiliated Hospital of Southern Medical University and Nanhai Hospital between January 2010 and August 2021. They were divided into a training set (n=328) for 5-fold cross-validation, an internal test set (n=72), and an independent external test set (n=54). To construct a prospective test set, we further enrolled 87 patients between September 2021 and August 2023 from the Third Affiliated Hospital of Southern Medical University. MRI techniques employed included T1-weighted (T1W), T2-weighted (T2W), and fat-suppressed (FS) sequences. We developed NegSpA-AI using a deep learning (DL) network to differentiate between axSpA and non-axSpA at admission. Furthermore, we conducted a reader study involving 4 radiologists and 2 rheumatologists to evaluate and compare the performance of independent and AI-assisted clinicians. Results NegSpA-AI demonstrated superior performance compared to the independent junior rheumatologist (≤5 years of experience), achieving areas under the curve (AUCs) of 0.878 [95% confidence interval (CI): 0.786-0.971], 0.870 (95% CI: 0.771-0.970), and 0.815 (95% CI: 0.714-0.915) on the internal, external, and prospective test sets, respectively. The assistance of NegSpA-AI promoted discriminating accuracy, sensitivity, and specificity of independent junior radiologists by 7.4-11.5%, 1.0-13.3%, and 7.4-20.6% across the 3 test sets (all P<0.05). On the prospective test set, AI assistance also improved the diagnostic accuracy, sensitivity, and specificity of independent junior rheumatologists by 7.7%, 7.7%, and 6.9%, respectively (all P<0.01). Conclusions The proposed NegSpA-AI effectively improves radiologists' interpretations of SIJ MRI and rheumatologists' diagnoses of HLA-B27-negative axSpA.
Collapse
Affiliation(s)
- Zixiao Lu
- Department of Radiology, The Third Affiliated Hospital of Southern Medical University (Academy of Orthopedics, Guangdong Province), Guangzhou, China
| | - Qingqing Zou
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, China
| | - Menghong Wang
- Department of Radiology, The Third Affiliated Hospital of Southern Medical University (Academy of Orthopedics, Guangdong Province), Guangzhou, China
| | - Xinai Han
- Department of Rheumatology and Immunology, The Third Affiliated Hospital of Southern Medical University, Guangzhou, China
| | - Xingliang Shi
- Department of Rheumatology and Immunology, The Third Affiliated Hospital of Southern Medical University, Guangzhou, China
| | - Shufan Wu
- Department of Hematology and Rheumatology, The Second Affiliated Hospital of Xiamen Medical College, Xiamen, China
| | - Zhuoyao Xie
- Department of Radiology, The Third Affiliated Hospital of Southern Medical University (Academy of Orthopedics, Guangdong Province), Guangzhou, China
| | - Qiang Ye
- Department of Radiology, The Third Affiliated Hospital of Southern Medical University (Academy of Orthopedics, Guangdong Province), Guangzhou, China
| | - Liwen Song
- Department of Radiology, The Third Affiliated Hospital of Southern Medical University (Academy of Orthopedics, Guangdong Province), Guangzhou, China
| | - Yi He
- Department of Rheumatology and Immunology, The Third Affiliated Hospital of Southern Medical University, Guangzhou, China
| | - Qianjin Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, China
| | - Yinghua Zhao
- Department of Radiology, The Third Affiliated Hospital of Southern Medical University (Academy of Orthopedics, Guangdong Province), Guangzhou, China
| |
Collapse
|
11
|
J W Partridge G, Darker I, J James J, Satchithananda K, Sharma N, Valencia A, Teh W, Khan H, Muscat E, J Michell M, Chen Y. How long does it take to read a mammogram? Investigating the reading time of digital breast tomosynthesis and digital mammography. Eur J Radiol 2024; 177:111535. [PMID: 38852330 DOI: 10.1016/j.ejrad.2024.111535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 05/16/2024] [Accepted: 05/27/2024] [Indexed: 06/11/2024]
Abstract
PURPOSE To analyse digital breast tomosynthesis (DBT) reading times in the screening setting, compared to 2D full-field digital mammography (FFDM), and investigate the impact of reader experience and professional group on interpretation times. METHOD Reading time data were recorded in the PROSPECTS Trial, a prospective randomised trial comparing DBT plus FFDM or synthetic 2D mammography (S2D) to FFDM alone, in the National Health Service (NHS) breast screening programme, from January 2019-February 2023. Time to read DBT+FFDM or DBT+S2D and FFDM alone was calculated per case and reading times were compared between modalities using dependent T-tests. Reading times were compared between readers from different professional groups (radiologists and radiographer readers) and experience levels using independent T-tests. The learning curve effect of using DBT in screening on reading time was investigated using a Kruskal-Wallis test. RESULTS Forty-eight readers interpreted 1,242 FFDM batches (34,210 FFDM cases) and 973 DBT batches (13,983 DBT cases). DBT reading time was doubled compared to FFDM (2.09 ± 0.64 min vs. 0.98 ± 0.30 min; p < 0.001), and DBT+S2D reading was longer than DBT + FFDM (2.24 ± 0.62 min vs. 2.04 ± 0.46 min; p = 0.006). No difference was identified in reading time between radiologists and radiographers (2.06 ± 0.71 min vs. 2.14 ± 0.46 min, respectively; p = 0.71). Readers with five or more years of experience reading DBT were quicker than those with less experience (1.86 ± 0.56 min vs. 2.37 ± 0.65 min; p = 0.008), and DBT reading time decreased after less than 9 months accrued screening experience (p = 0.01). CONCLUSIONS DBT reading times were double those of FFDM in the screening setting, but there was a short learning curve effect with readers showing significant improvements in reading times within the first nine months of DBT experience. CLINICALTRIALS gov Identifier: NCT03733106.
Collapse
Affiliation(s)
- George J W Partridge
- University of Nottingham, School of Medicine, Translational Medical Sciences, Clinical Sciences Building, City Hospital Campus, Hucknall Road, Nottingham, NG5 1PB, United Kingdom
| | - Iain Darker
- University of Nottingham, School of Medicine, Translational Medical Sciences, Clinical Sciences Building, City Hospital Campus, Hucknall Road, Nottingham, NG5 1PB, United Kingdom
| | - Jonathan J James
- Nottingham University Hospitals NHS Trust, Nottingham Breast Institute, City Hospital Campus, Hucknall Road, Nottingham NG5 1PB, United Kingdom
| | - Keshthra Satchithananda
- Department of Breast Radiology and National Breast Screening Training Centre, King's College Hospital, Denmark Hill, London SE5 9RS, United Kingdom
| | - Nisha Sharma
- Leeds Breast Screening Unit, Leeds Teaching Hospital, York Road, Leeds, LS14 6UH, United Kingdom
| | - Alexandra Valencia
- Avon Breast Screening, Bristol Breast Care Centre, Bristol, BS10 5NB, United Kingdom
| | - William Teh
- North London Breast Screening Service, Edgware Community Hospital, London, HA8 9BA, United Kingdom
| | - Humaira Khan
- City, Sandwell and Walsall Breast Screening Service, Birmingham City Hospital, B18 7QH, United Kingdom
| | - Elizabeth Muscat
- South West London Breast Screening Service, St George's Hospital, London, SW17 0QT, United Kingdom
| | - Michael J Michell
- Department of Breast Radiology and National Breast Screening Training Centre, King's College Hospital, Denmark Hill, London SE5 9RS, United Kingdom
| | - Yan Chen
- University of Nottingham, School of Medicine, Translational Medical Sciences, Clinical Sciences Building, City Hospital Campus, Hucknall Road, Nottingham, NG5 1PB, United Kingdom.
| |
Collapse
|
12
|
Zeng A, Houssami N, Noguchi N, Nickel B, Marinovich ML. Frequency and characteristics of errors by artificial intelligence (AI) in reading screening mammography: a systematic review. Breast Cancer Res Treat 2024; 207:1-13. [PMID: 38853221 PMCID: PMC11230971 DOI: 10.1007/s10549-024-07353-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 04/24/2024] [Indexed: 06/11/2024]
Abstract
PURPOSE Artificial intelligence (AI) for reading breast screening mammograms could potentially replace (some) human-reading and improve screening effectiveness. This systematic review aims to identify and quantify the types of AI errors to better understand the consequences of implementing this technology. METHODS Electronic databases were searched for external validation studies of the accuracy of AI algorithms in real-world screening mammograms. Descriptive synthesis was performed on error types and frequency. False negative proportions (FNP) and false positive proportions (FPP) were pooled within AI positivity thresholds using random-effects meta-analysis. RESULTS Seven retrospective studies (447,676 examinations; published 2019-2022) met inclusion criteria. Five studies reported AI error as false negatives or false positives. Pooled FPP decreased incrementally with increasing positivity threshold (71.83% [95% CI 69.67, 73.90] at Transpara 3 to 10.77% [95% CI 8.34, 13.79] at Transpara 9). Pooled FNP increased incrementally from 0.02% [95% CI 0.01, 0.03] (Transpara 3) to 0.12% [95% CI 0.06, 0.26] (Transpara 9), consistent with a trade-off with FPP. Heterogeneity within thresholds reflected algorithm version and completeness of the reference standard. Other forms of AI error were reported rarely (location error and technical error in one study each). CONCLUSION AI errors are largely interpreted in the framework of test accuracy. FP and FN errors show expected variability not only by positivity threshold, but also by algorithm version and study quality. Reporting of other forms of AI errors is sparse, despite their potential implications for adoption of the technology. Considering broader types of AI error would add nuance to reporting that can inform inferences about AI's utility.
Collapse
Affiliation(s)
- Aileen Zeng
- The Daffodil Centre, The University of Sydney, a Joint Venture with Cancer Council New South Wales, Sydney, NSW, Australia
- School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
- Westmead Applied Research Centre and Faculty of Medicine and Health, The University of Sydney, Westmead, NSW, Australia
| | - Nehmat Houssami
- The Daffodil Centre, The University of Sydney, a Joint Venture with Cancer Council New South Wales, Sydney, NSW, Australia
- School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Naomi Noguchi
- School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Brooke Nickel
- Wiser Healthcare, Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
- Sydney Health Literacy Lab, Sydney School of Public Health, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | - M Luke Marinovich
- The Daffodil Centre, The University of Sydney, a Joint Venture with Cancer Council New South Wales, Sydney, NSW, Australia.
- School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia.
| |
Collapse
|
13
|
Díaz-Grijuela E, Hernández A, Caballero C, Fernandez R, Urtasun R, Gulak M, Astigarraga E, Barajas M, Barreda-Gómez G. From Lipid Signatures to Cellular Responses: Unraveling the Complexity of Melanoma and Furthering Its Diagnosis and Treatment. MEDICINA (KAUNAS, LITHUANIA) 2024; 60:1204. [PMID: 39202486 PMCID: PMC11356604 DOI: 10.3390/medicina60081204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 07/19/2024] [Accepted: 07/22/2024] [Indexed: 09/03/2024]
Abstract
Recent advancements in mass spectrometry have significantly enhanced our understanding of complex lipid profiles, opening new avenues for oncological diagnostics. This review highlights the importance of lipidomics in the comprehension of certain metabolic pathways and its potential for the detection and characterization of various cancers, in particular melanoma. Through detailed case studies, we demonstrate how lipidomic analysis has led to significant breakthroughs in the identification and understanding of cancer types and its potential for detecting unique biomarkers that are instrumental in its diagnosis. Additionally, this review addresses the technical challenges and future perspectives of these methodologies, including their potential expansion and refinement for clinical applications. The discussion underscores the critical role of lipidomic profiling in advancing cancer diagnostics, proposing a new paradigm in how we approach this devastating disease, with particular emphasis on its application in comparative oncology.
Collapse
Affiliation(s)
| | | | | | - Roberto Fernandez
- IMG Pharma Biotech, Research and Development Division, 48170 Zamudio, Spain;
| | - Raquel Urtasun
- Biochemistry Area, Department of Health Science, Universidad Pública de Navarra, 31006 Pamplona, Spain; (R.U.); (M.B.)
| | | | - Egoitz Astigarraga
- Betternostics SL, 31110 Noáin, Spain; (E.D.-G.); (A.H.); (C.C.)
- IMG Pharma Biotech, Research and Development Division, 48170 Zamudio, Spain;
| | - Miguel Barajas
- Biochemistry Area, Department of Health Science, Universidad Pública de Navarra, 31006 Pamplona, Spain; (R.U.); (M.B.)
| | - Gabriel Barreda-Gómez
- Betternostics SL, 31110 Noáin, Spain; (E.D.-G.); (A.H.); (C.C.)
- IMG Pharma Biotech, Research and Development Division, 48170 Zamudio, Spain;
| |
Collapse
|
14
|
Żydowicz WM, Skokowski J, Marano L, Polom K. Navigating the Metaverse: A New Virtual Tool with Promising Real Benefits for Breast Cancer Patients. J Clin Med 2024; 13:4337. [PMID: 39124604 PMCID: PMC11313674 DOI: 10.3390/jcm13154337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/22/2024] [Accepted: 07/22/2024] [Indexed: 08/12/2024] Open
Abstract
BC, affecting both women and men, is a complex disease where early diagnosis plays a crucial role in successful treatment and enhances patient survival rates. The Metaverse, a virtual world, may offer new, personalized approaches to diagnosing and treating BC. Although Artificial Intelligence (AI) is still in its early stages, its rapid advancement indicates potential applications within the healthcare sector, including consolidating patient information in one accessible location. This could provide physicians with more comprehensive insights into disease details. Leveraging the Metaverse could facilitate clinical data analysis and improve the precision of diagnosis, potentially allowing for more tailored treatments for BC patients. However, while this article highlights the possible transformative impacts of virtual technologies on BC treatment, it is important to approach these developments with cautious optimism, recognizing the need for further research and validation to ensure enhanced patient care with greater accuracy and efficiency.
Collapse
Affiliation(s)
- Weronika Magdalena Żydowicz
- Department of General Surgery and Surgical Oncology, “Saint Wojciech” Hospital, “Nicolaus Copernicus” Health Center, Jana Pawła II 50, 80-462 Gdańsk, Poland; (W.M.Ż.); (J.S.)
| | - Jaroslaw Skokowski
- Department of General Surgery and Surgical Oncology, “Saint Wojciech” Hospital, “Nicolaus Copernicus” Health Center, Jana Pawła II 50, 80-462 Gdańsk, Poland; (W.M.Ż.); (J.S.)
- Department of Medicine, Academy of Applied Medical and Social Sciences, Akademia Medycznych I Spolecznych Nauk Stosowanych (AMiSNS), 2 Lotnicza Street, 82-300 Elbląg, Poland;
| | - Luigi Marano
- Department of General Surgery and Surgical Oncology, “Saint Wojciech” Hospital, “Nicolaus Copernicus” Health Center, Jana Pawła II 50, 80-462 Gdańsk, Poland; (W.M.Ż.); (J.S.)
- Department of Medicine, Academy of Applied Medical and Social Sciences, Akademia Medycznych I Spolecznych Nauk Stosowanych (AMiSNS), 2 Lotnicza Street, 82-300 Elbląg, Poland;
| | - Karol Polom
- Department of Medicine, Academy of Applied Medical and Social Sciences, Akademia Medycznych I Spolecznych Nauk Stosowanych (AMiSNS), 2 Lotnicza Street, 82-300 Elbląg, Poland;
- Department of Gastrointestinal Surgical Oncology, Greater Poland Cancer Centre, Garbary 15, 61-866 Poznan, Poland
| |
Collapse
|
15
|
Zhu B, Yang Y. Quality assessment of abdominal CT images: an improved ResNet algorithm with dual-attention mechanism. Am J Transl Res 2024; 16:3099-3107. [PMID: 39114678 PMCID: PMC11301486 DOI: 10.62347/wkns8633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 05/19/2024] [Indexed: 08/10/2024]
Abstract
OBJECTIVES To enhance medical image classification using a Dual-attention ResNet model and investigate the impact of attention mechanisms on model performance in a clinical setting. METHODS We utilized a dataset of medical images and implemented a Dual-attention ResNet model, integrating self-attention and spatial attention mechanisms. The model was trained and evaluated using binary and five-level quality classification tasks, leveraging standard evaluation metrics. RESULTS Our findings demonstrated substantial performance improvements with the Dual-attention ResNet model in both classification tasks. In the binary classification task, the model achieved an accuracy of 0.940, outperforming the conventional ResNet model. Similarly, in the five-level quality classification task, the Dual-attention ResNet model attained an accuracy of 0.757, highlighting its efficacy in capturing nuanced distinctions in image quality. CONCLUSIONS The integration of attention mechanisms within the ResNet model resulted in significant performance enhancements, showcasing its potential for improving medical image classification tasks. These results underscore the promising role of attention mechanisms in facilitating more accurate and discriminative analysis of medical images, thus holding substantial promise for clinical applications in radiology and diagnostics.
Collapse
Affiliation(s)
- Boying Zhu
- Shanghai Institute of Technical Physics, Chinese Academy of SciencesShanghai 200083, China
- University of Chinese Academy of SciencesBeijing 100049, China
| | - Yuanyuan Yang
- Shanghai Institute of Technical Physics, Chinese Academy of SciencesShanghai 200083, China
- University of Chinese Academy of SciencesBeijing 100049, China
| |
Collapse
|
16
|
Lu G, Tian R, Yang W, Liu R, Liu D, Xiang Z, Zhang G. Deep learning radiomics based on multimodal imaging for distinguishing benign and malignant breast tumours. Front Med (Lausanne) 2024; 11:1402967. [PMID: 39036101 PMCID: PMC11257849 DOI: 10.3389/fmed.2024.1402967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/14/2024] [Indexed: 07/23/2024] Open
Abstract
Objectives This study aimed to develop a deep learning radiomic model using multimodal imaging to differentiate benign and malignant breast tumours. Methods Multimodality imaging data, including ultrasonography (US), mammography (MG), and magnetic resonance imaging (MRI), from 322 patients (112 with benign breast tumours and 210 with malignant breast tumours) with histopathologically confirmed breast tumours were retrospectively collected between December 2018 and May 2023. Based on multimodal imaging, the experiment was divided into three parts: traditional radiomics, deep learning radiomics, and feature fusion. We tested the performance of seven classifiers, namely, SVM, KNN, random forest, extra trees, XGBoost, LightGBM, and LR, on different feature models. Through feature fusion using ensemble and stacking strategies, we obtained the optimal classification model for benign and malignant breast tumours. Results In terms of traditional radiomics, the ensemble fusion strategy achieved the highest accuracy, AUC, and specificity, with values of 0.892, 0.942 [0.886-0.996], and 0.956 [0.873-1.000], respectively. The early fusion strategy with US, MG, and MRI achieved the highest sensitivity of 0.952 [0.887-1.000]. In terms of deep learning radiomics, the stacking fusion strategy achieved the highest accuracy, AUC, and sensitivity, with values of 0.937, 0.947 [0.887-1.000], and 1.000 [0.999-1.000], respectively. The early fusion strategies of US+MRI and US+MG achieved the highest specificity of 0.954 [0.867-1.000]. In terms of feature fusion, the ensemble and stacking approaches of the late fusion strategy achieved the highest accuracy of 0.968. In addition, stacking achieved the highest AUC and specificity, which were 0.997 [0.990-1.000] and 1.000 [0.999-1.000], respectively. The traditional radiomic and depth features of US+MG + MR achieved the highest sensitivity of 1.000 [0.999-1.000] under the early fusion strategy. Conclusion This study demonstrated the potential of integrating deep learning and radiomic features with multimodal images. As a single modality, MRI based on radiomic features achieved greater accuracy than US or MG. The US and MG models achieved higher accuracy with transfer learning than the single-mode or radiomic models. The traditional radiomic and depth features of US+MG + MR achieved the highest sensitivity under the early fusion strategy, showed higher diagnostic performance, and provided more valuable information for differentiation between benign and malignant breast tumours.
Collapse
Affiliation(s)
- Guoxiu Lu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning, China
| | - Ronghui Tian
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Wei Yang
- Department of Radiology, Cancer Hospital of China Medical University, Liaoning Cancer Hospital and Institute, Shenyang, Liaoning, China
| | - Ruibo Liu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Dongmei Liu
- Department of Ultrasound, Beijing Shijitan Hospital, Capital Medical University, Beijing, China
| | - Zijie Xiang
- Biomedical Engineering, Shenyang University of Technology, Shenyang, Liaoning, China
| | - Guoxu Zhang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning, China
| |
Collapse
|
17
|
Çelik L, Aribal E. The efficacy of artificial intelligence (AI) in detecting interval cancers in the national screening program of a middle-income country. Clin Radiol 2024; 79:e885-e891. [PMID: 38649312 DOI: 10.1016/j.crad.2024.03.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/14/2024] [Accepted: 03/21/2024] [Indexed: 04/25/2024]
Abstract
AIM We aimed to investigate the efficiency and accuracy of an artificial intelligence (AI) algorithm for detecting interval cancers in a middle-income country's national screening program. MATERIAL AND METHODS A total of 2,129,486 mammograms reported as BIRADS 1 and 2 were matched with the national cancer registry for interval cancers (IC). The IC group consisted of 442 cases, of which 36 were excluded due to having mammograms incompatible with the AI system. A control group of 446 women with two negative consequent mammograms was defined as time-proven normal and constituted the normal group. The cancer risk scores of both groups were determined from 1 to 10 with the AI system. The sensitivity and specificity values of the AI system were defined in terms of IC detection. The IC group was divided into subgroups with six-month intervals according to their time from screening to diagnosis: 0-6 months, 6-12 months, 12-18 months, and 18-24 months. The diagnostic performance of the AI system for all patients was evaluated using receiver operating characteristics (ROC) curve analysis. The diagnostic performance of the AI system for major and minor findings that expert readers determined was re-evaluated. RESULTS AI labeled 53% of ICs with the highest score of 10. The sensitivity of AI in detecting ICs was 53.7% and 38.5% at specificities of 90% and 95%, respectively. Area under the curve (AUC) of AI in detecting major signs was 0.93 (95% CI: 0.90-0.95) with a sensitivity of 81.6% and 72.4% at specificities of 90% and 95%, respectively (95% CI: 0.73-0.88 and 95% CI: 0.60-0.82 respectively) and minor signs was 0.87 (95% CI: 0.87-0.92) with a sensitivity of 70% and 53% at a specificity of 90% and 95%, respectively (95% CI: 0.65-0.82 and 95% CI: 0.52-0.71 respectively). In subgroup analysis for time to diagnosis, the AUC value of the AI system was higher in the 0-6 month period than in later periods. CONCLUSION This study showed the potential of AI in detecting ICs in initial mammograms and reducing human errors and undetected cancers.
Collapse
Affiliation(s)
- L Çelik
- Maltepe University Hospital, Feyzullah cad 39, Maltepe, 34843, Istanbul, Turkey.
| | - E Aribal
- Acibadem University, School of Medicine, 34752, Istanbul, Turkey; Acibadem Altunizade Hospital, Tophanelioglu cad 13, Altunizade, 34662, Istanbul, Turkey.
| |
Collapse
|
18
|
Bhalla D, Rangarajan K, Chandra T, Banerjee S, Arora C. Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature. Indian J Radiol Imaging 2024; 34:469-487. [PMID: 38912238 PMCID: PMC11188703 DOI: 10.1055/s-0043-1775737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024] Open
Abstract
Background Although abundant literature is currently available on the use of deep learning for breast cancer detection in mammography, the quality of such literature is widely variable. Purpose To evaluate published literature on breast cancer detection in mammography for reproducibility and to ascertain best practices for model design. Methods The PubMed and Scopus databases were searched to identify records that described the use of deep learning to detect lesions or classify images into cancer or noncancer. A modification of Quality Assessment of Diagnostic Accuracy Studies (mQUADAS-2) tool was developed for this review and was applied to the included studies. Results of reported studies (area under curve [AUC] of receiver operator curve [ROC] curve, sensitivity, specificity) were recorded. Results A total of 12,123 records were screened, of which 107 fit the inclusion criteria. Training and test datasets, key idea behind model architecture, and results were recorded for these studies. Based on mQUADAS-2 assessment, 103 studies had high risk of bias due to nonrepresentative patient selection. Four studies were of adequate quality, of which three trained their own model, and one used a commercial network. Ensemble models were used in two of these. Common strategies used for model training included patch classifiers, image classification networks (ResNet in 67%), and object detection networks (RetinaNet in 67%). The highest reported AUC was 0.927 ± 0.008 on a screening dataset, while it reached 0.945 (0.919-0.968) on an enriched subset. Higher values of AUC (0.955) and specificity (98.5%) were reached when combined radiologist and Artificial Intelligence readings were used than either of them alone. None of the studies provided explainability beyond localization accuracy. None of the studies have studied interaction between AI and radiologist in a real world setting. Conclusion While deep learning holds much promise in mammography interpretation, evaluation in a reproducible clinical setting and explainable networks are the need of the hour.
Collapse
Affiliation(s)
- Deeksha Bhalla
- Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India
| | - Krithika Rangarajan
- Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India
| | - Tany Chandra
- Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India
| | - Subhashis Banerjee
- Department of Computer Science and Engineering, Indian Institute of Technology, New Delhi, India
| | - Chetan Arora
- Department of Computer Science and Engineering, Indian Institute of Technology, New Delhi, India
| |
Collapse
|
19
|
Lee KS, Choi E, Cho SI, Park S, Ryu J, Puche AV, Ma M, Park J, Jung W, Ro J, Kim S, Park G, Song S, Ock CY, Choe G, Park JH. An artificial intelligence-powered PD-L1 combined positive score (CPS) analyser in urothelial carcinoma alleviating interobserver and intersite variability. Histopathology 2024; 85:81-91. [PMID: 38477366 DOI: 10.1111/his.15176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/20/2024] [Accepted: 02/29/2024] [Indexed: 03/14/2024]
Abstract
AIMS Immune checkpoint inhibitors targeting programmed death-ligand 1 (PD-L1) have shown promising clinical outcomes in urothelial carcinoma (UC). The combined positive score (CPS) quantifies PD-L1 22C3 expression in UC, but it can vary between pathologists due to the consideration of both immune and tumour cell positivity. METHODS AND RESULTS An artificial intelligence (AI)-powered PD-L1 CPS analyser was developed using 1,275,907 cells and 6175.42 mm2 of tissue annotated by pathologists, extracted from 400 PD-L1 22C3-stained whole slide images of UC. We validated the AI model on 543 UC PD-L1 22C3 cases collected from three institutions. There were 446 cases (82.1%) where the CPS results (CPS ≥10 or <10) were in complete agreement between three pathologists, and 486 cases (89.5%) where the AI-powered CPS results matched the consensus of two or more pathologists. In the pathologist's assessment of the CPS, statistically significant differences were noted depending on the source hospital (P = 0.003). Three pathologists reevaluated discrepancy cases with AI-powered CPS results. After using the AI as a guide and revising, the complete agreement increased to 93.9%. The AI model contributed to improving the concordance between pathologists across various factors including hospital, specimen type, pathologic T stage, histologic subtypes, and dominant PD-L1-positive cell type. In the revised results, the evaluation discordance among slides from different hospitals was mitigated. CONCLUSION This study suggests that AI models can help pathologists to reduce discrepancies between pathologists in quantifying immunohistochemistry including PD-L1 22C3 CPS, especially when evaluating data from different institutions, such as in a telepathology setting.
Collapse
Affiliation(s)
- Kyu Sang Lee
- Department of Pathology, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam-si, Republic of Korea
| | - Euno Choi
- Department of Pathology, Ewha Womans University Mokdong Hospital, Ewha Womans University College of Medicine, Seoul, Republic of Korea
| | | | | | | | | | | | | | | | | | | | | | | | | | - Gheeyoung Choe
- Department of Pathology, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam-si, Republic of Korea
| | - Jeong Hwan Park
- Department of Pathology, SMG-SNU Boramae Medical Center, Seoul National University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
20
|
Johansson JV, Engström E. 'Humans think outside the pixels' - Radiologists' perceptions of using artificial intelligence for breast cancer detection in mammography screening in a clinical setting. Health Informatics J 2024; 30:14604582241275020. [PMID: 39155239 DOI: 10.1177/14604582241275020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/20/2024]
Abstract
OBJECTIVE This study aimed to explore radiologists' views on using an artificial intelligence (AI) tool named ScreenTrustCAD with Philips equipment) as a diagnostic decision support tool in mammography screening during a clinical trial at Capio Sankt Göran Hospital, Sweden. METHODS We conducted semi-structured interviews with seven breast imaging radiologists, evaluated using inductive thematic content analysis. RESULTS We identified three main thematic categories: AI in society, reflecting views on AI's contribution to the healthcare system; AI-human interactions, addressing the radiologists' self-perceptions when using the AI and its potential challenges to their profession; and AI as a tool among others. The radiologists were generally positive towards AI, and they felt comfortable handling its sometimes-ambiguous outputs and erroneous evaluations. While they did not feel that it would undermine their profession, they preferred using it as a complementary reader rather than an independent one. CONCLUSION The results suggested that breast radiology could become a launch pad for AI in healthcare. We recommend that this exploratory work on subjective perceptions be complemented by quantitative assessments to generalize the findings.
Collapse
Affiliation(s)
- Jennifer Viberg Johansson
- Department of Public Health and Caring Sciences, Centre for Research Ethics & Bioethics, Uppsala University, Uppsala, Sweden
| | - Emma Engström
- Institute for Futures Studies, Stockholm, Sweden; Department of Urban Planning and Environment, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
21
|
Ahsen ME, Vogel R, Stolovitzky G. Optimal linear ensemble of binary classifiers. BIOINFORMATICS ADVANCES 2024; 4:vbae093. [PMID: 39011276 PMCID: PMC11249386 DOI: 10.1093/bioadv/vbae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/03/2024] [Accepted: 06/13/2024] [Indexed: 07/17/2024]
Abstract
Motivation The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data. Results To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregation (MOCA) algorithm, which addresses the problem of generalization by virtue of being an ensemble learning method and can be used in problems with limited or no labeled data. We developed both an unsupervised (uMOCA) and a supervised (sMOCA) variant of MOCA. For uMOCA, we show how to infer the MOCA weights in an unsupervised way, which are optimal under the assumption of class-conditioned independent classifier predictions. When it is possible to use labels, sMOCA uses empirically computed MOCA weights. We demonstrate the performance of uMOCA and sMOCA using simulated data as well as actual data previously used in Dialogue on Reverse Engineering and Methods (DREAM) challenges. We also propose an application of sMOCA for transfer learning where we use pre-trained computational models from a domain where labeled data are abundant and apply them to a different domain with less abundant labeled data. Availability and implementation GitHub repository, https://github.com/robert-vogel/moca.
Collapse
Affiliation(s)
- Mehmet Eren Ahsen
- Department of Business Administration, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, United States
- Department of Biomedical and Translational Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, United States
| | - Robert Vogel
- Thomas J. Watson Research Center, IBM, New York, NY 10598, United States
- Department of Integrated Structural and Computational Biology, Scripps Research, La Jolla, CA 92037, United States
| | | |
Collapse
|
22
|
Kolla L, Parikh RB. Uses and limitations of artificial intelligence for oncology. Cancer 2024; 130:2101-2107. [PMID: 38554271 PMCID: PMC11170282 DOI: 10.1002/cncr.35307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/19/2024] [Accepted: 03/15/2024] [Indexed: 04/01/2024]
Abstract
Modern artificial intelligence (AI) tools built on high-dimensional patient data are reshaping oncology care, helping to improve goal-concordant care, decrease cancer mortality rates, and increase workflow efficiency and scope of care. However, data-related concerns and human biases that seep into algorithms during development and post-deployment phases affect performance in real-world settings, limiting the utility and safety of AI technology in oncology clinics. To this end, the authors review the current potential and limitations of predictive AI for cancer diagnosis and prognostication as well as of generative AI, specifically modern chatbots, which interfaces with patients and clinicians. They conclude the review with a discussion on ongoing challenges and regulatory opportunities in the field.
Collapse
Affiliation(s)
- Likhitha Kolla
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ravi B. Parikh
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
| |
Collapse
|
23
|
Gim N, Wu Y, Blazes M, Lee CS, Wang RK, Lee AY. A Clinician's Guide to Sharing Data for AI in Ophthalmology. Invest Ophthalmol Vis Sci 2024; 65:21. [PMID: 38864811 PMCID: PMC11174091 DOI: 10.1167/iovs.65.6.21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 05/17/2024] [Indexed: 06/13/2024] Open
Abstract
Data is the cornerstone of using AI models, because their performance directly depends on the diversity, quantity, and quality of the data used for training. Using AI presents unique potential, particularly in medical applications that involve rich data such as ophthalmology, encompassing a variety of imaging methods, medical records, and eye-tracking data. However, sharing medical data comes with challenges because of regulatory issues and privacy concerns. This review explores traditional and nontraditional data sharing methods in medicine, focusing on previous works in ophthalmology. Traditional methods involve direct data transfer, whereas newer approaches prioritize security and privacy by sharing derived datasets, creating secure research environments, or using model-to-data strategies. We examine each method's mechanisms, variations, recent applications in ophthalmology, and their respective advantages and disadvantages. By empowering medical researchers with insights into data sharing methods and considerations, this review aims to assist informed decision-making while upholding ethical standards and patient privacy in medical AI development.
Collapse
Affiliation(s)
- Nayoon Gim
- Department of Ophthalmology, University of Washington, Seattle, WA, United States
- The Roger and Angie Karalis Retina Center, Seattle, Washington, United States
- Department of Bioengineering, University of Washington, Seattle, WA, United States
| | - Yue Wu
- Department of Ophthalmology, University of Washington, Seattle, WA, United States
- The Roger and Angie Karalis Retina Center, Seattle, Washington, United States
| | - Marian Blazes
- Department of Ophthalmology, University of Washington, Seattle, WA, United States
- The Roger and Angie Karalis Retina Center, Seattle, Washington, United States
| | - Cecilia S. Lee
- Department of Ophthalmology, University of Washington, Seattle, WA, United States
- The Roger and Angie Karalis Retina Center, Seattle, Washington, United States
| | - Ruikang K. Wang
- Department of Ophthalmology, University of Washington, Seattle, WA, United States
- Department of Bioengineering, University of Washington, Seattle, WA, United States
| | - Aaron Y. Lee
- Department of Ophthalmology, University of Washington, Seattle, WA, United States
- The Roger and Angie Karalis Retina Center, Seattle, Washington, United States
| |
Collapse
|
24
|
Lee SE, Hong H, Kim EK. Diagnostic performance with and without artificial intelligence assistance in real-world screening mammography. Eur J Radiol Open 2024; 12:100545. [PMID: 38293282 PMCID: PMC10825593 DOI: 10.1016/j.ejro.2023.100545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 12/27/2023] [Accepted: 12/29/2023] [Indexed: 02/01/2024] Open
Abstract
Purpose To evaluate artificial intelligence-based computer-aided diagnosis (AI-CAD) for screening mammography, we analyzed the diagnostic performance of radiologists by providing and withholding AI-CAD results alternatively every month. Methods This retrospective study was approved by the institutional review board with a waiver for informed consent. Between August 2020 and May 2022, 1819 consecutive women (mean age 50.8 ± 9.4 years) with 2061 screening mammography and ultrasound performed on the same day in a single institution were included. Radiologists interpreted screening mammography in clinical practice with AI-CAD results being provided or withheld alternatively by month. The AI-CAD results were retrospectively obtained for analysis even when withheld from radiologists. The diagnostic performances of radiologists and stand-alone AI-CAD were compared and the performances of radiologists with and without AI-CAD assistance were also compared by cancer detection rate, recall rate, sensitivity, specificity, accuracy and area under the receiver-operating-characteristics curve (AUC). Results Twenty-nine breast cancer patients and 1790 women without cancers were included. Diagnostic performances of the radiologists did not significantly differ with and without AI-CAD assistance. Radiologists with AI-CAD assistance showed the same sensitivity (76.5%) and similar specificity (92.3% vs 93.8%), AUC (0.844 vs 0.851), and recall rates (8.8% vs. 7.4%) compared to standalone AI-CAD. Radiologists without AI-CAD assistance showed lower specificity (91.9% vs 94.6%) and accuracy (91.5% vs 94.1%) and higher recall rates (8.6% vs 5.9%, all p < 0.05) compared to stand-alone AI-CAD. Conclusion Radiologists showed no significant difference in diagnostic performance when both screening mammography and ultrasound were performed with or without AI-CAD assistance for mammography. However, without AI-CAD assistance, radiologists showed lower specificity and accuracy and higher recall rates compared to stand-alone AI-CAD.
Collapse
Affiliation(s)
| | | | - Eun-Kyung Kim
- Correspondence to: Department of Radiology, Yongin Severance Hospital, Yonsei University College of Medicine, 363, Dongbaekjukjeon-daero, Giheung-gul̥, Yongin-si, Gyeonggi-do, Korea.
| |
Collapse
|
25
|
Kühl J, Elhakim MT, Stougaard SW, Rasmussen BSB, Nielsen M, Gerke O, Larsen LB, Graumann O. Population-wide evaluation of artificial intelligence and radiologist assessment of screening mammograms. Eur Radiol 2024; 34:3935-3946. [PMID: 37938386 PMCID: PMC11166831 DOI: 10.1007/s00330-023-10423-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 10/09/2023] [Accepted: 10/14/2023] [Indexed: 11/09/2023]
Abstract
OBJECTIVES To validate an AI system for standalone breast cancer detection on an entire screening population in comparison to first-reading breast radiologists. MATERIALS AND METHODS All mammography screenings performed between August 4, 2014, and August 15, 2018, in the Region of Southern Denmark with follow-up within 24 months were eligible. Screenings were assessed as normal or abnormal by breast radiologists through double reading with arbitration. For an AI decision of normal or abnormal, two AI-score cut-off points were applied by matching at mean sensitivity (AIsens) and specificity (AIspec) of first readers. Accuracy measures were sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and recall rate (RR). RESULTS The sample included 249,402 screenings (149,495 women) and 2033 breast cancers (72.6% screen-detected cancers, 27.4% interval cancers). AIsens had lower specificity (97.5% vs 97.7%; p < 0.0001) and PPV (17.5% vs 18.7%; p = 0.01) and a higher RR (3.0% vs 2.8%; p < 0.0001) than first readers. AIspec was comparable to first readers in terms of all accuracy measures. Both AIsens and AIspec detected significantly fewer screen-detected cancers (1166 (AIsens), 1156 (AIspec) vs 1252; p < 0.0001) but found more interval cancers compared to first readers (126 (AIsens), 117 (AIspec) vs 39; p < 0.0001) with varying types of cancers detected across multiple subgroups. CONCLUSION Standalone AI can detect breast cancer at an accuracy level equivalent to the standard of first readers when the AI threshold point was matched at first reader specificity. However, AI and first readers detected a different composition of cancers. CLINICAL RELEVANCE STATEMENT Replacing first readers with AI with an appropriate cut-off score could be feasible. AI-detected cancers not detected by radiologists suggest a potential increase in the number of cancers detected if AI is implemented to support double reading within screening, although the clinicopathological characteristics of detected cancers would not change significantly. KEY POINTS • Standalone AI cancer detection was compared to first readers in a double-read mammography screening population. • Standalone AI matched at first reader specificity showed no statistically significant difference in overall accuracy but detected different cancers. • With an appropriate threshold, AI-integrated screening can increase the number of detected cancers with similar clinicopathological characteristics.
Collapse
Affiliation(s)
- Johanne Kühl
- Department of Clinical Research, University of Southern Denmark, Kløvervænget 10, 2ndfloor, 5000, Odense C, Denmark
| | - Mohammad Talal Elhakim
- Department of Clinical Research, University of Southern Denmark, Kløvervænget 10, 2ndfloor, 5000, Odense C, Denmark.
- Department of Radiology, Odense University Hospital, Kløvervænget 47, Ground Floor, 5000, Odense C, Denmark.
| | - Sarah Wordenskjold Stougaard
- Department of Clinical Research, University of Southern Denmark, Kløvervænget 10, 2ndfloor, 5000, Odense C, Denmark
| | - Benjamin Schnack Brandt Rasmussen
- Department of Clinical Research, University of Southern Denmark, Kløvervænget 10, 2ndfloor, 5000, Odense C, Denmark
- Department of Radiology, Odense University Hospital, Kløvervænget 47, Ground Floor, 5000, Odense C, Denmark
- CAI-X - Centre for Clinical Artificial Intelligence, Odense University Hospital, Kløvervænget 8C, 5000, Odense C, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100, Copenhagen, Denmark
| | - Oke Gerke
- Department of Clinical Research, University of Southern Denmark, Kløvervænget 10, 2ndfloor, 5000, Odense C, Denmark
- Department of Nuclear Medicine, Odense University Hospital, Kløvervænget 47, 5000, Odense C, Denmark
| | - Lisbet Brønsro Larsen
- Department of Radiology, Odense University Hospital, Kløvervænget 47, Ground Floor, 5000, Odense C, Denmark
| | - Ole Graumann
- Department of Clinical Research, University of Southern Denmark, Kløvervænget 10, 2ndfloor, 5000, Odense C, Denmark
- Department of Radiology, Aarhus University Hospital, Palle Juul-Jensens Blvd. 99, 8200, Aarhus N, Denmark
- Department of Clinical Research, Aarhus University, Palle Juul-Jensens Blvd. 99, 8200, Aarhus N, Denmark
| |
Collapse
|
26
|
Kim HJ, Choi WJ, Gwon HY, Jang SJ, Chae EY, Shin HJ, Cha JH, Kim HH. Improving mammography interpretation for both novice and experienced readers: a comparative study of two commercial artificial intelligence software. Eur Radiol 2024; 34:3924-3934. [PMID: 37938383 DOI: 10.1007/s00330-023-10422-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 09/15/2023] [Accepted: 10/14/2023] [Indexed: 11/09/2023]
Abstract
OBJECTIVES To evaluate the improvement of mammography interpretation for novice and experienced radiologists assisted by two commercial AI software. METHODS We compared the performance of two AI software (AI-1 and AI-2) in two experienced and two novice readers for 200 mammographic examinations (80 cancer cases). Two reading sessions were conducted within 4 weeks. The readers rated the likelihood of malignancy (range, 1-7) and the percentage probability of malignancy (range, 0-100%), with and without AI assistance. Differences in AUROC, sensitivity, and specificity were analyzed. RESULTS Mean AUROC increased in both novice (0.86 to 0.90 with AI-1 [p = 0.005]; 0.91 with AI-2 [p < 0.001]) and experienced readers (0.87 to 0.92 with AI-1 [p < 0.001]; 0.90 with AI-2 [p = 0.004]). Sensitivities increased from 81.3 to 88.8% with AI-1 (p = 0.027) and to 91.3% with AI-2 (p = 0.005) in novice readers, and from 81.9 to 90.6% with AI-1 (p = 0.001) and to 87.5% with AI-2 (p = 0.016) in experienced readers. Specificity did not decrease significantly in both novice (p > 0.999, both) and experienced readers (p > 0.999 with AI-1 and 0.282 with AI-2). There was no significant difference in the performance change depending on the type of AI software (p > 0.999). CONCLUSION Commercial AI software improved the diagnostic performance of both novice and experienced readers. The type of AI software used did not significantly impact performance changes. Further validation with a larger number of cases and readers is needed. CLINICAL RELEVANCE STATEMENT Commercial AI software effectively aided mammography interpretation irrespective of the experience level of human readers. KEY POINTS • Mammography interpretation remains challenging and is subject to a wide range of interobserver variability. • In this multi-reader study, two commercial AI software improved the sensitivity of mammography interpretation by both novice and experienced readers. The type of AI software used did not significantly impact performance changes. • Commercial AI software may effectively support mammography interpretation irrespective of the experience level of human readers.
Collapse
Affiliation(s)
- Hee Jeong Kim
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, South Korea
| | - Woo Jung Choi
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, South Korea.
| | - Hye Yun Gwon
- Department of Radiology, Hallym University Sacred Heart Hospital, 22, Gwanpyeong-Ro 170-Gil, Dongan-Gu, Anyang-Si, Gyeonggi-Do, 14068, South Korea
| | - Seo Jin Jang
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, South Korea
| | - Eun Young Chae
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, South Korea
| | - Hee Jung Shin
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, South Korea
| | - Joo Hee Cha
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, South Korea
| | - Hak Hee Kim
- Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, South Korea
| |
Collapse
|
27
|
Koç U, Sezer EA, Özkaya YA, Yarbay Y, Beşler MS, Taydaş O, Yalçın A, Evrimler Ş, Kızıloğlu HA, Kesimal U, Atasoy D, Oruç M, Ertuğrul M, Karakaş E, Karademir F, Sebik NB, Topuz Y, Aktan ME, Sezer Ö, Aydın Ş, Varlı S, Akdoğan E, Ülgü MM, Birinci Ş. Elevating healthcare through artificial intelligence: analyzing the abdominal emergencies data set (TR_ABDOMEN_RAD_EMERGENCY) at TEKNOFEST-2022. Eur Radiol 2024; 34:3588-3597. [PMID: 37947834 DOI: 10.1007/s00330-023-10391-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 08/28/2023] [Accepted: 09/08/2023] [Indexed: 11/12/2023]
Abstract
OBJECTIVES The artificial intelligence competition in healthcare at TEKNOFEST-2022 provided a platform to address the complex multi-class classification challenge of abdominal emergencies using computer vision techniques. This manuscript aimed to comprehensively present the methodologies for data preparation, annotation procedures, and rigorous evaluation metrics. Moreover, it was conducted to introduce a meticulously curated abdominal emergencies data set to the researchers. METHODS The data set underwent a comprehensive central screening procedure employing diverse algorithms extracted from the e-Nabız (Pulse) and National Teleradiology System of the Republic of Türkiye, Ministry of Health. Full anonymization of the data set was conducted. Subsequently, the data set was annotated by a group of ten experienced radiologists. The evaluation process was executed by calculating F1 scores, which were derived from the intersection over union values between the predicted bounding boxes and the corresponding ground truth (GT) bounding boxes. The establishment of baseline performance metrics involved computing the average of the highest five F1 scores. RESULTS Observations indicated a progressive decline in F1 scores as the threshold value increased. Furthermore, it could be deduced that class 6 (abdominal aortic aneurysm/dissection) was relatively straightforward to detect compared to other classes, with class 5 (acute diverticulitis) presenting the most formidable challenge. It is noteworthy, however, that if all achieved outcomes for all classes were considered with a threshold of 0.5, the data set's complexity and associated challenges became pronounced. CONCLUSION This data set's significance lies in its pioneering provision of labels and GT-boxes for six classes, fostering opportunities for researchers. CLINICAL RELEVANCE STATEMENT The prompt identification and timely intervention in cases of emergent medical conditions hold paramount significance. The handling of patients' care can be augmented, while the potential for errors is minimized, particularly amidst high caseload scenarios, through the application of AI. KEY POINTS • The data set used in artificial intelligence competition in healthcare (TEKNOFEST-2022) provides a 6-class data set of abdominal CT images consisting of a great variety of abdominal emergencies. • This data set is compiled from the National Teleradiology System data repository of emergency radiology departments of 459 hospitals. • Radiological data on abdominal emergencies is scarce in literature and this annotated competition data set can be a valuable resource for further studies and new AI models.
Collapse
Affiliation(s)
- Ural Koç
- Department of Radiology, Ankara Bilkent City Hospital, Ankara, Türkiye.
| | - Ebru Akçapınar Sezer
- Artificial Intelligence Division, Department of Computer Engineering, Hacettepe University, Ankara, Türkiye
| | | | - Yasin Yarbay
- General Directorate of Health Information Systems, Ministry of Health, Ankara, Türkiye
| | | | - Onur Taydaş
- Department of Radiology, Faculty of Medicine, Sakarya University, Sakarya, Türkiye
| | - Ahmet Yalçın
- Department of Radiology, Faculty of Medicine, Erzurum Atatürk University, Erzurum, Türkiye
| | - Şehnaz Evrimler
- Department of Radiology, Ankara Etlik City Hospital, Ankara, Türkiye
| | | | - Uğur Kesimal
- Department of Radiology, Ankara Training and Research Hospital, Ankara, Türkiye
| | - Dilara Atasoy
- Department of Radiology, Sivas Numune State Hospital, Sivas, Türkiye
| | - Meltem Oruç
- Department of Radiology, Karaman Training and Research Hospital, Karaman, Türkiye
| | - Mustafa Ertuğrul
- Department of Radiology, Ürgüp State Hospital, Nevşehir, Türkiye
| | - Emrah Karakaş
- General Directorate of Health Information Systems, Ministry of Health, Ankara, Türkiye
| | | | - Nihat Barış Sebik
- General Directorate of Health Information Systems, Ministry of Health, Ankara, Türkiye
| | | | | | - Özgür Sezer
- General Directorate of Health Information Systems, Ministry of Health, Ankara, Türkiye
| | - Şahin Aydın
- General Directorate of Health Information Systems, Ministry of Health, Ankara, Türkiye
| | - Songül Varlı
- Health Institutes of Türkiye, İstanbul, Türkiye
- Department of Computer Engineering, Yıldız Technical University, İstanbul, Türkiye
| | - Erhan Akdoğan
- Health Institutes of Türkiye, İstanbul, Türkiye
- Department of Mechatronics Engineering, Faculty of Mechanical Engineering, Yıldız Technical University, İstanbul, Türkiye
| | - Mustafa Mahir Ülgü
- General Directorate of Health Information Systems, Ministry of Health, Ankara, Türkiye
| | | |
Collapse
|
28
|
Di Sarno L, Caroselli A, Tonin G, Graglia B, Pansini V, Causio FA, Gatto A, Chiaretti A. Artificial Intelligence in Pediatric Emergency Medicine: Applications, Challenges, and Future Perspectives. Biomedicines 2024; 12:1220. [PMID: 38927427 PMCID: PMC11200597 DOI: 10.3390/biomedicines12061220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/19/2024] [Accepted: 05/28/2024] [Indexed: 06/28/2024] Open
Abstract
The dawn of Artificial intelligence (AI) in healthcare stands as a milestone in medical innovation. Different medical fields are heavily involved, and pediatric emergency medicine is no exception. We conducted a narrative review structured in two parts. The first part explores the theoretical principles of AI, providing all the necessary background to feel confident with these new state-of-the-art tools. The second part presents an informative analysis of AI models in pediatric emergencies. We examined PubMed and Cochrane Library from inception up to April 2024. Key applications include triage optimization, predictive models for traumatic brain injury assessment, and computerized sepsis prediction systems. In each of these domains, AI models outperformed standard methods. The main barriers to a widespread adoption include technological challenges, but also ethical issues, age-related differences in data interpretation, and the paucity of comprehensive datasets in the pediatric context. Future feasible research directions should address the validation of models through prospective datasets with more numerous sample sizes of patients. Furthermore, our analysis shows that it is essential to tailor AI algorithms to specific medical needs. This requires a close partnership between clinicians and developers. Building a shared knowledge platform is therefore a key step.
Collapse
Affiliation(s)
- Lorenzo Di Sarno
- Department of Pediatrics, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (A.C.); (B.G.); (A.C.)
- The Italian Society of Artificial Intelligence in Medicine (SIIAM), 00165 Rome, Italy; (F.A.C.); (A.G.)
| | - Anya Caroselli
- Department of Pediatrics, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (A.C.); (B.G.); (A.C.)
| | - Giovanna Tonin
- Department of Pediatrics, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00168 Rome, Italy; (G.T.); (V.P.)
| | - Benedetta Graglia
- Department of Pediatrics, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (A.C.); (B.G.); (A.C.)
| | - Valeria Pansini
- Department of Pediatrics, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00168 Rome, Italy; (G.T.); (V.P.)
| | - Francesco Andrea Causio
- The Italian Society of Artificial Intelligence in Medicine (SIIAM), 00165 Rome, Italy; (F.A.C.); (A.G.)
- Section of Hygiene and Public Health, Department of Life Sciences and Public Health, Università Cattolica del Sacro Cuore, 00168 Rome, Italy
| | - Antonio Gatto
- The Italian Society of Artificial Intelligence in Medicine (SIIAM), 00165 Rome, Italy; (F.A.C.); (A.G.)
- Department of Pediatrics, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00168 Rome, Italy; (G.T.); (V.P.)
| | - Antonio Chiaretti
- Department of Pediatrics, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (A.C.); (B.G.); (A.C.)
- The Italian Society of Artificial Intelligence in Medicine (SIIAM), 00165 Rome, Italy; (F.A.C.); (A.G.)
| |
Collapse
|
29
|
Ramwala OA, Lowry KP, Cross NM, Hsu W, Austin CC, Mooney SD, Lee CI. Establishing a Validation Infrastructure for Imaging-Based Artificial Intelligence Algorithms Before Clinical Implementation. J Am Coll Radiol 2024:S1546-1440(24)00451-4. [PMID: 38789066 DOI: 10.1016/j.jacr.2024.04.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 04/05/2024] [Accepted: 04/24/2024] [Indexed: 05/26/2024]
Abstract
With promising artificial intelligence (AI) algorithms receiving FDA clearance, the potential impact of these models on clinical outcomes must be evaluated locally before their integration into routine workflows. Robust validation infrastructures are pivotal to inspecting the accuracy and generalizability of these deep learning algorithms to ensure both patient safety and health equity. Protected health information concerns, intellectual property rights, and diverse requirements of models impede the development of rigorous external validation infrastructures. The authors propose various suggestions for addressing the challenges associated with the development of efficient, customizable, and cost-effective infrastructures for the external validation of AI models at large medical centers and institutions. The authors present comprehensive steps to establish an AI inferencing infrastructure outside clinical systems to examine the local performance of AI algorithms before health practice or systemwide implementation and promote an evidence-based approach for adopting AI models that can enhance radiology workflows and improve patient outcomes.
Collapse
Affiliation(s)
- Ojas A Ramwala
- Department of Biomedical Informatics and Medical Education, University of Washington School of Medicine, Seattle, Washington
| | - Kathryn P Lowry
- Department of Radiology, University of Washington School of Medicine, Seattle, Washington
| | - Nathan M Cross
- Vice Chair of Informatics, Department of Radiology, University of Washington School of Medicine, Seattle, Washington
| | - William Hsu
- Department of Radiological Sciences, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California; Department of Bioengineering, University of California, Los Angeles, Samueli School of Engineering, Los Angeles, California; Deputy Editor, Radiology: Artificial Intelligence
| | | | - Sean D Mooney
- Director, Center for Information Technology, National Institutes of Health, Bethesda, Maryland
| | - Christoph I Lee
- Department of Radiology, University of Washington School of Medicine, Seattle, Washington; Department of Health Systems and Population Health, University of Washington School of Public Health, Seattle, Washington; Director, Northwest Screening and Cancer Outcomes Research Enterprise, University of Washington; Deputy Editor, JACR.
| |
Collapse
|
30
|
Klempka A, Schröder A, Neumayer P, Groden C, Clausen S, Hetjens S. Cranial Computer Tomography with Photon Counting and Energy-Integrated Detectors: Objective Comparison in the Same Patients. Diagnostics (Basel) 2024; 14:1019. [PMID: 38786317 PMCID: PMC11119038 DOI: 10.3390/diagnostics14101019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/10/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024] Open
Abstract
This study provides an objective comparison of cranial computed tomography (CT) imaging quality and radiation dose between photon counting detectors (PCCTs) and energy-integrated detectors (EIDs). We retrospectively analyzed 158 CT scans from 76 patients, employing both detector types on the same individuals to ensure a consistent comparison. Our analysis focused on the Computed Tomography Dose Index and the Dose-Length Product together with the contrast-to-noise ratio and the signal-to-noise ratio for brain gray and white matter. We utilized standardized imaging protocols and consistent patient positioning to minimize variables. PCCT showed a potential for higher image quality and lower radiation doses, as highlighted by this study, thus achieving diagnostic clarity with reduced radiation exposure, underlining its significance in patient care, particularly for patients requiring multiple scans. The results demonstrated that while both systems were effective, PCCT offered enhanced imaging and patient safety in neuroradiological evaluations.
Collapse
Affiliation(s)
- Anna Klempka
- Department of Neuroradiology, University Medical Centre Mannheim, Medical Faculty Mannheim, University of Heidelberg, 68167 Mannheim, Germany
| | - Alexander Schröder
- Department of Neuroradiology, University Medical Centre Mannheim, Medical Faculty Mannheim, University of Heidelberg, 68167 Mannheim, Germany
| | - Philipp Neumayer
- Department of Neuroradiology, University Medical Centre Mannheim, Medical Faculty Mannheim, University of Heidelberg, 68167 Mannheim, Germany
| | - Christoph Groden
- Department of Neuroradiology, University Medical Centre Mannheim, Medical Faculty Mannheim, University of Heidelberg, 68167 Mannheim, Germany
| | - Sven Clausen
- Department of Radiation Oncology, University Medical Centre Mannheim, Medical Faculty Mannheim, University of Heidelberg, 68167 Mannheim, Germany
| | - Svetlana Hetjens
- Department of Medical Statistics and Biomathematics, Medical Faculty Mannheim, University of Heidelberg, 68167 Mannheim, Germany
| |
Collapse
|
31
|
Pedemonte S, Tsue T, Mombourquette B, Truong Vu YN, Matthews T, Morales Hoil R, Shah M, Ghare N, Zingman-Daniels N, Holley S, Appleton CM, Su J, Wahl RL. A Semiautonomous Deep Learning System to Reduce False Positives in Screening Mammography. Radiol Artif Intell 2024; 6:e230033. [PMID: 38597785 PMCID: PMC11140506 DOI: 10.1148/ryai.230033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 02/16/2024] [Accepted: 03/19/2024] [Indexed: 04/11/2024]
Abstract
Purpose To evaluate the ability of a semiautonomous artificial intelligence (AI) model to identify screening mammograms not suspicious for breast cancer and reduce the number of false-positive examinations. Materials and Methods The deep learning algorithm was trained using 123 248 two-dimensional digital mammograms (6161 cancers) and a retrospective study was performed on three nonoverlapping datasets of 14 831 screening mammography examinations (1026 cancers) from two U.S. institutions and one U.K. institution (2008-2017). The stand-alone performance of humans and AI was compared. Human plus AI performance was simulated to examine reductions in the cancer detection rate, number of examinations, false-positive callbacks, and benign biopsies. Metrics were adjusted to mimic the natural distribution of a screening population, and bootstrapped CIs and P values were calculated. Results Retrospective evaluation on all datasets showed minimal changes to the cancer detection rate with use of the AI device (noninferiority margin of 0.25 cancers per 1000 examinations: U.S. dataset 1, P = .02; U.S. dataset 2, P < .001; U.K. dataset, P < .001). On U.S. dataset 1 (11 592 mammograms; 101 cancers; 3810 female patients; mean age, 57.3 years ± 10.0 [SD]), the device reduced screening examinations requiring radiologist interpretation by 41.6% (95% CI: 40.6%, 42.4%; P < .001), diagnostic examinations callbacks by 31.1% (95% CI: 28.7%, 33.4%; P < .001), and benign needle biopsies by 7.4% (95% CI: 4.1%, 12.4%; P < .001). U.S. dataset 2 (1362 mammograms; 330 cancers; 1293 female patients; mean age, 55.4 years ± 10.5) was reduced by 19.5% (95% CI: 16.9%, 22.1%; P < .001), 11.9% (95% CI: 8.6%, 15.7%; P < .001), and 6.5% (95% CI: 0.0%, 19.0%; P = .08), respectively. The U.K. dataset (1877 mammograms; 595 cancers; 1491 female patients; mean age, 63.5 years ± 7.1) was reduced by 36.8% (95% CI: 34.4%, 39.7%; P < .001), 17.1% (95% CI: 5.9%, 30.1%: P < .001), and 5.9% (95% CI: 2.9%, 11.5%; P < .001), respectively. Conclusion This work demonstrates the potential of a semiautonomous breast cancer screening system to reduce false positives, unnecessary procedures, patient anxiety, and medical expenses. Keywords: Artificial Intelligence, Semiautonomous Deep Learning, Breast Cancer, Screening Mammography Supplemental material is available for this article. Published under a CC BY 4.0 license.
Collapse
Affiliation(s)
- Stefano Pedemonte
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Trevor Tsue
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Brent Mombourquette
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Yen Nhi Truong Vu
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Thomas Matthews
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Rodrigo Morales Hoil
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Meet Shah
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Nikita Ghare
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Naomi Zingman-Daniels
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Susan Holley
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Catherine M. Appleton
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Jason Su
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| | - Richard L. Wahl
- From Whiterabbit.ai, 3930 Freedom Cir, Santa Clara, CA 95054 (S.P.,
T.T., B.M., Y.N.T.V., T.M., R.M.H., M.S., N.G., N.Z.D., J.S.); Onsite
Women's Health, Westfield, Mass (S.H.); SSM Health, St Louis, Mo
(C.M.A.); and Mallinckrodt Institute of Radiology, Washington University School
of Medicine, St Louis, Mo (R.L.W.)
| |
Collapse
|
32
|
Lotter W, Hassett MJ, Schultz N, Kehl KL, Van Allen EM, Cerami E. Artificial Intelligence in Oncology: Current Landscape, Challenges, and Future Directions. Cancer Discov 2024; 14:711-726. [PMID: 38597966 PMCID: PMC11131133 DOI: 10.1158/2159-8290.cd-23-1199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/29/2024] [Accepted: 02/28/2024] [Indexed: 04/11/2024]
Abstract
Artificial intelligence (AI) in oncology is advancing beyond algorithm development to integration into clinical practice. This review describes the current state of the field, with a specific focus on clinical integration. AI applications are structured according to cancer type and clinical domain, focusing on the four most common cancers and tasks of detection, diagnosis, and treatment. These applications encompass various data modalities, including imaging, genomics, and medical records. We conclude with a summary of existing challenges, evolving solutions, and potential future directions for the field. SIGNIFICANCE AI is increasingly being applied to all aspects of oncology, where several applications are maturing beyond research and development to direct clinical integration. This review summarizes the current state of the field through the lens of clinical translation along the clinical care continuum. Emerging areas are also highlighted, along with common challenges, evolving solutions, and potential future directions for the field.
Collapse
Affiliation(s)
- William Lotter
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Michael J. Hassett
- Harvard Medical School, Boston, MA, USA
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nikolaus Schultz
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center; New York, NY, USA
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Kenneth L. Kehl
- Harvard Medical School, Boston, MA, USA
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Eliezer M. Van Allen
- Harvard Medical School, Boston, MA, USA
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ethan Cerami
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
33
|
Kwon MR, Chang Y, Ham SY, Cho Y, Kim EY, Kang J, Park EK, Kim KH, Kim M, Kim TS, Lee H, Kwon R, Lim GY, Choi HR, Choi J, Kook SH, Ryu S. Screening mammography performance according to breast density: a comparison between radiologists versus standalone intelligence detection. Breast Cancer Res 2024; 26:68. [PMID: 38649889 PMCID: PMC11036604 DOI: 10.1186/s13058-024-01821-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 04/08/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) algorithms for the independent assessment of screening mammograms have not been well established in a large screening cohort of Asian women. We compared the performance of screening digital mammography considering breast density, between radiologists and AI standalone detection among Korean women. METHODS We retrospectively included 89,855 Korean women who underwent their initial screening digital mammography from 2009 to 2020. Breast cancer within 12 months of the screening mammography was the reference standard, according to the National Cancer Registry. Lunit software was used to determine the probability of malignancy scores, with a cutoff of 10% for breast cancer detection. The AI's performance was compared with that of the final Breast Imaging Reporting and Data System category, as recorded by breast radiologists. Breast density was classified into four categories (A-D) based on the radiologist and AI-based assessments. The performance metrics (cancer detection rate [CDR], sensitivity, specificity, positive predictive value [PPV], recall rate, and area under the receiver operating characteristic curve [AUC]) were compared across breast density categories. RESULTS Mean participant age was 43.5 ± 8.7 years; 143 breast cancer cases were identified within 12 months. The CDRs (1.1/1000 examination) and sensitivity values showed no significant differences between radiologist and AI-based results (69.9% [95% confidence interval [CI], 61.7-77.3] vs. 67.1% [95% CI, 58.8-74.8]). However, the AI algorithm showed better specificity (93.0% [95% CI, 92.9-93.2] vs. 77.6% [95% CI, 61.7-77.9]), PPV (1.5% [95% CI, 1.2-1.9] vs. 0.5% [95% CI, 0.4-0.6]), recall rate (7.1% [95% CI, 6.9-7.2] vs. 22.5% [95% CI, 22.2-22.7]), and AUC values (0.8 [95% CI, 0.76-0.84] vs. 0.74 [95% CI, 0.7-0.78]) (all P < 0.05). Radiologist and AI-based results showed the best performance in the non-dense category; the CDR and sensitivity were higher for radiologists in the heterogeneously dense category (P = 0.059). However, the specificity, PPV, and recall rate consistently favored AI-based results across all categories, including the extremely dense category. CONCLUSIONS AI-based software showed slightly lower sensitivity, although the difference was not statistically significant. However, it outperformed radiologists in recall rate, specificity, PPV, and AUC, with disparities most prominent in extremely dense breast tissue.
Collapse
Affiliation(s)
- Mi-Ri Kwon
- Department of Radiology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Yoosoo Chang
- Center for Cohort Studies, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Samsung Main Building B2, 250, Taepyung-ro 2ga, Jung-gu, 04514, Seoul, South Korea.
- Department of Occupational and Environmental Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.
- Department of Clinical Research Design & Evaluation, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, Seoul, Republic of Korea.
| | - Soo-Youn Ham
- Department of Radiology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Yoosun Cho
- Center for Cohort Studies, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Samsung Main Building B2, 250, Taepyung-ro 2ga, Jung-gu, 04514, Seoul, South Korea
| | - Eun Young Kim
- Department of Surgery, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jeonggyu Kang
- Center for Cohort Studies, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Samsung Main Building B2, 250, Taepyung-ro 2ga, Jung-gu, 04514, Seoul, South Korea
| | | | | | - Minjeong Kim
- Lunit Inc, Seoul, Republic of Korea
- Department of Statistics, Ewha Womans University, Seoul, Republic of Korea
| | | | | | - Ria Kwon
- Center for Cohort Studies, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Samsung Main Building B2, 250, Taepyung-ro 2ga, Jung-gu, 04514, Seoul, South Korea
- Institute of Medical Research, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea
| | - Ga-Young Lim
- Center for Cohort Studies, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Samsung Main Building B2, 250, Taepyung-ro 2ga, Jung-gu, 04514, Seoul, South Korea
- Institute of Medical Research, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea
| | - Hye Rin Choi
- Center for Cohort Studies, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Samsung Main Building B2, 250, Taepyung-ro 2ga, Jung-gu, 04514, Seoul, South Korea
- Institute of Medical Research, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea
| | - JunHyeok Choi
- School of Mechanical Engineering, Sunkyungkwan University, Seoul, Republic of Korea
| | - Shin Ho Kook
- Department of Radiology, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Seungho Ryu
- Center for Cohort Studies, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Samsung Main Building B2, 250, Taepyung-ro 2ga, Jung-gu, 04514, Seoul, South Korea.
- Department of Occupational and Environmental Medicine, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.
- Department of Clinical Research Design & Evaluation, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, Seoul, Republic of Korea.
| |
Collapse
|
34
|
Carriero A, Groenhoff L, Vologina E, Basile P, Albera M. Deep Learning in Breast Cancer Imaging: State of the Art and Recent Advancements in Early 2024. Diagnostics (Basel) 2024; 14:848. [PMID: 38667493 PMCID: PMC11048882 DOI: 10.3390/diagnostics14080848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/07/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
The rapid advancement of artificial intelligence (AI) has significantly impacted various aspects of healthcare, particularly in the medical imaging field. This review focuses on recent developments in the application of deep learning (DL) techniques to breast cancer imaging. DL models, a subset of AI algorithms inspired by human brain architecture, have demonstrated remarkable success in analyzing complex medical images, enhancing diagnostic precision, and streamlining workflows. DL models have been applied to breast cancer diagnosis via mammography, ultrasonography, and magnetic resonance imaging. Furthermore, DL-based radiomic approaches may play a role in breast cancer risk assessment, prognosis prediction, and therapeutic response monitoring. Nevertheless, several challenges have limited the widespread adoption of AI techniques in clinical practice, emphasizing the importance of rigorous validation, interpretability, and technical considerations when implementing DL solutions. By examining fundamental concepts in DL techniques applied to medical imaging and synthesizing the latest advancements and trends, this narrative review aims to provide valuable and up-to-date insights for radiologists seeking to harness the power of AI in breast cancer care.
Collapse
Affiliation(s)
| | - Léon Groenhoff
- Radiology Department, Maggiore della Carità Hospital, 28100 Novara, Italy; (A.C.); (E.V.); (P.B.); (M.A.)
| | | | | | | |
Collapse
|
35
|
Lee SE, Hong H, Kim EK. Positive Predictive Values of Abnormality Scores From a Commercial Artificial Intelligence-Based Computer-Aided Diagnosis for Mammography. Korean J Radiol 2024; 25:343-350. [PMID: 38528692 PMCID: PMC10973732 DOI: 10.3348/kjr.2023.0907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 11/17/2023] [Accepted: 12/05/2023] [Indexed: 03/27/2024] Open
Abstract
OBJECTIVE Artificial intelligence-based computer-aided diagnosis (AI-CAD) is increasingly used in mammography. While the continuous scores of AI-CAD have been related to malignancy risk, the understanding of how to interpret and apply these scores remains limited. We investigated the positive predictive values (PPVs) of the abnormality scores generated by a deep learning-based commercial AI-CAD system and analyzed them in relation to clinical and radiological findings. MATERIALS AND METHODS From March 2020 to May 2022, 656 breasts from 599 women (mean age 52.6 ± 11.5 years, including 0.6% [4/599] high-risk women) who underwent mammography and received positive AI-CAD results (Lunit Insight MMG, abnormality score ≥ 10) were retrospectively included in this study. Univariable and multivariable analyses were performed to evaluate the associations between the AI-CAD abnormality scores and clinical and radiological factors. The breasts were subdivided according to the abnormality scores into groups 1 (10-49), 2 (50-69), 3 (70-89), and 4 (90-100) using the optimal binning method. The PPVs were calculated for all breasts and subgroups. RESULTS Diagnostic indications and positive imaging findings by radiologists were associated with higher abnormality scores in the multivariable regression analysis. The overall PPV of AI-CAD was 32.5% (213/656) for all breasts, including 213 breast cancers, 129 breasts with benign biopsy results, and 314 breasts with benign outcomes in the follow-up or diagnostic studies. In the screening mammography subgroup, the PPVs were 18.6% (58/312) overall and 5.1% (12/235), 29.0% (9/31), 57.9% (11/19), and 96.3% (26/27) for score groups 1, 2, 3, and 4, respectively. The PPVs were significantly higher in women with diagnostic indications (45.1% [155/344]), palpability (51.9% [149/287]), fatty breasts (61.2% [60/98]), and certain imaging findings (masses with or without calcifications and distortion). CONCLUSION PPV increased with increasing AI-CAD abnormality scores. The PPVs of AI-CAD satisfied the acceptable PPV range according to Breast Imaging-Reporting and Data System for screening mammography and were higher for diagnostic mammography.
Collapse
Affiliation(s)
- Si Eun Lee
- Department of Radiology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Republic of Korea
| | - Hanpyo Hong
- Department of Radiology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Republic of Korea
| | - Eun-Kyung Kim
- Department of Radiology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Republic of Korea.
| |
Collapse
|
36
|
Lo ZJ, Mak MHW, Liang S, Chan YM, Goh CC, Lai T, Tan A, Thng P, Rodriguez J, Weyde T, Smit S. Development of an explainable artificial intelligence model for Asian vascular wound images. Int Wound J 2024; 21:e14565. [PMID: 38146127 PMCID: PMC10961881 DOI: 10.1111/iwj.14565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 12/04/2023] [Indexed: 12/27/2023] Open
Abstract
Chronic wounds contribute to significant healthcare and economic burden worldwide. Wound assessment remains challenging given its complex and dynamic nature. The use of artificial intelligence (AI) and machine learning methods in wound analysis is promising. Explainable modelling can help its integration and acceptance in healthcare systems. We aim to develop an explainable AI model for analysing vascular wound images among an Asian population. Two thousand nine hundred and fifty-seven wound images from a vascular wound image registry from a tertiary institution in Singapore were utilized. The dataset was split into training, validation and test sets. Wound images were classified into four types (neuroischaemic ulcer [NIU], surgical site infections [SSI], venous leg ulcers [VLU], pressure ulcer [PU]), measured with automatic estimation of width, length and depth and segmented into 18 wound and peri-wound features. Data pre-processing was performed using oversampling and augmentation techniques. Convolutional and deep learning models were utilized for model development. The model was evaluated with accuracy, F1 score and receiver operating characteristic (ROC) curves. Explainability methods were used to interpret AI decision reasoning. A web browser application was developed to demonstrate results of the wound AI model with explainability. After development, the model was tested on additional 15 476 unlabelled images to evaluate effectiveness. After the development on the training and validation dataset, the model performance on unseen labelled images in the test set achieved an AUROC of 0.99 for wound classification with mean accuracy of 95.9%. For wound measurements, the model achieved AUROC of 0.97 with mean accuracy of 85.0% for depth classification, and AUROC of 0.92 with mean accuracy of 87.1% for width and length determination. For wound segmentation, an AUROC of 0.95 and mean accuracy of 87.8% was achieved. Testing on unlabelled images, the model confidence score for wound classification was 82.8% with an explainability score of 60.6%. Confidence score was 87.6% for depth classification with 68.0% explainability score, while width and length measurement obtained 93.0% accuracy score with 76.6% explainability. Confidence score for wound segmentation was 83.9%, while explainability was 72.1%. Using explainable AI models, we have developed an algorithm and application for analysis of vascular wound images from an Asian population with accuracy and explainability. With further development, it can be utilized as a clinical decision support system and integrated into existing healthcare electronic systems.
Collapse
Affiliation(s)
- Zhiwen Joseph Lo
- Department of SurgeryWoodlands HealthSingaporeSingapore
- Lee Kong Chian School of MedicineNanyang Technological UniversitySingaporeSingapore
| | | | | | - Yam Meng Chan
- Department of General SurgeryTan Tock Seng HospitalSingaporeSingapore
| | - Cheng Cheng Goh
- Wound and Stoma Care, Nursing SpecialityTan Tock Seng HospitalSingaporeSingapore
| | - Tina Lai
- Wound and Stoma Care, Nursing SpecialityTan Tock Seng HospitalSingaporeSingapore
| | - Audrey Tan
- Wound and Stoma Care, Nursing SpecialityTan Tock Seng HospitalSingaporeSingapore
| | - Patrick Thng
- AITIS ‐ Advanced Intelligence and Technology InnovationsLondonUnited Kingdom
| | - Jorge Rodriguez
- AITIS ‐ Advanced Intelligence and Technology InnovationsLondonUnited Kingdom
| | - Tillman Weyde
- AITIS ‐ Advanced Intelligence and Technology InnovationsLondonUnited Kingdom
| | - Sylvia Smit
- AITIS ‐ Advanced Intelligence and Technology InnovationsLondonUnited Kingdom
| |
Collapse
|
37
|
Guo Y, Zhang H, Yuan L, Chen W, Zhao H, Yu QQ, Shi W. Machine learning and new insights for breast cancer diagnosis. J Int Med Res 2024; 52:3000605241237867. [PMID: 38663911 PMCID: PMC11047257 DOI: 10.1177/03000605241237867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 02/21/2024] [Indexed: 04/28/2024] Open
Abstract
Breast cancer (BC) is the most prominent form of cancer among females all over the world. The current methods of BC detection include X-ray mammography, ultrasound, computed tomography, magnetic resonance imaging, positron emission tomography and breast thermographic techniques. More recently, machine learning (ML) tools have been increasingly employed in diagnostic medicine for its high efficiency in detection and intervention. The subsequent imaging features and mathematical analyses can then be used to generate ML models, which stratify, differentiate and detect benign and malignant breast lesions. Given its marked advantages, radiomics is a frequently used tool in recent research and clinics. Artificial neural networks and deep learning (DL) are novel forms of ML that evaluate data using computer simulation of the human brain. DL directly processes unstructured information, such as images, sounds and language, and performs precise clinical image stratification, medical record analyses and tumour diagnosis. Herein, this review thoroughly summarizes prior investigations on the application of medical images for the detection and intervention of BC using radiomics, namely DL and ML. The aim was to provide guidance to scientists regarding the use of artificial intelligence and ML in research and the clinic.
Collapse
Affiliation(s)
- Ya Guo
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Heng Zhang
- Department of Laboratory Medicine, Shandong Daizhuang Hospital, Jining, Shandong Province, China
| | - Leilei Yuan
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Weidong Chen
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Haibo Zhao
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Qing-Qing Yu
- Phase I Clinical Research Centre, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Wenjie Shi
- Molecular and Experimental Surgery, University Clinic for General-, Visceral-, Vascular- and Trans-Plantation Surgery, Medical Faculty University Hospital Magdeburg, Otto-von Guericke University, Magdeburg, Germany
| |
Collapse
|
38
|
Africano G, Arponen O, Rinta-Kiikka I, Pertuz S. Transfer learning for the generalization of artificial intelligence in breast cancer detection: a case-control study. Acta Radiol 2024; 65:334-340. [PMID: 38115699 DOI: 10.1177/02841851231218960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
BACKGROUND Some researchers have questioned whether artificial intelligence (AI) systems maintain their performance when used for women from populations not considered during the development of the system. PURPOSE To evaluate the impact of transfer learning as a way of improving the generalization of AI systems in the detection of breast cancer. MATERIAL AND METHODS This retrospective case-control Finnish study involved 191 women diagnosed with breast cancer and 191 matched healthy controls. We selected a state-of-the-art AI system for breast cancer detection trained using a large US dataset. The selected baseline system was evaluated in two experimental settings. First, we examined our private Finnish sample as an independent test set that had not been considered in the development of the system (unseen population). Second, the baseline system was retrained to attempt to improve its performance in the unseen population by means of transfer learning. To analyze performance, we used areas under the receiver operating characteristic curve (AUCs) with DeLong's test. RESULTS Two versions of the baseline system were considered: ImageOnly and Heatmaps. The ImageOnly and Heatmaps versions yielded mean AUC values of 0.82±0.008 and 0.88±0.003 in the US dataset and 0.56 (95% CI=0.50-0.62) and 0.72 (95% CI=0.67-0.77) when evaluated in the unseen population, respectively. The retrained systems achieved AUC values of 0.61 (95% CI=0.55-0.66) and 0.69 (95% CI=0.64-0.75), respectively. There was no statistical difference between the baseline system and the retrained system. CONCLUSION Transfer learning with a small study sample did not yield a significant improvement in the generalization of the system.
Collapse
Affiliation(s)
- Gerson Africano
- School of Electrical, Electronics and Telecommunications Engineering, Universidad Industrial de Santander, Bucaramanga, Colombia
| | - Otso Arponen
- Department of Radiology, Tampere University Hospital, Tampere, Finland
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Irina Rinta-Kiikka
- Department of Radiology, Tampere University Hospital, Tampere, Finland
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Said Pertuz
- School of Electrical, Electronics and Telecommunications Engineering, Universidad Industrial de Santander, Bucaramanga, Colombia
| |
Collapse
|
39
|
Flory MN, Napel S, Tsai EB. Artificial Intelligence in Radiology: Opportunities and Challenges. Semin Ultrasound CT MR 2024; 45:152-160. [PMID: 38403128 DOI: 10.1053/j.sult.2024.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Artificial intelligence's (AI) emergence in radiology elicits both excitement and uncertainty. AI holds promise for improving radiology with regards to clinical practice, education, and research opportunities. Yet, AI systems are trained on select datasets that can contain bias and inaccuracies. Radiologists must understand these limitations and engage with AI developers at every step of the process - from algorithm initiation and design to development and implementation - to maximize benefit and minimize harm that can be enabled by this technology.
Collapse
Affiliation(s)
- Marta N Flory
- Department of Radiology, Stanford University School of Medicine, Center for Academic Medicine, Palo Alto, CA
| | - Sandy Napel
- Department of Radiology, Stanford University School of Medicine, Center for Academic Medicine, Palo Alto, CA
| | - Emily B Tsai
- Department of Radiology, Stanford University School of Medicine, Center for Academic Medicine, Palo Alto, CA.
| |
Collapse
|
40
|
Dimitri P, Savage MO. Artificial intelligence in paediatric endocrinology: conflict or cooperation. J Pediatr Endocrinol Metab 2024; 37:209-221. [PMID: 38183676 DOI: 10.1515/jpem-2023-0554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 12/18/2023] [Indexed: 01/08/2024]
Abstract
Artificial intelligence (AI) in medicine is transforming healthcare by automating system tasks, assisting in diagnostics, predicting patient outcomes and personalising patient care, founded on the ability to analyse vast datasets. In paediatric endocrinology, AI has been developed for diabetes, for insulin dose adjustment, detection of hypoglycaemia and retinopathy screening; bone age assessment and thyroid nodule screening; the identification of growth disorders; the diagnosis of precocious puberty; and the use of facial recognition algorithms in conditions such as Cushing syndrome, acromegaly, congenital adrenal hyperplasia and Turner syndrome. AI can also predict those most at risk from childhood obesity by stratifying future interventions to modify lifestyle. AI will facilitate personalised healthcare by integrating data from 'omics' analysis, lifestyle tracking, medical history, laboratory and imaging, therapy response and treatment adherence from multiple sources. As data acquisition and processing becomes fundamental, data privacy and protecting children's health data is crucial. Minimising algorithmic bias generated by AI analysis for rare conditions seen in paediatric endocrinology is an important determinant of AI validity in clinical practice. AI cannot create the patient-doctor relationship or assess the wider holistic determinants of care. Children have individual needs and vulnerabilities and are considered in the context of family relationships and dynamics. Importantly, whilst AI provides value through augmenting efficiency and accuracy, it must not be used to replace clinical skills.
Collapse
Affiliation(s)
- Paul Dimitri
- Department of Paediatric Endocrinology, Sheffield Children's NHS Foundation Trust, Sheffield, UK
| | - Martin O Savage
- Centre for Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine & Dentistry, Queen Mary University of London, London, UK
| |
Collapse
|
41
|
Lokaj B, Pugliese MT, Kinkel K, Lovis C, Schmid J. Barriers and facilitators of artificial intelligence conception and implementation for breast imaging diagnosis in clinical practice: a scoping review. Eur Radiol 2024; 34:2096-2109. [PMID: 37658895 PMCID: PMC10873444 DOI: 10.1007/s00330-023-10181-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 06/07/2023] [Accepted: 07/10/2023] [Indexed: 09/05/2023]
Abstract
OBJECTIVE Although artificial intelligence (AI) has demonstrated promise in enhancing breast cancer diagnosis, the implementation of AI algorithms in clinical practice encounters various barriers. This scoping review aims to identify these barriers and facilitators to highlight key considerations for developing and implementing AI solutions in breast cancer imaging. METHOD A literature search was conducted from 2012 to 2022 in six databases (PubMed, Web of Science, CINHAL, Embase, IEEE, and ArXiv). The articles were included if some barriers and/or facilitators in the conception or implementation of AI in breast clinical imaging were described. We excluded research only focusing on performance, or with data not acquired in a clinical radiology setup and not involving real patients. RESULTS A total of 107 articles were included. We identified six major barriers related to data (B1), black box and trust (B2), algorithms and conception (B3), evaluation and validation (B4), legal, ethical, and economic issues (B5), and education (B6), and five major facilitators covering data (F1), clinical impact (F2), algorithms and conception (F3), evaluation and validation (F4), and education (F5). CONCLUSION This scoping review highlighted the need to carefully design, deploy, and evaluate AI solutions in clinical practice, involving all stakeholders to yield improvement in healthcare. CLINICAL RELEVANCE STATEMENT The identification of barriers and facilitators with suggested solutions can guide and inform future research, and stakeholders to improve the design and implementation of AI for breast cancer detection in clinical practice. KEY POINTS • Six major identified barriers were related to data; black-box and trust; algorithms and conception; evaluation and validation; legal, ethical, and economic issues; and education. • Five major identified facilitators were related to data, clinical impact, algorithms and conception, evaluation and validation, and education. • Coordinated implication of all stakeholders is required to improve breast cancer diagnosis with AI.
Collapse
Affiliation(s)
- Belinda Lokaj
- Geneva School of Health Sciences, HES-SO University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland.
- Faculty of Medicine, University of Geneva, Geneva, Switzerland.
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland.
| | - Marie-Thérèse Pugliese
- Geneva School of Health Sciences, HES-SO University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland
| | - Karen Kinkel
- Réseau Hospitalier Neuchâtelois, Neuchâtel, Switzerland
| | - Christian Lovis
- Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
| | - Jérôme Schmid
- Geneva School of Health Sciences, HES-SO University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland
| |
Collapse
|
42
|
Rabinovici-Cohen S, Fridman N, Weinbaum M, Melul E, Hexter E, Rosen-Zvi M, Aizenberg Y, Porat Ben Amy D. From Pixels to Diagnosis: Algorithmic Analysis of Clinical Oral Photos for Early Detection of Oral Squamous Cell Carcinoma. Cancers (Basel) 2024; 16:1019. [PMID: 38473377 DOI: 10.3390/cancers16051019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/22/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
Oral squamous cell carcinoma (OSCC) accounts for more than 90% of oral malignancies. Despite numerous advancements in understanding its biology, the mean five-year survival rate of OSCC is still very poor at about 50%, with even lower rates when the disease is detected at later stages. We investigate the use of clinical photographic images taken by common smartphones for the automated detection of OSCC cases and for the identification of suspicious cases mimicking cancer that require an urgent biopsy. We perform a retrospective study on a cohort of 1470 patients drawn from both hospital records and online academic sources. We examine various deep learning methods for the early detection of OSCC cases as well as for the detection of suspicious cases. Our results demonstrate the efficacy of these methods in both tasks, providing a comprehensive understanding of the patient's condition. When evaluated on holdout data, the model to predict OSCC achieved an AUC of 0.96 (CI: 0.91, 0.98), with a sensitivity of 0.91 and specificity of 0.81. When the data are stratified based on lesion location, we find that our models can provide enhanced accuracy (AUC 1.00) in differentiating specific groups of patients that have lesions in the lingual mucosa, floor of mouth, or posterior tongue. These results underscore the potential of leveraging clinical photos for the timely and accurate identification of OSCC.
Collapse
Affiliation(s)
| | - Naomi Fridman
- TIMNA-Big Data Research Platform Unit, Ministry of Health, Jerusalem 9446724, Israel
- The Department of Industrial Engineering & Management, Ariel University, Ariel 40700, Israel
| | - Michal Weinbaum
- TIMNA-Big Data Research Platform Unit, Ministry of Health, Jerusalem 9446724, Israel
| | - Eli Melul
- TIMNA-Big Data Research Platform Unit, Ministry of Health, Jerusalem 9446724, Israel
| | - Efrat Hexter
- IBM Research-Israel, Mount Carmel, Haifa 3498825, Israel
| | - Michal Rosen-Zvi
- IBM Research-Israel, Mount Carmel, Haifa 3498825, Israel
- Faculty of Medicine, The Hebrew University, Jerusalem 91120, Israel
| | - Yelena Aizenberg
- Oral Medicine Unit, Department of Oral and Maxillofacial Surgery, Tzafon Medical Center, Poriya 15208, Israel
| | - Dalit Porat Ben Amy
- Oral Medicine Unit, Department of Oral and Maxillofacial Surgery, Tzafon Medical Center, Poriya 15208, Israel
- The Azrieli Faculty of Medicine, Bar-Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
43
|
Viberg Johansson J, Dembrower K, Strand F, Grauman Å. Women's perceptions and attitudes towards the use of AI in mammography in Sweden: a qualitative interview study. BMJ Open 2024; 14:e084014. [PMID: 38355190 PMCID: PMC10868248 DOI: 10.1136/bmjopen-2024-084014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 02/02/2024] [Indexed: 02/16/2024] Open
Abstract
BACKGROUND Understanding women's perspectives can help to create an effective and acceptable artificial intelligence (AI) implementation for triaging mammograms, ensuring a high proportion of screening-detected cancer. This study aimed to explore Swedish women's perceptions and attitudes towards the use of AI in mammography. METHOD Semistructured interviews were conducted with 16 women recruited in the spring of 2023 at Capio S:t Görans Hospital, Sweden, during an ongoing clinical trial of AI in screening (ScreenTrustCAD, NCT04778670) with Philips equipment. The interview transcripts were analysed using inductive thematic content analysis. RESULTS In general, women viewed AI as an excellent complementary tool to help radiologists in their decision-making, rather than a complete replacement of their expertise. To trust the AI, the women requested a thorough evaluation, transparency about AI usage in healthcare, and the involvement of a radiologist in the assessment. They would rather be more worried because of being called in more often for scans than risk having overlooked a sign of cancer. They expressed substantial trust in the healthcare system if the implementation of AI was to become a standard practice. CONCLUSION The findings suggest that the interviewed women, in general, hold a positive attitude towards the implementation of AI in mammography; nonetheless, they expect and demand more from an AI than a radiologist. Effective communication regarding the role and limitations of AI is crucial to ensure that patients understand the purpose and potential outcomes of AI-assisted healthcare.
Collapse
Affiliation(s)
- Jennifer Viberg Johansson
- Centre for Research Ethics & Bioethics (CRB), Department of Public Health and Caring Sciences, Uppsala University, Uppsala, Sweden
| | - Karin Dembrower
- Capio S:t Görans Hospital, Stockholm, Sweden
- Department of Oncology-Pathology, Karolinska Institute, Stockholm, Sweden
| | - Fredrik Strand
- Department of Oncology-Pathology, Karolinska Institute, Stockholm, Sweden
| | - Åsa Grauman
- Centre for Research Ethics & Bioethics (CRB), Department of Public Health and Caring Sciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
44
|
Hou R, Lo JY, Marks JR, Hwang ES, Grimm LJ. Classification performance bias between training and test sets in a limited mammography dataset. PLoS One 2024; 19:e0282402. [PMID: 38324545 PMCID: PMC10849231 DOI: 10.1371/journal.pone.0282402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 08/22/2023] [Indexed: 02/09/2024] Open
Abstract
OBJECTIVES To assess the performance bias caused by sampling data into training and test sets in a mammography radiomics study. METHODS Mammograms from 700 women were used to study upstaging of ductal carcinoma in situ. The dataset was repeatedly shuffled and split into training (n = 400) and test cases (n = 300) forty times. For each split, cross-validation was used for training, followed by an assessment of the test set. Logistic regression with regularization and support vector machine were used as the machine learning classifiers. For each split and classifier type, multiple models were created based on radiomics and/or clinical features. RESULTS Area under the curve (AUC) performances varied considerably across the different data splits (e.g., radiomics regression model: train 0.58-0.70, test 0.59-0.73). Performances for regression models showed a tradeoff where better training led to worse testing and vice versa. Cross-validation over all cases reduced this variability, but required samples of 500+ cases to yield representative estimates of performance. CONCLUSIONS In medical imaging, clinical datasets are often limited to relatively small size. Models built from different training sets may not be representative of the whole dataset. Depending on the selected data split and model, performance bias could lead to inappropriate conclusions that might influence the clinical significance of the findings. ADVANCES IN KNOWLEDGE Performance bias can result from model testing when using limited datasets. Optimal strategies for test set selection should be developed to ensure study conclusions are appropriate.
Collapse
Affiliation(s)
- Rui Hou
- Department of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
- Department of Radiology, Duke University Medical Center, Duke University, Durham, North Carolina, United States of America
| | - Joseph Y. Lo
- Department of Radiology, Duke University Medical Center, Duke University, Durham, North Carolina, United States of America
| | - Jeffrey R. Marks
- Department of Surgery, Duke University Medical Center, Duke University, Durham, North Carolina, United States of America
| | - E. Shelley Hwang
- Department of Surgery, Duke University Medical Center, Duke University, Durham, North Carolina, United States of America
| | - Lars J. Grimm
- Department of Radiology, Duke University Medical Center, Duke University, Durham, North Carolina, United States of America
| |
Collapse
|
45
|
Bassi E, Russo A, Oliboni E, Zamboni F, De Santis C, Mansueto G, Montemezzi S, Foti G. The role of an artificial intelligence software in clinical senology: a mammography multi-reader study. LA RADIOLOGIA MEDICA 2024; 129:202-210. [PMID: 38082194 DOI: 10.1007/s11547-023-01751-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 11/07/2023] [Indexed: 02/21/2024]
Abstract
PURPOSE To evaluate the diagnostic role of a dedicated AI software in detecting anomalous breast findings on mammography and tomosynthesis images in the clinical setting, stand-alone and as aid of four readers. METHODS A total of 210 patients with complete clinical and radiologic records were retrospectively analyzed. Pathology was used as the reference standard for patients undergoing surgery or biopsy, and a 1-year follow-up was used to confirm no change in the remaining patients. The image evaluation was performed by four readers with different levels of experience (a junior and three senior breast radiologists) using a 5-point Likert scale moving from 1 (definitively no cancer) to 5 (definitively cancer). The positivity of mammograms was assessed on the presence of any breast lesion (masses, architectural distortions, asymmetries, calcifications), including malignant and benign ones. A multi-reader multi-case analysis was performed. A p value < 0.05 was considered statistically significant. RESULTS The stand-alone AI system achieved an accuracy of 71% (69% sensitivity and 73% specificity), which is overall lower than the value achieved by readers without AI. However, with the aid of AI, a significant increase of accuracy (p value = 0.004) and specificity (p value = 0.04) was achieved for the less experienced radiologist and a senior one. CONCLUSION The use of AI software as a second reader for breast lesions assessment could play a crucial role in the clinical setting, by increasing sensitivity and specificity, especially for less experienced radiologists.
Collapse
Affiliation(s)
- Enrica Bassi
- Department of Radiology, Verona University Hospital, Verona, Italy
| | - Anna Russo
- Department of Radiology, IRCCS Sacro Cuore Hospital, Via Don A. Sempreboni 10, 37024, Negrar (VR), Italy
| | - Eugenio Oliboni
- Department of Radiology, IRCCS Sacro Cuore Hospital, Via Don A. Sempreboni 10, 37024, Negrar (VR), Italy
| | - Federico Zamboni
- Department of Radiology, IRCCS Sacro Cuore Hospital, Via Don A. Sempreboni 10, 37024, Negrar (VR), Italy
| | - Cecilia De Santis
- Department of Radiology, IRCCS Sacro Cuore Hospital, Via Don A. Sempreboni 10, 37024, Negrar (VR), Italy
| | | | - Stefania Montemezzi
- Department of Radiology, Azienda Ospedaliera Universitaria Integrata, Verona, Italy
| | - Giovanni Foti
- Department of Radiology, IRCCS Sacro Cuore Hospital, Via Don A. Sempreboni 10, 37024, Negrar (VR), Italy.
| |
Collapse
|
46
|
Shen X, He Z, Shi Y, Liu T, Yang Y, Luo J, Tang X, Chen B, Xu S, Zhou Y, Xiao J, Qin Y. Development and Validation of an Automated Classification System for Osteonecrosis of the Femoral Head Using Deep Learning Approach: A Multicenter Study. J Arthroplasty 2024; 39:379-386.e2. [PMID: 37572719 DOI: 10.1016/j.arth.2023.08.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 08/01/2023] [Accepted: 08/03/2023] [Indexed: 08/14/2023] Open
Abstract
BACKGROUND Accurate classification can facilitate the selection of appropriate interventions to delay the progression of osteonecrosis of the femoral head (ONFH). This study aimed to perform the classification of ONFH through a deep learning approach. METHODS We retrospectively sampled 1,806 midcoronal magnetic resonance images (MRIs) of 1,337 hips from 4 institutions. Of these, 1,472 midcoronal MRIs of 1,155 hips were divided into training, validation, and test datasets with a ratio of 7:1:2 to develop a convolutional neural network model (CNN). An additional 334 midcoronal MRIs of 182 hips were used to perform external validation. The predictive performance of the CNN and the review panel was also compared. RESULTS A multiclass CNN model was successfully developed. In internal validation, the overall accuracy of the CNN for predicting the severity of ONFH based on the Japanese Investigation Committee classification was 87.8%. The macroaverage values of area under the curve (AUC), precision, recall, and F-value were 0.90, 84.8, 84.8, and 84.6%, respectively. In external validation, the overall accuracy of the CNN was 83.8%. The macroaverage values of area under the curve, precision, recall, and F-value were 0.87, 79.5, 80.5, and 79.9%, respectively. In a human-machine comparison study, the CNN outperformed or was comparable to that of the deputy chief orthopaedic surgeons. CONCLUSION The CNN is feasible and robust for classifying ONFH and correctly locating the necrotic area. These findings suggest that classifying ONFH using deep learning with high accuracy and generalizability may aid in predicting femoral head collapse and clinical decision-making.
Collapse
Affiliation(s)
- Xianyue Shen
- Department of Orthopedics, The Second Hospital of Jilin University, Changchun, Jilin province, PR China
| | - Ziling He
- College of Computer Science and Technology, Jilin University, Changchun, Jilin province, PR China
| | - Yi Shi
- Department of Orthopedics, The Second Hospital of Anhui Medical University, Hefei, Anhui province, PR China
| | - Tong Liu
- Department of Orthopedics, China-Japan Union Hospital of Jilin University, Changchun, Jilin province, PR China
| | - Yuhui Yang
- Department of Orthopedics, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, Guangdong province, PR China
| | - Jia Luo
- College of Computer Science and Technology, Jilin University, Changchun, Jilin province, PR China
| | - Xiongfeng Tang
- Department of Orthopedics, The Second Hospital of Jilin University, Changchun, Jilin province, PR China
| | - Bo Chen
- Department of Orthopedics, The Second Hospital of Jilin University, Changchun, Jilin province, PR China
| | - Shenghao Xu
- Department of Orthopedics, China-Japan Union Hospital of Jilin University, Changchun, Jilin province, PR China
| | - You Zhou
- College of Software, Jilin University, Changchun, Jilin province, PR China
| | - Jianlin Xiao
- Department of Orthopedics, China-Japan Union Hospital of Jilin University, Changchun, Jilin province, PR China
| | - Yanguo Qin
- Department of Orthopedics, The Second Hospital of Jilin University, Changchun, Jilin province, PR China
| |
Collapse
|
47
|
Lee SE, Yoon JH, Son NH, Han K, Moon HJ. Screening in Patients With Dense Breasts: Comparison of Mammography, Artificial Intelligence, and Supplementary Ultrasound. AJR Am J Roentgenol 2024; 222:e2329655. [PMID: 37493324 DOI: 10.2214/ajr.23.29655] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
BACKGROUND. Screening mammography has decreased performance in patients with dense breasts. Supplementary screening ultrasound is a recommended option in such patients, although it has yielded mixed results in prior investigations. OBJECTIVE. The purpose of this article is to compare the performance characteristics of screening mammography alone, standalone artificial intelligence (AI), ultrasound alone, and mammography in combination with AI and/or ultrasound in patients with dense breasts. METHODS. This retrospective study included 1325 women (mean age, 53 years) with dense breasts who underwent both screening mammography and supplementary breast ultrasound within a 1-month interval from January 2017 to December 2017; prior mammography and prior ultrasound examinations were available for comparison in 91.2% and 91.8%, respectively. Mammography and ultrasound examinations were interpreted by one of 15 radiologists (five staff; 10 fellows); clinical reports were used for the present analysis. A commercial AI tool was used to retrospectively evaluate mammographic examinations for presence of cancer. Screening performances were compared among mammography, AI, ultrasound, and test combinations, using generalized estimating equations. Benign diagnoses required 24 months or longer of imaging stability. RESULTS. Twelve cancers (six invasive ductal carcinoma; six ductal carcinoma in situ) were diagnosed. Mammography, standalone AI, and ultrasound showed cancer detection rates (per 1000 patients) of 6.0, 6.8, and 6.0 (all p > .05); recall rates of 4.4%, 11.9%, and 9.2% (all p < .05); sensitivity of 66.7%, 75.0%, and 66.7% (all p > .05); specificity of 96.2%, 88.7%, and 91.3% (all p < .05); and accuracy of 95.9%, 88.5%, and 91.1% (all p < .05). Mammography with AI, mammography with ultrasound, and mammography with both ultrasound and AI showed cancer detection rates of 7.5, 9.1, and 9.1 (all p > .05); recall rates of 14.9, 11.7, and 21.4 (all p < .05); sensitivity of 83.3%, 100.0%, and 100.0% (all p > .05); specificity of 85.8%, 89.1%, and 79.4% (all p < .05); and accuracy of 85.7%, 89.2%, and 79.5% (all p < .05). CONCLUSION. Mammography with supplementary ultrasound showed higher accuracy, higher specificity, and lower recall rate in comparison with mammography with AI and in comparison with mammography with both ultrasound and AI. CLINICAL IMPACT. The findings fail to show benefit of AI with respect to screening mammography performed with supplementary breast ultrasound in patients with dense breasts.
Collapse
Affiliation(s)
- Si Eun Lee
- Department of Radiology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Korea
| | - Jung Hyun Yoon
- Department of Radiology, Research Institute of Radiologic Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
| | - Nak-Hoon Son
- Department of Statistics, Keimyung University, Daegu, South Korea
| | - Kyunghwa Han
- Department of Radiology, Research Institute of Radiologic Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
| | - Hee Jung Moon
- Department of Radiology, Wonju Severance Christian Hospital, Yonsei University Wonju College of Medicine, 20 Ilsan-ro, Wonju 220-701, Korea
| |
Collapse
|
48
|
Antweiler D, Albiez D, Bures D, Hosters B, Jovy-Klein F, Nickel K, Reibel T, Schramm J, Sander J, Antons D, Diehl A. [Use of AI-based applications by hospital staff: task profiles and qualification requirements]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2024; 67:66-75. [PMID: 38032516 PMCID: PMC10776476 DOI: 10.1007/s00103-023-03817-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 11/24/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND Artificial intelligence (AI) is becoming increasingly important for the future development of hospitals. To unlock the large potential of AI, job profiles of hospital staff members need to be further developed in the direction of AI and digitization skills through targeted qualification measures. This affects both medical and non-medical processes along the entire value chain in hospitals. The aim of this paper is to provide an overview of the skills required to deal with smart technologies in a clinical context and to present measures for training employees. METHODS As part of the "SmartHospital.NRW" project in 2022, we conducted a literature review as well as interviews and workshops with experts. AI technologies and fields of application were identified. RESULTS Key findings include adapted and new task profiles, synergies and dependencies between individual task profiles, and the need for a comprehensive interdisciplinary and interprofessional exchange when using AI-based applications in hospitals. DISCUSSION Our article shows that hospitals need to promote digital health literacy skills for hospital staff members at an early stage and at the same time recruit technology- and AI-savvy staff. Interprofessional exchange formats and accompanying change management are essential for the use of AI in hospitals.
Collapse
Affiliation(s)
- Dario Antweiler
- Fraunhofer Institut für Intelligente Analyse und Informationssysteme IAIS, Abteilung Knowledge Discovery, Schloss Birlinghoven 1, 53757, Sankt Augustin, Deutschland.
| | - Daniela Albiez
- Fraunhofer Institut für Intelligente Analyse und Informationssysteme IAIS, Abteilung Adaptive Reflective Teams, Sankt Augustin, Deutschland
| | - Dominik Bures
- Stabsstelle Digitale Transformation, Universitätsmedizin Essen, Essen, Deutschland
| | - Bernadette Hosters
- Stabsstelle Entwicklung und Forschung Pflege, Universitätsmedizin Essen, Essen, Deutschland
| | - Florian Jovy-Klein
- Institut für Technologie- und Innovationsmanagement, RWTH Aachen, Aachen, Deutschland
| | - Kilian Nickel
- Fraunhofer Institut für Intelligente Analyse und Informationssysteme IAIS, Abteilung Adaptive Reflective Teams, Sankt Augustin, Deutschland
| | - Thomas Reibel
- Institut für Technologie- und Innovationsmanagement, RWTH Aachen, Aachen, Deutschland
| | - Johanna Schramm
- Stabsstelle Entwicklung und Forschung Pflege, Universitätsmedizin Essen, Essen, Deutschland
| | - Jil Sander
- Stabsstelle Digitale Transformation, Universitätsmedizin Essen, Essen, Deutschland
| | - David Antons
- Institut für Technologie- und Innovationsmanagement, RWTH Aachen, Aachen, Deutschland
| | - Anke Diehl
- Stabsstelle Digitale Transformation, Universitätsmedizin Essen, Essen, Deutschland
| |
Collapse
|
49
|
Kolasa K, Admassu B, Hołownia-Voloskova M, Kędzior KJ, Poirrier JE, Perni S. Systematic reviews of machine learning in healthcare: a literature review. Expert Rev Pharmacoecon Outcomes Res 2024; 24:63-115. [PMID: 37955147 DOI: 10.1080/14737167.2023.2279107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/31/2023] [Indexed: 11/14/2023]
Abstract
INTRODUCTION The increasing availability of data and computing power has made machine learning (ML) a viable approach to faster, more efficient healthcare delivery. METHODS A systematic literature review (SLR) of published SLRs evaluating ML applications in healthcare settings published between1 January 2010 and 27 March 2023 was conducted. RESULTS In total 220 SLRs covering 10,462 ML algorithms were reviewed. The main application of AI in medicine related to the clinical prediction and disease prognosis in oncology and neurology with the use of imaging data. Accuracy, specificity, and sensitivity were provided in 56%, 28%, and 25% SLRs respectively. Internal and external validation was reported in 53% and less than 1% of the cases respectively. The most common modeling approach was neural networks (2,454 ML algorithms), followed by support vector machine and random forest/decision trees (1,578 and 1,522 ML algorithms, respectively). EXPERT OPINION The review indicated considerable reporting gaps in terms of the ML's performance, both internal and external validation. Greater accessibility to healthcare data for developers can ensure the faster adoption of ML algorithms into clinical practice.
Collapse
Affiliation(s)
- Katarzyna Kolasa
- Division of Health Economics and Healthcare Management, Kozminski University, Warsaw, Poland
| | - Bisrat Admassu
- Division of Health Economics and Healthcare Management, Kozminski University, Warsaw, Poland
| | | | | | | | | |
Collapse
|
50
|
Elhakim MT, Stougaard SW, Graumann O, Nielsen M, Lång K, Gerke O, Larsen LB, Rasmussen BSB. Breast cancer detection accuracy of AI in an entire screening population: a retrospective, multicentre study. Cancer Imaging 2023; 23:127. [PMID: 38124111 PMCID: PMC10731688 DOI: 10.1186/s40644-023-00643-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 12/04/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI) systems are proposed as a replacement of the first reader in double reading within mammography screening. We aimed to assess cancer detection accuracy of an AI system in a Danish screening population. METHODS We retrieved a consecutive screening cohort from the Region of Southern Denmark including all participating women between Aug 4, 2014, and August 15, 2018. Screening mammograms were processed by a commercial AI system and detection accuracy was evaluated in two scenarios, Standalone AI and AI-integrated screening replacing first reader, with first reader and double reading with arbitration (combined reading) as comparators, respectively. Two AI-score cut-off points were applied by matching at mean first reader sensitivity (AIsens) and specificity (AIspec). Reference standard was histopathology-proven breast cancer or cancer-free follow-up within 24 months. Coprimary endpoints were sensitivity and specificity, and secondary endpoints were positive predictive value (PPV), negative predictive value (NPV), recall rate, and arbitration rate. Accuracy estimates were calculated using McNemar's test or exact binomial test. RESULTS Out of 272,008 screening mammograms from 158,732 women, 257,671 (94.7%) with adequate image data were included in the final analyses. Sensitivity and specificity were 63.7% (95% CI 61.6%-65.8%) and 97.8% (97.7-97.8%) for first reader, and 73.9% (72.0-75.8%) and 97.9% (97.9-98.0%) for combined reading, respectively. Standalone AIsens showed a lower specificity (-1.3%) and PPV (-6.1%), and a higher recall rate (+ 1.3%) compared to first reader (p < 0.0001 for all), while Standalone AIspec had a lower sensitivity (-5.1%; p < 0.0001), PPV (-1.3%; p = 0.01) and NPV (-0.04%; p = 0.0002). Compared to combined reading, Integrated AIsens achieved higher sensitivity (+ 2.3%; p = 0.0004), but lower specificity (-0.6%) and PPV (-3.9%) as well as higher recall rate (+ 0.6%) and arbitration rate (+ 2.2%; p < 0.0001 for all). Integrated AIspec showed no significant difference in any outcome measures apart from a slightly higher arbitration rate (p < 0.0001). Subgroup analyses showed higher detection of interval cancers by Standalone AI and Integrated AI at both thresholds (p < 0.0001 for all) with a varying composition of detected cancers across multiple subgroups of tumour characteristics. CONCLUSIONS Replacing first reader in double reading with an AI could be feasible but choosing an appropriate AI threshold is crucial to maintaining cancer detection accuracy and workload.
Collapse
Affiliation(s)
- Mohammad Talal Elhakim
- Department of Radiology, Odense University Hospital, Kløvervaenget 47, Entrance 27, Ground floor, 5000, Odense C, Denmark.
- Department of Clinical Research, University of Southern Denmark, Kløvervaenget 10, Entrance 112, 2nd floor, 5000, Odense C, Denmark.
| | - Sarah Wordenskjold Stougaard
- Department of Clinical Research, University of Southern Denmark, Kløvervaenget 10, Entrance 112, 2nd floor, 5000, Odense C, Denmark
| | - Ole Graumann
- Department of Clinical Research, University of Southern Denmark, Kløvervaenget 10, Entrance 112, 2nd floor, 5000, Odense C, Denmark
- Department of Radiology, Aarhus University Hospital, Palle Juul-Jensens Blvd. 99, 8200, Aarhus N, Denmark
- Department of Clinical Research, Aarhus University, Palle Juul-Jensens Blvd. 99, 8200, Aarhus N, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100, København Ø, Denmark
| | - Kristina Lång
- Department of Translational Medicine, Lund University, Inga Maria Nilssons gata 47, SE-20502, Malmö, Sweden
- Unilabs Mammography Unit, Skåne University Hospital, Jan Waldenströms gata 22, SE-20502, Malmö, Sweden
| | - Oke Gerke
- Department of Clinical Research, University of Southern Denmark, Kløvervaenget 10, Entrance 112, 2nd floor, 5000, Odense C, Denmark
- Department of Nuclear Medicine, Odense University Hospital, Kløvervaenget 47, Entrance 44, 5000, Odense C, Denmark
| | - Lisbet Brønsro Larsen
- Department of Radiology, Odense University Hospital, Kløvervaenget 47, Entrance 27, Ground floor, 5000, Odense C, Denmark
| | - Benjamin Schnack Brandt Rasmussen
- Department of Radiology, Odense University Hospital, Kløvervaenget 47, Entrance 27, Ground floor, 5000, Odense C, Denmark
- Department of Clinical Research, University of Southern Denmark, Kløvervaenget 10, Entrance 112, 2nd floor, 5000, Odense C, Denmark
- CAI-X - Centre for Clinical Artificial Intelligence, Odense University Hospital, Kløvervaenget 8C, Entrance 102, 5000, Odense C, Denmark
| |
Collapse
|