1
|
Kim S, An H, Cho HW, Min KJ, Hong JH, Lee S, Song JY, Lee JK, Lee NW. Pivotal Clinical Study to Evaluate the Efficacy and Safety of Assistive Artificial Intelligence-Based Software for Cervical Cancer Diagnosis. J Clin Med 2023; 12:4024. [PMID: 37373717 DOI: 10.3390/jcm12124024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 05/24/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
Colposcopy is the gold standard diagnostic tool for identifying cervical lesions. However, the accuracy of colposcopies depends on the proficiency of the colposcopist. Machine learning algorithms using an artificial intelligence (AI) system can quickly process large amounts of data and have been successfully applied in several clinical situations. This study evaluated the feasibility of an AI system as an assistive tool for diagnosing high-grade cervical intraepithelial neoplasia lesions compared to the human interpretation of cervical images. This two-centered, crossover, double-blind, randomized controlled trial included 886 randomly selected images. Four colposcopists (two proficient and two inexperienced) independently evaluated cervical images, once with and the other time without the aid of the Cerviray AI® system (AIDOT, Seoul, Republic of Korea). The AI aid demonstrated improved areas under the curve on the localization receiver-operating characteristic curve compared with the colposcopy impressions of colposcopists (difference 0.12, 95% confidence interval, 0.10-0.14, p < 0.001). Sensitivity and specificity also improved when using the AI system (89.18% vs. 71.33%; p < 0.001, 96.68% vs. 92.16%; p < 0.001, respectively). Additionally, the classification accuracy rate improved with the aid of AI (86.40% vs. 75.45%; p < 0.001). Overall, the AI system could be used as an assistive diagnostic tool for both proficient and inexperienced colposcopists in cervical cancer screenings to estimate the impression and location of pathologic lesions. Further utilization of this system could help inexperienced colposcopists confirm where to perform a biopsy to diagnose high-grade lesions.
Collapse
Affiliation(s)
- Seongmin Kim
- Gynecologic Cancer Center, CHA Ilsan Medical Center, CHA University College of Medicine, 1205 Jungang-ro, Ilsandong-gu, Goyang-si 10414, Republic of Korea
| | - Hyonggin An
- Department of Biostatistics, Korea University College of Medicine, 73 Inchon-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Hyun-Woong Cho
- Department of Obstetrics and Gynecology, Korea University College of Medicine, 73 Inchon-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Kyung-Jin Min
- Department of Obstetrics and Gynecology, Korea University College of Medicine, 73 Inchon-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Jin-Hwa Hong
- Department of Obstetrics and Gynecology, Korea University College of Medicine, 73 Inchon-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Sanghoon Lee
- Department of Obstetrics and Gynecology, Korea University College of Medicine, 73 Inchon-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Jae-Yun Song
- Department of Obstetrics and Gynecology, Korea University College of Medicine, 73 Inchon-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Jae-Kwan Lee
- Department of Obstetrics and Gynecology, Korea University College of Medicine, 73 Inchon-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| | - Nak-Woo Lee
- Department of Obstetrics and Gynecology, Korea University College of Medicine, 73 Inchon-ro, Seongbuk-gu, Seoul 02841, Republic of Korea
| |
Collapse
|
2
|
McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, Back T, Chesus M, Corrado GS, Darzi A, Etemadi M, Garcia-Vicente F, Gilbert FJ, Halling-Brown M, Hassabis D, Jansen S, Karthikesalingam A, Kelly CJ, King D, Ledsam JR, Melnick D, Mostofi H, Peng L, Reicher JJ, Romera-Paredes B, Sidebottom R, Suleyman M, Tse D, Young KC, De Fauw J, Shetty S. International evaluation of an AI system for breast cancer screening. Nature 2020; 577:89-94. [PMID: 31894144 DOI: 10.1038/s41586-019-1799-6] [Citation(s) in RCA: 1022] [Impact Index Per Article: 255.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Accepted: 11/05/2019] [Indexed: 02/07/2023]
Abstract
Screening mammography aims to identify breast cancer at earlier stages of the disease, when treatment can be more successful1. Despite the existence of screening programmes worldwide, the interpretation of mammograms is affected by high rates of false positives and false negatives2. Here we present an artificial intelligence (AI) system that is capable of surpassing human experts in breast cancer prediction. To assess its performance in the clinical setting, we curated a large representative dataset from the UK and a large enriched dataset from the USA. We show an absolute reduction of 5.7% and 1.2% (USA and UK) in false positives and 9.4% and 2.7% in false negatives. We provide evidence of the ability of the system to generalize from the UK to the USA. In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening.
Collapse
Affiliation(s)
| | | | | | | | | | - Hutan Ashrafian
- Department of Surgery and Cancer, Imperial College London, London, UK
- Institute of Global Health Innovation, Imperial College London, London, UK
| | | | | | | | - Ara Darzi
- Department of Surgery and Cancer, Imperial College London, London, UK
- Institute of Global Health Innovation, Imperial College London, London, UK
- Cancer Research UK Imperial Centre, Imperial College London, London, UK
| | | | | | - Fiona J Gilbert
- Department of Radiology, Cambridge Biomedical Research Centre, University of Cambridge, Cambridge, UK
| | | | | | - Sunny Jansen
- Verily Life Sciences, South San Francisco, CA, USA
| | | | | | | | | | | | | | | | | | | | - Richard Sidebottom
- The Royal Marsden Hospital, London, UK
- Thirlestaine Breast Centre, Cheltenham, UK
| | | | | | | | | | | |
Collapse
|
3
|
JOURNAL CLUB: Computer-Aided Detection of Lung Nodules on CT With a Computerized Pulmonary Vessel Suppressed Function. AJR Am J Roentgenol 2018; 210:480-488. [PMID: 29336601 DOI: 10.2214/ajr.17.18718] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
OBJECTIVE The purpose of this study is to evaluate radiologists' performance in detecting actionable nodules on chest CT when aided by a pulmonary vessel image-suppressed function and a computer-aided detection (CADe) system. MATERIALS AND METHODS A novel computerized pulmonary vessel image-suppressed function with a built-in CADe (VIS/CADe) system was developed to assist radiologists in interpreting thoracic CT images. Twelve radiologists participated in a comparative study without and with the VIS/CADe using 324 cases (involving 95 cancers and 83 benign nodules). The ratio of nodule-free cases to cases with nodules was 2:1 in the study. Localization ROC (LROC) methods were used for analysis. RESULTS In a stand-alone test, the VIS/CADe system detected 89.5% and 82.0% of malignant nodules and all nodules no smaller than 5 mm, respectively. The false-positive rate per CT study was 0.58. For the reader study, the mean area under the LROC curve (LROCAUC) for the detection of lung cancer significantly increased from 0.633 when unaided by VIS/CADe to 0.773 when aided by VIS/CADe (p < 0.01). For the detection of all clinically actionable nodules, the mean LROC-AUC significantly increased from 0.584 when unaided by VIS/CADe to 0.692 when detection was aided by VIS/CADe (p < 0.01). Radiologists detected 80.0% of cancers with VIS/CADe versus 64.45% of cancers unaided (p < 0.01); specificity decreased from 89.9% to 84.4% (p < 0.01). Radiologist interpretation time significantly decreased by 26%. CONCLUSION The VIS/CADe system significantly increased radiologists' detection of cancers and actionable nodules with somewhat lower specificity. With use of the VIS/CADe system, radiologists increased their interpretation speed by a factor of approximately one-fourth. Our study suggests that the technique has the potential to assist radiologists in the detection of additional actionable nodules on thoracic CT.
Collapse
|
4
|
Zanca F, Hillis SL, Claus F, Van Ongeval C, Celis V, Provoost V, Yoon HJ, Bosmans H. Correlation of free-response and receiver-operating-characteristic area-under-the-curve estimates: results from independently conducted FROC∕ROC studies in mammography. Med Phys 2012; 39:5917-29. [PMID: 23039631 DOI: 10.1118/1.4747262] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
PURPOSE From independently conducted free-response receiver operating characteristic (FROC) and receiver operating characteristic (ROC) experiments, to study fixed-reader associations between three estimators: the area under the alternative FROC (AFROC) curve computed from FROC data, the area under the ROC curve computed from FROC highest rating data, and the area under the ROC curve computed from confidence-of-disease ratings. METHODS Two hundred mammograms, 100 of which were abnormal, were processed by two image-processing algorithms and interpreted by four radiologists under the FROC paradigm. From the FROC data, inferred-ROC data were derived, using the highest rating assumption. Eighteen months afterwards, the images were interpreted by the same radiologists under the conventional ROC paradigm; conventional-ROC data (in contrast to inferred-ROC data) were obtained. FROC and ROC (inferred, conventional) data were analyzed using the nonparametric area-under-the-curve (AUC), (AFROC and ROC curve, respectively). Pearson correlation was used to quantify the degree of association between the modality-specific AUC indices and standard errors were computed using the bootstrap-after-bootstrap method. The magnitude of the correlations was assessed by comparison with computed Obuchowski-Rockette fixed reader correlations. RESULTS Average Pearson correlations (with 95% confidence intervals in square brackets) were: Corr(FROC, inferred ROC) = 0.76[0.64, 0.84] > Corr(inferred ROC, conventional ROC) = 0.40[0.18, 0.58] > Corr (FROC, conventional ROC) = 0.32[0.16, 0.46]. CONCLUSIONS Correlation between FROC and inferred-ROC data AUC estimates was high. Correlation between inferred- and conventional-ROC AUC was similar to the correlation between two modalities for a single reader using one estimation method, suggesting that the highest rating assumption might be questionable.
Collapse
Affiliation(s)
- Federica Zanca
- Department of Radiology, University Hospitals Leuven, Leuven, Belgium. @ac.be
| | | | | | | | | | | | | | | |
Collapse
|
5
|
Chakraborty DP, Haygood TM, Ryan J, Marom EM, Evanoff M, McEntee MF, Brennan PC. Quantifying the clinical relevance of a laboratory observer performance paradigm. Br J Radiol 2012; 85:1287-302. [PMID: 22573296 DOI: 10.1259/bjr/45866310] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
OBJECTIVE Laboratory observer performance measurements, receiver operating characteristic (ROC) and free-response ROC (FROC) differ from actual clinical interpretations in several respects, which could compromise their clinical relevance. The objective of this study was to develop a method for quantifying the clinical relevance of a laboratory paradigm and apply it to compare the ROC and FROC paradigms in a nodule detection task. METHODS The original prospective interpretations of 80 digital chest radiographs were classified by the truth panel as correct (C=1) or incorrect (C=0), depending on correlation with additional imaging, and the average of C was interpreted as the clinical figure of merit. FROC data were acquired for 21 radiologists and ROC data were inferred using the highest ratings. The areas under the ROC and alternative FROC curves were used as laboratory figures of merit. Bootstrap analysis was conducted to estimate conventional agreement measures between laboratory and clinical figures of merit. Also computed was a pseudovalue-based image-level correctness measure of the laboratory interpretations, whose association with C as measured by the area (rAUC) under an appropriately defined relevance ROC curve, is as a measure of the clinical relevance of a laboratory paradigm. RESULTS Low correlations (e.g. κ=0.244) and near chance level rAUC values (e.g. 0.598), attributable to differences between the clinical and laboratory paradigms, were observed. The absolute width of the confidence interval was 0.38 for the interparadigm differences of the conventional measures and 0.14 for the difference of the rAUCs. CONCLUSION The rAUC measure was consistent with the traditional measures but was more sensitive to the differences in clinical relevance. A new relevance ROC method for quantifying the clinical relevance of a laboratory paradigm is proposed.
Collapse
Affiliation(s)
- D P Chakraborty
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| | | | | | | | | | | | | |
Collapse
|
6
|
Abstract
A common task in medical imaging is assessing whether a new imaging system, or a variant of an existing one, is an improvement over an existing imaging technology. Imaging systems are generally quite complex, consisting of several components-for example, image acquisition hardware, image processing and display hardware and software, and image interpretation by radiologists- each of which can affect performance. Although it may appear odd to include the radiologist as a "component" of the imaging chain, because the radiologist's decision determines subsequent patient care, the effect of the human interpretation has to be included. Physical measurements such as modulation transfer function, signal-to-noise ratio, are useful for characterizing the nonhuman parts of the imaging chain under idealized and often unrealistic conditions, such as uniform background phantoms and target objects with sharp edges. Measuring the performance of the entire imaging chain, including the radiologist, and using real clinical images requires different methods that fall under the rubric of observer performance methods or "ROC" analysis, that involve collecting rating data on images. The purpose of this work is to review recent developments in this field, particularly with respect to the free-response method, where location information is also collected.
Collapse
Affiliation(s)
- Dev P Chakraborty
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
7
|
Chakraborty DP. Recent developments in imaging system assessment methodology, FROC analysis and the search model. NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH. SECTION A, ACCELERATORS, SPECTROMETERS, DETECTORS AND ASSOCIATED EQUIPMENT 2011; 648 Supplement 1:S297-S301. [PMID: 21804679 PMCID: PMC3144765 DOI: 10.1016/j.nima.2010.11.042] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
A frequent problem in imaging is assessing whether a new imaging system is an improvement over an existing standard. Observer performance methods, in particular the receiver operating characteristic (ROC) paradigm, are widely used in this context. In ROC analysis lesion location information is not used and consequently scoring ambiguities can arise in tasks, such as nodule detection, involving finding localized lesions. This paper reviews progress in the free-response ROC (FROC) paradigm in which the observer marks and rates suspicious regions and the location information is used to determine whether lesions were correctly localized. Reviewed are FROC data analysis, a search-model for simulating FROC data, predictions of the model and a method for estimating the parameters. The search model parameters are physically meaningful quantities that can guide system optimization.
Collapse
|
8
|
Freedman MT, Lo SCB, Seibel JC, Bromley CM. Lung Nodules: Improved Detection with Software That Suppresses the Rib and Clavicle on Chest Radiographs. Radiology 2011; 260:265-73. [DOI: 10.1148/radiol.11100153] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
9
|
Chakraborty DP. Clinical relevance of the ROC and free-response paradigms for comparing imaging system efficacies. RADIATION PROTECTION DOSIMETRY 2010; 139:37-41. [PMID: 20139268 PMCID: PMC2868120 DOI: 10.1093/rpd/ncq017] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Observer performance studies are widely used to assess medical imaging systems. Unlike technical/engineering measurements observer performance include the entire imaging chain and the radiologist. However, the widely used receiver operating characteristic (ROC) method ignores lesion localisation information. The free-response ROC (FROC) method uses the location information to appropriately reward or penalise correct or incorrect localisations, respectively. This paper describes a method for improving the clinical relevance of FROC studies. The method consists of assigning appropriate risk values to the different lesions that may be present on a single image. A high-risk lesion is one that is critical to detect and act upon, and is assigned a higher risk value than a low-risk lesion, one that is relatively innocuous. Instead of simply counting the number of lesions that are detected, as is done in conventional FROC analysis, a risk-weighted count is used. This has the advantage of rewarding detections of high-risk lesions commensurately more than detections of lower risk lesions. Simulations were used to demonstrate that the new method, termed case-based analysis, results in a higher figure of merit for an expert who detects more high-risk lesions than a naive observer who detects more low-risk lesions, even though both detect the same total number of lesions. Conventional free-response analysis is unable to distinguish between the two types of observers. This paper also comments on the issue of clinical relevance of ROC analysis vs. FROC for tasks that involve lesion localisation.
Collapse
|
10
|
Popescu LM. Model for the detection of signals in images with multiple suspicious locations. Med Phys 2009; 35:5565-74. [PMID: 19175114 DOI: 10.1118/1.3002413] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
A signal detection model is presented that combines a signal model and a noise model providing mathematical descriptions of the frequency of appearance of the signals, and of the signal-like features naturally occurring in the background. We derive expressions for the likelihood functions for the whole ensemble of observed suspicious locations, in various possible combinations of signals and false signal candidates. As a result, this formalism is able to describe several new types of detection tests using likelihood ratio statistics. We have a global image abnormality test and an individual signal detection test. The model also provides an alternative mechanism in which is selected the combination of signal and noise features candidates that has the maximum likelihood. These tests can be analyzed with a variety of operating characteristic curves (ROC, LROC, FROC, etc.). In the mathematical formalism of the model, all the details characterizing the suspicious features are reduced to a single scalar function, which we name the signal specificity function, representing the frequency that a signal takes a certain value relative to the frequency of having a false signal with the same value in an image of given size. The signal specificity function ranks the degree of suspiciousness of the features found, and can be used to unify into a single score all the suspicious feature characteristics, and then apply the usual decision conventions as in the Swensson's detection model [Med. Phys. 23, 1709-1725 (1996)]. We present several examples in which these tests are compared. We also show how the signal specificity function can be used to model various degrees of accuracy of the observer's knowledge about image noise and signal statistical properties. Aspects concerning modeling of the human observer are also discussed.
Collapse
Affiliation(s)
- Lucreţiu M Popescu
- Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6021, USA.
| |
Collapse
|
11
|
Chakraborty DP. Validation and statistical power comparison of methods for analyzing free-response observer performance studies. Acad Radiol 2008; 15:1554-66. [PMID: 19000872 DOI: 10.1016/j.acra.2008.07.018] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2008] [Revised: 07/16/2008] [Accepted: 07/17/2008] [Indexed: 11/26/2022]
Abstract
RATIONALE AND OBJECTIVES The aim of this work was to validate and compare the statistical powers of proposed methods for analyzing free-response data using a search-model-based simulator. MATERIALS AND METHODS A free-response data simulator is described that can model a single reader interpreting the same cases in two modalities, or two computer-aided detection (CAD) algorithms, or two human observers, interpreting the same cases in one modality. A variance components model, analogous to the Roe and Metz receiver-operating characteristic (ROC) data simulator, is described; it models intracase and intermodality correlations in free-response studies. Two generic observers were simulated: a quasi-human observer and a quasi-CAD algorithm. Null hypothesis (NH) validity and statistical powers of ROC, jackknife alternative free-response operating characteristic (JAFROC), a variant of JAFROC termed JAFROC-1, initial detection and candidate analysis (IDCA), and a nonparametric (NP) approach were investigated. RESULTS All methods had valid NH behavior over a wide range of simulator parameters. For equal numbers of normal and abnormal cases, for the human observer, the statistical power ranking of the methods was JAFROC-1 > JAFROC > (IDCA approximately NP) > ROC. For the CAD algorithm, the ranking was (NP approximately IDCA) > (JAFROC-1 approximately JAFROC) > ROC. In either case, the statistical power of the highest ranked method exceeded that of the lowest ranked method by about a factor of two. Dependence of statistical power on simulator parameters followed expected trends. For data sets with more abnormal cases than normal cases, JAFROC-1 power significantly exceeded JAFROC power. CONCLUSION Based on this work, the recommendation is to use JAFROC-1 for human observers (including human observers with CAD assist) and the NP method for evaluating CAD algorithms.
Collapse
|