1
|
Hegdé J. Deep learning can be used to train naïve, nonprofessional observers to detect diagnostic visual patterns of certain cancers in mammograms: a proof-of-principle study. J Med Imaging (Bellingham) 2020; 7:022410. [PMID: 32042860 PMCID: PMC6998757 DOI: 10.1117/1.jmi.7.2.022410] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/26/2019] [Indexed: 11/27/2022] Open
Abstract
The scientific, clinical, and pedagogical significance of devising methodologies to train nonprofessional subjects to recognize diagnostic visual patterns in medical images has been broadly recognized. However, systematic approaches to doing so remain poorly established. Using mammography as an exemplar case, we use a series of experiments to demonstrate that deep learning (DL) techniques can, in principle, be used to train naïve subjects to reliably detect certain diagnostic visual patterns of cancer in medical images. In the main experiment, subjects were required to learn to detect statistical visual patterns diagnostic of cancer in mammograms using only the mammograms and feedback provided following the subjects’ response. We found not only that the subjects learned to perform the task at statistically significant levels, but also that their eye movements related to image scrutiny changed in a learning-dependent fashion. Two additional, smaller exploratory experiments suggested that allowing subjects to re-examine the mammogram in light of various items of diagnostic information may help further improve DL of the diagnostic patterns. Finally, a fourth small, exploratory experiment suggested that the image information learned was similar across subjects. Together, these results prove the principle that DL methodologies can be used to train nonprofessional subjects to reliably perform those aspects of medical image perception tasks that depend on visual pattern recognition expertise.
Collapse
Affiliation(s)
- Jay Hegdé
- Augusta University, Medical College of Georgia, Departments of Neuroscience and Regenerative Medicine and Ophthalmology, Augusta, Georgia, United States
| |
Collapse
|
2
|
Gandomkar Z, Tay K, Brennan PC, Kozuch E, Mello-Thoms C. Can eye-tracking metrics be used to better pair radiologists in a mammogram reading task? Med Phys 2018; 45:4844-4856. [PMID: 30168153 DOI: 10.1002/mp.13161] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 08/06/2018] [Accepted: 08/10/2018] [Indexed: 11/08/2022] Open
Abstract
PURPOSE To propose a framework for optimal pairing of radiologists when reading mammograms based on their search patterns. MATERIALS AND METHODS Four experienced and four less-experienced radiologists were asked to assess 120 cases (59 with cancers) while their eye positions were tracked. Fourteen eye-tracking metrics were extracted to quantify the differences among radiologists' visual search pattern. For each radiologist and metric, less-experienced radiologists and expert readers were ranked based on the level of similarities in gaze patterns (from the most different to the most similar). Less-experienced readers and experts were also ranked based on the values of area under the receiver operating characteristic curve (AUC) after pairing (the best possible way of ranking). Using the Kendall's tau distance, rankings based on different metrics were compared with the best possible ranking. Using paired Wilcoxon signed-rank test, the AUC values when pairing in the best way were compared with pairing based on different metrics. Finally, we investigated the robustness of pairing strategies against the small sample size. RESULTS For ranking the experienced radiologists, results from eight metrics were as good as the best possible ranking. For the less-experienced ones, only one metric resulted in a ranking comparable to the best possible way of ranking. The AUC values of pairings based on these metrics did not differ significantly from the best pairing scenario. Compared to the pairings based on the cognitive metrics, the ranking based on AUC values varied more greatly with the sample size, suggesting that it is less robust against the small sample size compared to the cognitive metrics. CONCLUSION Different pairings may have different effects on performance; some are detrimental while some improve the performance of the pair. Using the suggested cognitive metrics, we can optimize the pairings even with a small dataset.
Collapse
Affiliation(s)
- Ziba Gandomkar
- Discipline of Medical Imaging and Radiation Sciences, Image Optimisation and Perception Group (MIOPeG), The University of Sydney, Sydney, NSW, Australia
| | - Kevin Tay
- Medical Imaging Department, Prince of Wales Hospital, Randwick, NSW, Australia
| | - Patrick C Brennan
- Discipline of Medical Imaging and Radiation Sciences, Image Optimisation and Perception Group (MIOPeG), The University of Sydney, Sydney, NSW, Australia
| | - Emma Kozuch
- University of Notre Dame, Notre Dame, Indiana, 46556, USA
| | - Claudia Mello-Thoms
- Discipline of Medical Imaging and Radiation Sciences, Image Optimisation and Perception Group (MIOPeG), The University of Sydney, Sydney, NSW, Australia.,Department of Radiology, The University of Iowa, Iowa City, IA, 52242, USA
| |
Collapse
|
3
|
Zhang J, Lo JY, Kuzmiak CM, Ghate SV, Yoon SC, Mazurowski MA. Using computer-extracted image features for modeling of error-making patterns in detection of mammographic masses among radiology residents. Med Phys 2015; 41:091907. [PMID: 25186394 DOI: 10.1118/1.4892173] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
PURPOSE Mammography is the most widely accepted and utilized screening modality for early breast cancer detection. Providing high quality mammography education to radiology trainees is essential, since excellent interpretation skills are needed to ensure the highest benefit of screening mammography for patients. The authors have previously proposed a computer-aided education system based on trainee models. Those models relate human-assessed image characteristics to trainee error. In this study, the authors propose to build trainee models that utilize features automatically extracted from images using computer vision algorithms to predict likelihood of missing each mass by the trainee. This computer vision-based approach to trainee modeling will allow for automatically searching large databases of mammograms in order to identify challenging cases for each trainee. METHODS The authors' algorithm for predicting the likelihood of missing a mass consists of three steps. First, a mammogram is segmented into air, pectoral muscle, fatty tissue, dense tissue, and mass using automated segmentation algorithms. Second, 43 features are extracted using computer vision algorithms for each abnormality identified by experts. Third, error-making models (classifiers) are applied to predict the likelihood of trainees missing the abnormality based on the extracted features. The models are developed individually for each trainee using his/her previous reading data. The authors evaluated the predictive performance of the proposed algorithm using data from a reader study in which 10 subjects (7 residents and 3 novices) and 3 experts read 100 mammographic cases. Receiver operating characteristic (ROC) methodology was applied for the evaluation. RESULTS The average area under the ROC curve (AUC) of the error-making models for the task of predicting which masses will be detected and which will be missed was 0.607 (95% CI,0.564-0.650). This value was statistically significantly different from 0.5 (p<0.0001). For the 7 residents only, the AUC performance of the models was 0.590 (95% CI,0.537-0.642) and was also significantly higher than 0.5 (p=0.0009). Therefore, generally the authors' models were able to predict which masses were detected and which were missed better than chance. CONCLUSIONS The authors proposed an algorithm that was able to predict which masses will be detected and which will be missed by each individual trainee. This confirms existence of error-making patterns in the detection of masses among radiology trainees. Furthermore, the proposed methodology will allow for the optimized selection of difficult cases for the trainees in an automatic and efficient manner.
Collapse
Affiliation(s)
- Jing Zhang
- Department of Radiology, Duke University School of Medicine, Durham, North Carolina 27705
| | - Joseph Y Lo
- Department of Radiology, Duke University School of Medicine, Durham, North Carolina 27705; Duke Cancer Institute, Durham, North Carolina 27710; Departments of Biomedical Engineering and Electrical & Computer Engineering, Duke University, Durham, North Carolina 27705; and Medical Physics Graduate Program, Duke University, Durham, North Carolina 27705
| | - Cherie M Kuzmiak
- Department of Radiology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, North Carolina 27599
| | - Sujata V Ghate
- Department of Radiology, Duke University School of Medicine, Durham, North Carolina 27705
| | - Sora C Yoon
- Department of Radiology, Duke University School of Medicine, Durham, North Carolina 27705
| | - Maciej A Mazurowski
- Department of Radiology, Duke University School of Medicine, Durham, North Carolina 27705; Duke Cancer Institute, Durham, North Carolina 27710; and Medical Physics Graduate Program, Duke University, Durham, North Carolina 27705
| |
Collapse
|
4
|
Zhang J, Silber JI, Mazurowski MA. Modeling false positive error making patterns in radiology trainees for improved mammography education. J Biomed Inform 2015; 54:50-7. [PMID: 25640462 DOI: 10.1016/j.jbi.2015.01.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2014] [Revised: 01/13/2015] [Accepted: 01/19/2015] [Indexed: 10/24/2022]
Abstract
INTRODUCTION While mammography notably contributes to earlier detection of breast cancer, it has its limitations, including a large number of false positive exams. Improved radiology education could potentially contribute to alleviating this issue. Toward this goal, in this paper we propose an algorithm for modeling of false positive error making among radiology trainees. Identifying troublesome locations for the trainees could focus their training and in turn improve their performance. METHODS The algorithm proposed in this paper predicts locations that are likely to result in a false positive error for each trainee based on the previous annotations made by the trainee. The algorithm consists of three steps. First, the suspicious false positive locations are identified in mammograms by Difference of Gaussian filter and suspicious regions are segmented by computer vision-based segmentation algorithms. Second, 133 features are extracted for each suspicious region to describe its distinctive characteristics. Third, a random forest classifier is applied to predict the likelihood of the trainee making a false positive error using the extracted features. The random forest classifier is trained using previous annotations made by the trainee. We evaluated the algorithm using data from a reader study in which 3 experts and 10 trainees interpreted 100 mammographic cases. RESULTS The algorithm was able to identify locations where the trainee will commit a false positive error with accuracy higher than an algorithm that selects such locations randomly. Specifically, our algorithm found false positive locations with 40% accuracy when only 1 location was selected for all cases for each trainee and 12% accuracy when 10 locations were selected. The accuracies for randomly identified locations were both 0% for these two scenarios. CONCLUSIONS In this first study on the topic, we were able to build computer models that were able to find locations for which a trainee will make a false positive error in images that were not previously seen by the trainee. Presenting the trainees with such locations rather than randomly selected ones may improve their educational outcomes.
Collapse
Affiliation(s)
- Jing Zhang
- Department of Radiology, Duke University School of Medicine, Durham, NC, United States; Computer Science Department, Lamar University, Beaumont, TX, United States.
| | - James I Silber
- Department of Biomedical Engineering, Duke University Pratt School of Engineering, Durham, NC, United States
| | - Maciej A Mazurowski
- Department of Radiology, Duke University School of Medicine, Durham, NC, United States; Duke Cancer Institute, United States; Duke Medical Physics Program, United States
| |
Collapse
|
5
|
Grimm LJ, Kuzmiak CM, Ghate SV, Yoon SC, Mazurowski MA. Radiology resident mammography training: interpretation difficulty and error-making patterns. Acad Radiol 2014; 21:888-92. [PMID: 24928157 DOI: 10.1016/j.acra.2014.01.025] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Revised: 01/20/2014] [Accepted: 01/24/2014] [Indexed: 11/25/2022]
Abstract
RATIONALE AND OBJECTIVES The purpose of this study was to better understand the concept of mammography difficulty and how it affects radiology resident performance. MATERIALS AND METHODS Seven radiology residents and three expert breast imagers reviewed 100 mammograms, consisting of bilateral medial lateral oblique and craniocaudal views, using a research workstation. The cases consisted of normal, benign, and malignant findings. Participants identified abnormalities and scored the difficulty and malignant potential for each case. Resident performance (sensitivity, specificity, and area under the receiver operating characteristic curve [AUC]) was calculated for self- and expert-assessed high and low difficulties. RESULTS For cases classified by self-assessed difficulty, the resident AUCs were 0.667 for high difficulty and 0.771 for low difficulty cases (P = .010). Resident sensitivities were 0.707 for high and 0.614 for low difficulty cases (P = .113). Resident specificities were 0.583 for high and 0.905 for low difficulty cases (P < .001). For cases classified by expert-assessed difficulty, the resident AUCs were 0.583 for high and 0.783 for low difficulty cases (P = .001). Resident sensitivities were 0.558 for high and 0.796 for low difficulty cases (P < .001). Resident specificities were 0.714 for high and 0.740 for low difficulty cases (P = .807). CONCLUSIONS Increased self- and expert-assessed difficulty is associated with a decrease in resident performance in mammography. However, while this lower performance is due to a decrease in specificity for self-assessed difficulty, it is due to a decrease in sensitivity for expert-assessed difficulty. These trends suggest that educators should provide a mix of self- and expert-assessed difficult cases in educational materials to maximize the effect of training on resident performance and confidence.
Collapse
|
6
|
Grimm LJ, Ghate SV, Yoon SC, Kuzmiak CM, Kim C, Mazurowski MA. Predicting error in detecting mammographic masses among radiology trainees using statistical models based on BI-RADS features. Med Phys 2014; 41:031909. [DOI: 10.1118/1.4866379] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
7
|
Voisin S, Pinto F, Morin‐Ducote G, Hudson KB, Tourassi GD. Predicting diagnostic error in radiology via eye‐tracking and image analytics: Preliminary investigation in mammography. Med Phys 2013; 40:101906. [DOI: 10.1118/1.4820536] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Affiliation(s)
- Sophie Voisin
- Biomedical Science and Engineering Center, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| | - Frank Pinto
- School of Engineering, Science, and Technology, Virginia State University, Petersburg, Virginia 23806
| | - Garnetta Morin‐Ducote
- Department of Radiology, University of Tennessee Medical Center at Knoxville, Knoxville, Tennessee 37920
| | - Kathleen B. Hudson
- Department of Radiology, University of Tennessee Medical Center at Knoxville, Knoxville, Tennessee 37920
| | - Georgia D. Tourassi
- Biomedical Science and Engineering Center, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
| |
Collapse
|
8
|
Tourassi G, Voisin S, Paquit V, Krupinski E. Investigating the link between radiologists' gaze, diagnostic decision, and image content. J Am Med Inform Assoc 2013; 20:1067-75. [PMID: 23788627 DOI: 10.1136/amiajnl-2012-001503] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE To investigate machine learning for linking image content, human perception, cognition, and error in the diagnostic interpretation of mammograms. METHODS Gaze data and diagnostic decisions were collected from three breast imaging radiologists and three radiology residents who reviewed 20 screening mammograms while wearing a head-mounted eye-tracker. Image analysis was performed in mammographic regions that attracted radiologists' attention and in all abnormal regions. Machine learning algorithms were investigated to develop predictive models that link: (i) image content with gaze, (ii) image content and gaze with cognition, and (iii) image content, gaze, and cognition with diagnostic error. Both group-based and individualized models were explored. RESULTS By pooling the data from all readers, machine learning produced highly accurate predictive models linking image content, gaze, and cognition. Potential linking of those with diagnostic error was also supported to some extent. Merging readers' gaze metrics and cognitive opinions with computer-extracted image features identified 59% of the readers' diagnostic errors while confirming 97.3% of their correct diagnoses. The readers' individual perceptual and cognitive behaviors could be adequately predicted by modeling the behavior of others. However, personalized tuning was in many cases beneficial for capturing more accurately individual behavior. CONCLUSIONS There is clearly an interaction between radiologists' gaze, diagnostic decision, and image content which can be modeled with machine learning algorithms.
Collapse
Affiliation(s)
- Georgia Tourassi
- Oak Ridge National Laboratory, Biomedical Science and Engineering Center, Oak Ridge, Tennessee, USA
| | | | | | | |
Collapse
|