1
|
Castner N, Arsiwala-Scheppach L, Mertens S, Krois J, Thaqi E, Kasneci E, Wahl S, Schwendicke F. Expert gaze as a usability indicator of medical AI decision support systems: a preliminary study. NPJ Digit Med 2024; 7:199. [PMID: 39068241 PMCID: PMC11283514 DOI: 10.1038/s41746-024-01192-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 07/12/2024] [Indexed: 07/30/2024] Open
Abstract
Given the current state of medical artificial intelligence (AI) and perceptions towards it, collaborative systems are becoming the preferred choice for clinical workflows. This work aims to address expert interaction with medical AI support systems to gain insight towards how these systems can be better designed with the user in mind. As eye tracking metrics have been shown to be robust indicators of usability, we employ them for evaluating the usability and user interaction with medical AI support systems. We use expert gaze to assess experts' interaction with an AI software for caries detection in bitewing x-ray images. We compared standard viewing of bitewing images without AI support versus viewing where AI support could be freely toggled on and off. We found that experts turned the AI on for roughly 25% of the total inspection task, and generally turned it on halfway through the course of the inspection. Gaze behavior showed that when supported by AI, more attention was dedicated to user interface elements related to the AI support, with more frequent transitions from the image itself to these elements. When considering that expert visual strategy is already optimized for fast and effective image inspection, such interruptions in attention can lead to increased time needed for the overall assessment. Gaze analysis provided valuable insights into an AI's usability for medical image inspection. Further analyses of these tools and how to delineate metrical measures of usability should be developed.
Collapse
Affiliation(s)
- Nora Castner
- Carl Zeiss Vision International GmbH, Tübingen, Germany.
- University of Tübingen, Tübingen, Germany.
| | | | - Sarah Mertens
- Charité - Univesitätsmedizin, Oral Diagnostics, Digital Health and Services Research, Berlin, Germany
| | - Joachim Krois
- Charité - Univesitätsmedizin, Oral Diagnostics, Digital Health and Services Research, Berlin, Germany
| | - Enkeleda Thaqi
- Technical University of Munich, Human-Centered Technologies for Learning, Munich, Germany
| | - Enkelejda Kasneci
- Technical University of Munich, Human-Centered Technologies for Learning, Munich, Germany
| | - Siegfried Wahl
- Carl Zeiss Vision International GmbH, Tübingen, Germany
- Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany
| | - Falk Schwendicke
- Ludwig Maximilian University, Operative, Preventative and Pediatric Dentistry and Periodontology, Munich, Germany
| |
Collapse
|
2
|
Cherian J, Ray S, Taele P, Koh JI, Hammond T. Exploring the Impact of the NULL Class on In-the-Wild Human Activity Recognition. SENSORS (BASEL, SWITZERLAND) 2024; 24:3898. [PMID: 38931682 PMCID: PMC11207638 DOI: 10.3390/s24123898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 06/07/2024] [Accepted: 06/13/2024] [Indexed: 06/28/2024]
Abstract
Monitoring activities of daily living (ADLs) plays an important role in measuring and responding to a person's ability to manage their basic physical needs. Effective recognition systems for monitoring ADLs must successfully recognize naturalistic activities that also realistically occur at infrequent intervals. However, existing systems primarily focus on either recognizing more separable, controlled activity types or are trained on balanced datasets where activities occur more frequently. In our work, we investigate the challenges associated with applying machine learning to an imbalanced dataset collected from a fully in-the-wild environment. This analysis shows that the combination of preprocessing techniques to increase recall and postprocessing techniques to increase precision can result in more desirable models for tasks such as ADL monitoring. In a user-independent evaluation using in-the-wild data, these techniques resulted in a model that achieved an event-based F1-score of over 0.9 for brushing teeth, combing hair, walking, and washing hands. This work tackles fundamental challenges in machine learning that will need to be addressed in order for these systems to be deployed and reliably work in the real world.
Collapse
Affiliation(s)
| | | | | | | | - Tracy Hammond
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX 77843, USA; (S.R.); (P.T.); (J.I.K.)
| |
Collapse
|
3
|
Arsiwala-Scheppach LT, Castner N, Rohrer C, Mertens S, Kasneci E, Cejudo Grano de Oro JE, Krois J, Schwendicke F. Gaze patterns of dentists while evaluating bitewing radiographs. J Dent 2023; 135:104585. [PMID: 37301462 DOI: 10.1016/j.jdent.2023.104585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/15/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
OBJECTIVES Understanding dentists' gaze patterns on radiographs may allow to unravel sources of their limited accuracy and develop strategies to mitigate them. We conducted an eye tracking experiment to characterize dentists' scanpaths and thus their gaze patterns when assessing bitewing radiographs to detect primary proximal carious lesions. METHODS 22 dentists assessed a median of nine bitewing images each, resulting in 170 datasets after excluding data with poor quality of gaze recording. Fixation was defined as an area of attentional focus related to visual stimuli. We calculated time to first fixation, fixation count, average fixation duration, and fixation frequency. Analyses were performed for the entire image and stratified by (1) presence of carious lesions and/or restorations and (2) lesion depth (E1/2: outer/inner enamel; D1-3: outer-inner third of dentin). We also examined the transitional nature of the dentists' gaze. RESULTS Dentists had more fixations on teeth with lesions and/or restorations (median=138 [interquartile range=87, 204]) than teeth without them (32 [15, 66]), p<0.001. Notably, teeth with lesions had longer fixation durations (407 milliseconds [242, 591]) than those with restorations (289 milliseconds [216, 337]), p<0.001. Time to first fixation was longer for teeth with E1 lesions (17,128 milliseconds [8813, 21,540]) than lesions of other depths (p = 0.049). The highest number of fixations were on teeth with D2 lesions (43 [20, 51]) and lowest on teeth with E1 lesions (5 [1, 37]), p<0.001. Generally, a systematic tooth-by-tooth gaze pattern was observed. CONCLUSIONS As hypothesized, while visually inspecting bitewing radiographic images, dentists employed a heightened focus on certain image features/areas, relevant to the assigned task. Also, they generally examined the entire image in a systematic tooth-by-tooth pattern.
Collapse
Affiliation(s)
- Lubaina T Arsiwala-Scheppach
- Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Germany; ITU/WHO Focus Group AI on Health, Topic Group Dental Diagnostics and Digital Dentistry, Switzerland.
| | - Nora Castner
- Department of Computer Science, University of Tuebingen, Tuebingen, Germany
| | - Csaba Rohrer
- Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Germany
| | - Sarah Mertens
- Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Germany
| | - Enkelejda Kasneci
- Department of Computer Science, Technical University of Munich, Germany
| | - Jose Eduardo Cejudo Grano de Oro
- Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Germany
| | - Joachim Krois
- Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Germany; ITU/WHO Focus Group AI on Health, Topic Group Dental Diagnostics and Digital Dentistry, Switzerland
| | - Falk Schwendicke
- Department of Oral Diagnostics, Digital Health and Health Services Research, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Germany; ITU/WHO Focus Group AI on Health, Topic Group Dental Diagnostics and Digital Dentistry, Switzerland
| |
Collapse
|
4
|
Suman AA, Russo C, Carrigan A, Nalepka P, Liquet-Weiland B, Newport RA, Kumari P, Di Ieva A. Spatial and time domain analysis of eye-tracking data during screening of brain magnetic resonance images. PLoS One 2021; 16:e0260717. [PMID: 34855867 PMCID: PMC8639086 DOI: 10.1371/journal.pone.0260717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 11/15/2021] [Indexed: 12/01/2022] Open
Abstract
INTRODUCTION Eye-tracking research has been widely used in radiology applications. Prior studies exclusively analysed either temporal or spatial eye-tracking features, both of which alone do not completely characterise the spatiotemporal dynamics of radiologists' gaze features. PURPOSE Our research aims to quantify human visual search dynamics in both domains during brain stimuli screening to explore the relationship between reader characteristics and stimuli complexity. The methodology can be used to discover strategies to aid trainee radiologists in identifying pathology, and to select regions of interest for machine vision applications. METHOD The study was performed using eye-tracking data 5 seconds in duration from 57 readers (15 Brain-experts, 11 Other-experts, 5 Registrars and 26 Naïves) for 40 neuroradiological images as stimuli (i.e., 20 normal and 20 pathological brain MRIs). The visual scanning patterns were analysed by calculating the fractal dimension (FD) and Hurst exponent (HE) using re-scaled range (R/S) and detrended fluctuation analysis (DFA) methods. The FD was used to measure the spatial geometrical complexity of the gaze patterns, and the HE analysis was used to measure participants' focusing skill. The focusing skill is referred to persistence/anti-persistence of the participants' gaze on the stimulus over time. Pathological and normal stimuli were analysed separately both at the "First Second" and full "Five Seconds" viewing duration. RESULTS All experts were more focused and a had higher visual search complexity compared to Registrars and Naïves. This was seen in both the pathological and normal stimuli in the first and five second analyses. The Brain-experts subgroup was shown to achieve better focusing skill than Other-experts due to their domain specific expertise. Indeed, the FDs found when viewing pathological stimuli were higher than those in normal ones. Viewing normal stimuli resulted in an increase of FD found in five second data, unlike pathological stimuli, which did not change. In contrast to the FDs, the scanpath HEs of pathological and normal stimuli were similar. However, participants' gaze was more focused for "Five Seconds" than "First Second" data. CONCLUSIONS The HE analysis of the scanpaths belonging to all experts showed that they have greater focus than Registrars and Naïves. This may be related to their higher visual search complexity than non-experts due to their training and expertise.
Collapse
Affiliation(s)
- Abdulla Al Suman
- Computational NeuroSurgery (CNS) Lab, Faculty of Medicine, Health, and Human Sciences, Macquarie University, Sydney, Australia
| | - Carlo Russo
- Computational NeuroSurgery (CNS) Lab, Faculty of Medicine, Health, and Human Sciences, Macquarie University, Sydney, Australia
| | - Ann Carrigan
- School of Psychological Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, Sydney, Australia
- Centre for Elite Performance, Expertise and Training, Macquarie University, Sydney, Australia
| | - Patrick Nalepka
- School of Psychological Sciences, Faculty of Medicine, Health, and Human Sciences, Macquarie University, Sydney, Australia
- Centre for Elite Performance, Expertise and Training, Macquarie University, Sydney, Australia
| | - Benoit Liquet-Weiland
- Department of Mathematics and Statistics, Faculty of Science and Engineering, Macquarie University, Sydney, Australia
| | - Robert Ahadizad Newport
- Computational NeuroSurgery (CNS) Lab, Faculty of Medicine, Health, and Human Sciences, Macquarie University, Sydney, Australia
| | - Poonam Kumari
- Computational NeuroSurgery (CNS) Lab, Faculty of Medicine, Health, and Human Sciences, Macquarie University, Sydney, Australia
| | - Antonio Di Ieva
- Computational NeuroSurgery (CNS) Lab, Faculty of Medicine, Health, and Human Sciences, Macquarie University, Sydney, Australia
- Centre for Elite Performance, Expertise and Training, Macquarie University, Sydney, Australia
| |
Collapse
|
5
|
Analysis of the Visual Perception of Female Breast Aesthetics and Symmetry. Plast Reconstr Surg 2019; 144:1257-1266. [DOI: 10.1097/prs.0000000000006292] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
6
|
Abstract
Breast cancer is the most common cancer among females worldwide and large volumes of breast images are produced and interpreted annually. As long as radiologists interpret these images, the diagnostic accuracy will be limited by human factors and both false-positive and false-negative errors might occur. By understanding visual search in breast images, we may be able to identify causes of diagnostic errors, find ways to reduce them, and also provide a better education to radiology residents. Many visual search studies in breast radiology have been devoted to mammography. These studies showed that 70% of missed lesions on mammograms attract radiologists' visual attention and that a plethora of different reasons, such as satisfaction of search, incorrect background sampling, and incorrect first impression can cause diagnostic errors in the interpretation of mammograms. Recently, highly accurate tools, which rely on both eye-tracking data and the content of the mammogram, have been proposed to provide feedback to the radiologists. Improving these tools and determining the optimal pathway to integrate them in the radiology workflow could be a possible line of future research. Moreover, in the past few years deep learning has led to improving diagnostic accuracy of computerized diagnostic tools and visual search studies will be required to understand how radiologists interact with the prompts from these tools, and to identify the best way to utilize them. Visual search in other breast imaging modalities, such as breast ultrasound and digital breast tomosynthesis, have so far received less attention, probably due to associated complexities of eye-tracking monitoring and analysing the data. For example, in digital breast tomosynthesis, scrolling through the image results in longer trials, adds a new factor to the study's complexity and makes calculation of gaze parameters more difficult. However, considering the wide utilization of three-dimensional imaging modalities, more visual search studies involving reading stack-view examinations are required in the future. To conclude, in the past few decades visual search studies provided extensive understanding about underlying reasons for diagnostic errors in breast radiology and characterized differences between experts' and novices' visual search patterns. Further visual search studies are required to investigate radiologists' interaction with relatively newer imaging modalities and artificial intelligence tools.
Collapse
Affiliation(s)
- Ziba Gandomkar
- BreastScreen Reader Assessment Strategy (BREAST), Discipline of Medical Imaging Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | - Claudia Mello-Thoms
- Department of Radiology, Carver College of Medicine, University of Iowa, Iowa City, IA, US
| |
Collapse
|
7
|
Moreira FC, Aihara AY, Lederman HM, Pisa IT, Tenório JM. Cognitive map to support the diagnosis of solitary bone tumors in pediatric patients. Radiol Bras 2018; 51:297-302. [PMID: 30369656 PMCID: PMC6198841 DOI: 10.1590/0100-3984.2017.0121] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Abstract
Collapse
Affiliation(s)
- Felipe Costa Moreira
- Department of Health Informatics, Escola Paulista de Medicina da Universidade Federal de São Paulo (EPM-Unifesp), São Paulo, SP, Brazil
| | - André Yui Aihara
- Department of Diagnostic Imaging, Escola Paulista de Medicina da Universidade Federal de São Paulo (EPM-Unifesp), São Paulo, SP, Brazil
| | - Henrique Manoel Lederman
- Department of Diagnostic Imaging, Escola Paulista de Medicina da Universidade Federal de São Paulo (EPM-Unifesp), São Paulo, SP, Brazil
| | - Ivan Torres Pisa
- Department of Health Informatics, Escola Paulista de Medicina da Universidade Federal de São Paulo (EPM-Unifesp), São Paulo, SP, Brazil
| | - Josceli Maria Tenório
- Department of Health Informatics, Escola Paulista de Medicina da Universidade Federal de São Paulo (EPM-Unifesp), São Paulo, SP, Brazil
| |
Collapse
|
8
|
Gandomkar Z, Tay K, Brennan PC, Mello-Thoms C. Recurrence quantification analysis of radiologists' scanpaths when interpreting mammograms. Med Phys 2018; 45:3052-3062. [PMID: 29694675 DOI: 10.1002/mp.12935] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Revised: 02/22/2018] [Accepted: 02/23/2018] [Indexed: 11/05/2022] Open
Abstract
PURPOSE The purpose of this study was to Propose a classifier based on recurrence quantification analysis (RQA) metrics for distinguishing experts' scanpaths from those of less-experienced readers and to explore the association of spatiotemporal dynamics of the mammographic scanpaths with the characteristics of cases and radiologists using RQA metrics. MATERIALS AND METHODS Eye movements were recorded from eight radiologists (two cohorts: four experienced and four less-experienced) while reading 120 mammograms (59 cancer, 61 normal). Ten RQA measures were extracted for each recorded scanpath. The measures described the temporal distribution of recurrent fixations as well as laminar and deterministic eye movements. Recurrent fixations are fixations that are located close to a previously fixated point in a scanpath. Deterministic eye movements represent looking back and forth between two locations, while laminar eye movements indicate detailed scanning of an area with consecutive fixations. The RQA metrics along with six conventional eye-tracking parameters were used to construct a classifier for distinguishing experts' scanpaths from those of less-experienced readers. Leave-one-out cross validation was used for evaluating the classifier. For each reader cohort, the ANOVA analysis was done to study the relationship of RQA measures with breast density, case pathology, readers' expertise, and readers' decisions on the case. The proportions of laminar and deterministic movements involved fixations in the location of lesions were also compared for two reader cohorts using two proportion z-tests. RESULTS All RQA measures differed significantly between scanpaths of experienced readers and those of less-experienced readers. The classifier achieved an area under the receiver operating characteristic curve of 0.89 (0.87-0.91) for detecting experts' scanpaths. Proportionately more refixations and laminar and deterministic sequences were in the location of the lesion for the experienced cohort compared to the less-experienced cohort (all P-values < 0.001). Eight and four RQA measures differed between normal and cancer cases for the experienced and less experienced readers, respectively. None of metrics differed between fatty and dense breasts for the less experienced readers, while two measures resulted into a significant difference for the experienced readers. For experts, six measures differed significantly between true negatives and false positives and nine were significantly different between true positives and false negatives. For the less-experienced cohort, the corresponding figures were seven and one measures, respectively. CONCLUSION The RQA measures can quantify the differences among experienced and less experienced radiologists. They also capture differences among experts' scanpaths related to case pathology and radiologists' decisions on the case.
Collapse
Affiliation(s)
- Ziba Gandomkar
- Image Optimisation and Perception Group (MIOPeG), Discipline of Medical Imaging and Radiation Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Kevin Tay
- Medical Imaging Department, Prince of Wales Hospital, Randwick, NSW, Australia
| | - Patrick C Brennan
- Image Optimisation and Perception Group (MIOPeG), Discipline of Medical Imaging and Radiation Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Claudia Mello-Thoms
- Image Optimisation and Perception Group (MIOPeG), Discipline of Medical Imaging and Radiation Sciences, The University of Sydney, Sydney, NSW, Australia.,Department of Biomedical Informatics, School of Medicine, The University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
9
|
Alamudun F, Paulus P, Yoon HJ, Tourassi G. Modeling sequential context effects in diagnostic interpretation of screening mammograms. J Med Imaging (Bellingham) 2018; 5:031408. [PMID: 29564370 PMCID: PMC5858736 DOI: 10.1117/1.jmi.5.3.031408] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Accepted: 02/19/2018] [Indexed: 11/29/2022] Open
Abstract
Prior research has shown that physicians’ medical decisions can be influenced by sequential context, particularly in cases where successive stimuli exhibit similar characteristics when analyzing medical images. This type of systematic error is known to psychophysicists as sequential context effect as it indicates that judgments are influenced by features of and decisions about the preceding case in the sequence of examined cases, rather than being based solely on the peculiarities unique to the present case. We determine if radiologists experience some form of context bias, using screening mammography as the use case. To this end, we explore correlations between previous perceptual behavior and diagnostic decisions and current decisions. We hypothesize that a radiologist’s visual search pattern and diagnostic decisions in previous cases are predictive of the radiologist’s current diagnostic decisions. To test our hypothesis, we tasked 10 radiologists of varied experience to conduct blind reviews of 100 four-view screening mammograms. Eye-tracking data and diagnostic decisions were collected from each radiologist under conditions mimicking clinical practice. Perceptual behavior was quantified using the fractal dimension of gaze scanpath, which was computed using the Minkowski–Bouligand box-counting method. To test the effect of previous behavior and decisions, we conducted a multifactor fixed-effects ANOVA. Further, to examine the predictive value of previous perceptual behavior and decisions, we trained and evaluated a predictive model for radiologists’ current diagnostic decisions. ANOVA tests showed that previous visual behavior, characterized by fractal analysis, previous diagnostic decisions, and image characteristics of previous cases are significant predictors of current diagnostic decisions. Additionally, predictive modeling of diagnostic decisions showed an overall improvement in prediction error when the model is trained on additional information about previous perceptual behavior and diagnostic decisions.
Collapse
Affiliation(s)
- Folami Alamudun
- Oak Ridge National Laboratory, Computational Sciences and Engineering Division, Oak Ridge, Tennessee, United States.,Oak Ridge National Laboratory, Health Data Sciences Institute, Oak Ridge, Tennessee, United States
| | - Paige Paulus
- University of Tennessee, Department of Mechanical, Aerospace, and Biomedical Engineering, Knoxville, Tennessee, United States
| | - Hong-Jun Yoon
- Oak Ridge National Laboratory, Computational Sciences and Engineering Division, Oak Ridge, Tennessee, United States.,Oak Ridge National Laboratory, Health Data Sciences Institute, Oak Ridge, Tennessee, United States
| | - Georgia Tourassi
- Oak Ridge National Laboratory, Computational Sciences and Engineering Division, Oak Ridge, Tennessee, United States.,Oak Ridge National Laboratory, Health Data Sciences Institute, Oak Ridge, Tennessee, United States.,University of Tennessee, Department of Mechanical, Aerospace, and Biomedical Engineering, Knoxville, Tennessee, United States
| |
Collapse
|