1
|
Mohamed Selim A, Barz M, Bhatti OS, Alam HMT, Sonntag D. A review of machine learning in scanpath analysis for passive gaze-based interaction. Front Artif Intell 2024; 7:1391745. [PMID: 38903158 PMCID: PMC11188426 DOI: 10.3389/frai.2024.1391745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 05/15/2024] [Indexed: 06/22/2024] Open
Abstract
The scanpath is an important concept in eye tracking. It refers to a person's eye movements over a period of time, commonly represented as a series of alternating fixations and saccades. Machine learning has been increasingly used for the automatic interpretation of scanpaths over the past few years, particularly in research on passive gaze-based interaction, i.e., interfaces that implicitly observe and interpret human eye movements, with the goal of improving the interaction. This literature review investigates research on machine learning applications in scanpath analysis for passive gaze-based interaction between 2012 and 2022, starting from 2,425 publications and focussing on 77 publications. We provide insights on research domains and common learning tasks in passive gaze-based interaction and present common machine learning practices from data collection and preparation to model selection and evaluation. We discuss commonly followed practices and identify gaps and challenges, especially concerning emerging machine learning topics, to guide future research in the field.
Collapse
Affiliation(s)
- Abdulrahman Mohamed Selim
- German Research Center for Artificial Intelligence (DFKI), Interactive Machine Learning Department, Saarbrücken, Germany
| | - Michael Barz
- German Research Center for Artificial Intelligence (DFKI), Interactive Machine Learning Department, Saarbrücken, Germany
- Applied Artificial Intelligence, University of Oldenburg, Oldenburg, Germany
| | - Omair Shahzad Bhatti
- German Research Center for Artificial Intelligence (DFKI), Interactive Machine Learning Department, Saarbrücken, Germany
| | - Hasan Md Tusfiqur Alam
- German Research Center for Artificial Intelligence (DFKI), Interactive Machine Learning Department, Saarbrücken, Germany
| | - Daniel Sonntag
- German Research Center for Artificial Intelligence (DFKI), Interactive Machine Learning Department, Saarbrücken, Germany
- Applied Artificial Intelligence, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
2
|
Ma X, Liu Y, Clariana R, Gu C, Li P. From eye movements to scanpath networks: A method for studying individual differences in expository text reading. Behav Res Methods 2023; 55:730-750. [PMID: 35445941 PMCID: PMC10027820 DOI: 10.3758/s13428-022-01842-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/17/2022] [Indexed: 11/08/2022]
Abstract
Eye movements have been examined as an index of attention and comprehension during reading in the literature for over 30 years. Although eye-movement measurements are acknowledged as reliable indicators of readers' comprehension skill, few studies have analyzed eye-movement patterns using network science. In this study, we offer a new approach to analyze eye-movement data. Specifically, we recorded visual scanpaths when participants were reading expository science text, and used these to construct scanpath networks that reflect readers' processing of the text. Results showed that low ability and high ability readers' scanpath networks exhibited distinctive properties, which are reflected in different network metrics including density, centrality, small-worldness, transitivity, and global efficiency. Such patterns provide a new way to show how skilled readers, as compared with less skilled readers, process information more efficiently. Implications of our analyses are discussed in light of current theories of reading comprehension.
Collapse
Affiliation(s)
- Xiaochuan Ma
- Department of Psychology, The Pennsylvania State University, Moore Building, University Park, PA, 16802, USA
| | - Yikang Liu
- Department of Biomedical Engineering, The Pennsylvania State University, Millennium Science Complex, University Park, PA, 16802, USA
| | - Roy Clariana
- Department of Learning and Performance Systems, Keller Building, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Chanyuan Gu
- Department of Chinese and Bilingual Studies, Faculty of Humanities, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
| | - Ping Li
- Department of Chinese and Bilingual Studies, Faculty of Humanities, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.
| |
Collapse
|
3
|
Newport RA, Russo C, Liu S, Suman AA, Di Ieva A. SoftMatch: Comparing Scanpaths Using Combinatorial Spatio-Temporal Sequences with Fractal Curves. SENSORS (BASEL, SWITZERLAND) 2022; 22:7438. [PMID: 36236535 PMCID: PMC9570610 DOI: 10.3390/s22197438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 09/24/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
Recent studies matching eye gaze patterns with those of others contain research that is heavily reliant on string editing methods borrowed from early work in bioinformatics. Previous studies have shown string editing methods to be susceptible to false negative results when matching mutated genes or unordered regions of interest in scanpaths. Even as new methods have emerged for matching amino acids using novel combinatorial techniques, scanpath matching is still limited by a traditional collinear approach. This approach reduces the ability to discriminate between free viewing scanpaths of two people looking at the same stimulus due to the heavy weight placed on linearity. To overcome this limitation, we here introduce a new method called SoftMatch to compare pairs of scanpaths. SoftMatch diverges from traditional scanpath matching in two different ways: firstly, by preserving locality using fractal curves to reduce dimensionality from 2D Cartesian (x,y) coordinates into 1D (h) Hilbert distances, and secondly by taking a combinatorial approach to fixation matching using discrete Fréchet distance measurements between segments of scanpath fixation sequences. These matching "sequences of fixations over time" are a loose acronym for SoftMatch. Results indicate high degrees of statistical and substantive significance when scoring matches between scanpaths made during free-form viewing of unfamiliar stimuli. Applications of this method can be used to better understand bottom up perceptual processes extending to scanpath outlier detection, expertise analysis, pathological screening, and salience prediction.
Collapse
Affiliation(s)
- Robert Ahadizad Newport
- Faculty of Medicine, Health and Human Sciences, Macquarie Medical School, Macquarie University, Balaclava Road, Sydney, NSW 2109, Australia
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Macquarie University, Balaclava Road, Sydney, NSW 2109, Australia
| | - Carlo Russo
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Macquarie University, Balaclava Road, Sydney, NSW 2109, Australia
| | - Sidong Liu
- Faculty of Medicine, Health and Human Sciences, Macquarie Medical School, Macquarie University, Balaclava Road, Sydney, NSW 2109, Australia
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Macquarie University, Balaclava Road, Sydney, NSW 2109, Australia
| | - Abdulla Al Suman
- Faculty of Medicine, Health and Human Sciences, Macquarie Medical School, Macquarie University, Balaclava Road, Sydney, NSW 2109, Australia
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Macquarie University, Balaclava Road, Sydney, NSW 2109, Australia
| | - Antonio Di Ieva
- Faculty of Medicine, Health and Human Sciences, Macquarie Medical School, Macquarie University, Balaclava Road, Sydney, NSW 2109, Australia
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Macquarie University, Balaclava Road, Sydney, NSW 2109, Australia
| |
Collapse
|
4
|
Gandomkar Z, Brennan PC, Suleiman ME. Optimizing Radiologic Detection of COVID-19. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
5
|
Li T, Gandomkar Z, Trieu PDY, Lewis SJ, Brennan PC. Differences in lesion interpretation between radiologists in two countries: Lessons from a digital breast tomosynthesis training test set. Asia Pac J Clin Oncol 2021; 18:441-447. [PMID: 34811880 DOI: 10.1111/ajco.13686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 09/23/2021] [Indexed: 11/29/2022]
Abstract
INTRODUCTION In many western countries, there is good evidence documenting the performance of radiologists reading digital breast tomosynthesis (DBT) images. However, the diagnostic efficiency of Chinese radiologists using DBT, particularly type of errors being made and type of cancers being missed, is understudied. This study aims to investigate the pattern of diagnostic errors across different lesion types produced by Chinese radiologists diagnosing from DBT images. Australian radiologists will be used as a benchmark. METHODS Twelve Chinese radiologists read a DBT test set and located each perceived cancer lesion. True positives, false positives (FP), true negatives and false negatives (FN) were generated. The same test set was also read by 14 Australian radiologists. Z-scores and Pearson correlations were used to compare interpretation of lesions and identification of normal appearances between two groups of radiologists. RESULTS Architectural distortions (p < .001) and stellate masses (p = .02) were more difficult for Chinese radiologists to correctly diagnose compared to their Australian counterparts. Chinese readers categorised more FPs as discrete masses (p < .001) and fewer FPs as architectural distortions (p < .001) comparing with Australian radiologists. The percentages of FN for each cancer case were not correlated (r = 0.37, p = .18) but the percentages of FP for each normal case were moderately correlated (r = 0.52, p = .02) between two groups of readers. CONCLUSIONS Architectural distortions and stellate masses were challenging to Chinese radiologists when reading DBT. Our findings proposed the need of development of training and education programs focussing on imaging cases tailored for specific groups of readers with certain interpretation patterns.
Collapse
Affiliation(s)
- Tong Li
- BreastScreen Reader Assessment Strategy, Medical Imaging Science, School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, New South Wales, Australia
| | - Ziba Gandomkar
- Medical Imaging Science, School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, New South Wales, Australia
| | - Phuong Dung Yun Trieu
- BreastScreen Reader Assessment Strategy, Medical Imaging Science, School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, New South Wales, Australia
| | - Sarah J Lewis
- BreastScreen Reader Assessment Strategy, Medical Imaging Science, School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, New South Wales, Australia.,Medical Imaging Science, School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, New South Wales, Australia
| | - Patrick C Brennan
- Medical Imaging Science, School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, New South Wales, Australia
| |
Collapse
|
6
|
Gandomkar Z, Siviengphanom S, Ekpo EU, Suleiman M, Taba ST, Li T, Xu D, Evans KK, Lewis SJ, Wolfe JM, Brennan PC. Global processing provides malignancy evidence complementary to the information captured by humans or machines following detailed mammogram inspection. Sci Rep 2021; 11:20122. [PMID: 34635726 PMCID: PMC8505651 DOI: 10.1038/s41598-021-99582-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 09/13/2021] [Indexed: 12/15/2022] Open
Abstract
The information captured by the gist signal, which refers to radiologists’ first impression arising from an initial global image processing, is poorly understood. We examined whether the gist signal can provide complementary information to data captured by radiologists (experiment 1), or computer algorithms (experiment 2) based on detailed mammogram inspection. In the first experiment, 19 radiologists assessed a case set twice, once based on a half-second image presentation (i.e., gist signal) and once in the usual viewing condition. Their performances in two viewing conditions were compared using repeated measure correlation (rm-corr). The cancer cases (19 cases × 19 readers) exhibited non-significant trend with rm-corr = 0.012 (p = 0.82, CI: −0.09, 0.12). For normal cases (41 cases × 19 readers), a weak correlation of rm-corr = 0.238 (p < 0.001, CI: 0.17, 0.30) was found. In the second experiment, we combined the abnormality score from a state-of-the-art deep learning-based tool (DL) with the radiological gist signal using a support vector machine (SVM). To obtain the gist signal, 53 radiologists assessed images based on half-second image presentation. The SVM performance for each radiologist and an average reader, whose gist responses were the mean abnormality scores given by all 53 readers to each image was assessed using leave-one-out cross-validation. For the average reader, the AUC for gist, DL, and the SVM, were 0.76 (CI: 0.62–0.86), 0.79 (CI: 0.63–0.89), and 0.88 (CI: 0.79–0.94). For all readers with a gist AUC significantly better than chance-level, the SVM outperformed DL. The gist signal provided malignancy evidence with no or weak associations with the information captured by humans in normal radiologic reporting, which involves detailed mammogram inspection. Adding gist signal to a state-of-the-art deep learning-based tool improved its performance for the breast cancer detection.
Collapse
Affiliation(s)
- Ziba Gandomkar
- Discipline of Medical Imaging Sciences, Faculty of Medicine and Health, University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, 2006, Australia.
| | - Somphone Siviengphanom
- Discipline of Medical Imaging Sciences, Faculty of Medicine and Health, University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, 2006, Australia
| | - Ernest U Ekpo
- Discipline of Medical Imaging Sciences, Faculty of Medicine and Health, University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, 2006, Australia
| | - Mo'ayyad Suleiman
- Discipline of Medical Imaging Sciences, Faculty of Medicine and Health, University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, 2006, Australia
| | - Seyedamir Tavakoli Taba
- Discipline of Medical Imaging Sciences, Faculty of Medicine and Health, University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, 2006, Australia
| | - Tong Li
- Discipline of Medical Imaging Sciences, Faculty of Medicine and Health, University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, 2006, Australia
| | - Dong Xu
- School of Electrical and Information Engineering, Faculty of Engineering, University of Sydney, Sydney, NSW, 2006, Australia
| | - Karla K Evans
- Department of Psychology, University of York, York, UK
| | - Sarah J Lewis
- Discipline of Medical Imaging Sciences, Faculty of Medicine and Health, University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, 2006, Australia
| | - Jeremy M Wolfe
- Harvard Medical School, Boston, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA
| | - Patrick C Brennan
- Discipline of Medical Imaging Sciences, Faculty of Medicine and Health, University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, 2006, Australia
| |
Collapse
|
7
|
Wolfe JM, Wu CC, Li J, Suresh SB. What do experts look at and what do experts find when reading mammograms? J Med Imaging (Bellingham) 2021; 8:045501. [PMID: 34277890 DOI: 10.1117/1.jmi.8.4.045501] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 06/28/2021] [Indexed: 12/21/2022] Open
Abstract
Purpose: Radiologists sometimes fail to report clearly visible, clinically significant findings. Eye tracking can provide insight into the causes of such errors. Approach: We tracked eye movements of 17 radiologists, searching for masses in 80 mammograms (60 with masses). Results: Errors were classified using the Kundel et al. (1978) taxonomy: search errors (target never fixated), recognition errors (fixated < 500 ms ), or decision errors (fixated > 500 ms ). Error proportions replicated Krupinski (1996): search 25%, recognition 25%, and decision 50%. Interestingly, we found few differences between experts and residents in accuracy or eye movement metrics. Error categorization depends on the definition of the useful field of view (UFOV) around fixation. We explored different UFOV definitions, based on targeting saccades and search saccades. Targeting saccades averaged slightly longer than search saccades. Of most interest, we found that the probability that the eyes would move to the target on the next saccade or even on one of the next three saccades was strikingly low ( ∼ 33 % , even when the eyes were < 2 deg from the target). This makes it clear that observers do not fully process everything within a UFOV. Using a probabilistic UFOV, we find, unsurprisingly, that observers cover more of the image when no target is present than when it is found. Interestingly, we do not find evidence that observers cover too little of the image on trials when they miss the target. Conclusions: These results indicate that many errors in mammography reflect failed deployment of attention; not failure to fixate clinically significant locations.
Collapse
Affiliation(s)
- Jeremy M Wolfe
- Brigham and Women's Hospital, Boston, Massachusetts, United States.,Harvard Medical School, Cambridge, Massachusetts, United States
| | - Chia-Chien Wu
- Brigham and Women's Hospital, Boston, Massachusetts, United States.,Harvard Medical School, Cambridge, Massachusetts, United States
| | - Jonathan Li
- Melbourne Medical School, Melbourne, Victoria, Australia
| | - Sneha B Suresh
- Brigham and Women's Hospital, Boston, Massachusetts, United States
| |
Collapse
|
8
|
Optimizing Radiologic Detection of COVID-19. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
9
|
Gandomkar Z, Brennan PC, Mello-Thoms C. Computer-Assisted Nuclear Atypia Scoring of Breast Cancer: a Preliminary Study. J Digit Imaging 2019; 32:702-712. [PMID: 30719586 PMCID: PMC6737167 DOI: 10.1007/s10278-019-00181-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Inter-pathologist agreement for nuclear atypia scoring of breast cancer is poor. To address this problem, previous studies suggested some criteria for describing the variations appearance of tumor cells relative to normal cells. However, these criteria were still assessed subjectively by pathologists. Previous studies used quantitative computer-extracted features for scoring. However, application of these tools is limited as further improvement in their accuracy is required. This study proposes COMPASS (COMputer-assisted analysis combined with Pathologist's ASSessment) for reproducible nuclear atypia scoring. COMPASS relies on both cytological criteria assessed subjectively by pathologists as well as computer-extracted textural features. Using machine learning, COMPASS combines these two sets of features and output nuclear atypia score. COMPASS's performance was evaluated using 300 images for which expert-consensus derived reference nuclear pleomorphism scores were available, and they were scanned by two scanners from different vendors. A personalized model was built for three pathologists who gave scores to six atypia-related criteria for each image. Leave-one-out cross validation (LOOCV) was used. COMPASS was trained and tested for each pathologist separately. Percentage agreement between COMPASS and the reference nuclear scores was 93.8%, 92.9%, and 93.1% for three pathologists. COMPASS's performance in nuclear grading was almost identical for both scanners, with Cohen's kappa ranging from 0.80 to 0.86 for different pathologists and different scanners. Independently, the images were also assessed by two experienced senior pathologists. Cohen's kappa of COMPASS was comparable to the Cohen's kappa for two senior pathologists (0.79 and 0.68).
Collapse
Affiliation(s)
- Ziba Gandomkar
- Discipline of Medical Imaging and Radiation Sciences, Medical Image Optimisation and Perception Group (MIOPeG), The University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, Australia.
| | - Patrick C Brennan
- Discipline of Medical Imaging and Radiation Sciences, Medical Image Optimisation and Perception Group (MIOPeG), The University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, Australia
| | - Claudia Mello-Thoms
- Discipline of Medical Imaging and Radiation Sciences, Medical Image Optimisation and Perception Group (MIOPeG), The University of Sydney, 512/Block M, Cumberland Campus, Sydney, NSW, Australia
- Carver College of Medicine, Department of Radiology, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
10
|
Abstract
Breast cancer is the most common cancer among females worldwide and large volumes of breast images are produced and interpreted annually. As long as radiologists interpret these images, the diagnostic accuracy will be limited by human factors and both false-positive and false-negative errors might occur. By understanding visual search in breast images, we may be able to identify causes of diagnostic errors, find ways to reduce them, and also provide a better education to radiology residents. Many visual search studies in breast radiology have been devoted to mammography. These studies showed that 70% of missed lesions on mammograms attract radiologists' visual attention and that a plethora of different reasons, such as satisfaction of search, incorrect background sampling, and incorrect first impression can cause diagnostic errors in the interpretation of mammograms. Recently, highly accurate tools, which rely on both eye-tracking data and the content of the mammogram, have been proposed to provide feedback to the radiologists. Improving these tools and determining the optimal pathway to integrate them in the radiology workflow could be a possible line of future research. Moreover, in the past few years deep learning has led to improving diagnostic accuracy of computerized diagnostic tools and visual search studies will be required to understand how radiologists interact with the prompts from these tools, and to identify the best way to utilize them. Visual search in other breast imaging modalities, such as breast ultrasound and digital breast tomosynthesis, have so far received less attention, probably due to associated complexities of eye-tracking monitoring and analysing the data. For example, in digital breast tomosynthesis, scrolling through the image results in longer trials, adds a new factor to the study's complexity and makes calculation of gaze parameters more difficult. However, considering the wide utilization of three-dimensional imaging modalities, more visual search studies involving reading stack-view examinations are required in the future. To conclude, in the past few decades visual search studies provided extensive understanding about underlying reasons for diagnostic errors in breast radiology and characterized differences between experts' and novices' visual search patterns. Further visual search studies are required to investigate radiologists' interaction with relatively newer imaging modalities and artificial intelligence tools.
Collapse
Affiliation(s)
- Ziba Gandomkar
- BreastScreen Reader Assessment Strategy (BREAST), Discipline of Medical Imaging Sciences, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | - Claudia Mello-Thoms
- Department of Radiology, Carver College of Medicine, University of Iowa, Iowa City, IA, US
| |
Collapse
|
11
|
Gandomkar Z, Tay K, Brennan PC, Kozuch E, Mello-Thoms C. Can eye-tracking metrics be used to better pair radiologists in a mammogram reading task? Med Phys 2018; 45:4844-4856. [PMID: 30168153 DOI: 10.1002/mp.13161] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 08/06/2018] [Accepted: 08/10/2018] [Indexed: 11/08/2022] Open
Abstract
PURPOSE To propose a framework for optimal pairing of radiologists when reading mammograms based on their search patterns. MATERIALS AND METHODS Four experienced and four less-experienced radiologists were asked to assess 120 cases (59 with cancers) while their eye positions were tracked. Fourteen eye-tracking metrics were extracted to quantify the differences among radiologists' visual search pattern. For each radiologist and metric, less-experienced radiologists and expert readers were ranked based on the level of similarities in gaze patterns (from the most different to the most similar). Less-experienced readers and experts were also ranked based on the values of area under the receiver operating characteristic curve (AUC) after pairing (the best possible way of ranking). Using the Kendall's tau distance, rankings based on different metrics were compared with the best possible ranking. Using paired Wilcoxon signed-rank test, the AUC values when pairing in the best way were compared with pairing based on different metrics. Finally, we investigated the robustness of pairing strategies against the small sample size. RESULTS For ranking the experienced radiologists, results from eight metrics were as good as the best possible ranking. For the less-experienced ones, only one metric resulted in a ranking comparable to the best possible way of ranking. The AUC values of pairings based on these metrics did not differ significantly from the best pairing scenario. Compared to the pairings based on the cognitive metrics, the ranking based on AUC values varied more greatly with the sample size, suggesting that it is less robust against the small sample size compared to the cognitive metrics. CONCLUSION Different pairings may have different effects on performance; some are detrimental while some improve the performance of the pair. Using the suggested cognitive metrics, we can optimize the pairings even with a small dataset.
Collapse
Affiliation(s)
- Ziba Gandomkar
- Discipline of Medical Imaging and Radiation Sciences, Image Optimisation and Perception Group (MIOPeG), The University of Sydney, Sydney, NSW, Australia
| | - Kevin Tay
- Medical Imaging Department, Prince of Wales Hospital, Randwick, NSW, Australia
| | - Patrick C Brennan
- Discipline of Medical Imaging and Radiation Sciences, Image Optimisation and Perception Group (MIOPeG), The University of Sydney, Sydney, NSW, Australia
| | - Emma Kozuch
- University of Notre Dame, Notre Dame, Indiana, 46556, USA
| | - Claudia Mello-Thoms
- Discipline of Medical Imaging and Radiation Sciences, Image Optimisation and Perception Group (MIOPeG), The University of Sydney, Sydney, NSW, Australia.,Department of Radiology, The University of Iowa, Iowa City, IA, 52242, USA
| |
Collapse
|