1
|
Human attention guided explainable artificial intelligence for computer vision models. Neural Netw 2024; 177:106392. [PMID: 38788290 DOI: 10.1016/j.neunet.2024.106392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 05/11/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024]
Abstract
Explainable artificial intelligence (XAI) has been increasingly investigated to enhance the transparency of black-box artificial intelligence models, promoting better user understanding and trust. Developing an XAI that is faithful to models and plausible to users is both a necessity and a challenge. This work examines whether embedding human attention knowledge into saliency-based XAI methods for computer vision models could enhance their plausibility and faithfulness. Two novel XAI methods for object detection models, namely FullGrad-CAM and FullGrad-CAM++, were first developed to generate object-specific explanations by extending the current gradient-based XAI methods for image classification models. Using human attention as the objective plausibility measure, these methods achieve higher explanation plausibility. Interestingly, all current XAI methods when applied to object detection models generally produce saliency maps that are less faithful to the model than human attention maps from the same object detection task. Accordingly, human attention-guided XAI (HAG-XAI) was proposed to learn from human attention how to best combine explanatory information from the models to enhance explanation plausibility by using trainable activation functions and smoothing kernels to maximize the similarity between XAI saliency map and human attention map. The proposed XAI methods were evaluated on widely used BDD-100K, MS-COCO, and ImageNet datasets and compared with typical gradient-based and perturbation-based XAI methods. Results suggest that HAG-XAI enhanced explanation plausibility and user trust at the expense of faithfulness for image classification models, and it enhanced plausibility, faithfulness, and user trust simultaneously and outperformed existing state-of-the-art XAI methods for object detection models.
Collapse
|
2
|
A hybrid CNN-RNN approach for survival analysis in a Lung Cancer Screening study. Heliyon 2023; 9:e18695. [PMID: 37600411 PMCID: PMC10432611 DOI: 10.1016/j.heliyon.2023.e18695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/18/2023] [Accepted: 07/25/2023] [Indexed: 08/22/2023] Open
Abstract
In this study, we present a hybrid CNN-RNN approach to investigate long-term survival of subjects in a lung cancer screening study. Subjects who died of cardiovascular and respiratory causes were identified whereby the CNN model was used to capture imaging features in the CT scans and the RNN model was used to investigate time series and thus global information. To account for heterogeneity in patients' follow-up times, two different variants of LSTM models were evaluated, each incorporating different strategies to address irregularities in follow-up time. The models were trained on subjects who underwent cardiovascular and respiratory deaths and a control cohort matched to participant age, gender, and smoking history. The combined model can achieve an AUC of 0.76 which outperforms humans at cardiovascular mortality prediction. The corresponding F1 and Matthews Correlation Coefficient are 0.63 and 0.42 respectively. The generalisability of the model is further validated on an 'external' cohort. The same models were applied to survival analysis with the Cox Proportional Hazard model. It was demonstrated that incorporating the follow-up history can lead to improvement in survival prediction. The Cox neural network can achieve an IPCW C-index of 0.75 on the internal dataset and 0.69 on an external dataset. Delineating subjects at increased risk of cardiorespiratory mortality can alert clinicians to request further more detailed functional or imaging studies to improve the assessment of cardiorespiratory disease burden. Such strategies may uncover unsuspected and under-recognised pathologies thereby potentially reducing patient morbidity.
Collapse
|
3
|
Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization. Neural Netw 2023; 161:228-241. [PMID: 36774862 DOI: 10.1016/j.neunet.2023.01.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 11/18/2022] [Accepted: 01/19/2023] [Indexed: 01/26/2023]
Abstract
Although deep Reinforcement Learning (RL) has proven successful in a wide range of tasks, one challenge it faces is interpretability when applied to real-world problems. Saliency maps are frequently used to provide interpretability for deep neural networks. However, in the RL domain, existing saliency map approaches are either computationally expensive and thus cannot satisfy the real-time requirement of real-world scenarios or cannot produce interpretable saliency maps for RL policies. In this work, we propose an approach of Distillation with selective Input Gradient Regularization (DIGR) which uses policy distillation and input gradient regularization to produce new policies that achieve both high interpretability and computation efficiency in generating saliency maps. Our approach is also found to improve the robustness of RL policies to multiple adversarial attacks. We conduct experiments on three tasks, MiniGrid (Fetch Object), Atari (Breakout) and CARLA Autonomous Driving, to demonstrate the importance and effectiveness of our approach.
Collapse
|
4
|
Deep learning-based image deconstruction method with maintained saliency. Neural Netw 2022; 155:224-241. [PMID: 36081196 DOI: 10.1016/j.neunet.2022.08.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 06/30/2022] [Accepted: 08/12/2022] [Indexed: 11/22/2022]
Abstract
Visual properties that primarily attract bottom-up attention are collectively referred to as saliency. In this study, to understand the neural activity involved in top-down and bottom-up visual attention, we aim to prepare pairs of natural and unnatural images with common saliency. For this purpose, we propose an image transformation method based on deep neural networks that can generate new images while maintaining the consistent feature map, in particular the saliency map. This is an ill-posed problem because the transformation from an image to its corresponding feature map could be many-to-one, and in our particular case, the various images would share the same saliency map. Although stochastic image generation has the potential to solve such ill-posed problems, the most existing methods focus on adding diversity of the overall style/touch information while maintaining the naturalness of the generated images. To this end, we developed a new image transformation method that incorporates higher-dimensional latent variables so that the generated images appear unnatural with less context information but retain a high diversity of local image structures. Although such high-dimensional latent spaces are prone to collapse, we proposed a new regularization based on Kullback-Leibler divergence to avoid collapsing the latent distribution. We also conducted human experiments using our newly prepared natural and corresponding unnatural images to measure overt eye movements and functional magnetic resonance imaging, and found that those images induced distinctive neural activities related to top-down and bottom-up attentional processing.
Collapse
|
5
|
Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011-2022). COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107161. [PMID: 36228495 DOI: 10.1016/j.cmpb.2022.107161] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/16/2022] [Accepted: 09/25/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVES Artificial intelligence (AI) has branched out to various applications in healthcare, such as health services management, predictive medicine, clinical decision-making, and patient data and diagnostics. Although AI models have achieved human-like performance, their use is still limited because they are seen as a black box. This lack of trust remains the main reason for their low use in practice, especially in healthcare. Hence, explainable artificial intelligence (XAI) has been introduced as a technique that can provide confidence in the model's prediction by explaining how the prediction is derived, thereby encouraging the use of AI systems in healthcare. The primary goal of this review is to provide areas of healthcare that require more attention from the XAI research community. METHODS Multiple journal databases were thoroughly searched using PRISMA guidelines 2020. Studies that do not appear in Q1 journals, which are highly credible, were excluded. RESULTS In this review, we surveyed 99 Q1 articles covering the following XAI techniques: SHAP, LIME, GradCAM, LRP, Fuzzy classifier, EBM, CBR, rule-based systems, and others. CONCLUSION We discovered that detecting abnormalities in 1D biosignals and identifying key text in clinical notes are areas that require more attention from the XAI research community. We hope this is review will encourage the development of a holistic cloud system for a smart city.
Collapse
|
6
|
SSPNet: An interpretable 3D-CNN for classification of schizophrenia using phase maps of resting-state complex-valued fMRI data. Med Image Anal 2022; 79:102430. [PMID: 35397470 DOI: 10.1016/j.media.2022.102430] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/16/2022] [Accepted: 03/18/2022] [Indexed: 01/05/2023]
Abstract
Convolutional neural networks (CNNs) have shown promising results in classifying individuals with mental disorders such as schizophrenia using resting-state fMRI data. However, complex-valued fMRI data is rarely used since additional phase data introduces high-level noise though it is potentially useful information for the context of classification. As such, we propose to use spatial source phase (SSP) maps derived from complex-valued fMRI data as the CNN input. The SSP maps are not only less noisy, but also more sensitive to spatial activation changes caused by mental disorders than magnitude maps. We build a 3D-CNN framework with two convolutional layers (named SSPNet) to fully explore the 3D structure and voxel-level relationships from the SSP maps. Two interpretability modules, consisting of saliency map generation and gradient-weighted class activation mapping (Grad-CAM), are incorporated into the well-trained SSPNet to provide additional information helpful for understanding the output. Experimental results from classifying schizophrenia patients (SZs) and healthy controls (HCs) show that the proposed SSPNet significantly improved accuracy and AUC compared to CNN using magnitude maps extracted from either magnitude-only (by 23.4 and 23.6% for DMN) or complex-valued fMRI data (by 10.6 and 5.8% for DMN). SSPNet captured more prominent HC-SZ differences in saliency maps, and Grad-CAM localized all contributing brain regions with opposite strengths for HCs and SZs within SSP maps. These results indicate the potential of SSPNet as a sensitive tool that may be useful for the development of brain-based biomarkers of mental disorders.
Collapse
|
7
|
Saliency map-guided hierarchical dense feature aggregation framework for breast lesion classification using ultrasound image. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 215:106612. [PMID: 35033757 DOI: 10.1016/j.cmpb.2021.106612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 11/28/2021] [Accepted: 12/29/2021] [Indexed: 06/14/2023]
Abstract
Deep learning methods, especially convolutional neural networks, have advanced the breast lesion classification task using breast ultrasound (BUS) images. However, constructing a highly-accurate classification model still remains challenging due to complex pattern, relatively-low contrast and fuzzy boundary existing between lesion regions (i.e., foreground) and the surrounding tissues (i.e., background). Few studies have separated foreground and background for learning domain-specific representations, and then fused them for improving performance of models. In this paper, we propose a saliency map-guided hierarchical dense feature aggregation framework for breast lesion classification using BUS images. Specifically, we first generate saliency maps for foreground and background via super-pixel clustering and multi-scale region grouping. Then, a triple-branch network, including two feature extraction branches and a feature aggregation branch, is constructed to learn and fuse discriminative representations under the guidance of priors provided by saliency maps. In particular, two feature extraction branches take the original image and corresponding saliency map as input for extracting foreground- and background-specific representations. Subsequently, a hierarchical feature aggregation branch receives and fuses the features from different stages of two feature extraction branches, for lesion classification in a task-oriented manner. The proposed model was evaluated on three datasets using 5-fold cross validation, and experimental results have demonstrated that it outperforms several state-of-the-art deep learning methods on breast lesion diagnosis using BUS images.
Collapse
|
8
|
Evaluation of reconstructed auricles by convolutional neural networks. J Plast Reconstr Aesthet Surg 2022; 75:2293-2301. [PMID: 35183463 DOI: 10.1016/j.bjps.2022.01.037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 12/25/2021] [Accepted: 01/18/2022] [Indexed: 11/17/2022]
Abstract
The difficulty in determining which structures are crucial to ensure a natural-looking ear has been plaguing surgeons for many years. This preliminary study explores the feasibility of training convolutional neural network (CNN) models to evaluate a reconstructed auricle as accurate as a human would. By visualizing the attention of trained models, the criteria for the design of a natural-looking auricle can be established. A total of 400 pictures were evaluated by 20 volunteers, and 20 labeled datasets were generated, which were then used to train ResNet models that had been pre-trained on ImageNet. The saliency maps and occlusion maps of each trained model were calculated to capture the attention of models. The average accuracy of the 20 models was 0.8245 ± 0.0356 (>0.80), and the evaluation results of the trained model and the medical student showed a significant correlation (P < 0.05). For the attention visualization of auricles labeled as normal, distribution of the highlighted portions corresponded to a linear contour of the helix, the inferior crura of the antihelix, and the contour of the concha. A CNN can provide an evaluation of a reconstructed auricle in a manner similar to that of a medical student. Saliency maps generated by the CNN demonstrate the subjective view, which was consistent with professional opinion.
Collapse
|
9
|
An awareness-dependent mapping of saliency in the human visual system. Neuroimage 2021; 247:118864. [PMID: 34965453 DOI: 10.1016/j.neuroimage.2021.118864] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 12/20/2021] [Accepted: 12/25/2021] [Indexed: 11/25/2022] Open
Abstract
The allocation of exogenously cued spatial attention is governed by a saliency map. Yet, how salience is mapped when multiple salient stimuli are present simultaneously, and how this mapping interacts with awareness remains unclear. These questions were addressed here using either visible or invisible displays presenting two foreground stimuli (whose bars were oriented differently from the bars in the otherwise uniform background): a high salience target and a distractor of varied, lesser salience. Interference, or not, by the distractor with the effective salience of the target served to index a graded or non-graded nature of salience mapping, respectively. The invisible and visible displays were empirically validated by a two-alternative forced choice test (detecting the quadrant of the target) demonstrating subjects' performance at or above chance level, respectively. By combining psychophysics, fMRI, and effective connectivity analysis, we found a graded distribution of salience with awareness, changing to a non-graded distribution without awareness. Crucially, we further revealed that the graded distribution was contingent upon feedback from the posterior intraparietal sulcus (pIPS, especially from the right pIPS), whereas the non-graded distribution was innate to V1. Together, this awareness-dependent mapping of saliency reconciles several previous, seemingly contradictory findings regarding the nature of the saliency map.
Collapse
|
10
|
Statistical modeling of dynamic eye-tracking experiments: Relative importance of visual stimulus elements for gaze behavior in the multi-group case. Behav Res Methods 2021; 53:2650-2667. [PMID: 34027596 PMCID: PMC8613156 DOI: 10.3758/s13428-021-01576-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2021] [Indexed: 11/08/2022]
Abstract
This paper presents a model that allows group comparisons of gaze behavior while watching dynamic video stimuli. The model is based on the approach of Coutrot and Guyader (2017) and allows linear combinations of feature maps to form a master saliency map. The feature maps in the model are, for example, the dynamically salient contents of a video stimulus or predetermined areas of interest. The model takes into account temporal aspects of the stimuli, which is a crucial difference to other common models. The multi-group extension of the model introduced here allows to obtain relative importance plots, which visualize the effect of a specific feature of a stimulus on the attention and visual behavior for two or more experimental groups. These plots are interpretable summaries of data with high spatial and temporal resolution. This approach differs from many common methods for comparing gaze behavior between natural groups, which usually only include single-dimensional features such as the duration of fixation on a particular part of the stimulus. The method is illustrated by contrasting a sample of a group of persons with particularly high cognitive abilities (high achievement on IQ tests) with a control group on a psycholinguistic task on the conceptualization of motion events. In the example, we find no substantive differences in relative importance, but more exploratory gaze behavior in the highly gifted group. The code, videos, and eye-tracking data we used for this study are available online.
Collapse
|
11
|
Atomoxetine modulates the contribution of low-level signals during free viewing of natural images in rhesus monkeys. Neuropharmacology 2020; 182:108377. [PMID: 33137343 DOI: 10.1016/j.neuropharm.2020.108377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 10/16/2020] [Accepted: 10/26/2020] [Indexed: 10/23/2022]
Abstract
Visuo-spatial attentional orienting is fundamental to selectively process behaviorally relevant information, depending on both low-level visual attributes of stimuli in the environment and higher-level factors, such as goals, expectations and prior knowledge. Growing evidence suggests an impact of the locus-cœruleus-norepinephrine (LC-NE) system in attentional orienting that depends on taskcontext. Nonetheless, most of previous studies used visual displays encompassing a target and various distractors, often preceded by cues to orient the attentional focus. This emphasizes the contribution of goal-driven processes, at the expense of other factors related to the stimulus content. Here, we aimed to determine the impact of NE on attentional orienting in more naturalistic conditions, using complex images and without any explicit task manipulation. We tested the effects of atomoxetine (ATX) injections, a NE reuptake inhibitor, on four monkeys during free viewing of images belonging to three categories: landscapes, monkey faces and scrambled images. Analyses of the gaze exploration patterns revealed, first, that the monkeys spent more time on each fixation under ATX compared to the control condition, regard less of the image content. Second, we found that, depending on the image content, ATX modulated the impact of low-level visual salience on attentional orienting. This effect correlated with the effect of ATX on the number and duration of fixations. Taken together, our results demonstrate that ATX adjusts the contribution of salience on attentional orienting depending on the image content, indicative of its role in balancing the role of stimulus-driven and top-down control during free viewing of complex stimuli.
Collapse
|
12
|
Classification of schizophrenia and normal controls using 3D convolutional neural network and outcome visualization. Schizophr Res 2019; 212:186-195. [PMID: 31395487 DOI: 10.1016/j.schres.2019.07.034] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 05/16/2019] [Accepted: 07/21/2019] [Indexed: 02/07/2023]
Abstract
BACKGROUND The recent deep learning-based studies on the classification of schizophrenia (SCZ) using MRI data rely on manual extraction of feature vector, which destroys the 3D structure of MRI data. In order to both identify SCZ and find relevant biomarkers, preserving the 3D structure in classification pipeline is critical. OBJECTIVES The present study investigated whether the proposed 3D convolutional neural network (CNN) model produces higher accuracy compared to the support vector machine (SVM) and other 3D-CNN models in distinguishing individuals with SCZ spectrum disorders (SSDs) from healthy controls. We sought to construct saliency map using class saliency visualization (CSV) method. METHODS Task-based fMRI data were obtained from 103 patients with SSDs and 41 normal controls. To preserve spatial locality, we used 3D activation map as input for the 3D convolutional autoencoder (3D-CAE)-based CNN model. Data on 62 patients with SSDs were used for unsupervised pretraining with 3D-CAE. Data on the remaining 41 patients and 41 normal controls were processed for training and testing with CNN. The performance of our model was analyzed and compared with SVM and other 3D-CNN models. The learned CNN model was visualized using CSV method. RESULTS Using task-based fMRI data, our model achieved 84.15%∼84.43% classification accuracies, outperforming SVM and other 3D-CNN models. The inferior and middle temporal lobes were identified as key regions for classification. CONCLUSIONS Our findings suggest that the proposed 3D-CAE-based CNN can classify patients with SSDs and controls with higher accuracy compared to other models. Visualization of salient regions provides important clinical information.
Collapse
|
13
|
Plant disease identification using explainable 3D deep learning on hyperspectral images. PLANT METHODS 2019; 15:98. [PMID: 31452674 PMCID: PMC6702735 DOI: 10.1186/s13007-019-0479-8] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 08/06/2019] [Indexed: 05/20/2023]
Abstract
BACKGROUND Hyperspectral imaging is emerging as a promising approach for plant disease identification. The large and possibly redundant information contained in hyperspectral data cubes makes deep learning based identification of plant diseases a natural fit. Here, we deploy a novel 3D deep convolutional neural network (DCNN) that directly assimilates the hyperspectral data. Furthermore, we interrogate the learnt model to produce physiologically meaningful explanations. We focus on an economically important disease, charcoal rot, which is a soil borne fungal disease that affects the yield of soybean crops worldwide. RESULTS Based on hyperspectral imaging of inoculated and mock-inoculated stem images, our 3D DCNN has a classification accuracy of 95.73% and an infected class F1 score of 0.87. Using the concept of a saliency map, we visualize the most sensitive pixel locations, and show that the spatial regions with visible disease symptoms are overwhelmingly chosen by the model for classification. We also find that the most sensitive wavelengths used by the model for classification are in the near infrared region (NIR), which is also the commonly used spectral range for determining the vegetative health of a plant. CONCLUSION The use of an explainable deep learning model not only provides high accuracy, but also provides physiological insight into model predictions, thus generating confidence in model predictions. These explained predictions lend themselves for eventual use in precision agriculture and research application using automated phenotyping platforms.
Collapse
|
14
|
Saliency model based on a neural population for integrating figure direction and organizing Border Ownership. Neural Netw 2018; 110:33-46. [PMID: 30481686 DOI: 10.1016/j.neunet.2018.10.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2018] [Revised: 08/31/2018] [Accepted: 10/30/2018] [Indexed: 11/24/2022]
Abstract
Attentional selection is a function of the brain that allocates computational resources momentarily to the most important part of a visual scene. Saliency map models have been used to predict the location of attentional selection and gaze. Border Ownership (BO) indicates the direction of the figure with respect to the border. I here propose a biologically plausible saliency model based on neural population for integrating the activities of intermediate-level visual areas with neurons selective to BO. A variety of BO organizations produces a population of model neurons that represent the grouping structure. In the model I propose, the interactions and the population responses of these model neurons underlie the determination of saliency and the accurate prediction of gaze location. I tested 100 patterns for BO organizations and found that the proposed saliency model not only reproduced the characteristics of perceptual organization but also captured object locations in natural images. Furthermore, the saliency model based on the population responses of the BO organization significantly improved the gaze prediction accuracy compared with previous saliency-based models. These results suggest a crucial role for a wide variety of BO organizations and neural population coding to determine saliency mediating attentional selection and to predict gaze location.
Collapse
|
15
|
Attentive pointing in natural scenes correlates with other measures of attention. Vision Res 2017; 135:54-64. [PMID: 28427890 PMCID: PMC5488873 DOI: 10.1016/j.visres.2017.04.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 03/13/2017] [Accepted: 04/06/2017] [Indexed: 10/19/2022]
Abstract
Finger pointing is a natural human behavior frequently used to draw attention to specific parts of sensory input. Since this pointing behavior is likely preceded and/or accompanied by the deployment of attention by the pointing person, we hypothesize that pointing can be used as a natural means of providing self-reports of attention and, in the case of visual input, visual salience. We here introduce a new method for assessing attentional choice by asking subjects to point to and tap the first place they look at on an image appearing on an electronic tablet screen. Our findings show that the tap data are well-correlated with other measures of attention, including eye fixations and selections of interesting image points, as well as with predictions of a saliency map model. We also develop an analysis method for comparing attentional maps (including fixations, reported points of interest, finger pointing, and computed salience) that takes into account the error in estimating those maps from a finite number of data points. This analysis strengthens our original findings by showing that the measured correlation between attentional maps drawn from identical underlying processes is systematically underestimated. The underestimation is strongest when the number of samples is small but it is always present. Our analysis method is not limited to data from attentional paradigms but, instead, it is broadly applicable to measures of similarity made between counts of multinomial data or probability distributions.
Collapse
|
16
|
A Computer-Aided Diagnosis System for Measuring Carotid Artery Intima-Media Thickness (IMT) Using Quaternion Vectors. J Med Syst 2016; 40:149. [PMID: 27137786 DOI: 10.1007/s10916-016-0507-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 04/19/2016] [Indexed: 10/21/2022]
Abstract
This study aims investigating adjustable distant fuzzy c-means segmentation on carotid Doppler images, as well as quaternion-based convolution filters and saliency mapping procedures. We developed imaging software that will simplify the measurement of carotid artery intima-media thickness (IMT) on saliency mapping images. Additionally, specialists evaluated the present images and compared them with saliency mapping images. In the present research, we conducted imaging studies of 25 carotid Doppler images obtained by the Department of Cardiology at Fırat University. After implementing fuzzy c-means segmentation and quaternion-based convolution on all Doppler images, we obtained a model that can be analyzed easily by the doctors using a bottom-up saliency model. These methods were applied to 25 carotid Doppler images and then interpreted by specialists. In the present study, we used color-filtering methods to obtain carotid color images. Saliency mapping was performed on the obtained images, and the carotid artery IMT was detected and interpreted on the obtained images from both methods and the raw images are shown in Results. Also these results were investigated by using Mean Square Error (MSE) for the raw IMT images and the method which gives the best performance is the Quaternion Based Saliency Mapping (QBSM). 0,0014 and 0,000191 mm(2) MSEs were obtained for artery lumen diameters and plaque diameters in carotid arteries respectively. We found that computer-based image processing methods used on carotid Doppler could aid doctors' in their decision-making process. We developed software that could ease the process of measuring carotid IMT for cardiologists and help them to evaluate their findings.
Collapse
|
17
|
Predicting the eye fixation locations in the gray scale images in the visual scenes with different semantic contents. Cogn Neurodyn 2016; 10:31-47. [PMID: 26834860 DOI: 10.1007/s11571-015-9357-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Revised: 08/22/2015] [Accepted: 09/30/2015] [Indexed: 10/23/2022] Open
Abstract
In recent years, there has been considerable interest in visual attention models (saliency map of visual attention). These models can be used to predict eye fixation locations, and thus will have many applications in various fields which leads to obtain better performance in machine vision systems. Most of these models need to be improved because they are based on bottom-up computation that does not consider top-down image semantic contents and often does not match actual eye fixation locations. In this study, we recorded the eye movements (i.e., fixations) of fourteen individuals who viewed images which consist natural (e.g., landscape, animal) and man-made (e.g., building, vehicles) scenes. We extracted the fixation locations of eye movements in two image categories. After extraction of the fixation areas (a patch around each fixation location), characteristics of these areas were evaluated as compared to non-fixation areas. The extracted features in each patch included the orientation and spatial frequency. After feature extraction phase, different statistical classifiers were trained for prediction of eye fixation locations by these features. This study connects eye-tracking results to automatic prediction of saliency regions of the images. The results showed that it is possible to predict the eye fixation locations by using of the image patches around subjects' fixation points.
Collapse
|
18
|
A proto-object based saliency model in three-dimensional space. Vision Res 2016; 119:42-9. [PMID: 26739278 DOI: 10.1016/j.visres.2015.12.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 12/16/2015] [Accepted: 12/20/2015] [Indexed: 10/22/2022]
Abstract
Most models of visual saliency operate on two-dimensional images, using elementary image features such as intensity, color, or orientation. The human visual system, however, needs to function in complex three-dimensional environments, where depth information is often available and may be used to guide the bottom-up attentional selection process. In this report we extend a model of proto-object based saliency to include depth information and evaluate its performance on three separate three-dimensional eye tracking datasets. Our results show that the additional depth information provides a small, but statistically significant, improvement in the model's ability to predict perceptual saliency (eye fixations) in natural scenes. The computational mechanisms of our model have direct neural correlates, and our results provide further evidence that proto-objects help to establish perceptual organization of the scene.
Collapse
|
19
|
Saliency-based gaze prediction based on head direction. Vision Res 2015; 117:59-66. [PMID: 26475088 DOI: 10.1016/j.visres.2015.10.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2015] [Revised: 10/01/2015] [Accepted: 10/05/2015] [Indexed: 11/23/2022]
Abstract
Despite decades of attempts to create a model for predicting gaze locations by using saliency maps, a highly accurate gaze prediction model for general conditions has yet to be devised. In this study, we propose a gaze prediction method based on head direction that can improve the accuracy of any model. We used a probability distribution of eye position based on head direction (static eye-head coordination) and added this information to a model of saliency-based visual attention. Using empirical data on eye and head directions while observers were viewing natural scenes, we estimated a probability distribution of eye position. We then combined the relationship between eye position and head direction with visual saliency to predict gaze locations. The model showed that information on head direction improved the prediction accuracy. Further, there was no difference in the gaze prediction accuracy between the two models using information on head direction with and without eye-head coordination. Therefore, information on head direction is useful for predicting gaze location when it is available. Furthermore, this gaze prediction model can be applied relatively easily to many daily situations such as during walking.
Collapse
|
20
|
Airport detection in remote sensing images: a method based on saliency map. Cogn Neurodyn 2013; 7:143-54. [PMID: 24427198 PMCID: PMC3595433 DOI: 10.1007/s11571-012-9223-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Revised: 09/17/2012] [Accepted: 09/19/2012] [Indexed: 10/27/2022] Open
Abstract
The detection of airport attracts lots of attention and becomes a hot topic recently because of its applications and importance in military and civil aviation fields. However, the complicated background around airports brings much difficulty into the detection. This paper presents a new method for airport detection in remote sensing images. Distinct from other methods which analyze images pixel by pixel, we introduce visual attention mechanism into detection of airport and improve the efficiency of detection greatly. Firstly, Hough transform is used to judge whether an airport exists in an image. Then an improved graph-based visual saliency model is applied to compute the saliency map and extract regions of interest (ROIs). The airport target is finally detected according to the scale-invariant feature transform features which are extracted from each ROI and classified by hierarchical discriminant regression tree. Experimental results show that the proposed method is faster and more accurate than existing methods, and has lower false alarm rate and better anti-noise performance simultaneously.
Collapse
|
21
|
Bottom-up attention: pulsed PCA transform and pulsed cosine transform. Cogn Neurodyn 2011; 5:321-32. [PMID: 23115590 PMCID: PMC3193976 DOI: 10.1007/s11571-011-9155-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2010] [Revised: 04/05/2011] [Accepted: 04/08/2011] [Indexed: 11/25/2022] Open
Abstract
In this paper we propose a computational model of bottom-up visual attention based on a pulsed principal component analysis (PCA) transform, which simply exploits the signs of the PCA coefficients to generate spatial and motional saliency. We further extend the pulsed PCA transform to a pulsed cosine transform that is not only data-independent but also very fast in computation. The proposed model has the following biological plausibilities. First, the PCA projection vectors in the model can be obtained by using the Hebbian rule in neural networks. Second, the outputs of the pulsed PCA transform, which are inherently binary, simulate the neuronal pulses in the human brain. Third, like many Fourier transform-based approaches, our model also accomplishes the cortical center-surround suppression in frequency domain. Experimental results on psychophysical patterns and natural images show that the proposed model is more effective in saliency detection and predict human eye fixations better than the state-of-the-art attention models.
Collapse
|
22
|
Visual saliency: a biologically plausible contourlet-like frequency domain approach. Cogn Neurodyn 2010; 4:189-98. [PMID: 21886671 DOI: 10.1007/s11571-010-9122-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2010] [Revised: 06/17/2010] [Accepted: 06/17/2010] [Indexed: 10/19/2022] Open
Abstract
In this paper we propose a fast frequency domain saliency detection method that is also biologically plausible, referred to as frequency domain divisive normalization (FDN). We show that the initial feature extraction stage, common to all spatial domain approaches, can be simplified to a Fourier transform with a contourlet-like grouping of coefficients, and saliency detection can be achieved in frequency domain. Specifically, we show that divisive normalization, a model of cortical surround inhibition, can be conducted in frequency domain. Since Fourier coefficients are global in space, we extend to this model by conducting piecewise FDN (PFDN) using overlapping local patches to provide better biological plausibility. Not only do FDN and PFDN outperform current state-of-the-art methods in eye fixation prediction, they are also faster. Speed and simplicity are advantages of our frequency domain approach, and its biological plausibility is the main contribution of our paper.
Collapse
|