1
|
Jiao J, Alsharid M, Drukker L, Papageorghiou AT, Zisserman A, Noble JA. Audio-visual modelling in a clinical setting. Sci Rep 2024; 14:15569. [PMID: 38971838 PMCID: PMC11227581 DOI: 10.1038/s41598-024-66160-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 06/27/2024] [Indexed: 07/08/2024] Open
Abstract
Auditory and visual signals are two primary perception modalities that are usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals-usually speech audio. In this study, we consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations that benefit various clinical tasks, without relying on dense supervisory annotations from human experts for the model training. A simple yet effective multi-modal self-supervised learning framework is presented for this purpose. The proposed approach is able to help find standard anatomical planes, predict the focusing position of sonographer's eyes, and localise anatomical regions of interest during ultrasound imaging. Experimental analysis on a large-scale clinical multi-modal ultrasound video dataset show that the proposed novel representation learning method provides good transferable anatomical representations that boost the performance of automated downstream clinical tasks, even outperforming fully-supervised solutions. Being able to learn such medical representations in a self-supervised manner will contribute to several aspects including a better understanding of obstetric imaging, training new sonographers, more effective assistive tools for human experts, and enhancement of the clinical workflow.
Collapse
Affiliation(s)
- Jianbo Jiao
- Department of Engineering Science, University of Oxford, Oxford, UK.
- School of Computer Science, University of Birmingham, Birmingham, UK.
| | - Mohammad Alsharid
- Department of Engineering Science, University of Oxford, Oxford, UK
- Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Lior Drukker
- Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK
- Rabin Medical Center, Tel-Aviv University Faculty of Medicine, Tel Aviv, Israel
| | - Aris T Papageorghiou
- Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK
| | - Andrew Zisserman
- Department of Engineering Science, University of Oxford, Oxford, UK
| | - J Alison Noble
- Department of Engineering Science, University of Oxford, Oxford, UK.
| |
Collapse
|
2
|
Li X, Zhao H, Wu D, Liu Q, Tang R, Li L, Xu Z, Lyu X. SLMFNet: Enhancing land cover classification of remote sensing images through selective attentions and multi-level feature fusion. PLoS One 2024; 19:e0301134. [PMID: 38743645 PMCID: PMC11093330 DOI: 10.1371/journal.pone.0301134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 03/08/2024] [Indexed: 05/16/2024] Open
Abstract
Land cover classification (LCC) is of paramount importance for assessing environmental changes in remote sensing images (RSIs) as it involves assigning categorical labels to ground objects. The growing availability of multi-source RSIs presents an opportunity for intelligent LCC through semantic segmentation, offering a comprehensive understanding of ground objects. Nonetheless, the heterogeneous appearances of terrains and objects contribute to significant intra-class variance and inter-class similarity at various scales, adding complexity to this task. In response, we introduce SLMFNet, an innovative encoder-decoder segmentation network that adeptly addresses this challenge. To mitigate the sparse and imbalanced distribution of RSIs, we incorporate selective attention modules (SAMs) aimed at enhancing the distinguishability of learned representations by integrating contextual affinities within spatial and channel domains through a compact number of matrix operations. Precisely, the selective position attention module (SPAM) employs spatial pyramid pooling (SPP) to resample feature anchors and compute contextual affinities. In tandem, the selective channel attention module (SCAM) concentrates on capturing channel-wise affinity. Initially, feature maps are aggregated into fewer channels, followed by the generation of pairwise channel attention maps between the aggregated channels and all channels. To harness fine-grained details across multiple scales, we introduce a multi-level feature fusion decoder with data-dependent upsampling (MLFD) to meticulously recover and merge feature maps at diverse scales using a trainable projection matrix. Empirical results on the ISPRS Potsdam and DeepGlobe datasets underscore the superior performance of SLMFNet compared to various state-of-the-art methods. Ablation studies affirm the efficacy and precision of SAMs in the proposed model.
Collapse
Affiliation(s)
- Xin Li
- College of Computer and Information, Hohai University, Nanjing, Jiangsu, China
| | - Hejing Zhao
- Water History Department, China Institute of Water Resources and Hydropower Research, Beijing, China
- Research Center on Flood and Drought Disaster Reduction of Ministry of Water Resource, China Institute of Water Resources and Hydropower Research, Beijing, China
| | - Dan Wu
- Information Engineering Center, Yellow River Institute of Hydraulic Research, Yellow River Conservancy Commission of the Ministry of Water Resources, Zhengzhou, Henan, China
- Key Laboratory of Yellow River Sediment Research, MWR (Ministry of Water Resources), Zhengzhou, Henan, China
- Henan Engineering Research Center of Smart Water Conservancy, Yellow River Institute of Hydraulic Research, Zhengzhou, Henan, China
| | - Qixing Liu
- Information Engineering Center, Yellow River Institute of Hydraulic Research, Yellow River Conservancy Commission of the Ministry of Water Resources, Zhengzhou, Henan, China
- Key Laboratory of Yellow River Sediment Research, MWR (Ministry of Water Resources), Zhengzhou, Henan, China
- Henan Engineering Research Center of Smart Water Conservancy, Yellow River Institute of Hydraulic Research, Zhengzhou, Henan, China
| | - Rui Tang
- Department of Orthopedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Linyang Li
- School of Geodesy and Geomatics, Wuhan University, Wuhan, Hubei, China
| | - Zhennan Xu
- College of Computer and Information, Hohai University, Nanjing, Jiangsu, China
| | - Xin Lyu
- College of Computer and Information, Hohai University, Nanjing, Jiangsu, China
| |
Collapse
|
3
|
Liu X, Wang L. MSRMNet: Multi-scale skip residual and multi-mixed features network for salient object detection. Neural Netw 2024; 173:106144. [PMID: 38335792 DOI: 10.1016/j.neunet.2024.106144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 12/08/2023] [Accepted: 01/22/2024] [Indexed: 02/12/2024]
Abstract
The current models for the salient object detection (SOD) have made remarkable progress through multi-scale feature fusion strategies. However, the existing models have large deviations in the detection of different scales, and the target boundaries of the prediction images are still blurred. In this paper, we propose a new model addressing these issues using a transformer backbone to capture multiple feature layers. The model uses multi-scale skip residual connections during encoding to improve the accuracy of the model's predicted object position and edge pixel information. Furthermore, to extract richer multi-scale semantic information, we perform multiple mixed feature operations in the decoding stage. In addition, we add the structure similarity index measure (SSIM) function with coefficients in the loss function to enhance the accurate prediction performance of the boundaries. Experiments demonstrate that our algorithm achieves state-of-the-art results on five public datasets, and improves the performance metrics of the existing SOD tasks. Codes and results are available at: https://github.com/xxwudi508/MSRMNet.
Collapse
Affiliation(s)
- Xinlong Liu
- Sun Yat-Sen University, Guangzhou 510275, China.
| | - Luping Wang
- Sun Yat-Sen University, Guangzhou 510275, China.
| |
Collapse
|
4
|
Vallée R, Gomez T, Bourreille A, Normand N, Mouchère H, Coutrot A. Influence of training and expertise on deep neural network attention and human attention during a medical image classification task. J Vis 2024; 24:6. [PMID: 38587421 PMCID: PMC11008746 DOI: 10.1167/jov.24.4.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 11/19/2023] [Indexed: 04/09/2024] Open
Abstract
In many different domains, experts can make complex decisions after glancing very briefly at an image. However, the perceptual mechanisms underlying expert performance are still largely unknown. Recently, several machine learning algorithms have been shown to outperform human experts in specific tasks. But these algorithms often behave as black boxes and their information processing pipeline remains unknown. This lack of transparency and interpretability is highly problematic in applications involving human lives, such as health care. One way to "open the black box" is to compute an artificial attention map from the model, which highlights the pixels of the input image that contributed the most to the model decision. In this work, we directly compare human visual attention to machine visual attention when performing the same visual task. We have designed a medical diagnosis task involving the detection of lesions in small bowel endoscopic images. We collected eye movements from novices and gastroenterologist experts while they classified medical images according to their relevance for Crohn's disease diagnosis. We trained three state-of-the-art deep learning models on our carefully labeled dataset. Both humans and machine performed the same task. We extracted artificial attention with six different post hoc methods. We show that the model attention maps are significantly closer to human expert attention maps than to novices', especially for pathological images. As the model gets trained and its performance gets closer to the human experts, the similarity between model and human attention increases. Through the understanding of the similarities between the visual decision-making process of human experts and deep neural networks, we hope to inform both the training of new doctors and the architecture of new algorithms.
Collapse
Affiliation(s)
- Rémi Vallée
- Nantes Université, Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
| | - Tristan Gomez
- Nantes Université, Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
| | - Arnaud Bourreille
- CHU Nantes, Institut des Maladies de l'Appareil Digestif, CIC Inserm 1413, Université de Nantes, Nantes, France
| | - Nicolas Normand
- Nantes Université, Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
| | - Harold Mouchère
- Nantes Université, Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
| | - Antoine Coutrot
- Nantes Université, Ecole Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
- Univ Lyon, CNRS, INSA Lyon, UCBL, LIRIS, UMR5205, Lyon, France
| |
Collapse
|
5
|
Martinez-Cedillo AP, Foulsham T. Don't look now! Social elements are harder to avoid during scene viewing. Vision Res 2024; 216:108356. [PMID: 38184917 DOI: 10.1016/j.visres.2023.108356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/09/2023] [Accepted: 12/28/2023] [Indexed: 01/09/2024]
Abstract
Regions of social importance (i.e., other people) attract attention in real world scenes, but it is unclear how automatic this bias is and how it might interact with other guidance factors. To investigate this, we recorded eye movements while participants were explicitly instructed to avoid looking at one of two objects in a scene (either a person or a non-social object). The results showed that, while participants could follow these instructions, they still made errors (especially on the first saccade). Crucially, there were about twice as many erroneous looks towards the person than there were towards the other object. This indicates that it is hard to suppress the prioritization of social information during scene viewing, with implications for how quickly and automatically this information is perceived and attended to.
Collapse
Affiliation(s)
- A P Martinez-Cedillo
- Department of Psychology, University of York, York YO10 5DD, England; Department of Psychology, University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, England.
| | - T Foulsham
- Department of Psychology, University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, England
| |
Collapse
|
6
|
Sun L, Francis DJ, Nagai Y, Yoshida H. Early development of saliency-driven attention through object manipulation. Acta Psychol (Amst) 2024; 243:104124. [PMID: 38232506 DOI: 10.1016/j.actpsy.2024.104124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 12/30/2023] [Accepted: 01/02/2024] [Indexed: 01/19/2024] Open
Abstract
In the first years of life, infants progressively develop attention selection skills to gather information from visually clustered environments. As young as newborns, infants are sensitive to the distinguished differences in color, orientation, and luminance, which are the components of visual saliency. However, we know little about how saliency-driven attention emerges and develops socially through everyday free-viewing experiences. The present work assessed the saliency change in infants' egocentric scenes and investigated the impacts of manual engagements on infant object looking in the interactive context of object play. Thirty parent-infant dyads, including infants in two age groups (younger: 3- to 6-month-old; older: 9- to 12-month-old), completed a brief session of object play. Infants' looking behaviors were recorded by the head-mounted eye-tracking gear, and both parents' and infants' manual actions on objects were annotated separately for analyses. The present findings revealed distinct attention mechanisms that underlie the hand-eye coordination between parents and infants and within infants during object play: younger infants are predominantly biased toward the characteristics of the visual saliency accompanying the parent's handled actions on the objects; on the other hand, older infants gradually employed more attention to the object, regardless of the saliency in view, as they gained more self-generated manual actions. Taken together, the present work highlights the tight coordination between visual experiences and sensorimotor competence and proposes a novel dyadic pathway to sustained attention that social sensitivity to parents' hands emerges through saliency-driven attention, preparing infants to focus, follow, and steadily track moving targets in free-flow viewing activities.
Collapse
Affiliation(s)
- Lichao Sun
- Department of Psychology, University of Houston, TX, United States.
| | - David J Francis
- Texas Institute for Measurement, Evaluation, and Statistics, University of Houston, TX, United States.
| | - Yukie Nagai
- International Research Center for Neurointelligence, University of Tokyo, Tokyo, Japan.
| | - Hanako Yoshida
- Department of Psychology, University of Houston, TX, United States.
| |
Collapse
|
7
|
Azadi R, Lopez E, Taubert J, Patterson A, Afraz A. Inactivation of face-selective neurons alters eye movements when free viewing faces. Proc Natl Acad Sci U S A 2024; 121:e2309906121. [PMID: 38198528 PMCID: PMC10801883 DOI: 10.1073/pnas.2309906121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 10/06/2023] [Indexed: 01/12/2024] Open
Abstract
During free viewing, faces attract gaze and induce specific fixation patterns corresponding to the facial features. This suggests that neurons encoding the facial features are in the causal chain that steers the eyes. However, there is no physiological evidence to support a mechanistic link between face-encoding neurons in high-level visual areas and the oculomotor system. In this study, we targeted the middle face patches of the inferior temporal (IT) cortex in two macaque monkeys using an functional magnetic resonance imaging (fMRI) localizer. We then utilized muscimol microinjection to unilaterally suppress IT neural activity inside and outside the face patches and recorded eye movements while the animals free viewing natural scenes. Inactivation of the face-selective neurons altered the pattern of eye movements on faces: The monkeys found faces in the scene but neglected the eye contralateral to the inactivation hemisphere. These findings reveal the causal contribution of the high-level visual cortex in eye movements.
Collapse
Affiliation(s)
- Reza Azadi
- Unit on Neurons, Circuits and Behavior, Laboratory of Neuropsychology, National Institute of Mental Health, NIH, Bethesda, MD20892
| | - Emily Lopez
- Unit on Neurons, Circuits and Behavior, Laboratory of Neuropsychology, National Institute of Mental Health, NIH, Bethesda, MD20892
| | - Jessica Taubert
- Section on Neurocircuitry, Laboratory of Brain and Cognition, National Institute of Mental Health, NIH, Bethesda, MD20892
- School of Psychology, The University of Queensland, Brisbane, QLD4072, Australia
| | - Amanda Patterson
- Section on Neurocircuitry, Laboratory of Brain and Cognition, National Institute of Mental Health, NIH, Bethesda, MD20892
| | - Arash Afraz
- Unit on Neurons, Circuits and Behavior, Laboratory of Neuropsychology, National Institute of Mental Health, NIH, Bethesda, MD20892
| |
Collapse
|
8
|
Stolte M, Kraus L, Ansorge U. Visual attentional guidance during smooth pursuit eye movements: Distractor interference is independent of distractor-target similarity. Psychophysiology 2023; 60:e14384. [PMID: 37431573 DOI: 10.1111/psyp.14384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 05/31/2023] [Accepted: 06/26/2023] [Indexed: 07/12/2023]
Abstract
In the current study, we used abrupt-onset distractors similar and dissimilar in luminance to the target of a smooth pursuit eye-movement to test if abrupt-onset distractors capture attention in a top-down or bottom-up fashion while the eyes track a moving object. Abrupt onset distractors were presented at different positions relative to the current position of a pursuit target during the closed-loop phase of smooth pursuit. Across experiments, we varied the duration of the distractors, their motion direction, and task-relevance. We found that abrupt-onset distractors decreased the gain of horizontally directed smooth-pursuit eye-movements. This effect, however, was independent of the similarity in luminance between distractor and target. In addition, distracting effects on horizontal gain were the same, regardless of the exact duration and position of the distractors, suggesting that capture was relatively unspecific and short-lived (Experiments 1 and 2). This was different with distractors moving in a vertical direction, perpendicular to the horizontally moving target. In line with past findings, these distractors caused suppression of vertical gain (Experiment 3). Finally, making distractors task-relevant by asking observers to report distractor positions increased the pursuit gain effect of the distractors. This effect was also independent of target-distractor similarity (Experiment 4). In conclusion, the results suggest that a strong location signal exerted by the pursuit targets led to very brief and largely location-unspecific interference through the abrupt onsets and that this interference was bottom-up, implying that the control of smooth pursuit was independent of other target features besides its motion signal.
Collapse
Affiliation(s)
- Moritz Stolte
- Department of Cognition, Emotion, and Methods in Psychology, University of Vienna, Vienna, Austria
| | - Leon Kraus
- Department of Cognition, Emotion, and Methods in Psychology, University of Vienna, Vienna, Austria
| | - Ulrich Ansorge
- Department of Cognition, Emotion, and Methods in Psychology, University of Vienna, Vienna, Austria
- Vienna Cognitive Science Hub, University of Vienna, Vienna, Austria
- Research Platform Mediatised Lifeworlds, University of Vienna, Vienna, Austria
| |
Collapse
|
9
|
Zou J, Zhang Y, Li J, Tian X, Ding N. Human attention during goal-directed reading comprehension relies on task optimization. eLife 2023; 12:RP87197. [PMID: 38032825 PMCID: PMC10688971 DOI: 10.7554/elife.87197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023] Open
Abstract
The computational principles underlying attention allocation in complex goal-directed tasks remain elusive. Goal-directed reading, that is, reading a passage to answer a question in mind, is a common real-world task that strongly engages attention. Here, we investigate what computational models can explain attention distribution in this complex task. We show that the reading time on each word is predicted by the attention weights in transformer-based deep neural networks (DNNs) optimized to perform the same reading task. Eye tracking further reveals that readers separately attend to basic text features and question-relevant information during first-pass reading and rereading, respectively. Similarly, text features and question relevance separately modulate attention weights in shallow and deep DNN layers. Furthermore, when readers scan a passage without a question in mind, their reading time is predicted by DNNs optimized for a word prediction task. Therefore, we offer a computational account of how task optimization modulates attention distribution during real-world reading.
Collapse
Affiliation(s)
- Jiajie Zou
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang UniversityHangzhouChina
- Nanhu Brain-computer Interface InstituteHangzhouChina
| | - Yuran Zhang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang UniversityHangzhouChina
| | - Jialu Li
- Division of Arts and Sciences, New York University ShanghaiShanghaiChina
| | - Xing Tian
- Division of Arts and Sciences, New York University ShanghaiShanghaiChina
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang UniversityHangzhouChina
- Nanhu Brain-computer Interface InstituteHangzhouChina
| |
Collapse
|
10
|
Uddin A, Tao X, Yu D. Attention based dynamic graph neural network for asset pricing. GLOBAL FINANCE JOURNAL 2023; 58:100900. [PMID: 37908899 PMCID: PMC10614642 DOI: 10.1016/j.gfj.2023.100900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Recent studies suggest that networks among firms (sectors) play a vital role in asset pricing. This paper investigates these implications and develops a novel end-to-end graph neural network model for asset pricing by combining and modifying two state-of-the-art machine learning techniques. First, we apply the graph attention mechanism to learn dynamic network structures of the equity market over time and then use a recurrent convolutional neural network to diffuse and propagate firms' information into the learned networks. This novel approach allows us to model the implications of networks along with the characteristics of the dynamic comovement of asset prices. The results demonstrate the effectiveness of our proposed model in both predicting returns and improving portfolio performance. Our approach demonstrates persistent performance in different sensitivity tests and simulated data. We also show that the dynamic network learned from our proposed model captures major market events over time. Our model is highly effective in recognizing the network structure in the market and predicting equity returns and provides valuable market information to regulators and investors.
Collapse
Affiliation(s)
- Ajim Uddin
- Martin Tuchman School of Management, New Jersey Institute of Technology, 323 Dr Martin Luther King Jr Blvd, Newark, NJ 07102, USA
| | - Xinyuan Tao
- Martin Tuchman School of Management, New Jersey Institute of Technology, 323 Dr Martin Luther King Jr Blvd, Newark, NJ 07102, USA
| | - Dantong Yu
- Martin Tuchman School of Management, New Jersey Institute of Technology, 323 Dr Martin Luther King Jr Blvd, Newark, NJ 07102, USA
| |
Collapse
|
11
|
Entzmann L, Guyader N, Kauffmann L, Peyrin C, Mermillod M. Detection of emotional faces: The role of spatial frequencies and local features. Vision Res 2023; 211:108281. [PMID: 37421829 DOI: 10.1016/j.visres.2023.108281] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 06/18/2023] [Accepted: 06/28/2023] [Indexed: 07/10/2023]
Abstract
Models of emotion processing suggest that threat-related stimuli such as fearful faces can be detected based on the rapid extraction of low spatial frequencies. However, this remains debated as other models argue that the decoding of facial expressions occurs with a more flexible use of spatial frequencies. The purpose of this study was to clarify the role of spatial frequencies and differences in luminance contrast between spatial frequencies, on the detection of facial emotions. We used a saccadic choice task in which emotional-neutral face pairs were presented and participants were asked to make a saccade toward the neutral or the emotional (happy or fearful) face. Faces were displayed either in low, high, or broad spatial frequencies. Results showed that participants were better to saccade toward the emotional face. They were also better for high or broad than low spatial frequencies, and the accuracy was higher with a happy target. An analysis of the eye and mouth saliency ofour stimuli revealed that the mouth saliency of the target correlates with participants' performance. Overall, this study underlines the importance of local more than global information, and of the saliency of the mouth region in the detection of emotional and neutral faces.
Collapse
Affiliation(s)
- Léa Entzmann
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France; Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France; Icelandic Vision Lab, School of Health Sciences, University of Iceland, Reykjavík, Iceland.
| | - Nathalie Guyader
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France
| | - Louise Kauffmann
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Carole Peyrin
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Martial Mermillod
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| |
Collapse
|
12
|
Roth N, Rolfs M, Hellwich O, Obermayer K. Objects guide human gaze behavior in dynamic real-world scenes. PLoS Comput Biol 2023; 19:e1011512. [PMID: 37883331 PMCID: PMC10602265 DOI: 10.1371/journal.pcbi.1011512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 09/12/2023] [Indexed: 10/28/2023] Open
Abstract
The complexity of natural scenes makes it challenging to experimentally study the mechanisms behind human gaze behavior when viewing dynamic environments. Historically, eye movements were believed to be driven primarily by space-based attention towards locations with salient features. Increasing evidence suggests, however, that visual attention does not select locations with high saliency but operates on attentional units given by the objects in the scene. We present a new computational framework to investigate the importance of objects for attentional guidance. This framework is designed to simulate realistic scanpaths for dynamic real-world scenes, including saccade timing and smooth pursuit behavior. Individual model components are based on psychophysically uncovered mechanisms of visual attention and saccadic decision-making. All mechanisms are implemented in a modular fashion with a small number of well-interpretable parameters. To systematically analyze the importance of objects in guiding gaze behavior, we implemented five different models within this framework: two purely spatial models, where one is based on low-level saliency and one on high-level saliency, two object-based models, with one incorporating low-level saliency for each object and the other one not using any saliency information, and a mixed model with object-based attention and selection but space-based inhibition of return. We optimized each model's parameters to reproduce the saccade amplitude and fixation duration distributions of human scanpaths using evolutionary algorithms. We compared model performance with respect to spatial and temporal fixation behavior, including the proportion of fixations exploring the background, as well as detecting, inspecting, and returning to objects. A model with object-based attention and inhibition, which uses saliency information to prioritize between objects for saccadic selection, leads to scanpath statistics with the highest similarity to the human data. This demonstrates that scanpath models benefit from object-based attention and selection, suggesting that object-level attentional units play an important role in guiding attentional processing.
Collapse
Affiliation(s)
- Nicolas Roth
- Cluster of Excellence Science of Intelligence, Technische Universität Berlin, Germany
- Institute of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Germany
| | - Martin Rolfs
- Cluster of Excellence Science of Intelligence, Technische Universität Berlin, Germany
- Department of Psychology, Humboldt-Universität zu Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
| | - Olaf Hellwich
- Cluster of Excellence Science of Intelligence, Technische Universität Berlin, Germany
- Institute of Computer Engineering and Microelectronics, Technische Universität Berlin, Germany
| | - Klaus Obermayer
- Cluster of Excellence Science of Intelligence, Technische Universität Berlin, Germany
- Institute of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Germany
- Bernstein Center for Computational Neuroscience Berlin, Germany
| |
Collapse
|
13
|
Priorelli M, Pezzulo G, Stoianov IP. Active Vision in Binocular Depth Estimation: A Top-Down Perspective. Biomimetics (Basel) 2023; 8:445. [PMID: 37754196 PMCID: PMC10526497 DOI: 10.3390/biomimetics8050445] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 09/08/2023] [Accepted: 09/19/2023] [Indexed: 09/28/2023] Open
Abstract
Depth estimation is an ill-posed problem; objects of different shapes or dimensions, even if at different distances, may project to the same image on the retina. Our brain uses several cues for depth estimation, including monocular cues such as motion parallax and binocular cues such as diplopia. However, it remains unclear how the computations required for depth estimation are implemented in biologically plausible ways. State-of-the-art approaches to depth estimation based on deep neural networks implicitly describe the brain as a hierarchical feature detector. Instead, in this paper we propose an alternative approach that casts depth estimation as a problem of active inference. We show that depth can be inferred by inverting a hierarchical generative model that simultaneously predicts the eyes' projections from a 2D belief over an object. Model inversion consists of a series of biologically plausible homogeneous transformations based on Predictive Coding principles. Under the plausible assumption of a nonuniform fovea resolution, depth estimation favors an active vision strategy that fixates the object with the eyes, rendering the depth belief more accurate. This strategy is not realized by first fixating on a target and then estimating the depth; instead, it combines the two processes through action-perception cycles, with a similar mechanism of the saccades during object recognition. The proposed approach requires only local (top-down and bottom-up) message passing, which can be implemented in biologically plausible neural circuits.
Collapse
Affiliation(s)
- Matteo Priorelli
- Institute of Cognitive Sciences and Technologies, National Research Council of Italy, 35137 Padova, Italy;
| | - Giovanni Pezzulo
- Institute of Cognitive Sciences and Technologies, National Research Council of Italy, 00185 Rome, Italy;
| | - Ivilin Peev Stoianov
- Institute of Cognitive Sciences and Technologies, National Research Council of Italy, 35137 Padova, Italy;
| |
Collapse
|
14
|
Bruckert A, Christie M, Le Meur O. Where to look at the movies: Analyzing visual attention to understand movie editing. Behav Res Methods 2023; 55:2940-2959. [PMID: 36002630 DOI: 10.3758/s13428-022-01949-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2022] [Indexed: 11/08/2022]
Abstract
In the process of making a movie, directors constantly care about where the spectator will look on the screen. Shot composition, framing, camera movements, or editing are tools commonly used to direct attention. In order to provide a quantitative analysis of the relationship between those tools and gaze patterns, we propose a new eye-tracking database, containing gaze-pattern information on movie sequences, as well as editing annotations, and we show how state-of-the-art computational saliency techniques behave on this dataset. In this work, we expose strong links between movie editing and spectators gaze distributions, and open several leads on how the knowledge of editing information could improve human visual attention modeling for cinematic content. The dataset generated and analyzed for this study is available at https://github.com/abruckert/eye_tracking_filmmaking.
Collapse
|
15
|
Azadi R, Lopez E, Taubert J, Patterson A, Afraz A. Inactivation of face selective neurons alters eye movements when free viewing faces. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.20.544678. [PMID: 37502993 PMCID: PMC10370202 DOI: 10.1101/2023.06.20.544678] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
During free viewing, faces attract gaze and induce specific fixation patterns corresponding to the facial features. This suggests that neurons encoding the facial features are in the causal chain that steers the eyes. However, there is no physiological evidence to support a mechanistic link between face encoding neurons in high-level visual areas and the oculomotor system. In this study, we targeted the middle face patches of inferior temporal (IT) cortex in two macaque monkeys using an fMRI localizer. We then utilized muscimol microinjection to unilaterally suppress IT neural activity inside and outside the face patches and recorded eye movements while the animals free viewing natural scenes. Inactivation of the face selective neurons altered the pattern of eye movements on faces: the monkeys found faces in the scene but neglected the eye contralateral to the inactivation hemisphere. These findings reveal the causal contribution of the high-level visual cortex in eye movements. Significance It has been shown, for more than half a century, that eye movements follow distinctive patterns when free viewing faces. This suggests causal involvement of the face-encoding visual neurons in the eye movements. However, the literature is scant of evidence for this possibility and has focused mostly on the link between low-level image saliency and eye movements. Here, for the first time, we bring causal evidence showing how face-selective neurons in inferior temporal cortex inform and steer eye movements when free viewing faces.
Collapse
|
16
|
Chen X, Weng J, Deng X, Luo W, Lan Y, Tian Q. Feature Distillation in Deep Attention Network Against Adversarial Examples. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:3691-3705. [PMID: 34739380 DOI: 10.1109/tnnls.2021.3113342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Deep neural networks (DNNs) are easily fooled by adversarial examples. Most existing defense strategies defend against adversarial examples based on full information of whole images. In reality, one possible reason as to why humans are not sensitive to adversarial perturbations is that the human visual mechanism often concentrates on most important regions of images. A deep attention mechanism has been applied in many computer fields and has achieved great success. Attention modules are composed of an attention branch and a trunk branch. The encoder/decoder architecture in the attention branch has potential of compressing adversarial perturbations. In this article, we theoretically prove that attention modules can compress adversarial perturbations by destroying potential linear characteristics of DNNs. Considering the distribution characteristics of adversarial perturbations in different frequency bands, we design and compare three types of attention modules based on frequency decomposition and reorganization to defend against adversarial examples. Moreover, we find that our designed attention modules can obtain high classification accuracies on clean images by locating attention regions more accurately. Experimental results on the CIFAR and ImageNet dataset demonstrate that frequency reorganization in attention modules can not only achieve good robustness to adversarial perturbations, but also obtain comparable, even higher classification, accuracies on clean images. Moreover, our proposed attention modules can be integrated with existing defense strategies as components to further improve adversarial robustness.
Collapse
|
17
|
Kv R, Prasad K, Peralam Yegneswaran P. Segmentation and Classification Approaches of Clinically Relevant Curvilinear Structures: A Review. J Med Syst 2023; 47:40. [PMID: 36971852 PMCID: PMC10042761 DOI: 10.1007/s10916-023-01927-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 02/25/2023] [Indexed: 03/29/2023]
Abstract
Detection of curvilinear structures from microscopic images, which help the clinicians to make an unambiguous diagnosis is assuming paramount importance in recent clinical practice. Appearance and size of dermatophytic hyphae, keratitic fungi, corneal and retinal vessels vary widely making their automated detection cumbersome. Automated deep learning methods, endowed with superior self-learning capacity, have superseded the traditional machine learning methods, especially in complex images with challenging background. Automatic feature learning ability using large input data with better generalization and recognition capability, but devoid of human interference and excessive pre-processing, is highly beneficial in the above context. Varied attempts have been made by researchers to overcome challenges such as thin vessels, bifurcations and obstructive lesions in retinal vessel detection as revealed through several publications reviewed here. Revelations of diabetic neuropathic complications such as tortuosity, changes in the density and angles of the corneal fibers have been successfully sorted in many publications reviewed here. Since artifacts complicate the images and affect the quality of analysis, methods addressing these challenges have been described. Traditional and deep learning methods, that have been adapted and published between 2015 and 2021 covering retinal vessels, corneal nerves and filamentous fungi have been summarized in this review. We find several novel and meritorious ideas and techniques being put to use in the case of retinal vessel segmentation and classification, which by way of cross-domain adaptation can be utilized in the case of corneal and filamentous fungi also, making suitable adaptations to the challenges to be addressed.
Collapse
Affiliation(s)
- Rajitha Kv
- Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India
| | - Keerthana Prasad
- Manipal School of Information Sciences, Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India.
| | - Prakash Peralam Yegneswaran
- Department of Microbiology, Kasturba Medical College, Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India
| |
Collapse
|
18
|
Rehman T, Muhammad W, Naveed A, Naeem M, Irshad MJ, Qaiser I, Jabbar MW. Hybrid Saliency-Based Visual Perception Model for Humanoid Robots. 2023 INTERNATIONAL CONFERENCE ON ENERGY, POWER, ENVIRONMENT, CONTROL, AND COMPUTING (ICEPECC) 2023. [DOI: 10.1109/icepecc57281.2023.10209501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Affiliation(s)
- Talha Rehman
- University of Gujrat,Department of Electrical Engineering,Gujrat,Pakistan
| | - Wasif Muhammad
- University of Gujrat,Department of Electrical Engineering,Gujrat,Pakistan
| | - Anum Naveed
- University of Gujrat,Department of Electrical Engineering,Gujrat,Pakistan
| | - Muhammad Naeem
- University of Gujrat,Department of Electrical Engineering,Gujrat,Pakistan
| | | | - Irfan Qaiser
- University of Gujrat,Department of Electrical Engineering,Gujrat,Pakistan
| | | |
Collapse
|
19
|
Novin S, Fallah A, Rashidi S, Daliri MR. An improved saliency model of visual attention dependent on image content. Front Hum Neurosci 2023; 16:862588. [PMID: 36926377 PMCID: PMC10011177 DOI: 10.3389/fnhum.2022.862588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 11/14/2022] [Indexed: 03/08/2023] Open
Abstract
Many visual attention models have been presented to obtain the saliency of a scene, i.e., the visually significant parts of a scene. However, some mechanisms are still not taken into account in these models, and the models do not fit the human data accurately. These mechanisms include which visual features are informative enough to be incorporated into the model, how the conspicuity of different features and scales of an image may integrate to obtain the saliency map of the image, and how the structure of an image affects the strategy of our attention system. We integrate such mechanisms in the presented model more efficiently compared to previous models. First, besides low-level features commonly employed in state-of-the-art models, we also apply medium-level features as the combination of orientations and colors based on the visual system behavior. Second, we use a variable number of center-surround difference maps instead of the fixed number used in the other models, suggesting that human visual attention operates differently for diverse images with different structures. Third, we integrate the information of different scales and different features based on their weighted sum, defining the weights according to each component's contribution, and presenting both the local and global saliency of the image. To test the model's performance in fitting human data, we compared it to other models using the CAT2000 dataset and the Area Under Curve (AUC) metric. Our results show that the model has high performance compared to the other models (AUC = 0.79 and sAUC = 0.58) and suggest that the proposed mechanisms can be applied to the existing models to improve them.
Collapse
Affiliation(s)
- Shabnam Novin
- Faculty of Biomedical Engineering, Amirkabir University of Technology (AUT), Tehran, Iran
| | - Ali Fallah
- Faculty of Biomedical Engineering, Amirkabir University of Technology (AUT), Tehran, Iran
| | - Saeid Rashidi
- Faculty of Medical Sciences and Technologies, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Mohammad Reza Daliri
- Neuroscience and Neuroengineering Research Laboratory, Biomedical Engineering Department, School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
- School of Cognitive Sciences (SCS), Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| |
Collapse
|
20
|
A Deep Model of Visual Attention for Saliency Detection on 3D Objects. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11180-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
21
|
Zhang Z, Shang X, Li G, Wang G. Just Noticeable Difference Model for Images with Color Sensitivity. SENSORS (BASEL, SWITZERLAND) 2023; 23:2634. [PMID: 36904837 PMCID: PMC10007073 DOI: 10.3390/s23052634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 02/20/2023] [Accepted: 02/23/2023] [Indexed: 06/18/2023]
Abstract
The just noticeable difference (JND) model reflects the visibility limitations of the human visual system (HVS), which plays an important role in perceptual image/video processing and is commonly applied to perceptual redundancy removal. However, existing JND models are usually constructed by treating the color components of three channels equally, and their estimation of the masking effect is inadequate. In this paper, we introduce visual saliency and color sensitivity modulation to improve the JND model. Firstly, we comprehensively combined contrast masking, pattern masking, and edge protection to estimate the masking effect. Then, the visual saliency of HVS was taken into account to adaptively modulate the masking effect. Finally, we built color sensitivity modulation according to the perceptual sensitivities of HVS, to adjust the sub-JND thresholds of Y, Cb, and Cr components. Thus, the color-sensitivity-based JND model (CSJND) was constructed. Extensive experiments and subjective tests were conducted to verify the effectiveness of the CSJND model. We found that consistency between the CSJND model and HVS was better than existing state-of-the-art JND models.
Collapse
Affiliation(s)
| | - Xiwu Shang
- Correspondence: ; Tel.: +86-021-6779-1084
| | | | | |
Collapse
|
22
|
Fan DP, Zhang J, Xu G, Cheng MM, Shao L. Salient Objects in Clutter. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2344-2366. [PMID: 35404809 DOI: 10.1109/tpami.2022.3166451] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this paper, we identify and address a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter (SOC), includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We therefore argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.
Collapse
|
23
|
Chen Z, Joseph Raj AN, Rajangam V, Li W, Mahesh VG, Zhuang Z. Twofold Dynamic Attention Guided Deep network and Noise-Aware mechanism for Image Denoising. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2023. [DOI: 10.1016/j.jksuci.2023.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
24
|
Doğan FI, Melsión GI, Leite I. Leveraging explainability for understanding object descriptions in ambiguous 3D environments. Front Robot AI 2023; 9:937772. [PMID: 36704241 PMCID: PMC9872646 DOI: 10.3389/frobt.2022.937772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 11/29/2022] [Indexed: 01/06/2023] Open
Abstract
For effective human-robot collaboration, it is crucial for robots to understand requests from users perceiving the three-dimensional space and ask reasonable follow-up questions when there are ambiguities. While comprehending the users' object descriptions in the requests, existing studies have focused on this challenge for limited object categories that can be detected or localized with existing object detection and localization modules. Further, they have mostly focused on comprehending the object descriptions using flat RGB images without considering the depth dimension. On the other hand, in the wild, it is impossible to limit the object categories that can be encountered during the interaction, and 3-dimensional space perception that includes depth information is fundamental in successful task completion. To understand described objects and resolve ambiguities in the wild, for the first time, we suggest a method leveraging explainability. Our method focuses on the active areas of an RGB scene to find the described objects without putting the previous constraints on object categories and natural language instructions. We further improve our method to identify the described objects considering depth dimension. We evaluate our method in varied real-world images and observe that the regions suggested by our method can help resolve ambiguities. When we compare our method with a state-of-the-art baseline, we show that our method performs better in scenes with ambiguous objects which cannot be recognized by existing object detectors. We also show that using depth features significantly improves performance in scenes where depth data is critical to disambiguate the objects and across our evaluation dataset that contains objects that can be specified with and without the depth dimension.
Collapse
|
25
|
Berlijn AM, Hildebrandt LK, Gamer M. Idiosyncratic viewing patterns of social scenes reflect individual preferences. J Vis 2022; 22:10. [PMID: 36583910 PMCID: PMC9807181 DOI: 10.1167/jov.22.13.10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
In general, humans preferentially look at conspecifics in naturalistic images. However, such group-based effects might conceal systematic individual differences concerning the preference for social information. Here, we investigated to what degree fixations on social features occur consistently within observers and whether this preference generalizes to other measures of social prioritization in the laboratory as well as the real world. Participants carried out a free viewing task, a relevance taps task that required them to actively select image regions that are crucial for understanding a given scene, and they were asked to freely take photographs outside the laboratory that were later classified regarding their social content. We observed stable individual differences in the fixation and active selection of human heads and faces that were correlated across tasks and partly predicted the social content of self-taken photographs. Such relationship was not observed for human bodies indicating that different social elements need to be dissociated. These findings suggest that idiosyncrasies in the visual exploration and interpretation of social features exist and predict real-world behavior. Future studies should further characterize these preferences and elucidate how they shape perception and interpretation of social contexts in healthy participants and patients with mental disorders that affect social functioning.
Collapse
Affiliation(s)
- Adam M. Berlijn
- Department of Experimental Psychology, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany,Institute of Clinical Neuroscience and Medical Psychology, Medical Faculty, University Hospital Düsseldorf, Heinrich-Heine University Düsseldorf, Düsseldorf, Germany,Institute of Neuroscience and Medicine (INM-1), Research Centre Jülich, Jülich, Germany,Department of Psychology, Julius-Maximilians-University Würzburg, Würzburg, Germany,
| | - Lea K. Hildebrandt
- Department of Psychology, Julius-Maximilians-University Würzburg, Würzburg, Germany,
| | - Matthias Gamer
- Department of Psychology, Julius-Maximilians-University Würzburg, Würzburg, Germany,
| |
Collapse
|
26
|
Nuthmann A, Thibaut M, Tran THC, Boucart M. Impact of neovascular age-related macular degeneration on eye-movement control during scene viewing: Viewing biases and guidance by visual salience. Vision Res 2022; 201:108105. [PMID: 36081228 DOI: 10.1016/j.visres.2022.108105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 06/06/2022] [Accepted: 07/19/2022] [Indexed: 01/25/2023]
Abstract
Human vision requires us to analyze the visual periphery to decide where to fixate next. In the present study, we investigated this process in people with age-related macular degeneration (AMD). In particular, we examined viewing biases and the extent to which visual salience guides fixation selection during free-viewing of naturalistic scenes. We used an approach combining generalized linear mixed modeling (GLMM) with a-priori scene parcellation. This method allows one to investigate group differences in terms of scene coverage and observers' well-known tendency to look at the center of scene images. Moreover, it allows for testing whether image salience influences fixation probability above and beyond what can be accounted for by the central bias. Compared with age-matched normally sighted control subjects (and young subjects), AMD patients' viewing behavior was less exploratory, with a stronger central fixation bias. All three subject groups showed a salience effect on fixation selection-higher-salience scene patches were more likely to be fixated. Importantly, the salience effect for the AMD group was of similar size as the salience effect for the control group, suggesting that guidance by visual salience was still intact. The variances for by-subject random effects in the GLMM indicated substantial individual differences. A separate model exclusively considered the AMD data and included fixation stability as a covariate, with the results suggesting that reduced fixation stability was associated with a reduced impact of visual salience on fixation selection.
Collapse
Affiliation(s)
- Antje Nuthmann
- Institute of Psychology, University of Kiel, Kiel, Germany.
| | - Miguel Thibaut
- University of Lille, Lille Neuroscience & Cognition, INSERM, Lille, France
| | - Thi Ha Chau Tran
- University of Lille, Lille Neuroscience & Cognition, INSERM, Lille, France; Ophthalmology Department, Lille Catholic Hospital, Catholic University of Lille, Lille, France
| | - Muriel Boucart
- University of Lille, Lille Neuroscience & Cognition, INSERM, Lille, France.
| |
Collapse
|
27
|
Hayes TR, Henderson JM. Scene inversion reveals distinct patterns of attention to semantically interpreted and uninterpreted features. Cognition 2022; 229:105231. [DOI: 10.1016/j.cognition.2022.105231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 07/19/2022] [Accepted: 07/20/2022] [Indexed: 11/03/2022]
|
28
|
Pavlič J, Tomažič T. The (In)effectiveness of Attention Guidance Methods for Enhancing Brand Memory in 360° Video. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22228809. [PMID: 36433406 PMCID: PMC9695698 DOI: 10.3390/s22228809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 11/11/2022] [Accepted: 11/13/2022] [Indexed: 05/14/2023]
Abstract
Sensing and remembering features in visual scenes are conditioned by visual attention and methods to guide it. This should be relevant in terms of product placement, which has become an important part of incorporating brands into different mass media formats with a commercial purpose. The approach can be challenging in 360° video, where an omnidirectional view enables consumers to choose different viewing perspectives, which may result in overlooking the brands. Accordingly, attention guidance methods should be applied. This study is the first to explore diegetic guidance methods as the only appropriate guiding method for an unobtrusive and unconscious nature of product placement. To test the effectiveness of three different diegetic guiding methods, a between-subject design was employed, where the participants were assigned randomly to one of four videos with the same scene but different guiding methods. The findings show and explain the discrepancy with studies on guiding attention in other contexts, as there were no significant differences between the guiding cues according to brand recall and brand recognition. The results also indicate a significant influence of brand familiarity on brand recall in 360° video. The article concludes by providing limitations, future research directions, and recommendations for audiovisual policy.
Collapse
|
29
|
Chen S, Jiang M, Yang J, Zhao Q. Attention in Reasoning: Dataset, Analysis, and Modeling. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:7310-7326. [PMID: 34550881 DOI: 10.1109/tpami.2021.3114582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
While attention has been an increasingly popular component in deep neural networks to both interpret and boost the performance of models, little work has examined how attention progresses to accomplish a task and whether it is reasonable. In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling a quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attention mechanisms on their reasoning capability and how they impact task performance. To improve the attention and reasoning ability of visual question answering models, we propose to supervise the learning of attention progressively along the reasoning process and to differentiate the correct and incorrect attention patterns. We demonstrate the effectiveness of the proposed framework in analyzing and modeling attention with better reasoning capability and task performance. The code and data are available at https://github.com/szzexpoi/AiR.
Collapse
|
30
|
Gonçalves RC, Louw TL, Madigan R, Quaresma M, Romano R, Merat N. The effect of information from dash-based human-machine interfaces on drivers' gaze patterns and lane-change manoeuvres after conditionally automated driving. ACCIDENT; ANALYSIS AND PREVENTION 2022; 174:106726. [PMID: 35716544 DOI: 10.1016/j.aap.2022.106726] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 04/13/2022] [Accepted: 05/28/2022] [Indexed: 06/15/2023]
Abstract
The goal of this paper was to measure the effect of Human-Machine Interface (HMI) information and guidance on drivers' gaze and takeover behaviour during transitions of control from automation. The motivation for this study came from a gap in the literature, where previous research reports improved performance of drivers' takeover based on HMI information, without considering its effect on drivers' visual attention distribution, and how drivers also use the information available in the environment to guide their response. This driving simulator study investigated drivers' lane-changing behaviour after resumption of control from automation. Different levels of information were provided on a dash-based HMI, prior to each lane change, to investigate how drivers distribute their attention between the surrounding environment and the HMI. The difficulty of the lane change was also manipulated by controlling the position of approaching vehicles in drivers' offside lane. Results indicated that drivers' decision-making time was sensitive to the presence of nearby vehicles in the offside lane, but not directly influenced by the information on the HMI. In terms of gaze behaviour, the closer the position of vehicles in the offside lane, the longer drivers looked in that direction. Drivers looked more at the HMI, and less towards the road centre, when the HMI presented information about automation status, and included an advisory message indicating it was safe to change lane. Machine learning techniques showed a strong relationship between drivers' gaze to the information presented on the HMI, and decision-making time (DMT). These results contribute to our understanding of HMI design for automated vehicles, by demonstrating the attentional costs of an overly-informative HMI, and that drivers still rely on environmental information to perform a lane-change, even when the same information can be acquired by the HMI of the vehicle.
Collapse
Affiliation(s)
| | - Tyron L Louw
- Pontifical Catholic University of Rio de Janeiro, Brazil
| | - Ruth Madigan
- Pontifical Catholic University of Rio de Janeiro, Brazil
| | - Manuela Quaresma
- University of Leeds, Institute for Transport Studies, United Kingdom
| | - Richard Romano
- Pontifical Catholic University of Rio de Janeiro, Brazil
| | - Natasha Merat
- Pontifical Catholic University of Rio de Janeiro, Brazil
| |
Collapse
|
31
|
A Gated Fusion Network for Dynamic Saliency Prediction. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3094974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
32
|
RGB-D saliency detection via complementary and selective learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03612-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
33
|
Anil Meera A, Novicky F, Parr T, Friston K, Lanillos P, Sajid N. Reclaiming saliency: Rhythmic precision-modulated action and perception. Front Neurorobot 2022; 16:896229. [PMID: 35966370 PMCID: PMC9368584 DOI: 10.3389/fnbot.2022.896229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Computational models of visual attention in artificial intelligence and robotics have been inspired by the concept of a saliency map. These models account for the mutual information between the (current) visual information and its estimated causes. However, they fail to consider the circular causality between perception and action. In other words, they do not consider where to sample next, given current beliefs. Here, we reclaim salience as an active inference process that relies on two basic principles: uncertainty minimization and rhythmic scheduling. For this, we make a distinction between attention and salience. Briefly, we associate attention with precision control, i.e., the confidence with which beliefs can be updated given sampled sensory data, and salience with uncertainty minimization that underwrites the selection of future sensory data. Using this, we propose a new account of attention based on rhythmic precision-modulation and discuss its potential in robotics, providing numerical experiments that showcase its advantages for state and noise estimation, system identification and action selection for informative path planning.
Collapse
Affiliation(s)
- Ajith Anil Meera
- Department of Cognitive Robotics, Faculty of Mechanical, Maritime and Materials Engineering, Delft University of Technology, Delft, Netherlands
- *Correspondence: Ajith Anil Meera
| | - Filip Novicky
- Department of Neurophysiology, Donders Institute for Brain Cognition and Behavior, Radboud University, Nijmegen, Netherlands
- Filip Novicky
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Karl Friston
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Pablo Lanillos
- Department of Artificial Intelligence, Donders Institute for Brain Cognition and Behavior, Radboud University, Nijmegen, Netherlands
| | - Noor Sajid
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| |
Collapse
|
34
|
A novel video saliency estimation method in the compressed domain. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01081-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
35
|
Peng P, Yang KF, Liang SQ, Li YJ. Contour-guided saliency detection with long-range interactions. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
36
|
Wang W, Lai Q, Fu H, Shen J, Ling H, Yang R. Salient Object Detection in the Deep Learning Era: An In-Depth Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:3239-3259. [PMID: 33434124 DOI: 10.1109/tpami.2021.3051099] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years. Recent advances in SOD are predominantly led by deep learning-based solutions (named deep SOD). To enable in-depth understanding of deep SOD, in this paper, we provide a comprehensive survey covering various aspects, ranging from algorithm taxonomy to unsolved issues. In particular, we first review deep SOD algorithms from different perspectives, including network architecture, level of supervision, learning paradigm, and object-/instance-level detection. Following that, we summarize and analyze existing SOD datasets and evaluation metrics. Then, we benchmark a large group of representative SOD models, and provide detailed analyses of the comparison results. Moreover, we study the performance of SOD algorithms under different attribute settings, which has not been thoroughly explored previously, by constructing a novel SOD dataset with rich attribute annotations covering various salient object types, challenging factors, and scene categories. We further analyze, for the first time in the field, the robustness of SOD models to random input perturbations and adversarial attacks. We also look into the generalization and difficulty of existing SOD datasets. Finally, we discuss several open issues of SOD and outline future research directions. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are publicly available at https://github.com/wenguanwang/SODsurvey.
Collapse
|
37
|
Pandey S, Harit G. Handwritten Annotation Spotting in Printed Documents Using Top-Down Visual Saliency Models. ACM T ASIAN LOW-RESO 2022. [DOI: 10.1145/3485468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
In this article, we address the problem of localizing text and symbolic annotations on the scanned image of a printed document. Previous approaches have considered the task of annotation extraction as binary classification into printed and handwritten text. In this work, we further subcategorize the annotations as underlines, encirclements, inline text, and marginal text. We have collected a new dataset of 300 documents constituting all classes of annotations marked around or in-between printed text. Using the dataset as a benchmark, we report the results of two saliency formulations—CRF Saliency and Discriminant Saliency, for predicting salient patches, which can correspond to different types of annotations. We also compare our work with recent semantic segmentation techniques using deep models. Our analysis shows that Discriminant Saliency can be considered as the preferred approach for fast localization of patches containing different types of annotations. The saliency models were learned on a small dataset, but still, give comparable performance to the deep networks for pixel-level semantic segmentation. We show that saliency-based methods give better outcomes with limited annotated data compared to more sophisticated segmentation techniques that require a large training set to learn the model.
Collapse
Affiliation(s)
- Shilpa Pandey
- Adani Institute of Infrastructure Engineering, Ahmedabad, Gujarat, India
| | - Gaurav Harit
- Indian Institute of Technology Jodhpur, Jodhpur, Rajasthan, India
| |
Collapse
|
38
|
Zhou L, Zhou T, Khan S, Sun H, Shen J, Shao L. Weakly Supervised Visual Saliency Prediction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3111-3124. [PMID: 35380961 DOI: 10.1109/tip.2022.3158064] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The success of current deep saliency models heavily depends on large amounts of annotated human fixation data to fit the highly non-linear mapping between the stimuli and visual saliency. Such fully supervised data-driven approaches are annotation-intensive and often fail to consider the underlying mechanisms of visual attention. In contrast, in this paper, we introduce a model based on various cognitive theories of visual saliency, which learns visual attention patterns in a weakly supervised manner. Our approach incorporates insights from cognitive science as differentiable submodules, resulting in a unified, end-to-end trainable framework. Specifically, our model encapsulates the following important components motivated from biological vision. (a) As scene semantics are closely related to visually attentive regions, our model encodes discriminative spatial information for scene understanding through spatial visual semantics embedding. (b) To model the objectness factors in visual attention deployment, we incorporate object-level semantics embedding and object relation information. (c) Considering the "winner-take-all" mechanism in visual stimuli processing, we model the competition mechanism among objects with softmax based neural attention. (d) Lastly, a conditional center prior is learned to mimic the spatial distribution bias of visual attention. Furthermore, we propose novel loss functions to utilize supervision cues from image-level semantics, saliency prior knowledge, and self-information compression. Experiments show that our method achieves promising results, and even outperforms many of its fully supervised counterparts. Overall, our weakly supervised saliency method makes an essential step towards reducing the annotation budget of current approaches, as well as providing a more comprehensive understanding of the visual attention mechanism. Our code is available at: https://github.com/ashleylqx/WeakFixation.git.
Collapse
|
39
|
Ndayikengurukiye D, Mignotte M. Salient Object Detection by LTP Texture Characterization on Opposing Color Pairs under SLICO Superpixel Constraint. J Imaging 2022; 8:jimaging8040110. [PMID: 35448237 PMCID: PMC9027508 DOI: 10.3390/jimaging8040110] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 03/31/2022] [Accepted: 04/05/2022] [Indexed: 02/05/2023] Open
Abstract
The effortless detection of salient objects by humans has been the subject of research in several fields, including computer vision, as it has many applications. However, salient object detection remains a challenge for many computer models dealing with color and textured images. Most of them process color and texture separately and therefore implicitly consider them as independent features which is not the case in reality. Herein, we propose a novel and efficient strategy, through a simple model, almost without internal parameters, which generates a robust saliency map for a natural image. This strategy consists of integrating color information into local textural patterns to characterize a color micro-texture. It is the simple, yet powerful LTP (Local Ternary Patterns) texture descriptor applied to opposing color pairs of a color space that allows us to achieve this end. Each color micro-texture is represented by a vector whose components are from a superpixel obtained by the SLICO (Simple Linear Iterative Clustering with zero parameter) algorithm, which is simple, fast and exhibits state-of-the-art boundary adherence. The degree of dissimilarity between each pair of color micro-textures is computed by the FastMap method, a fast version of MDS (Multi-dimensional Scaling) that considers the color micro-textures’ non-linearity while preserving their distances. These degrees of dissimilarity give us an intermediate saliency map for each RGB (Red–Green–Blue), HSL (Hue–Saturation–Luminance), LUV (L for luminance, U and V represent chromaticity values) and CMY (Cyan–Magenta–Yellow) color space. The final saliency map is their combination to take advantage of the strength of each of them. The MAE (Mean Absolute Error), MSE (Mean Squared Error) and Fβ measures of our saliency maps, on the five most used datasets show that our model outperformed several state-of-the-art models. Being simple and efficient, our model could be combined with classic models using color contrast for a better performance.
Collapse
|
40
|
Kümmerer M, Bethge M, Wallis TSA. DeepGaze III: Modeling free-viewing human scanpaths with deep learning. J Vis 2022; 22:7. [PMID: 35472130 PMCID: PMC9055565 DOI: 10.1167/jov.22.5.7] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Humans typically move their eyes in “scanpaths” of fixations linked by saccades. Here we present DeepGaze III, a new model that predicts the spatial location of consecutive fixations in a free-viewing scanpath over static images. DeepGaze III is a deep learning–based model that combines image information with information about the previous fixation history to predict where a participant might fixate next. As a high-capacity and flexible model, DeepGaze III captures many relevant patterns in the human scanpath data, setting a new state of the art in the MIT300 dataset and thereby providing insight into how much information in scanpaths across observers exists in the first place. We use this insight to assess the importance of mechanisms implemented in simpler, interpretable models for fixation selection. Due to its architecture, DeepGaze III allows us to disentangle several factors that play an important role in fixation selection, such as the interplay of scene content and scanpath history. The modular nature of DeepGaze III allows us to conduct ablation studies, which show that scene content has a stronger effect on fixation selection than previous scanpath history in our main dataset. In addition, we can use the model to identify scenes for which the relative importance of these sources of information differs most. These data-driven insights would be difficult to accomplish with simpler models that do not have the computational capacity to capture such patterns, demonstrating an example of how deep learning advances can be used to contribute to scientific understanding.
Collapse
Affiliation(s)
| | | | - Thomas S A Wallis
- Technical University of Darmstadt, Institute of Psychology and Centre for Cognitive Science, Darmstadt, Germany.,
| |
Collapse
|
41
|
Han Y, Chen X, Zhang S, Qi D. iNL: Implicit non-local network. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
42
|
Mairena A, Gutwin C, Cockburn A. Which emphasis technique to use? Perception of emphasis techniques with varying distractors, backgrounds, and visualization types. INFORMATION VISUALIZATION 2022; 21:95-129. [PMID: 35177955 PMCID: PMC8841630 DOI: 10.1177/14738716211045354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Emphasis effects are visual changes that make data elements distinct from their surroundings. Designers may use computational saliency models to predict how a viewer's attention will be guided by a specific effect; however, although saliency models provide a foundational understanding of emphasis perception, they only cover specific visual effects in abstract conditions. To address these limitations, we carried out crowdsourced studies that evaluate emphasis perception in a wider range of conditions than previously studied. We varied effect magnitude, distractor number and type, background, and visualization type, and measured the perceived emphasis of 12 visual effects. Our results show that there are perceptual commonalities of emphasis across a wide range of environments, but also that there are limitations on perceptibility for some effects, dependent on a visualization's background or type. We developed a model of emphasis predictability based on simple scatterplots that can be extended to other viewing conditions. Our studies provide designers with new understanding of how viewers experience emphasis in realistic visualization settings.
Collapse
Affiliation(s)
| | - Carl Gutwin
- University of Saskatchewan, Saskatoon, SK, Canada
| | | |
Collapse
|
43
|
Gromada K, Siemiątkowska B, Stecz W, Płochocki K, Woźniak K. Real-Time Object Detection and Classification by UAV Equipped With SAR. SENSORS 2022; 22:s22052068. [PMID: 35271213 PMCID: PMC8915099 DOI: 10.3390/s22052068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 03/03/2022] [Accepted: 03/04/2022] [Indexed: 11/20/2022]
Abstract
The article presents real-time object detection and classification methods by unmanned aerial vehicles (UAVs) equipped with a synthetic aperture radar (SAR). Two algorithms have been extensively tested: classic image analysis and convolutional neural networks (YOLOv5). The research resulted in a new method that combines YOLOv5 with post-processing using classic image analysis. It is shown that the new system improves both the classification accuracy and the location of the identified object. The algorithms were implemented and tested on a mobile platform installed on a military-class UAV as the primary unit for online image analysis. The usage of objective low-computational complexity detection algorithms on SAR scans can reduce the size of the scans sent to the ground control station.
Collapse
Affiliation(s)
- Krzysztof Gromada
- Institute of Automatic Control and Robotics, Warsaw University of Technology, 02-525 Warsaw, Poland; (B.S.); (K.P.); (K.W.)
- Correspondence:
| | - Barbara Siemiątkowska
- Institute of Automatic Control and Robotics, Warsaw University of Technology, 02-525 Warsaw, Poland; (B.S.); (K.P.); (K.W.)
| | - Wojciech Stecz
- Faculty of Cybernetics, Military University of Technology, 00-908 Warsaw, Poland;
| | - Krystian Płochocki
- Institute of Automatic Control and Robotics, Warsaw University of Technology, 02-525 Warsaw, Poland; (B.S.); (K.P.); (K.W.)
| | - Karol Woźniak
- Institute of Automatic Control and Robotics, Warsaw University of Technology, 02-525 Warsaw, Poland; (B.S.); (K.P.); (K.W.)
| |
Collapse
|
44
|
Zhang X, Chang R, Sui X, Li Y. Influences of Emotion on Driving Decisions at Different Risk Levels: An Eye Movement Study. Front Psychol 2022; 13:788712. [PMID: 35185722 PMCID: PMC8854174 DOI: 10.3389/fpsyg.2022.788712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Accepted: 01/13/2022] [Indexed: 11/13/2022] Open
Abstract
To explore the influences of traffic-related negative emotions on driving decisions, we induced drivers' three emotions (neutral emotion, traffic-related negative emotion, and traffic-unrelated negative emotion) by videos, then the drivers were shown traffic pictures at different risk levels and made decisions about whether to slow down, while their eye movements were recorded. We found that traffic-related negative emotion influenced driving decisions. Compared with neutral emotion, traffic-related negative emotion led to an increase in the number of decelerations, and the higher the risk, the more the number of decelerations. The visual processing time of the risk area was shorter in the traffic-related negative emotional state than that in the neutral emotional state. The less time drivers spend looking at the risk area, the faster they make their driving decisions. The results suggest that traffic-related negative emotions lead drivers to make more conservative decisions. This study supports the rationality of using traffic accident materials to conduct safety education for drivers. This article also discussed the significance of traffic-related negative emotions to social security.
Collapse
Affiliation(s)
- Xiaoying Zhang
- School of Psychology, Liaoning Normal University, Dalian, China
| | - Ruosong Chang
- School of Psychology, Liaoning Normal University, Dalian, China
| | - Xue Sui
- School of Psychology, Liaoning Normal University, Dalian, China
| | - Yutong Li
- School of Psychology, Liaoning Normal University, Dalian, China
| |
Collapse
|
45
|
Robust Segmentation Based on Salient Region Detection Coupled Gaussian Mixture Model. INFORMATION 2022. [DOI: 10.3390/info13020098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The impressive progress on image segmentation has been witnessed recently. In this paper, an improved model introducing frequency-tuned salient region detection into Gaussian mixture model (GMM) is proposed, which is named FTGMM. Frequency-tuned salient region detection is added to achieve the saliency map of the original image, which is combined with the original image, and the value of the saliency map is added into the Gaussian mixture model in the form of spatial information weight. The proposed method (FTGMM) calculates the model parameters by the expectation maximization (EM) algorithm with low computational complexity. In the qualitative and quantitative analysis of the experiment, the subjective visual effect and the value of the evaluation index are found to be better than other methods. Therefore, the proposed method (FTGMM) is proven to have high precision and better robustness.
Collapse
|
46
|
Yu Y, Qian J, Wu Q. Visual Saliency via Multiscale Analysis in Frequency Domain and Its Applications to Ship Detection in Optical Satellite Images. Front Neurorobot 2022; 15:767299. [PMID: 35095455 PMCID: PMC8793482 DOI: 10.3389/fnbot.2021.767299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 12/01/2021] [Indexed: 11/13/2022] Open
Abstract
This article proposes a bottom-up visual saliency model that uses the wavelet transform to conduct multiscale analysis and computation in the frequency domain. First, we compute the multiscale magnitude spectra by performing a wavelet transform to decompose the magnitude spectrum of the discrete cosine coefficients of an input image. Next, we obtain multiple saliency maps of different spatial scales through an inverse transformation from the frequency domain to the spatial domain, which utilizes the discrete cosine magnitude spectra after multiscale wavelet decomposition. Then, we employ an evaluation function to automatically select the two best multiscale saliency maps. A final saliency map is generated via an adaptive integration of the two selected multiscale saliency maps. The proposed model is fast, efficient, and can simultaneously detect salient regions or objects of different sizes. It outperforms state-of-the-art bottom-up saliency approaches in the experiments of psychophysical consistency, eye fixation prediction, and saliency detection for natural images. In addition, the proposed model is applied to automatic ship detection in optical satellite images. Ship detection tests on satellite data of visual optical spectrum not only demonstrate our saliency model's effectiveness in detecting small and large salient targets but also verify its robustness against various sea background disturbances.
Collapse
Affiliation(s)
- Ying Yu
- School of Information Science and Engineering, Yunnan University, Kunming, China
| | | | | |
Collapse
|
47
|
Review of Visual Saliency Prediction: Development Process from Neurobiological Basis to Deep Models. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app12010309] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The human attention mechanism can be understood and simulated by closely associating the saliency prediction task to neuroscience and psychology. Furthermore, saliency prediction is widely used in computer vision and interdisciplinary subjects. In recent years, with the rapid development of deep learning, deep models have made amazing achievements in saliency prediction. Deep learning models can automatically learn features, thus solving many drawbacks of the classic models, such as handcrafted features and task settings, among others. Nevertheless, the deep models still have some limitations, for example in tasks involving multi-modality and semantic understanding. This study focuses on summarizing the relevant achievements in the field of saliency prediction, including the early neurological and psychological mechanisms and the guiding role of classic models, followed by the development process and data comparison of classic and deep saliency prediction models. This study also discusses the relationship between the model and human vision, as well as the factors that cause the semantic gaps, the influences of attention in cognitive research, the limitations of the saliency model, and the emerging applications, to provide new saliency predictions for follow-up work and the necessary help and advice.
Collapse
|
48
|
Betti A, Boccignone G, Faggi L, Gori M, Melacci S. Visual Features and Their Own Optical Flow. Front Artif Intell 2021; 4:768516. [PMID: 34927064 PMCID: PMC8672218 DOI: 10.3389/frai.2021.768516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/25/2021] [Indexed: 11/14/2022] Open
Abstract
Symmetries, invariances and conservation equations have always been an invaluable guide in Science to model natural phenomena through simple yet effective relations. For instance, in computer vision, translation equivariance is typically a built-in property of neural architectures that are used to solve visual tasks; networks with computational layers implementing such a property are known as Convolutional Neural Networks (CNNs). This kind of mathematical symmetry, as well as many others that have been recently studied, are typically generated by some underlying group of transformations (translations in the case of CNNs, rotations, etc.) and are particularly suitable to process highly structured data such as molecules or chemical compounds which are known to possess those specific symmetries. When dealing with video streams, common built-in equivariances are able to handle only a small fraction of the broad spectrum of transformations encoded in the visual stimulus and, therefore, the corresponding neural architectures have to resort to a huge amount of supervision in order to achieve good generalization capabilities. In the paper we formulate a theory on the development of visual features that is based on the idea that movement itself provides trajectories on which to impose consistency. We introduce the principle of Material Point Invariance which states that each visual feature is invariant with respect to the associated optical flow, so that features and corresponding velocities are an indissoluble pair. Then, we discuss the interaction of features and velocities and show that certain motion invariance traits could be regarded as a generalization of the classical concept of affordance. These analyses of feature-velocity interactions and their invariance properties leads to a visual field theory which expresses the dynamical constraints of motion coherence and might lead to discover the joint evolution of the visual features along with the associated optical flows.
Collapse
Affiliation(s)
- Alessandro Betti
- Department of Information Engineering and Mathematics, Università degli Studi di Siena, Siena, Italy
| | - Giuseppe Boccignone
- PHuSe Lab, Department of Computer Science, Università degli Studi di Milano, Milan, Italy
| | - Lapo Faggi
- Department of Information Engineering and Mathematics, Università degli Studi di Siena, Siena, Italy.,Department of Information Engineering, Università degli Studi di Firenze, Firenze, Italy
| | - Marco Gori
- Department of Information Engineering and Mathematics, Università degli Studi di Siena, Siena, Italy.,Universitè Côte D'Azur, Inria, CNRS, I3S, Maasai, Sophia-Antipolis, France
| | - Stefano Melacci
- Department of Information Engineering and Mathematics, Università degli Studi di Siena, Siena, Italy
| |
Collapse
|
49
|
Xia C, Han J, Zhang D. Evaluation of Saccadic Scanpath Prediction: Subjective Assessment Database and Recurrent Neural Network Based Metric. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:4378-4395. [PMID: 32750785 DOI: 10.1109/tpami.2020.3002168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In recent years, predicting the saccadic scanpaths of humans has become a new trend in the field of visual attention modeling. Given various saccadic algorithms, determining how to evaluate their ability to model a dynamic saccade has become an important yet understudied issue. To our best knowledge, existing metrics for evaluating saccadic prediction models are often heuristically designed, which may produce results that are inconsistent with human subjective assessment. To this end, we first construct a subjective database by collecting the assessments on 5,000 pairs of scanpaths from ten subjects. Based on this database, we can compare different metrics according to their consistency with human visual perception. In addition, we also propose a data-driven metric to measure scanpath similarity based on the human subjective comparison. To achieve this goal, we employ a long short-term memory (LSTM) network to learn the inference from the relationship of encoded scanpaths to a binary measurement. Experimental results have demonstrated that the LSTM-based metric outperforms other existing metrics. Moreover, we believe the constructed database can be used as a benchmark to inspire more insights for future metric selection.
Collapse
|
50
|
Berga D, Otazu X. A Neurodynamic Model of Saliency Prediction in V1. Neural Comput 2021; 34:378-414. [PMID: 34915573 DOI: 10.1162/neco_a_01464] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 09/03/2021] [Indexed: 11/04/2022]
Abstract
Lateral connections in the primary visual cortex (V1) have long been hypothesized to be responsible for several visual processing mechanisms such as brightness induction, chromatic induction, visual discomfort, and bottom-up visual attention (also named saliency). Many computational models have been developed to independently predict these and other visual processes, but no computational model has been able to reproduce all of them simultaneously. In this work, we show that a biologically plausible computational model of lateral interactions of V1 is able to simultaneously predict saliency and all the aforementioned visual processes. Our model's architecture (NSWAM) is based on Penacchio's neurodynamic model of lateral connections of V1. It is defined as a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation, and scale. We tested NSWAM saliency predictions using images from several eye tracking data sets. We show that the accuracy of predictions obtained by our architecture, using shuffled metrics, is similar to other state-of-the-art computational methods, particularly with synthetic images (CAT2000-Pattern and SID4VAM) that mainly contain low-level features. Moreover, we outperform other biologically inspired saliency models that are specifically designed to exclusively reproduce saliency. We show that our biologically plausible model of lateral connections can simultaneously explain different visual processes present in V1 (without applying any type of training or optimization and keeping the same parameterization for all the visual processes). This can be useful for the definition of a unified architecture of the primary visual cortex.
Collapse
Affiliation(s)
- David Berga
- Eurecat, Centre Tecnòlogic de Catalunya, 08005 Barcelona, Spain
| | - Xavier Otazu
- Computer Vision Center, Universitat Autònoma de Barcelona Edifici O, 08193, Bellaterra, Spain
| |
Collapse
|