1
|
Sun L, Francis DJ, Nagai Y, Yoshida H. Early development of saliency-driven attention through object manipulation. Acta Psychol (Amst) 2024; 243:104124. [PMID: 38232506 DOI: 10.1016/j.actpsy.2024.104124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 12/30/2023] [Accepted: 01/02/2024] [Indexed: 01/19/2024] Open
Abstract
In the first years of life, infants progressively develop attention selection skills to gather information from visually clustered environments. As young as newborns, infants are sensitive to the distinguished differences in color, orientation, and luminance, which are the components of visual saliency. However, we know little about how saliency-driven attention emerges and develops socially through everyday free-viewing experiences. The present work assessed the saliency change in infants' egocentric scenes and investigated the impacts of manual engagements on infant object looking in the interactive context of object play. Thirty parent-infant dyads, including infants in two age groups (younger: 3- to 6-month-old; older: 9- to 12-month-old), completed a brief session of object play. Infants' looking behaviors were recorded by the head-mounted eye-tracking gear, and both parents' and infants' manual actions on objects were annotated separately for analyses. The present findings revealed distinct attention mechanisms that underlie the hand-eye coordination between parents and infants and within infants during object play: younger infants are predominantly biased toward the characteristics of the visual saliency accompanying the parent's handled actions on the objects; on the other hand, older infants gradually employed more attention to the object, regardless of the saliency in view, as they gained more self-generated manual actions. Taken together, the present work highlights the tight coordination between visual experiences and sensorimotor competence and proposes a novel dyadic pathway to sustained attention that social sensitivity to parents' hands emerges through saliency-driven attention, preparing infants to focus, follow, and steadily track moving targets in free-flow viewing activities.
Collapse
Affiliation(s)
- Lichao Sun
- Department of Psychology, University of Houston, TX, United States.
| | - David J Francis
- Texas Institute for Measurement, Evaluation, and Statistics, University of Houston, TX, United States.
| | - Yukie Nagai
- International Research Center for Neurointelligence, University of Tokyo, Tokyo, Japan.
| | - Hanako Yoshida
- Department of Psychology, University of Houston, TX, United States.
| |
Collapse
|
2
|
Entzmann L, Guyader N, Kauffmann L, Peyrin C, Mermillod M. Detection of emotional faces: The role of spatial frequencies and local features. Vision Res 2023; 211:108281. [PMID: 37421829 DOI: 10.1016/j.visres.2023.108281] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 06/18/2023] [Accepted: 06/28/2023] [Indexed: 07/10/2023]
Abstract
Models of emotion processing suggest that threat-related stimuli such as fearful faces can be detected based on the rapid extraction of low spatial frequencies. However, this remains debated as other models argue that the decoding of facial expressions occurs with a more flexible use of spatial frequencies. The purpose of this study was to clarify the role of spatial frequencies and differences in luminance contrast between spatial frequencies, on the detection of facial emotions. We used a saccadic choice task in which emotional-neutral face pairs were presented and participants were asked to make a saccade toward the neutral or the emotional (happy or fearful) face. Faces were displayed either in low, high, or broad spatial frequencies. Results showed that participants were better to saccade toward the emotional face. They were also better for high or broad than low spatial frequencies, and the accuracy was higher with a happy target. An analysis of the eye and mouth saliency ofour stimuli revealed that the mouth saliency of the target correlates with participants' performance. Overall, this study underlines the importance of local more than global information, and of the saliency of the mouth region in the detection of emotional and neutral faces.
Collapse
Affiliation(s)
- Léa Entzmann
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France; Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France; Icelandic Vision Lab, School of Health Sciences, University of Iceland, Reykjavík, Iceland.
| | - Nathalie Guyader
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France
| | - Louise Kauffmann
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Carole Peyrin
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Martial Mermillod
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| |
Collapse
|
3
|
Bruckert A, Christie M, Le Meur O. Where to look at the movies: Analyzing visual attention to understand movie editing. Behav Res Methods 2023; 55:2940-2959. [PMID: 36002630 DOI: 10.3758/s13428-022-01949-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2022] [Indexed: 11/08/2022]
Abstract
In the process of making a movie, directors constantly care about where the spectator will look on the screen. Shot composition, framing, camera movements, or editing are tools commonly used to direct attention. In order to provide a quantitative analysis of the relationship between those tools and gaze patterns, we propose a new eye-tracking database, containing gaze-pattern information on movie sequences, as well as editing annotations, and we show how state-of-the-art computational saliency techniques behave on this dataset. In this work, we expose strong links between movie editing and spectators gaze distributions, and open several leads on how the knowledge of editing information could improve human visual attention modeling for cinematic content. The dataset generated and analyzed for this study is available at https://github.com/abruckert/eye_tracking_filmmaking.
Collapse
|
4
|
Manley CE, Walter K, Micheletti S, Tietjen M, Cantillon E, Fazzi EM, Bex PJ, Merabet LB. Object identification in cerebral visual impairment characterized by gaze behavior and image saliency analysis. Brain Dev 2023; 45:432-444. [PMID: 37188548 PMCID: PMC10524860 DOI: 10.1016/j.braindev.2023.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 04/26/2023] [Accepted: 05/01/2023] [Indexed: 05/17/2023]
Abstract
Individuals with cerebral visual impairment (CVI) have difficulties identifying common objects, especially when presented as cartoons or abstract images. In this study, participants were shown a series of images of ten common objects, each from five possible categories ranging from abstract black & white line drawings to color photographs. Fifty individuals with CVI and 50 neurotypical controls verbally identified each object and success rates and reaction times were collected. Visual gaze behavior was recorded using an eye tracker to quantify the extent of visual search area explored and number of fixations. A receiver operating characteristic (ROC) analysis was also carried out to compare the degree of alignment between the distribution of individual eye gaze patterns and image saliency features computed by the graph-based visual saliency (GBVS) model. Compared to controls, CVI participants showed significantly lower success rates and longer reaction times when identifying objects. In the CVI group, success rate improved moving from abstract black & white images to color photographs, suggesting that object form (as defined by outlines and contours) and color are important cues for correct identification. Eye tracking data revealed that the CVI group showed significantly greater visual search areas and number of fixations per image, and the distribution of eye gaze patterns in the CVI group was less aligned with the high saliency features of the image compared to controls. These results have important implications in helping to understand the complex profile of visual perceptual difficulties associated with CVI.
Collapse
Affiliation(s)
- Claire E Manley
- The Laboratory for Visual Neuroplasticity, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, USA
| | - Kerri Walter
- Translational Vision Lab. Department of Psychology, Northeastern University, Boston, MA, USA
| | - Serena Micheletti
- Unit of Child Neurology and Psychiatry, ASST Spedali Civili of Brescia, Brescia, Italy; Department of Clinical and Experimental Sciences, University of Brescia, Brescia, Italy
| | - Matthew Tietjen
- The Laboratory for Visual Neuroplasticity, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, USA
| | - Emily Cantillon
- The Laboratory for Visual Neuroplasticity, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, USA
| | - Elisa M Fazzi
- Unit of Child Neurology and Psychiatry, ASST Spedali Civili of Brescia, Brescia, Italy; Department of Clinical and Experimental Sciences, University of Brescia, Brescia, Italy
| | - Peter J Bex
- Translational Vision Lab. Department of Psychology, Northeastern University, Boston, MA, USA
| | - Lotfi B Merabet
- The Laboratory for Visual Neuroplasticity, Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, USA.
| |
Collapse
|
5
|
Abstract
Salient object detection (SOD) is viewed as a pixel-wise saliency modeling task by traditional deep learning-based methods. A limitation of current SOD models is insufficient utilization of inter-pixel information, which usually results in imperfect segmentation near edge regions and low spatial coherence. As we demonstrate, using a saliency mask as the only label is suboptimal. To address this limitation, we propose a connectivity-based approach called bilateral connectivity network (BiconNet), which uses connectivity masks together with saliency masks as labels for effective modeling of inter-pixel relationships and object saliency. Moreover, we propose a bilateral voting module to enhance the output connectivity map, and a novel edge feature enhancement method that efficiently utilizes edge-specific features. Through comprehensive experiments on five benchmark datasets, we demonstrate that our proposed method can be plugged into any existing state-of-the-art saliency-based SOD framework to improve its performance with negligible parameter increase.
Collapse
Affiliation(s)
- Ziyun Yang
- Department of Biomedical Engineering, Duke University, Durham, 27708, NC, USA
| | | | - Sina Farsiu
- Department of Biomedical Engineering, Duke University, Durham, 27708, NC, USA
- Department of Ophthalmology, Duke University Medical Center, Durham, 27710, NC, USA
- Department of Electrical & Computer Engineering, Duke University, Durham, 27708, NC, USA
- Department of Computer Science, Duke University, Durham, 27708, NC, USA
- Corresponding author: (Sina Farsiu)
| |
Collapse
|
6
|
Mikhailova A, Raposo A, Sala SD, Coco MI. Eye-movements reveal semantic interference effects during the encoding of naturalistic scenes in long-term memory. Psychon Bull Rev 2021; 28:1601-14. [PMID: 34009623 DOI: 10.3758/s13423-021-01920-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/22/2021] [Indexed: 11/08/2022]
Abstract
Similarity-based semantic interference (SI) hinders memory recognition. Within long-term visual memory paradigms, the more scenes (or objects) from the same semantic category are viewed, the harder it is to recognize each individual instance. A growing body of evidence shows that overt attention is intimately linked to memory. However, it is yet to be understood whether SI mediates overt attention during scene encoding, and so explain its detrimental impact on recognition memory. In the current experiment, participants watched 372 photographs belonging to different semantic categories (e.g., a kitchen) with different frequency (4, 20, 40 or 60 images), while being eye-tracked. After 10 minutes, they were presented with the same 372 photographs plus 372 new photographs and asked whether they recognized (or not) each photo (i.e., old/new paradigm). We found that the more the SI, the poorer the recognition performance, especially for old scenes of which memory representations existed. Scenes more widely explored were better recognized, but for increasing SI, participants focused on more local regions of the scene in search for its potentially distinctive details. Attending to the centre of the display, or to scene regions rich in low-level saliency was detrimental to recognition accuracy, and as SI increased participants were more likely to rely on visual saliency. The complexity of maintaining faithful memory representations for increasing SI also manifested in longer fixation durations; in fact, a more successful encoding was also associated with shorter fixations. Our study highlights the interdependence between attention and memory during high-level processing of semantic information.
Collapse
|
7
|
Niu X, Huang S, Yang S, Wang Z, Li Z, Shi L. Comparison of pop-out responses to luminance and motion contrasting stimuli of tectal neurons in pigeons. Brain Res 2020; 1747:147068. [PMID: 32827547 DOI: 10.1016/j.brainres.2020.147068] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 07/20/2020] [Accepted: 08/17/2020] [Indexed: 11/30/2022]
Abstract
The emergence of visual saliency has been widely studied in the primary visual cortex and the superior colliculus (SC) in mammals. There are fewer studies on the pop-out response to motion direction contrasting stimuli taken in the optic tectum (OT, homologous to mammalian SC), and these are mainly of owls and fish. To our knowledge the influence of spatial luminance has not been reported. In this study, we have recorded multi-units in pigeon OT and analyzed the tectal response to spatial luminance contrasting, motion direction contrasting, and contrasting stimuli from both feature dimensions. The comparison results showed that 1) the tectal response would pop-out in either motion direction or spatial luminance contrasting conditions. 2) The modulation from motion direction contrasting was independent of the temporal luminance variation of the visual stimuli. 3) When both spatial luminance and motion direction were salient, the response of tectal neurons was modulated more intensely by motion direction than by spatial luminance. The phenomenon was consistent with the innate instinct of avians in their natural environment. This study will help to deepen the understanding of mechanisms involved in bottom-up visual information processing and selective attention in the avian.
Collapse
Affiliation(s)
- Xiaoke Niu
- Henan Key Laboratory of Brain-Computer Interface Technology, School of Electrical Engineering, ZhengZhou University, Zhengzhou 450001, China; College of Basic Medicine, Zhengzhou University, Zhengzhou 450001, China.
| | - Shuman Huang
- Henan Key Laboratory of Brain-Computer Interface Technology, School of Electrical Engineering, ZhengZhou University, Zhengzhou 450001, China
| | - Shangfei Yang
- Henan Key Laboratory of Brain-Computer Interface Technology, School of Electrical Engineering, ZhengZhou University, Zhengzhou 450001, China
| | - Zhizhong Wang
- Henan Key Laboratory of Brain-Computer Interface Technology, School of Electrical Engineering, ZhengZhou University, Zhengzhou 450001, China
| | - Zhihui Li
- Henan Key Laboratory of Brain-Computer Interface Technology, School of Electrical Engineering, ZhengZhou University, Zhengzhou 450001, China
| | - Li Shi
- Henan Key Laboratory of Brain-Computer Interface Technology, School of Electrical Engineering, ZhengZhou University, Zhengzhou 450001, China; Department of Automation, Tsinghua University, Beijing 100000, China.
| |
Collapse
|
8
|
Vakanski A, Xian M, Freer PE. Attention-Enriched Deep Learning Model for Breast Tumor Segmentation in Ultrasound Images. Ultrasound Med Biol 2020; 46:2819-2833. [PMID: 32709519 PMCID: PMC7483681 DOI: 10.1016/j.ultrasmedbio.2020.06.015] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 06/12/2020] [Accepted: 06/19/2020] [Indexed: 05/07/2023]
Abstract
Incorporating human domain knowledge for breast tumor diagnosis is challenging because shape, boundary, curvature, intensity or other common medical priors vary significantly across patients and cannot be employed. This work proposes a new approach to integrating visual saliency into a deep learning model for breast tumor segmentation in ultrasound images. Visual saliency refers to image maps containing regions that are more likely to attract radiologists' visual attention. The proposed approach introduces attention blocks into a U-Net architecture and learns feature representations that prioritize spatial regions with high saliency levels. The validation results indicate increased accuracy for tumor segmentation relative to models without salient attention layers. The approach achieved a Dice similarity coefficient (DSC) of 90.5% on a data set of 510 images. The salient attention model has the potential to enhance accuracy and robustness in processing medical images of other organs, by providing a means to incorporate task-specific knowledge into deep learning architectures.
Collapse
Affiliation(s)
- Aleksandar Vakanski
- Department of Computer Science, University of Idaho, Idaho Falls, Idaho, USA.
| | - Min Xian
- Department of Computer Science, University of Idaho, Idaho Falls, Idaho, USA
| | - Phoebe E Freer
- University of Utah School of Medicine, Salt Lake City, Utah, USA
| |
Collapse
|
9
|
Liu D, Rao N, Mei X, Jiang H, Li Q, Luo C, Li Q, Zeng C, Zeng B, Gan T. Annotating Early Esophageal Cancers Based on Two Saliency Levels of Gastroscopic Images. J Med Syst 2018; 42:237. [PMID: 30327890 DOI: 10.1007/s10916-018-1063-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Accepted: 09/06/2018] [Indexed: 02/05/2023]
Abstract
Early diagnoses of esophageal cancer can greatly improve the survival rate of patients. At present, the lesion annotation of early esophageal cancers (EEC) in gastroscopic images is generally performed by medical personnel in a clinic. To reduce the effect of subjectivity and fatigue in manual annotation, computer-aided annotation is required. However, automated annotation of EEC lesions using images is a challenging task owing to the fine-grained variability in the appearance of EEC lesions. This study modifies the traditional EEC annotation framework and utilizes visual salient information to develop a two saliency levels-based lesion annotation (TSL-BLA) for EEC annotations on gastroscopic images. Unlike existing methods, the proposed framework has a strong ability of constraining false positive outputs. What is more, TSL-BLA is also placed an additional emphasis on the annotation of small EEC lesions. A total of 871 gastroscopic images from 231 patients were used to validate TSL-BLA. 365 of those images contain 434 EEC lesions and 506 images do not contain any lesions. 101 small lesion regions are extracted from the 434 lesions to further validate the performance of TSL-BLA. The experimental results show that the mean detection rate and Dice similarity coefficients of TSL-BLA were 97.24 and 75.15%, respectively. Compared with other state-of-the-art methods, TSL-BLA shows better performance. Moreover, it shows strong superiority when annotating small EEC lesions. It also produces fewer false positive outputs and has a fast running speed. Therefore, The proposed method has good application prospects in aiding clinical EEC diagnoses.
Collapse
Affiliation(s)
- Dingyun Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Nini Rao
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China. .,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China. .,Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China.
| | - Xinming Mei
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China.,Institute of Electronic and Information Engineering of UESTC in Guangdong, Dongguan, China
| | - Hongxiu Jiang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Quanchi Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - ChengSi Luo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Qian Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Chengshi Zeng
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Bing Zeng
- School of Communication and Information Engineering, University Electronic Science and Technology of China, Chengdu, China
| | - Tao Gan
- Digestive Endoscopic Center of West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
10
|
Loukas C, Varytimidis C, Rapantzikos K, Kanakis MA. Keyframe extraction from laparoscopic videos based on visual saliency detection. Comput Methods Programs Biomed 2018; 165:13-23. [PMID: 30337068 DOI: 10.1016/j.cmpb.2018.07.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 06/01/2018] [Accepted: 07/16/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Laparoscopic surgery offers the potential for video recording of the operation, which is important for technique evaluation, cognitive training, patient briefing and documentation. An effective way for video content representation is to extract a limited number of keyframes with semantic information. In this paper we present a novel method for keyframe extraction from individual shots of the operational video. METHODS The laparoscopic video was first segmented into video shots using an objectness model, which was trained to capture significant changes in the endoscope field of view. Each frame of a shot was then decomposed into three saliency maps in order to model the preference of human vision to regions with higher differentiation with respect to color, motion and texture. The accumulated responses from each map provided a 3D time series of saliency variation across the shot. The time series was modeled as a multivariate autoregressive process with hidden Markov states (HMMAR model). This approach allowed the temporal segmentation of the shot into a predefined number of states. A representative keyframe was extracted from each state based on the highest state-conditional probability of the corresponding saliency vector. RESULTS Our method was tested on 168 video shots extracted from various laparoscopic cholecystectomy operations from the publicly available Cholec80 dataset. Four state-of-the-art methodologies were used for comparison. The evaluation was based on two assessment metrics: Color Consistency Score (CCS), which measures the color distance between the ground truth (GT) and the closest keyframe, and Temporal Consistency Score (TCS), which considers the temporal proximity between GT and extracted keyframes. About 81% of the extracted keyframes matched the color content of the GT keyframes, compared to 77% yielded by the second-best method. The TCS of the proposed and the second-best method was close to 1.9 and 1.4 respectively. CONCLUSIONS Our results demonstrated that the proposed method yields superior performance in terms of content and temporal consistency to the ground truth. The extracted keyframes provided highly semantic information that may be used for various applications related to surgical video content representation, such as workflow analysis, video summarization and retrieval.
Collapse
Affiliation(s)
- Constantinos Loukas
- Laboratory of Medical Physics, Medical School, National and Kapodistrian University of Athens, Mikras Asias 75 str., Athens 11527, Greece.
| | - Christos Varytimidis
- School of Electrical and Computer Engineering, National and Technical University of Athens, Athens, Greece
| | - Konstantinos Rapantzikos
- School of Electrical and Computer Engineering, National and Technical University of Athens, Athens, Greece
| | - Meletios A Kanakis
- Cardiothoracic Surgery Unit, Great Ormond Street Hospital for Children, London, UK
| |
Collapse
|
11
|
Liang Z, Hamada Y, Oba S, Ishii S. Characterization of electroencephalography signals for estimating saliency features in videos. Neural Netw 2018; 105:52-64. [PMID: 29763744 DOI: 10.1016/j.neunet.2018.04.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Revised: 04/05/2018] [Accepted: 04/18/2018] [Indexed: 11/27/2022]
Abstract
Understanding the functions of the visual system has been one of the major targets in neuroscience for many years. However, the relation between spontaneous brain activities and visual saliency in natural stimuli has yet to be elucidated. In this study, we developed an optimized machine learning-based decoding model to explore the possible relationships between the electroencephalography (EEG) characteristics and visual saliency. The optimal features were extracted from the EEG signals and saliency map which was computed according to an unsupervised saliency model (Tavakoli and Laaksonen, 2017). Subsequently, various unsupervised feature selection/extraction techniques were examined using different supervised regression models. The robustness of the presented model was fully verified by means of ten-fold or nested cross validation procedure, and promising results were achieved in the reconstruction of saliency features based on the selected EEG characteristics. Through the successful demonstration of using EEG characteristics to predict the real-time saliency distribution in natural videos, we suggest the feasibility of quantifying visual content through measuring brain activities (EEG signals) in real environments, which would facilitate the understanding of cortical involvement in the processing of natural visual stimuli and application developments motivated by human visual processing.
Collapse
Affiliation(s)
- Zhen Liang
- Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan.
| | - Yasuyuki Hamada
- Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan.
| | - Shigeyuki Oba
- Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan.
| | - Shin Ishii
- Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan; ATR Cognitive Mechanisms Laboratories, Kyoto 619-0288, Japan.
| |
Collapse
|
12
|
Abstract
Negative correlations between pupil size and the tendency to look at salient locations were found in recent studies (e.g., Mathôt et al., 2015). It is hypothesized that this negative correlation might be explained by the mental effort put by participants in the task that leads in return to pupil dilation. Here we present an exploratory study on the effect of expertise on eye-movement behavior. Because there is no available standard tool to evaluate WoW players' expertise, we built an off-game questionnaire testing players' knowledge about WoW and acquired skills through completed raids, highest rated battlegrounds, Skill Points, etc. Experts (N = 4) and novices (N = 4) in the massively multiplayer online role-playing game World of Warcraft (WoW) viewed 24 designed video segments from the game that differ in regards with their content (i.e, informative locations) and visual complexity (i.e, salient locations). Consistent with previous studies, we found a negative correlation between pupil size and the tendency to look at salient locations (experts, r = - .17, p < .0001, and novices, r = - .09, p < .0001). This correlation has been interpreted in terms of mental effort: People are inherently biased to look at salient locations (sharp corners, bright lights, etc.), but are able (i.e., experts) to overcome this bias if they invest sufficient mental effort. Crucially, we observed that this correlation was stronger for expert WoW players than novice players (Z = - 3.3, p = .0011). This suggests that experts learned to improve control over eye-movement behavior by guiding their eyes towards informative, but potentially low-salient areas of the screen. These findings may contribute to our understanding of what makes an expert an expert.
Collapse
Affiliation(s)
- Yousri Marzouki
- Department of Social Sciences, Qatar University, Doha, Qatar
- Laboratoire de Psychologie Cognitive, Aix-Marseille Université, Marseille, France
| | - Valériane Dusaucy
- Laboratoire de Psychologie Cognitive, Aix-Marseille Université, Marseille, France
| | | | - Sebastiaan Mathôt
- Laboratoire de Psychologie Cognitive, Aix-Marseille Université, Marseille, France
- Department of Experimental Psychology, University of Groningen, Groningen, Netherlands
| |
Collapse
|
13
|
Abstract
We explore the role of eye movements in a chase detection task. Unlike the previous studies, which focused on overall performance as indicated by response speed and chase detection accuracy, we decompose the search process into gaze events such as smooth eye movements and use a data-driven approach to separately describe these gaze events. We measured eye movements of four human subjects engaged in a chase detection task displayed on a computer screen. The subjects were asked to detect two chasing rings among twelve other randomly moving rings. Using principal component analysis and support vector machines, we looked at the template and classification images that describe various stages of the detection process. We showed that the subjects mostly search for pairs of rings that move one after another in the same direction with a distance of 3.5-3.8 degrees. To find such pairs, the subjects first looked for regions with a high ring density and then pursued the rings in this region. Most of these groups consisted of two rings. Three subjects preferred to pursue the pair as a single object, while the remaining subject pursued the group by alternating the gaze between the two individual rings. In the discussion, we argue that subjects do not compare the movement of the pursued pair to a singular preformed template that describes a chasing motion. Rather, subjects bring certain hypotheses about what motion may qualify as chase and then, through feedback, they learn to look for a motion pattern that maximizes their performance.
Collapse
|
14
|
Abstract
Psycholinguistic research using the visual world paradigm has shown that the processing of sentences is constrained by the visual context in which they occur. Recently, there has been growing interest in the interactions observed when both language and vision provide relevant information during sentence processing. In three visual world experiments on syntactic ambiguity resolution, we investigate how visual and linguistic information influence the interpretation of ambiguous sentences. We hypothesize that (1) visual and linguistic information both constrain which interpretation is pursued by the sentence processor, and (2) the two types of information act upon the interpretation of the sentence at different points during processing. In Experiment 1, we show that visual saliency is utilized to anticipate the upcoming arguments of a verb. In Experiment 2, we operationalize linguistic saliency using intonational breaks and demonstrate that these give prominence to linguistic referents. These results confirm prediction (1). In Experiment 3, we manipulate visual and linguistic saliency together and find that both types of information are used, but at different points in the sentence, to incrementally update its current interpretation. This finding is consistent with prediction (2). Overall, our results suggest an adaptive processing architecture in which different types of information are used when they become available, optimizing different aspects of situated language processing.
Collapse
Affiliation(s)
- Moreno I Coco
- a Institute for Language, Cognition and Computation, School of Informatics , University of Edinburgh , UK
| | | |
Collapse
|
15
|
Bian P, Zhang L. Visual saliency: a biologically plausible contourlet-like frequency domain approach. Cogn Neurodyn 2010; 4:189-98. [PMID: 21886671 DOI: 10.1007/s11571-010-9122-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2010] [Revised: 06/17/2010] [Accepted: 06/17/2010] [Indexed: 10/19/2022] Open
Abstract
In this paper we propose a fast frequency domain saliency detection method that is also biologically plausible, referred to as frequency domain divisive normalization (FDN). We show that the initial feature extraction stage, common to all spatial domain approaches, can be simplified to a Fourier transform with a contourlet-like grouping of coefficients, and saliency detection can be achieved in frequency domain. Specifically, we show that divisive normalization, a model of cortical surround inhibition, can be conducted in frequency domain. Since Fourier coefficients are global in space, we extend to this model by conducting piecewise FDN (PFDN) using overlapping local patches to provide better biological plausibility. Not only do FDN and PFDN outperform current state-of-the-art methods in eye fixation prediction, they are also faster. Speed and simplicity are advantages of our frequency domain approach, and its biological plausibility is the main contribution of our paper.
Collapse
|