1
|
Guo KH, Chaudhari NN, Jafar T, Chowdhury NF, Bogdan P, Irimia A. Anatomic Interpretability in Neuroimage Deep Learning: Saliency Approaches for Typical Aging and Traumatic Brain Injury. Neuroinformatics 2024; 22:591-606. [PMID: 39503843 PMCID: PMC11579113 DOI: 10.1007/s12021-024-09694-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/15/2024] [Indexed: 11/13/2024]
Abstract
The black box nature of deep neural networks (DNNs) makes researchers and clinicians hesitant to rely on their findings. Saliency maps can enhance DNN explainability by suggesting the anatomic localization of relevant brain features. This study compares seven popular attribution-based saliency approaches to assign neuroanatomic interpretability to DNNs that estimate biological brain age (BA) from magnetic resonance imaging (MRI). Cognitively normal (CN) adults (N = 13,394, 5,900 males; mean age: 65.82 ± 8.89 years) are included for DNN training, testing, validation, and saliency map generation to estimate BA. To study saliency robustness to the presence of anatomic deviations from normality, saliency maps are also generated for adults with mild traumatic brain injury (mTBI, N = 214, 135 males; mean age: 55.3 ± 9.9 years). We assess saliency methods' capacities to capture known anatomic features of brain aging and compare them to a surrogate ground truth whose anatomic saliency is known a priori. Anatomic aging features are identified most reliably by the integrated gradients method, which outperforms all others through its ability to localize relevant anatomic features. Gradient Shapley additive explanations, input × gradient, and masked gradient perform less consistently but still highlight ubiquitous neuroanatomic features of aging (ventricle dilation, hippocampal atrophy, sulcal widening). Saliency methods involving gradient saliency, guided backpropagation, and guided gradient-weight class attribution mapping localize saliency outside the brain, which is undesirable. Our research suggests the relative tradeoffs of saliency methods to interpret DNN findings during BA estimation in typical aging and after mTBI.
Collapse
Affiliation(s)
- Kevin H Guo
- Thomas Lord Department of Computer Science, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, 90089, USA
- Ethel Percy Andrus Gerontology Center, Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Nikhil N Chaudhari
- Ethel Percy Andrus Gerontology Center, Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, 90089, USA
- Corwin D. Denney Research Center, Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, 90089, USA
| | - Tamara Jafar
- Ethel Percy Andrus Gerontology Center, Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, 90089, USA
- Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, 90089, USA
| | - Nahian F Chowdhury
- Ethel Percy Andrus Gerontology Center, Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, 90089, USA
- Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, 90089, USA
| | - Paul Bogdan
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, 90089, USA
| | - Andrei Irimia
- Ethel Percy Andrus Gerontology Center, Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, 90089, USA.
- Corwin D. Denney Research Center, Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Quantitative and Computational Biology, Dornsife College of Arts and Sciences, University of Southern California, Los Angeles, CA, 90089, USA.
- Centre for Healthy Brain Aging, Institute of Psychiatry, Psychology & Neuroscience, King's College London, 16 de Crespigny Park, London, SE5 8AF, UK.
| |
Collapse
|
2
|
Men Q, Teng C, Drukker L, Papageorghiou AT, Noble JA. Gaze-probe joint guidance with multi-task learning in obstetric ultrasound scanning. Med Image Anal 2023; 90:102981. [PMID: 37863638 PMCID: PMC7615231 DOI: 10.1016/j.media.2023.102981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 08/31/2023] [Accepted: 09/26/2023] [Indexed: 10/22/2023]
Abstract
In this work, we exploit multi-task learning to jointly predict the two decision-making processes of gaze movement and probe manipulation that an experienced sonographer would perform in routine obstetric scanning. A multimodal guidance framework, Multimodal-GuideNet, is proposed to detect the causal relationship between a real-world ultrasound video signal, synchronized gaze, and probe motion. The association between the multi-modality inputs is learned and shared through a modality-aware spatial graph that leverages useful cross-modal dependencies. By estimating the probability distribution of probe and gaze movements in real scans, the predicted guidance signals also allow inter- and intra-sonographer variations and avoid a fixed scanning path. We validate the new multi-modality approach on three types of obstetric scanning examinations, and the result consistently outperforms single-task learning under various guidance policies. To simulate sonographer's attention on multi-structure images, we also explore multi-step estimation in gaze guidance, and its visual results show that the prediction allows multiple gaze centers that are substantially aligned with underlying anatomical structures.
Collapse
Affiliation(s)
- Qianhui Men
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, OX3 7DQ, United Kingdom.
| | - Clare Teng
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, OX3 7DQ, United Kingdom
| | - Lior Drukker
- Nuffield Department of Women's & Reproductive Health, University of Oxford, Oxford, OX3 9DU, United Kingdom; Department of Obstetrics and Gynecology, Tel-Aviv University, Tel Aviv, Ramat Aviv, 69978, Israel
| | - Aris T Papageorghiou
- Nuffield Department of Women's & Reproductive Health, University of Oxford, Oxford, OX3 9DU, United Kingdom
| | - J Alison Noble
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, OX3 7DQ, United Kingdom
| |
Collapse
|
3
|
Kümmerer M, Bethge M. Predicting Visual Fixations. Annu Rev Vis Sci 2023; 9:269-291. [PMID: 37419107 DOI: 10.1146/annurev-vision-120822-072528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/09/2023]
Abstract
As we navigate and behave in the world, we are constantly deciding, a few times per second, where to look next. The outcomes of these decisions in response to visual input are comparatively easy to measure as trajectories of eye movements, offering insight into many unconscious and conscious visual and cognitive processes. In this article, we review recent advances in predicting where we look. We focus on evaluating and comparing models: How can we consistently measure how well models predict eye movements, and how can we judge the contribution of different mechanisms? Probabilistic models facilitate a unified approach to fixation prediction that allows us to use explainable information explained to compare different models across different settings, such as static and video saliency, as well as scanpath prediction. We review how the large variety of saliency maps and scanpath models can be translated into this unifying framework, how much different factors contribute, and how we can select the most informative examples for model comparison. We conclude that the universal scale of information gain offers a powerful tool for the inspection of candidate mechanisms and experimental design that helps us understand the continual decision-making process that determines where we look.
Collapse
Affiliation(s)
| | - Matthias Bethge
- Tübingen AI Center, University of Tübingen, Tübingen, Germany; ,
| |
Collapse
|
4
|
Wang H, Nicklaus K, Jewett E, Rehani E, Chen TA, Engelmann J, Bordes MC, Chopra D, Reece GP, Lee ZH, Markey MK. Assessing saliency models of observers' visual attention on acquired facial differences. J Med Imaging (Bellingham) 2023; 10:S11908. [PMID: 37091297 PMCID: PMC10118307 DOI: 10.1117/1.jmi.10.s1.s11908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 04/03/2023] [Indexed: 04/25/2023] Open
Abstract
Purpose Saliency models that predict observers' visual attention to facial differences could enable psychosocial interventions to help patients and their families anticipate staring behaviors. The purpose of this study was to assess the ability of existing saliency models to predict observers' visual attention to acquired facial differences arising from head and neck cancer and its treatment. Approach Saliency maps predicted by graph-based visual saliency (GBVS), an artificial neural network (ANN), and a face-specific model were compared to observer fixation maps generated from eye-tracking of lay observers presented with clinical facial photographs of patients with a visible or functional impairment manifesting in the head and neck region. We used a linear mixed-effects model to investigate observer and stimulus factors associated with the saliency models' accuracy. Results The GBVS model predicted many irrelevant regions (e.g., shirt collars) as being salient. The ANN model underestimated observers' attention to facial differences relative to the central region of the face. Compared with GBVS and ANN, the face-specific saliency model was more accurate on this task; however, the face-specific model underestimated the saliency of deviations from the typical structure of human faces. The linear mixed-effects model revealed that the location of the facial difference (midface versus periphery) was significantly associated with saliency model performance. Model performance was also significantly impacted by interobserver variability. Conclusions Existing saliency models are not adequate for predicting observers' visual attention to facial differences. Extensions of face-specific saliency models are needed to accurately predict the saliency of acquired facial differences arising from head and neck cancer and its treatment.
Collapse
Affiliation(s)
- Haoqi Wang
- The University of Texas at Austin, Department of Biomedical Engineering, Austin, Texas, United States
- The University of Texas MD Anderson Cancer Center, Department of Plastic Surgery, Houston, Texas, United States
| | - Krista Nicklaus
- The University of Texas at Austin, Department of Biomedical Engineering, Austin, Texas, United States
- The University of Texas MD Anderson Cancer Center, Department of Plastic Surgery, Houston, Texas, United States
| | - Eloise Jewett
- The University of Texas at Austin, Department of Biomedical Engineering, Austin, Texas, United States
| | - Eeshaan Rehani
- The University of Texas at Austin, Department of Biomedical Engineering, Austin, Texas, United States
| | - Tzuan A. Chen
- University of Houston, HEALTH Research Institute, Houston, Texas, United States
- University of Houston, Department of Psychological, Health, and Learning Sciences, Houston, Texas, United States
| | - Jeff Engelmann
- Rogers Behavioral Health, Oconomowoc, Wisconsin, United States
| | - Mary Catherine Bordes
- The University of Texas MD Anderson Cancer Center, Department of Plastic Surgery, Houston, Texas, United States
| | - Deepti Chopra
- The University of Texas MD Anderson Cancer Center, Department of Psychiatry, Houston, Texas, United States
| | - Gregory P. Reece
- The University of Texas MD Anderson Cancer Center, Department of Plastic Surgery, Houston, Texas, United States
| | - Z-Hye Lee
- The University of Texas MD Anderson Cancer Center, Department of Plastic Surgery, Houston, Texas, United States
| | - Mia K. Markey
- The University of Texas at Austin, Department of Biomedical Engineering, Austin, Texas, United States
- The University of Texas MD Anderson Cancer Center, Department of Imaging Physics, Houston, Texas, United States
| |
Collapse
|
5
|
Lencastre P, Bhurtel S, Yazidi A, E Mello GBM, Denysov S, Lind PG. EyeT4Empathy: Dataset of foraging for visual information, gaze typing and empathy assessment. Sci Data 2022; 9:752. [PMID: 36463232 PMCID: PMC9719458 DOI: 10.1038/s41597-022-01862-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 11/23/2022] [Indexed: 12/05/2022] Open
Abstract
We present a dataset of eye-movement recordings collected from 60 participants, along with their empathy levels, towards people with movement impairments. During each round of gaze recording, participants were divided into two groups, each one completing one task. One group performed a task of free exploration of structureless images, and a second group performed a task consisting of gaze typing, i.e. writing sentences using eye-gaze movements on a card board. The eye-tracking data recorded from both tasks is stored in two datasets, which, besides gaze position, also include pupil diameter measurements. The empathy levels of participants towards non-verbal movement-impaired people were assessed twice through a questionnaire, before and after each task. The questionnaire is composed of forty questions, extending a established questionnaire of cognitive and affective empathy. Finally, our dataset presents an opportunity for analysing and evaluating, among other, the statistical features of eye-gaze trajectories in free-viewing as well as how empathy is reflected in eye features.
Collapse
Affiliation(s)
- Pedro Lencastre
- Dep. Computer Science, OsloMet - Oslo Metropolitan University, P.O. Box 4 St. Olavs plass, N-0130, Oslo, Norway.
- OsloMet Artificial Intelligence lab, OsloMet, Pilestredet 52, N-0166, Oslo, Norway.
- NordSTAR - Nordic Center for Sustainable and Trustworthy AI Research, Pilestredet 52, N-0166, Oslo, Norway.
| | - Samip Bhurtel
- Dep. Computer Science, OsloMet - Oslo Metropolitan University, P.O. Box 4 St. Olavs plass, N-0130, Oslo, Norway
- OsloMet Artificial Intelligence lab, OsloMet, Pilestredet 52, N-0166, Oslo, Norway
| | - Anis Yazidi
- Dep. Computer Science, OsloMet - Oslo Metropolitan University, P.O. Box 4 St. Olavs plass, N-0130, Oslo, Norway
- OsloMet Artificial Intelligence lab, OsloMet, Pilestredet 52, N-0166, Oslo, Norway
- NordSTAR - Nordic Center for Sustainable and Trustworthy AI Research, Pilestredet 52, N-0166, Oslo, Norway
| | - Gustavo B M E Mello
- Dep. Computer Science, OsloMet - Oslo Metropolitan University, P.O. Box 4 St. Olavs plass, N-0130, Oslo, Norway
- OsloMet Artificial Intelligence lab, OsloMet, Pilestredet 52, N-0166, Oslo, Norway
- NordSTAR - Nordic Center for Sustainable and Trustworthy AI Research, Pilestredet 52, N-0166, Oslo, Norway
| | - Sergiy Denysov
- Dep. Computer Science, OsloMet - Oslo Metropolitan University, P.O. Box 4 St. Olavs plass, N-0130, Oslo, Norway
- OsloMet Artificial Intelligence lab, OsloMet, Pilestredet 52, N-0166, Oslo, Norway
- NordSTAR - Nordic Center for Sustainable and Trustworthy AI Research, Pilestredet 52, N-0166, Oslo, Norway
| | - Pedro G Lind
- Dep. Computer Science, OsloMet - Oslo Metropolitan University, P.O. Box 4 St. Olavs plass, N-0130, Oslo, Norway
- OsloMet Artificial Intelligence lab, OsloMet, Pilestredet 52, N-0166, Oslo, Norway
- NordSTAR - Nordic Center for Sustainable and Trustworthy AI Research, Pilestredet 52, N-0166, Oslo, Norway
| |
Collapse
|
6
|
Visual search habits and the spatial structure of scenes. Atten Percept Psychophys 2022; 84:1874-1885. [PMID: 35819714 PMCID: PMC9338010 DOI: 10.3758/s13414-022-02506-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2022] [Indexed: 11/18/2022]
Abstract
Some spatial layouts may suit our visual search habits better than others. We compared eye movements during search across three spatial configurations. Participants searched for a line segment oriented 45∘ to the right. Variation in the orientation of distractor line segments determines the extent to which this target would be visible in peripheral vision: a target among homogeneous distractors is highly visible, while a target among heterogeneous distractors requires central vision. When the search array is split into homogeneous and heterogeneous left and right halves, a large proportion of fixations are “wasted” on the homogeneous half, leading to slower search times. We compared this pattern to two new configurations. In the first, the array was split into upper and lower halves. During a passive viewing baseline condition, we observed biases to look both at the top half and also at the hetergeneous region first. Both of these biases were weaker during active search, despite the fact that the heterogeneous bias would have led to improvements in efficiency if it had been retained. In the second experiment, patches of more or less heterogeneous line segments were scattered across the search space. This configuration allows for more natural, spatially distributed scanpaths. Participants were more efficient and less variable relative to the left/right configuration. The results are consistent with the idea that visual search is associated with a distributed sequence of fixations, guided only loosely by the potential visibility of the target in different regions of the scene.
Collapse
|
7
|
Ghosh S, D'Angelo G, Glover A, Iacono M, Niebur E, Bartolozzi C. Event-driven proto-object based saliency in 3D space to attract a robot's attention. Sci Rep 2022; 12:7645. [PMID: 35538154 PMCID: PMC9090933 DOI: 10.1038/s41598-022-11723-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 04/25/2022] [Indexed: 11/28/2022] Open
Abstract
To interact with its environment, a robot working in 3D space needs to organise its visual input in terms of objects or their perceptual precursors, proto-objects. Among other visual cues, depth is a submodality used to direct attention to visual features and objects. Current depth-based proto-object attention models have been implemented for standard RGB-D cameras that produce synchronous frames. In contrast, event cameras are neuromorphic sensors that loosely mimic the function of the human retina by asynchronously encoding per-pixel brightness changes at very high temporal resolution, thereby providing advantages like high dynamic range, efficiency (thanks to their high degree of signal compression), and low latency. We propose a bio-inspired bottom-up attention model that exploits event-driven sensing to generate depth-based saliency maps that allow a robot to interact with complex visual input. We use event-cameras mounted in the eyes of the iCub humanoid robot to directly extract edge, disparity and motion information. Real-world experiments demonstrate that our system robustly selects salient objects near the robot in the presence of clutter and dynamic scene changes, for the benefit of downstream applications like object segmentation, tracking and robot interaction with external objects.
Collapse
Affiliation(s)
- Suman Ghosh
- Event Driven Perception for Robotics, Istituto Italiano di Tecnologia, 16163, Genoa, Italy
- Electrical Engineering and Computer Science, Technische Universität Berlin, 10623, Berlin, Germany
| | - Giulia D'Angelo
- Event Driven Perception for Robotics, Istituto Italiano di Tecnologia, 16163, Genoa, Italy
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Arren Glover
- Event Driven Perception for Robotics, Istituto Italiano di Tecnologia, 16163, Genoa, Italy
| | - Massimiliano Iacono
- Event Driven Perception for Robotics, Istituto Italiano di Tecnologia, 16163, Genoa, Italy
| | - Ernst Niebur
- Mind/Brain Institute, Johns Hopkins University, Baltimore, 21218, MD, USA
| | - Chiara Bartolozzi
- Event Driven Perception for Robotics, Istituto Italiano di Tecnologia, 16163, Genoa, Italy.
| |
Collapse
|
8
|
Fan DP, Lin Z, Zhang Z, Zhu M, Cheng MM. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2075-2089. [PMID: 32491986 DOI: 10.1109/tnnls.2020.2996406] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The use of RGB-D information for salient object detection (SOD) has been extensively explored in recent years. However, relatively few efforts have been put toward modeling SOD in real-world human activity scenes with RGB-D. In this article, we fill the gap by making the following contributions to RGB-D SOD: 1) we carefully collect a new Salient Person (SIP) data set that consists of ~1 K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusions, illuminations, and background s; 2) we conduct a large-scale (and, so far, the most comprehensive) benchmark comparing contemporary methods, which has long been missing in the field and can serve as a baseline for future research, and we systematically summarize 32 popular models and evaluate 18 parts of 32 models on seven data sets containing a total of about 97k images; and 3) we propose a simple general architecture, called deep depth-depurator network (D3Net). It consists of a depth depurator unit (DDU) and a three-stream feature learning module (FLM), which performs low-quality depth map filtering and cross-modal feature learning, respectively. These components form a nested structure and are elaborately designed to be learned jointly. D3Net exceeds the performance of any prior contenders across all five metrics under consideration, thus serving as a strong model to advance research in this field. We also demonstrate that D3Net can be used to efficiently extract salient object masks from real scenes, enabling effective background-changing application with a speed of 65 frames/s on a single GPU. All the saliency maps, our new SIP data set, the D3Net model, and the evaluation tools are publicly available at https://github.com/DengPingFan/D3NetBenchmark.
Collapse
|
9
|
Borji A. Saliency Prediction in the Deep Learning Era: Successes and Limitations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:679-700. [PMID: 31425064 DOI: 10.1109/tpami.2019.2935715] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Visual saliency models have enjoyed a big leap in performance in recent years, thanks to advances in deep learning and large scale annotated data. Despite enormous effort and huge breakthroughs, however, models still fall short in reaching human-level accuracy. In this work, I explore the landscape of the field emphasizing on new deep saliency models, benchmarks, and datasets. A large number of image and video saliency models are reviewed and compared over two image benchmarks and two large scale video datasets. Further, I identify factors that contribute to the gap between models and humans and discuss the remaining issues that need to be addressed to build the next generation of more powerful saliency models. Some specific questions that are addressed include: in what ways current models fail, how to remedy them, what can be learned from cognitive studies of attention, how explicit saliency judgments relate to fixations, how to conduct fair model comparison, and what are the emerging applications of saliency models.
Collapse
|
10
|
Chen CY, Matrov D, Veale R, Onoe H, Yoshida M, Miura K, Isa T. Properties of visually guided saccadic behavior and bottom-up attention in marmoset, macaque, and human. J Neurophysiol 2020; 125:437-457. [PMID: 33356912 DOI: 10.1152/jn.00312.2020] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Saccades are stereotypic behaviors whose investigation improves our understanding of how primate brains implement precise motor control. Furthermore, saccades offer an important window into the cognitive and attentional state of the brain. Historically, saccade studies have largely relied on macaques. However, the cortical network giving rise to the saccadic command is difficult to study in macaques because relevant cortical areas lie in deep sulci and are difficult to access. Recently, a New World monkey. the marmoset, has garnered attention as an alternative to macaques because of advantages including its smooth cortical surface. However, adoption of the marmoset for oculomotor research has been limited due to a lack of in-depth descriptions of marmoset saccade kinematics and their ability to perform psychophysical tasks. Here, we directly compare free-viewing and visually guided behavior of marmoset, macaque, and human engaged in identical tasks under similar conditions. In the video free-viewing task, all species exhibited qualitatively similar saccade kinematics up to 25° in amplitude although with different parameters. Furthermore, the conventional bottom-up saliency model predicted gaze targets at similar rates for all species. We further verified their visually guided behavior by training them with step and gap saccade tasks. In the step paradigm, marmosets did not show shorter saccade reaction time for upward saccades whereas macaques and humans did. In the gap paradigm, all species showed similar gap effect and express saccades. Our results suggest that the marmoset can serve as a model for oculomotor, attentional, and cognitive research while we need to be aware of their difference from macaque or human.NEW & NOTEWORTHY We directly compared the results of a video free-viewing task and visually guided saccade tasks (step and gap) among three different species: marmoset, macaque, and human. We found that all species exhibit qualitatively similar saccadic kinematics and saliency-driven saccadic behavior albeit with different parameters. Our results suggest that the marmoset possesses similar neural mechanisms to macaque and human for saccadic control, and it is an appropriate model to study neural mechanisms for active vision and attention.
Collapse
Affiliation(s)
- Chih-Yang Chen
- Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto, Japan.,Institute for the Advanced Study of Human Biology, Kyoto University, Kyoto, Japan
| | - Denis Matrov
- Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto, Japan.,Division of Neuropsychopharmacology, Department of Psychology, University of Tartu, Tartu, Estonia
| | - Richard Veale
- Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Hirotaka Onoe
- Human Brain Research Center, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Masatoshi Yoshida
- Center for Human Nature, Artificial Intelligence, and Neuroscience, Hokkaido University, Sapporo, Japan
| | - Kenichiro Miura
- Department of Integrative Brain Science, Graduate School of Medicine, Kyoto University, Kyoto, Japan.,Department of Pathology of Mental Diseases, National Institute of Mental Health, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Tadashi Isa
- Department of Neuroscience, Graduate School of Medicine, Kyoto University, Kyoto, Japan.,Institute for the Advanced Study of Human Biology, Kyoto University, Kyoto, Japan.,Human Brain Research Center, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| |
Collapse
|
11
|
Kim H, Ohmura Y, Kuniyoshi Y. Using Human Gaze to Improve Robustness Against Irrelevant Objects in Robot Manipulation Tasks. IEEE Robot Autom Lett 2020. [DOI: 10.1109/lra.2020.2998410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
12
|
Kroner A, Senden M, Driessens K, Goebel R. Contextual encoder-decoder network for visual saliency prediction. Neural Netw 2020; 129:261-270. [PMID: 32563023 DOI: 10.1016/j.neunet.2020.05.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 03/19/2020] [Accepted: 05/04/2020] [Indexed: 11/28/2022]
Abstract
Predicting salient regions in natural images requires the detection of objects that are present in a scene. To develop robust representations for this challenging task, high-level visual features at multiple spatial scales must be extracted and augmented with contextual information. However, existing models aimed at explaining human fixation maps do not incorporate such a mechanism explicitly. Here we propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task. The architecture forms an encoder-decoder structure and includes a module with multiple convolutional layers at different dilation rates to capture multi-scale features in parallel. Moreover, we combine the resulting representations with global scene information for accurately predicting visual saliency. Our model achieves competitive and consistent results across multiple evaluation metrics on two public saliency benchmarks and we demonstrate the effectiveness of the suggested approach on five datasets and selected examples. Compared to state of the art approaches, the network is based on a lightweight image classification backbone and hence presents a suitable choice for applications with limited computational resources, such as (virtual) robotic systems, to estimate human fixations across complex natural scenes. Our TensorFlow implementation is openly available at https://github.com/alexanderkroner/saliency.
Collapse
Affiliation(s)
- Alexander Kroner
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands; Maastricht Brain Imaging Centre, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands.
| | - Mario Senden
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands; Maastricht Brain Imaging Centre, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands.
| | - Kurt Driessens
- Department of Data Science and Knowledge Engineering, Faculty of Science and Engineering, Maastricht University, Maastricht, The Netherlands.
| | - Rainer Goebel
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands; Maastricht Brain Imaging Centre, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands; Department of Neuroimaging and Neuromodeling, Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences (KNAW), Amsterdam, The Netherlands.
| |
Collapse
|
13
|
Berens P, Freeman J, Deneux T, Chenkov N, McColgan T, Speiser A, Macke JH, Turaga SC, Mineault P, Rupprecht P, Gerhard S, Friedrich RW, Friedrich J, Paninski L, Pachitariu M, Harris KD, Bolte B, Machado TA, Ringach D, Stone J, Rogerson LE, Sofroniew NJ, Reimer J, Froudarakis E, Euler T, Román Rosón M, Theis L, Tolias AS, Bethge M. Community-based benchmarking improves spike rate inference from two-photon calcium imaging data. PLoS Comput Biol 2018; 14:e1006157. [PMID: 29782491 PMCID: PMC5997358 DOI: 10.1371/journal.pcbi.1006157] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 06/12/2018] [Accepted: 04/24/2018] [Indexed: 11/25/2022] Open
Abstract
In recent years, two-photon calcium imaging has become a standard tool to probe the function of neural circuits and to study computations in neuronal populations. However, the acquired signal is only an indirect measurement of neural activity due to the comparatively slow dynamics of fluorescent calcium indicators. Different algorithms for estimating spike rates from noisy calcium measurements have been proposed in the past, but it is an open question how far performance can be improved. Here, we report the results of the spikefinder challenge, launched to catalyze the development of new spike rate inference algorithms through crowd-sourcing. We present ten of the submitted algorithms which show improved performance compared to previously evaluated methods. Interestingly, the top-performing algorithms are based on a wide range of principles from deep neural networks to generative models, yet provide highly correlated estimates of the neural activity. The competition shows that benchmark challenges can drive algorithmic developments in neuroscience.
Collapse
Affiliation(s)
- Philipp Berens
- Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany
- Center for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, University of Tübingen, Tübingen, Germany
| | - Jeremy Freeman
- Chan Zuckerberg Initiative, San Francisco, California, United States of America
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America
| | - Thomas Deneux
- Unit of Neuroscience Information and Complexity, Centre National de la Recherche Scientifique, Gif-sur-Yvette, France
| | - Nikolay Chenkov
- Bernstein Center for Computational Neuroscience and Institute for Theoretical Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Thomas McColgan
- Bernstein Center for Computational Neuroscience and Institute for Theoretical Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Artur Speiser
- Research Center Caesar, an associate of the Max Planck Society, Bonn, Germany
| | - Jakob H. Macke
- Research Center Caesar, an associate of the Max Planck Society, Bonn, Germany
- Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany
| | - Srinivas C. Turaga
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America
| | - Patrick Mineault
- Independent Researcher, San Francisco, California, United States of America
| | - Peter Rupprecht
- Friedrich Miescher Institute of Biomedical Research, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Stephan Gerhard
- Friedrich Miescher Institute of Biomedical Research, Basel, Switzerland
| | - Rainer W. Friedrich
- Friedrich Miescher Institute of Biomedical Research, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Johannes Friedrich
- Departments of Statistics and Neuroscience, Grossman Center for the Statistics of Mind, and Center for Theoretical Neuroscience, Columbia University, New York, New York, United States of America
| | - Liam Paninski
- Departments of Statistics and Neuroscience, Grossman Center for the Statistics of Mind, and Center for Theoretical Neuroscience, Columbia University, New York, New York, United States of America
| | - Marius Pachitariu
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America
- Institute of Neurology, University College, London, United Kingdom
| | | | - Ben Bolte
- Departments of Mathematics and Computer Science, Emory University, Atlanta, United States of America
| | - Timothy A. Machado
- Departments of Statistics and Neuroscience, Grossman Center for the Statistics of Mind, and Center for Theoretical Neuroscience, Columbia University, New York, New York, United States of America
| | - Dario Ringach
- Neurobiology and Psychology, Jules Stein Eye Institute, Biomedical Engineering Program, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| | - Jasmine Stone
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America
- Departement of Computer Science, Yale University, New Haven, Connecticut, United States of America
| | - Luke E. Rogerson
- Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany
- Center for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, University of Tübingen, Tübingen, Germany
| | - Nicolas J. Sofroniew
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, United States of America
| | - Jacob Reimer
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
| | - Emmanouil Froudarakis
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
| | - Thomas Euler
- Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany
- Center for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, University of Tübingen, Tübingen, Germany
| | - Miroslav Román Rosón
- Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany
- Center for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Division of Neurobiology, Department Biology II, LMU Munich, Munich, Germany
| | | | - Andreas S. Tolias
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Electrical and Computer Engineering, Rice University, Houston, Texas, United States of America
| | - Matthias Bethge
- Center for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
- Bernstein Center for Computational Neuroscience, University of Tübingen, Tübingen, Germany
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, Texas, United States of America
- Institute of Theoretical Physics, University of Tübingen, Tübingen, Germany
| |
Collapse
|