1
|
Sztuka IM, Kühn S. Blue skies: Does the visual composition of sky guide subjective judgments of naturalness in the environment? ENVIRONMENTAL RESEARCH 2024; 262:119845. [PMID: 39208970 DOI: 10.1016/j.envres.2024.119845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 08/22/2024] [Accepted: 08/24/2024] [Indexed: 09/04/2024]
Abstract
Expanding on previous findings, that highlighted the significance of sky in environmental perception, our analysis investigated whether the visual composition of the sky shapes perceptions of environmental naturalness. The study employed a novel, free-selection task in which participants viewed a series of environmental images with varying levels of natural and urban elements, as well as different sky visibility conditions, and were asked to identify "nature" within these images. The task procedure also involved subjective ratings of each scene. Using previously gathered data, we reassessed 105 participants' selection of the sky as "nature" across 96 photos of diverse outdoor scenes to understand which visuospatial features influence these perceptions. Utilizing the Boruta feature selection algorithm, we identified key characteristics-fractal dimensions, brightness, and entropy in brightness, hue, and saturation-that significantly predicted the selection of sky as "nature", irrespective of the environment type (urban or natural). Results indicated that lower fractal dimensions are preferred for sky selected as "nature", inversely affecting the naturalness judgment of scenes with the additional effect of brightness. These findings enhance our understanding of how visuospatial features influence environmental perception, offering implications for future research directions and theoretical advancements in understanding environmental perception.
Collapse
Affiliation(s)
| | - Simone Kühn
- Max Planck Institute for Human Development, Berlin, Germany; Department of Psychiatry and Psychotherapy, University Medical Center Hamburg-Eppendorf, Hamburg, Germany; Max Planck-UCL, Center for Computational Psychiatry and Ageing Research, Germany
| |
Collapse
|
2
|
Děchtěrenko F, Bainbridge WA, Lukavský J. Visual free recall and recognition in art students and laypeople. Mem Cognit 2024:10.3758/s13421-024-01607-7. [PMID: 39078592 DOI: 10.3758/s13421-024-01607-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/19/2024] [Indexed: 07/31/2024]
Abstract
Artists and laypeople differ in their ability to create drawings. Previous research has shown that artists have improved memory performance during drawing; however, it is unclear whether they have better visual memory after the drawing is finished. In this paper, we focused on the question of differences in visual memory between art students and the general population in two studies. In Study 1, both groups studied a set of images and later drew them in a surprise visual recall test. In Study 2, the drawings from Study 1 were evaluated by a different set of raters based on their drawing quality and similarity to the original image to link drawing evaluations with memory performance for both groups. We found that both groups showed comparable visual recognition memory performance; however, the artist group showed increased recall memory performance. Moreover, they produced drawings that were both better quality and more similar to the original image. Individually, participants whose drawings were rated as better showed higher recognition accuracy. Results from Study 2 also have practical implications for the usage of drawing as a tool for measuring free recall - the majority of the drawings were recognizable, and raters showed a high level of consistency during their evaluation of the drawings. Taken together, we found that artists have better visual recall memory than laypeople.
Collapse
Affiliation(s)
- Filip Děchtěrenko
- Institute of Psychology, Czech Academy of Sciences, Pod Vodárenskou věží 4, Prague, 18200, Czech Republic.
| | - Wilma A Bainbridge
- Department of Psychology, University of Chicago, 5848 S University Ave, Beecher Hall 303, Chicago, IL, 60637, USA
- Neuroscience Institute, University of Chicago, 5812 S Ellis Ave, Chicago, IL, 60637, USA
| | - Jiří Lukavský
- Institute of Psychology, Czech Academy of Sciences, Pod Vodárenskou věží 4, Prague, 18200, Czech Republic
| |
Collapse
|
3
|
Kallmayer A, Võ MLH. Anchor objects drive realism while diagnostic objects drive categorization in GAN generated scenes. COMMUNICATIONS PSYCHOLOGY 2024; 2:68. [PMID: 39242968 PMCID: PMC11332195 DOI: 10.1038/s44271-024-00119-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 07/15/2024] [Indexed: 09/09/2024]
Abstract
Our visual surroundings are highly complex. Despite this, we understand and navigate them effortlessly. This requires transforming incoming sensory information into representations that not only span low- to high-level visual features (e.g., edges, object parts, objects), but likely also reflect co-occurrence statistics of objects in real-world scenes. Here, so-called anchor objects are defined as being highly predictive of the location and identity of frequently co-occuring (usually smaller) objects, derived from object clustering statistics in real-world scenes, while so-called diagnostic objects are predictive of the larger semantic context (i.e., scene category). Across two studies (N1 = 50, N2 = 44), we investigate which of these properties underlie scene understanding across two dimensions - realism and categorisation - using scenes generated from Generative Adversarial Networks (GANs) which naturally vary along these dimensions. We show that anchor objects and mainly high-level features extracted from a range of pre-trained deep neural networks (DNNs) drove realism both at first glance and after initial processing. Categorisation performance was mainly determined by diagnostic objects, regardless of realism, at first glance and after initial processing. Our results are testament to the visual system's ability to pick up on reliable, category specific sources of information that are flexible towards disturbances across the visual feature-hierarchy.
Collapse
Affiliation(s)
- Aylin Kallmayer
- Goethe University Frankfurt, Department of Psychology, Frankfurt am Main, Germany.
| | - Melissa L-H Võ
- Goethe University Frankfurt, Department of Psychology, Frankfurt am Main, Germany
| |
Collapse
|
4
|
Stecher R, Kaiser D. Representations of imaginary scenes and their properties in cortical alpha activity. Sci Rep 2024; 14:12796. [PMID: 38834699 DOI: 10.1038/s41598-024-63320-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 05/28/2024] [Indexed: 06/06/2024] Open
Abstract
Imagining natural scenes enables us to engage with a myriad of simulated environments. How do our brains generate such complex mental images? Recent research suggests that cortical alpha activity carries information about individual objects during visual imagery. However, it remains unclear if more complex imagined contents such as natural scenes are similarly represented in alpha activity. Here, we answer this question by decoding the contents of imagined scenes from rhythmic cortical activity patterns. In an EEG experiment, participants imagined natural scenes based on detailed written descriptions, which conveyed four complementary scene properties: openness, naturalness, clutter level and brightness. By conducting classification analyses on EEG power patterns across neural frequencies, we were able to decode both individual imagined scenes as well as their properties from the alpha band, showing that also the contents of complex visual images are represented in alpha rhythms. A cross-classification analysis between alpha power patterns during the imagery task and during a perception task, in which participants were presented images of the described scenes, showed that scene representations in the alpha band are partly shared between imagery and late stages of perception. This suggests that alpha activity mediates the top-down re-activation of scene-related visual contents during imagery.
Collapse
Affiliation(s)
- Rico Stecher
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, 35392, Gießen, Germany.
| | - Daniel Kaiser
- Mathematical Institute, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Gießen, 35392, Gießen, Germany
- Center for Mind, Brain and Behavior (CMBB), Philipps-University Marburg and Justus Liebig University Gießen, 35032, Marburg, Germany
| |
Collapse
|
5
|
Lande KJ. Compositionality in perception: A framework. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2024:e1691. [PMID: 38807187 DOI: 10.1002/wcs.1691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 05/07/2024] [Accepted: 05/08/2024] [Indexed: 05/30/2024]
Abstract
Perception involves the processing of content or information about the world. In what form is this content represented? I argue that perception is widely compositional. The perceptual system represents many stimulus features (including shape, orientation, and motion) in terms of combinations of other features (such as shape parts, slant and tilt, common and residual motion vectors). But compositionality can take a variety of forms. The ways in which perceptual representations compose are markedly different from the ways in which sentences or thoughts are thought to be composed. I suggest that the thesis that perception is compositional is not itself a concrete hypothesis with specific predictions; rather it affords a productive framework for developing and evaluating specific empirical hypotheses about the form and content of perceptual representations. The question is not just whether perception is compositional, but how. Answering this latter question can provide fundamental insights into perception. This article is categorized under: Philosophy > Representation Philosophy > Foundations of Cognitive Science Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Kevin J Lande
- Department of Philosophy and Centre for Vision Research, York University, Toronto, Canada
| |
Collapse
|
6
|
Shao Z, Beck DM. Is attention necessary for the representational advantage of good exemplars over bad exemplars? Eur J Neurosci 2024; 59:2353-2372. [PMID: 38403361 DOI: 10.1111/ejn.16291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 01/03/2024] [Accepted: 02/04/2024] [Indexed: 02/27/2024]
Abstract
Real-world (rw-) statistical regularities, or expectations about the visual world learned over a lifetime, have been found to be associated with scene perception efficiency. For example, good (i.e., highly representative) exemplars of basic scene categories, one example of an rw-statistical regularity, are detected more readily than bad exemplars of the category. Similarly, good exemplars achieve higher multivariate pattern analysis (MVPA) classification accuracy than bad exemplars in scene-responsive regions of interest, particularly in the parahippocampal place area (PPA). However, it is unclear whether the good exemplar advantages observed depend on or are even confounded by selective attention. Here, we ask whether the observed neural advantage of the good scene exemplars requires full attention. We used a dual-task paradigm to manipulate attention and exemplar representativeness while recording neural responses with functional magnetic resonance imaging (fMRI). Both univariate analysis and MVPA were adopted to examine the effect of representativeness. In the attend-to-scenes condition, our results replicated an earlier study showing that good exemplars evoke less activity but a clearer category representation than bad exemplars. Importantly, similar advantages of the good exemplars were also observed when participants were distracted by a serial visual search task demanding a high attention load. In addition, cross-decoding between attended and distracted representations revealed that attention resulted in a quantitative (increased activation) rather than qualitative (altered activity patterns) improvement of the category representation, particularly for good exemplars. We, therefore, conclude that the effect of category representativeness on neural representations does not require full attention.
Collapse
Affiliation(s)
- Zhenan Shao
- Department of Psychology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Beckman Institute, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Diane M Beck
- Department of Psychology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Beckman Institute, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
7
|
McMullin MA, Kumar R, Higgins NC, Gygi B, Elhilali M, Snyder JS. Preliminary Evidence for Global Properties in Human Listeners During Natural Auditory Scene Perception. Open Mind (Camb) 2024; 8:333-365. [PMID: 38571530 PMCID: PMC10990578 DOI: 10.1162/opmi_a_00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 02/10/2024] [Indexed: 04/05/2024] Open
Abstract
Theories of auditory and visual scene analysis suggest the perception of scenes relies on the identification and segregation of objects within it, resembling a detail-oriented processing style. However, a more global process may occur while analyzing scenes, which has been evidenced in the visual domain. It is our understanding that a similar line of research has not been explored in the auditory domain; therefore, we evaluated the contributions of high-level global and low-level acoustic information to auditory scene perception. An additional aim was to increase the field's ecological validity by using and making available a new collection of high-quality auditory scenes. Participants rated scenes on 8 global properties (e.g., open vs. enclosed) and an acoustic analysis evaluated which low-level features predicted the ratings. We submitted the acoustic measures and average ratings of the global properties to separate exploratory factor analyses (EFAs). The EFA of the acoustic measures revealed a seven-factor structure explaining 57% of the variance in the data, while the EFA of the global property measures revealed a two-factor structure explaining 64% of the variance in the data. Regression analyses revealed each global property was predicted by at least one acoustic variable (R2 = 0.33-0.87). These findings were extended using deep neural network models where we examined correlations between human ratings of global properties and deep embeddings of two computational models: an object-based model and a scene-based model. The results support that participants' ratings are more strongly explained by a global analysis of the scene setting, though the relationship between scene perception and auditory perception is multifaceted, with differing correlation patterns evident between the two models. Taken together, our results provide evidence for the ability to perceive auditory scenes from a global perspective. Some of the acoustic measures predicted ratings of global scene perception, suggesting representations of auditory objects may be transformed through many stages of processing in the ventral auditory stream, similar to what has been proposed in the ventral visual stream. These findings and the open availability of our scene collection will make future studies on perception, attention, and memory for natural auditory scenes possible.
Collapse
Affiliation(s)
| | - Rohit Kumar
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Nathan C. Higgins
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, FL, USA
| | - Brian Gygi
- East Bay Institute for Research and Education, Martinez, CA, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, NV, USA
| |
Collapse
|
8
|
Son G, Walther DB, Mack ML. Brief category learning distorts perceptual space for complex scenes. Psychon Bull Rev 2024:10.3758/s13423-024-02484-6. [PMID: 38438711 DOI: 10.3758/s13423-024-02484-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2024] [Indexed: 03/06/2024]
Abstract
The formation of categories is known to distort perceptual space: representations are pushed away from category boundaries and pulled toward categorical prototypes. This phenomenon has been studied with artificially constructed objects, whose feature dimensions are easily defined and manipulated. How such category-induced perceptual distortions arise for complex, real-world scenes, however, remains largely unknown due to the technical challenge of measuring and controlling scene features. We address this question by generating realistic scene images from a high-dimensional continuous space using generative adversarial networks and using the images as stimuli in a novel learning task. Participants learned to categorize the scene images along arbitrary category boundaries and later reconstructed the same scenes from memory. Systematic biases in reconstruction errors closely tracked each participant's subjective category boundaries. These findings suggest that the perception of global scene properties is warped to align with a newly learned category structure after only a brief learning experience.
Collapse
Affiliation(s)
- Gaeun Son
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada.
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
| | - Michael L Mack
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
9
|
Westebbe L, Liang Y, Blaser E. The Accuracy and Precision of Memory for Natural Scenes: A Walk in the Park. Open Mind (Camb) 2024; 8:131-147. [PMID: 38435706 PMCID: PMC10898787 DOI: 10.1162/opmi_a_00122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 01/17/2024] [Indexed: 03/05/2024] Open
Abstract
It is challenging to quantify the accuracy and precision of scene memory because it is unclear what 'space' scenes occupy (how can we quantify error when misremembering a natural scene?). To address this, we exploited the ecologically valid, metric space in which scenes occur and are represented: routes. In a delayed estimation task, participants briefly saw a target scene drawn from a video of an outdoor 'route loop', then used a continuous report wheel of the route to pinpoint the scene. Accuracy was high and unbiased, indicating there was no net boundary extension/contraction. Interestingly, precision was higher for routes that were more self-similar (as characterized by the half-life, in meters, of a route's Multiscale Structural Similarity index), consistent with previous work finding a 'similarity advantage' where memory precision is regulated according to task demands. Overall, scenes were remembered to within a few meters of their actual location.
Collapse
Affiliation(s)
- Leo Westebbe
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| | - Yibiao Liang
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| | - Erik Blaser
- Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
| |
Collapse
|
10
|
Mikhailova A, Lightfoot S, Santos-Victor J, Coco MI. Differential effects of intrinsic properties of natural scenes and interference mechanisms on recognition processes in long-term visual memory. Cogn Process 2024; 25:173-187. [PMID: 37831320 DOI: 10.1007/s10339-023-01164-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 09/20/2023] [Indexed: 10/14/2023]
Abstract
Humans display remarkable long-term visual memory (LTVM) processes. Even though images may be intrinsically memorable, the fidelity of their visual representations, and consequently the likelihood of successfully retrieving them, hinges on their similarity when concurrently held in LTVM. In this debate, it is still unclear whether intrinsic features of images (perceptual and semantic) may be mediated by mechanisms of interference generated at encoding, or during retrieval, and how these factors impinge on recognition processes. In the current study, participants (32) studied a stream of 120 natural scenes from 8 semantic categories, which varied in frequencies (4, 8, 16 or 32 exemplars per category) to generate different levels of category interference, in preparation for a recognition test. Then they were asked to indicate which of two images, presented side by side (i.e. two-alternative forced-choice), they remembered. The two images belonged to the same semantic category but varied in their perceptual similarity (similar or dissimilar). Participants also expressed their confidence (sure/not sure) about their recognition response, enabling us to tap into their metacognitive efficacy (meta-d'). Additionally, we extracted the activation of perceptual and semantic features in images (i.e. their informational richness) through deep neural network modelling and examined their impact on recognition processes. Corroborating previous literature, we found that category interference and perceptual similarity negatively impact recognition processes, as well as response times and metacognitive efficacy. Moreover, images semantically rich were less likely remembered, an effect that trumped a positive memorability boost coming from perceptual information. Critically, we did not observe any significant interaction between intrinsic features of images and interference generated either at encoding or during retrieval. All in all, our study calls for a more integrative understanding of the representational dynamics during encoding and recognition enabling us to form, maintain and access visual information.
Collapse
Affiliation(s)
- Anastasiia Mikhailova
- Institute for Systems and Robotics, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal.
| | | | - José Santos-Victor
- Institute for Systems and Robotics, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Moreno I Coco
- Sapienza, University of Rome, Rome, Italy.
- I.R.C.C.S. Santa Lucia, Fondazione Santa Lucia, Roma, Italy.
| |
Collapse
|
11
|
Lee J, Park S. Multi-modal Representation of the Size of Space in the Human Brain. J Cogn Neurosci 2024; 36:340-361. [PMID: 38010320 DOI: 10.1162/jocn_a_02092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
To estimate the size of an indoor space, we must analyze the visual boundaries that limit the spatial extent and acoustic cues from reflected interior surfaces. We used fMRI to examine how the brain processes the geometric size of indoor scenes when various types of sensory cues are presented individually or together. Specifically, we asked whether the size of space is represented in a modality-specific way or in an integrative way that combines multimodal cues. In a block-design study, images or sounds that depict small- and large-sized indoor spaces were presented. Visual stimuli were real-world pictures of empty spaces that were small or large. Auditory stimuli were sounds convolved with different reverberations. By using a multivoxel pattern classifier, we asked whether the two sizes of space can be classified in visual, auditory, and visual-auditory combined conditions. We identified both sensory-specific and multimodal representations of the size of space. To further investigate the nature of the multimodal region, we specifically examined whether it contained multimodal information in a coexistent or integrated form. We found that angular gyrus and the right medial frontal gyrus had modality-integrated representation, displaying sensitivity to the match in the spatial size information conveyed through image and sound. Background functional connectivity analysis further demonstrated that the connection between sensory-specific regions and modality-integrated regions increases in the multimodal condition compared with single modality conditions. Our results suggest that spatial size perception relies on both sensory-specific and multimodal representations, as well as their interplay during multimodal perception.
Collapse
|
12
|
Park J, Josephs E, Konkle T. Systematic transition from boundary extension to contraction along an object-to-scene continuum. J Vis 2024; 24:9. [PMID: 38252521 PMCID: PMC10810016 DOI: 10.1167/jov.24.1.9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 11/23/2023] [Indexed: 01/24/2024] Open
Abstract
After viewing a picture of an environment, our memory of it typically extends beyond what was presented, a phenomenon referred to as boundary extension. But, sometimes memory errors show the opposite pattern-boundary contraction-and the relationship between these phenomena is controversial. We constructed virtual three-dimensional environments and created a series of views at different distances, from object close-ups to wide-angle indoor views, and tested for memory errors along this object-to-scene continuum. Boundary extension was evident for close-scale views and transitioned parametrically to boundary contraction for far-scale views. However, this transition point was not tied to a specific position in the environment (e.g., the point of reachability). Instead, it tracked with judgments of the best-looking view of the environment, in both rich-object and low-object environments. We offer a dynamic-tension account, where competition between object-based and scene-based affordances determines whether a view will extend or contract in memory. This study demonstrates that boundary extension and boundary contraction are not two separate phenomena but rather two parts of a continuum, suggesting a common underlying mechanism. The transition point between the two is not fixed but depends on the observer's judgment of the best-looking view of the environment. These findings provide new insights into how we perceive and remember a view of environment.
Collapse
Affiliation(s)
- Jeongho Park
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Emilie Josephs
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Talia Konkle
- Department of Psychology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
13
|
Biassoni F, Gandola M, Gnerre M. Grounding the Restorative Effect of the Environment in Tertiary Qualities: An Integration of Embodied and Phenomenological Perspectives. J Intell 2023; 11:208. [PMID: 37998707 PMCID: PMC10672635 DOI: 10.3390/jintelligence11110208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/18/2023] [Accepted: 10/25/2023] [Indexed: 11/25/2023] Open
Abstract
This paper proposes an integration of embodied and phenomenological perspectives to understand the restorative capacity of natural environments. It emphasizes the role of embodied simulation mechanisms in evoking positive affects and cognitive functioning. Perceptual symbols play a crucial role in generating the restorative potential in environments, highlighting the significance of the encounter between the embodied individual and the environment. This study reviews Stress Reduction Theory (SRT) and Attention Restoration Theory (ART), finding commonalities in perceptual fluency and connectedness to nature. It also explores a potential model based on physiognomic perception, where the environment's pervasive qualities elicit an affective response. Restorativeness arises from a direct encounter between the environment's phenomenal structure and the embodied perceptual processes of individuals. Overall, this integrative approach sheds light on the intrinsic affective value of environmental elements and their influence on human well-being.
Collapse
Affiliation(s)
- Federica Biassoni
- Traffic Psychology Research Unit, Department of Psychology, Università Cattolica del Sacro Cuore, 20123 Milano, Italy (M.G.)
- Research Center in Communication Psychology, Università Cattolica del Sacro Cuore, 20123 Milan, Italy
| | - Michela Gandola
- Traffic Psychology Research Unit, Department of Psychology, Università Cattolica del Sacro Cuore, 20123 Milano, Italy (M.G.)
| | - Martina Gnerre
- Traffic Psychology Research Unit, Department of Psychology, Università Cattolica del Sacro Cuore, 20123 Milano, Italy (M.G.)
| |
Collapse
|
14
|
Cavanagh P, Caplovitz GP, Lytchenko TK, Maechler MR, Tse PU, Sheinberg DL. The Architecture of Object-Based Attention. Psychon Bull Rev 2023; 30:1643-1667. [PMID: 37081283 DOI: 10.3758/s13423-023-02281-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/22/2023] [Indexed: 04/22/2023]
Abstract
The allocation of attention to objects raises several intriguing questions: What are objects, how does attention access them, what anatomical regions are involved? Here, we review recent progress in the field to determine the mechanisms underlying object-based attention. First, findings from unconscious priming and cueing suggest that the preattentive targets of object-based attention can be fully developed object representations that have reached the level of identity. Next, the control of object-based attention appears to come from ventral visual areas specialized in object analysis that project downward to early visual areas. How feedback from object areas can accurately target the object's specific locations and features is unknown but recent work in autoencoding has made this plausible. Finally, we suggest that the three classic modes of attention may not be as independent as is commonly considered, and instead could all rely on object-based attention. Specifically, studies show that attention can be allocated to the separated members of a group-without affecting the space between them-matching the defining property of feature-based attention. At the same time, object-based attention directed to a single small item has the properties of space-based attention. We outline the architecture of object-based attention, the novel predictions it brings, and discuss how it works in parallel with other attention pathways.
Collapse
Affiliation(s)
- Patrick Cavanagh
- Department of Psychology, Glendon College, 2275 Bayview Avenue, North York, ON, M4N 3M6, Canada.
- CVR, York University, Toronto, ON, Canada.
| | | | | | | | | | - David L Sheinberg
- Department of Neuroscience, Brown University, Providence, RI, USA
- Carney Institute for Brain Science, Brown University, Providence, RI, USA
| |
Collapse
|
15
|
Lee J, Park S. Multi-modal representation of the size of space in the human brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.24.550343. [PMID: 37546991 PMCID: PMC10402083 DOI: 10.1101/2023.07.24.550343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
To estimate the size of an indoor space, we must analyze the visual boundaries that limit the spatial extent and acoustic cues from reflected interior surfaces. We used fMRI to examine how the brain processes geometric size of indoor scenes when various types of sensory cues are presented individually or together. Specifically, we asked whether the size of space is represented in a modality-specific way or in an integrative way that combines multimodal cues. In a block-design study, images or sounds that depict small and large sized indoor spaces were presented. Visual stimuli were real-world pictures of empty spaces that were small or large. Auditory stimuli were sounds convolved with different reverberation. By using a multi-voxel pattern classifier, we asked whether the two sizes of space can be classified in visual, auditory, and visual-auditory combined conditions. We identified both sensory specific and multimodal representations of the size of space. To further investigate the nature of the multimodal region, we specifically examined whether it contained multimodal information in a coexistent or integrated form. We found that AG and the right IFG pars opercularis had modality-integrated representation, displaying sensitivity to the match in the spatial size information conveyed through image and sound. Background functional connectivity analysis further demonstrated that the connection between sensory specific regions and modality-integrated regions increase in the multimodal condition compared to single modality conditions. Our results suggest that the spatial size perception relies on both sensory specific and multimodal representations, as well as their interplay during multimodal perception.
Collapse
Affiliation(s)
- Jaeeun Lee
- Department of Psychology, University of Minnesota, Minneapolis, MN
| | - Soojin Park
- Department of Psychology, Yonsei University, Seoul, South Korea
| |
Collapse
|
16
|
Josephs EL, Hebart MN, Konkle T. Dimensions underlying human understanding of the reachable world. Cognition 2023; 234:105368. [PMID: 36641868 DOI: 10.1016/j.cognition.2023.105368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 12/20/2022] [Accepted: 01/09/2023] [Indexed: 01/15/2023]
Abstract
Near-scale environments, like work desks, restaurant place settings or lab benches, are the interface of our hand-based interactions with the world. How are our conceptual representations of these environments organized? What properties distinguish among reachspaces, and why? We obtained 1.25 million similarity judgments on 990 reachspace images, and generated a 30-dimensional embedding which accurately predicts these judgments. Examination of the embedding dimensions revealed key properties underlying these judgments, such as reachspace layout, affordance, and visual appearance. Clustering performed over the embedding revealed four distinct interpretable classes of reachspaces, distinguishing among spaces related to food, electronics, analog activities, and storage or display. Finally, we found that reachspace similarity ratings were better predicted by the function of the spaces than their locations, suggesting that reachspaces are largely conceptualized in terms of the actions they support. Altogether, these results reveal the behaviorally-relevant principles that structure our internal representations of reach-relevant environments.
Collapse
Affiliation(s)
- Emilie L Josephs
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA; Psychology Department, Harvard University, Cambridge, USA.
| | - Martin N Hebart
- Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
| | - Talia Konkle
- Psychology Department, Harvard University, Cambridge, USA.
| |
Collapse
|
17
|
Wiesmann SL, Võ MLH. Disentangling diagnostic object properties for human scene categorization. Sci Rep 2023; 13:5912. [PMID: 37041222 PMCID: PMC10090043 DOI: 10.1038/s41598-023-32385-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 03/27/2023] [Indexed: 04/13/2023] Open
Abstract
It usually only takes a single glance to categorize our environment into different scene categories (e.g. a kitchen or a highway). Object information has been suggested to play a crucial role in this process, and some proposals even claim that the recognition of a single object can be sufficient to categorize the scene around it. Here, we tested this claim in four behavioural experiments by having participants categorize real-world scene photographs that were reduced to a single, cut-out object. We show that single objects can indeed be sufficient for correct scene categorization and that scene category information can be extracted within 50 ms of object presentation. Furthermore, we identified object frequency and specificity for the target scene category as the most important object properties for human scene categorization. Interestingly, despite the statistical definition of specificity and frequency, human ratings of these properties were better predictors of scene categorization behaviour than more objective statistics derived from databases of labelled real-world images. Taken together, our findings support a central role of object information during human scene categorization, showing that single objects can be indicative of a scene category if they are assumed to frequently and exclusively occur in a certain environment.
Collapse
Affiliation(s)
- Sandro L Wiesmann
- Department of Psychology, Johann Wolfgang Goethe-Universität, Theodor-W.-Adorno-Platz 6, 60323, Frankfurt Am Main, Germany.
| | - Melissa L-H Võ
- Department of Psychology, Johann Wolfgang Goethe-Universität, Theodor-W.-Adorno-Platz 6, 60323, Frankfurt Am Main, Germany
| |
Collapse
|
18
|
Raat EM, Evans KK. Early signs of cancer present in the fine detail of mammograms. PLoS One 2023; 18:e0282872. [PMID: 37018164 PMCID: PMC10075467 DOI: 10.1371/journal.pone.0282872] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 02/25/2023] [Indexed: 04/06/2023] Open
Abstract
The gist of abnormality can be rapidly extracted by medical experts from global information in medical images, such as mammograms, to identify abnormal mammograms with above-chance accuracy-even before any abnormalities are localizable. The current study evaluated the effect of different high-pass filters on expert radiologists' performance in detecting the gist of abnormality in mammograms, especially those acquired prior to any visibly actionable lesions. Thirty-four expert radiologists viewed unaltered and high-pass filtered versions of normal and abnormal mammograms. Abnormal mammograms consisted of obvious abnormalities, subtle abnormalities, and currently normal mammograms from women who would go to develop cancer in 2-3 years. Four levels of high-pass filtering were tested (0.5, 1, 1.5, and 2 cycles per degree (cpd) after brightening and contrast normalizing to the unfiltered mammograms. Overall performance for 0.5 and 1.5 did not change compared to unfiltered but was reduced for 1 and 2 cpd. Critically, filtering that eliminated frequencies below 0.5 and 1.5 cpd significantly boosted performance on mammograms acquired years prior appearance of localizable abnormalities. Filtering at 0.5 did not change the radiologist's decision criteria compared to unfiltered mammograms whereas other filters resulted in more conservative ratings. The findings bring us closer to identifying the characteristics of the gist of the abnormal that affords radiologists detection of the earliest signs of cancer. A 0.5 cpd high-pass filter significantly boosts subtle, global signals of future cancerous abnormalities, potentially providing an image enhancement strategy for rapid assessment of impending cancer risk.
Collapse
Affiliation(s)
- Emma M. Raat
- Department of Psychology, University of York, York, United Kingdom
| | - Karla K. Evans
- Department of Psychology, University of York, York, United Kingdom
| |
Collapse
|
19
|
DiGirolamo GJ, DiDominica M, Qadri MAJ, Kellman PJ, Krasne S, Massey C, Rosen MP. Multiple expressions of "expert" abnormality gist in novices following perceptual learning. Cogn Res Princ Implic 2023; 8:10. [PMID: 36723822 PMCID: PMC9892374 DOI: 10.1186/s41235-023-00462-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 01/07/2023] [Indexed: 02/02/2023] Open
Abstract
With a brief half-second presentation, a medical expert can determine at above chance levels whether a medical scan she sees is abnormal based on a first impression arising from an initial global image process, termed "gist." The nature of gist processing is debated but this debate stems from results in medical experts who have years of perceptual experience. The aim of the present study was to determine if gist processing for medical images occurs in naïve (non-medically trained) participants who received a brief perceptual training and to tease apart the nature of that gist signal. We trained 20 naïve participants on a brief perceptual-adaptive training of histology images. After training, naïve observers were able to obtain abnormality detection and abnormality categorization above chance, from a brief 500 ms masked presentation of a histology image, hence showing "gist." The global signal demonstrated in perceptually trained naïve participants demonstrated multiple dissociable components, with some of these components relating to how rapidly naïve participants learned a normal template during perceptual learning. We suggest that multiple gist signals are present when experts view medical images derived from the tens of thousands of images that they are exposed to throughout their training and careers. We also suggest that a directed learning of a normal template may produce better abnormality detection and identification in radiologists and pathologists.
Collapse
Affiliation(s)
- Gregory J. DiGirolamo
- grid.254514.30000 0001 2174 1885Department of Psychology, College of the Holy Cross, 1 College Street, Worcester, MA 01610 USA ,grid.168645.80000 0001 0742 0364Department of Radiology, University of Massachusetts, Chan Medical School, Worcester, MA USA ,grid.168645.80000 0001 0742 0364Department of Psychiatry, University of Massachusetts, Chan Medical School, Worcester, MA USA
| | - Megan DiDominica
- grid.254514.30000 0001 2174 1885Department of Psychology, College of the Holy Cross, 1 College Street, Worcester, MA 01610 USA
| | - Muhammad A. J. Qadri
- grid.254514.30000 0001 2174 1885Department of Psychology, College of the Holy Cross, 1 College Street, Worcester, MA 01610 USA
| | - Philip J. Kellman
- grid.19006.3e0000 0000 9632 6718Department of Psychology, UCLA, Los Angeles, CA USA ,grid.19006.3e0000 0000 9632 6718Department of Physiology, David Geffen School of Medicine, UCLA, Los Angeles, CA USA
| | - Sally Krasne
- grid.19006.3e0000 0000 9632 6718Department of Surgery, David Geffen School of Medicine, UCLA, Los Angeles, CA USA
| | - Christine Massey
- grid.19006.3e0000 0000 9632 6718Department of Psychology, UCLA, Los Angeles, CA USA
| | - Max P. Rosen
- grid.168645.80000 0001 0742 0364Department of Radiology, University of Massachusetts, Chan Medical School, Worcester, MA USA
| |
Collapse
|
20
|
Park HB, Azer L, Ahn S, Dinh TD, Macias G, Zhang G, Chen BB, Ma H, Botejue M, Choi EH, Zhang W. Contributions of global and local processing on medical image perception. J Med Imaging (Bellingham) 2023; 10:S11911. [PMID: 37168693 PMCID: PMC10166588 DOI: 10.1117/1.jmi.10.s1.s11911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 04/18/2023] [Accepted: 04/24/2023] [Indexed: 05/13/2023] Open
Abstract
Purpose The influential holistic processing hypothesis attributes expertise in medical image perception to cognitive processing of global gist information. However, it has remained unclear whether or how experts use rapid global impression of images for their subsequent diagnostic decisions based on the focal sign of cancer. We hypothesized that continuous-global and discrete-local processes jointly attribute to radiological experts' detection of mammogram, with different weights and temporal dynamics. Approach We examined experienced versus inexperienced observers' performance at first (500 ms) versus second (2500 ms) mammogram image presentation in an abnormality detection task. We applied a dual-trace signal detection (DTSD) model of receiver operating characteristic (ROC) to assess the time-varying contributions of global and focal cancer signals on mammogram reading and medical expertise. Results The hierarchical Bayesian DTSD modeling of empirical ROCs revealed that mammogram expertise (experienced versus inexperienced observers) manifests largely in a continuous-global component for the detection of the gist of abnormality at the early phase of mammogram reading. For the second presentation of the same mammogram images, the experienced participants showed increased task performance that was largely driven by better processing of discrete-local information, whereas the global processing of abnormality remained saturated from the first exposure. Modeling of the mouse trajectory of the confidence rating responses further revealed the temporal dynamics of global and focal processing. Conclusions These results suggest a joint contribution of continuous-global and discrete-local processes on medical expertise, and these processes could be analytically dissociated.
Collapse
Affiliation(s)
- Hyung-Bum Park
- The University of Chicago, Institute for Mind and Biology, Chicago, Illinois, United States
- University of California, Riverside, Department of Psychology, Riverside, California, United States
| | - Lilian Azer
- The University of Chicago, Institute for Mind and Biology, Chicago, Illinois, United States
| | - Shinhae Ahn
- Washington University in St. Louis, Department of Psychological and Brain Sciences, St. Louis, Missouri, United States
| | - Tam-Dan Dinh
- The University of Chicago, Institute for Mind and Biology, Chicago, Illinois, United States
| | - Gabriela Macias
- The University of Chicago, Institute for Mind and Biology, Chicago, Illinois, United States
| | - Gavin Zhang
- University of California, Berkeley, Department of Computer Science, Berkeley, California, United States
| | - Bihong Beth Chen
- City of Hope National Medical Center, Department of Diagnostic Radiology, Duarte, California, United States
| | - Huiyan Ma
- City of Hope National Medical Center, Beckman Research Institute, Department of Population Sciences, Duarte, California, United States
| | - Mahesh Botejue
- University of California Riverside School of Medicine, Riverside Community Hospital, Internal Medicine, Riverside, California, United States
| | - Eric H. Choi
- University of California Riverside School of Medicine, Riverside Community Hospital, Internal Medicine, Riverside, California, United States
| | - Weiwei Zhang
- The University of Chicago, Institute for Mind and Biology, Chicago, Illinois, United States
| |
Collapse
|
21
|
Odic D, Oppenheimer DM. Visual numerosity perception shows no advantage in real-world scenes compared to artificial displays. Cognition 2023; 230:105291. [PMID: 36183630 DOI: 10.1016/j.cognition.2022.105291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 09/15/2022] [Accepted: 09/23/2022] [Indexed: 10/14/2022]
Abstract
While the human visual system is sensitive to numerosity, the mechanisms that allow perception to extract and represent the number of objects in a scene remains unknown. Prominent theoretical approaches posit that numerosity perception emerges from passive experience with visual scenes throughout development, and that unsupervised deep neural network models mirror all characteristic behavioral features observed in participants. Here, we derive and test a novel prediction: if the visual number sense emerges from exposure to real-world scenes, then the closer a stimulus aligns with the natural statistics of the real world, the better number perception should be. But - in contrast to this prediction - we observe no such advantage (and sometimes even a notable impairment) in number perception for natural scenes compared to artificial dot displays in college-aged adults. These findings are not accounted for by the difficulty in object identification, visual clutter, the parsability of objects from the rest of the scene, or increased occlusion. This pattern of results represents a fundamental challenge to recent models of numerosity perception based in experiential learning of statistical regularities, and instead suggests that the visual number sense is attuned to abstract number of objects, independent of their underlying correlation with non-numeric features. We discuss our results in the context of recent proposals that suggest that object complexity and entropy may play a role in number perception.
Collapse
|
22
|
Anderson MD, Elder JH, Graf EW, Adams WJ. The time-course of real-world scene perception: Spatial and semantic processing. iScience 2022; 25:105633. [PMID: 36505927 PMCID: PMC9732406 DOI: 10.1016/j.isci.2022.105633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 09/16/2022] [Accepted: 11/16/2022] [Indexed: 11/21/2022] Open
Abstract
Real-world scene perception unfolds remarkably quickly, yet the underlying visual processes are poorly understood. Space-centered theory maintains that a scene's spatial structure (e.g., openness, mean depth) can be rapidly recovered from low-level image statistics. In turn, the statistical relationship between a scene's spatial properties and semantic content allows for semantic identity to be inferred from its layout. We tested this theory by investigating (1) the temporal dynamics of spatial and semantic perception in real-world scenes, and (2) dependencies between spatial and semantic judgments. Participants viewed backward-masked images for 13.3 to 106.7 ms, and identified the semantic (e.g., beach, road) or spatial structure (e.g., open, closed-off) category. We found no temporal precedence of spatial discrimination relative to semantic discrimination. Computational analyses further suggest that, instead of using spatial layout to infer semantic categories, humans exploit semantic information to discriminate spatial structure categories. These findings challenge traditional 'bottom-up' views of scene perception.
Collapse
Affiliation(s)
- Matt D. Anderson
- Centre for Perception and Cognition, Psychology, University of Southampton, Southampton, UK
| | - James H. Elder
- Centre for Vision Research, Department of Psychology, Department of Electrical Engineering and Computer Science, York University, Toronto, Canada
| | - Erich W. Graf
- Centre for Perception and Cognition, Psychology, University of Southampton, Southampton, UK
| | - Wendy J. Adams
- Centre for Perception and Cognition, Psychology, University of Southampton, Southampton, UK
| |
Collapse
|
23
|
Parmentier FBR, Gallego L, Micucci A, Leiva A, Andrés P, Maybery MT. Distraction by deviant sounds is modulated by the environmental context. Sci Rep 2022; 12:21447. [PMID: 36509791 PMCID: PMC9744899 DOI: 10.1038/s41598-022-25500-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 11/30/2022] [Indexed: 12/14/2022] Open
Abstract
Evidence shows that participants performing a continuous visual categorization task respond slower following the presentation of a task-irrelevant sound deviating from an otherwise repetitive or predictable auditory context (deviant sound among standard sounds). Here, for the first time, we explored the role of the environmental context (instrumentalized as a task-irrelevant background picture) in this effect. In two experiments, participants categorized left/right arrows while ignoring irrelevant sounds and background pictures of forest and city scenes. While equiprobable across the task, sounds A and B were presented with probabilities of .882 and .118 in the forest context, respectively, and with the reversed probabilities in the city context. Hence, neither sound constituted a deviant sound at task-level, but each did within a specific context. In Experiment 1, where each environmental context (forest and city scene) consisted of a single picture each, participants were significantly slower in the visual task following the presentation of the sound that was unexpected within the current context (context-dependent distraction). Further analysis showed that the cognitive system reset its sensory predictions even for the first trial of a change in environmental context. In Experiment 2, the two contexts (forest and city) were implemented using sets of 32 pictures each, with the background picture changing on every trial. Here too, context-dependent deviance distraction was observed. However, participants took a trial to fully reset their sensory predictions upon a change in context. We conclude that irrelevant sounds are incidentally processed in association with the environmental context (even though these stimuli belong to different sensory modalities) and that sensory predictions are context-dependent.
Collapse
Affiliation(s)
- Fabrice B. R. Parmentier
- grid.9563.90000 0001 1940 4767Department of Psychology and Research Institute of Health Sciences, University of the Balearic Islands, Ctra. De Valldemossa, Km 7.5, Palma de Mallorca, Balearic Islands Spain ,grid.1012.20000 0004 1936 7910School of Psychological Science, University of Western Australia, Perth, Australia
| | - Laura Gallego
- grid.9563.90000 0001 1940 4767Department of Psychology and Research Institute of Health Sciences, University of the Balearic Islands, Ctra. De Valldemossa, Km 7.5, Palma de Mallorca, Balearic Islands Spain
| | - Antonia Micucci
- grid.6292.f0000 0004 1757 1758Department of Psychology, University of Bologna, Bologna, Italy
| | - Alicia Leiva
- grid.440820.aDepartment of Psychology, Universitat de Vic-Universitat Central de Catalunya, Vic, Spain
| | - Pilar Andrés
- grid.9563.90000 0001 1940 4767Department of Psychology and Research Institute of Health Sciences, University of the Balearic Islands, Ctra. De Valldemossa, Km 7.5, Palma de Mallorca, Balearic Islands Spain
| | - Murray T. Maybery
- grid.1012.20000 0004 1936 7910School of Psychological Science, University of Western Australia, Perth, Australia
| |
Collapse
|
24
|
Aminoff EM, Durham T. Scene-selective brain regions respond to embedded objects of a scene. Cereb Cortex 2022; 33:5066-5074. [PMID: 36305640 DOI: 10.1093/cercor/bhac399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 09/12/2022] [Accepted: 09/13/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Objects are fundamental to scene understanding. Scenes are defined by embedded objects and how we interact with them. Paradoxically, scene processing in the brain is typically discussed in contrast to object processing. Using the BOLD5000 dataset (Chang et al., 2019), we examined whether objects within a scene predicted the neural representation of scenes, as measured by functional magnetic resonance imaging in humans. Stimuli included 1,179 unique scenes across 18 semantic categories. Object composition of scenes were compared across scene exemplars in different semantic scene categories, and separately, in exemplars of the same scene category. Neural representations in scene- and object-preferring brain regions were significantly related to which objects were in a scene, with the effect at times stronger in the scene-preferring regions. The object model accounted for more variance when comparing scenes within the same semantic category to scenes from different categories. Here, we demonstrate the function of scene-preferring regions includes the processing of objects. This suggests visual processing regions may be better characterized by the processes, which are engaged when interacting with the stimulus kind, such as processing groups of objects in scenes, or processing a single object in our foreground, rather than the stimulus kind itself.
Collapse
Affiliation(s)
- Elissa M Aminoff
- Fordham University Department of Psychology, , 226 Dealy Hall, 441 E. Fordham Rd, Bronx, NY 10458, United States
| | - Tess Durham
- Fordham University Department of Psychology, , 226 Dealy Hall, 441 E. Fordham Rd, Bronx, NY 10458, United States
| |
Collapse
|
25
|
Jiang Z, Sanders DMW, Cowell RA. Visual and semantic similarity norms for a photographic image stimulus set containing recognizable objects, animals and scenes. Behav Res Methods 2022; 54:2364-2380. [PMID: 35088365 PMCID: PMC9325926 DOI: 10.3758/s13428-021-01732-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2021] [Indexed: 11/08/2022]
Abstract
We collected visual and semantic similarity norms for a set of photographic images comprising 120 recognizable objects/animals and 120 indoor/outdoor scenes. Human observers rated the similarity of pairs of images within four categories of stimuli-inanimate objects, animals, indoor scenes and outdoor scenes-via Amazon's Mechanical Turk. We performed multidimensional scaling (MDS) on the collected similarity ratings to visualize the perceived similarity for each image category, for both visual and semantic ratings. The MDS solutions revealed the expected similarity relationships between images within each category, along with intuitively sensible differences between visual and semantic similarity relationships for each category. Stress tests performed on the MDS solutions indicated that the MDS analyses captured meaningful levels of variance in the similarity data. These stimuli, associated norms and naming data are made available to all researchers, and should provide a useful resource for researchers of vision, memory and conceptual knowledge wishing to run experiments using well-parameterized stimulus sets.
Collapse
Affiliation(s)
- Zhuohan Jiang
- Neuroscience Program, Smith College, Northampton, MA, USA
- Integrated Program in Neuroscience, McGill University, Montreal, Quebec, Canada
| | - D Merika W Sanders
- Department of Psychological and Brain Sciences, University of Massachusetts Amherst, 135 Hicks Way, Amherst, MA, 01003, USA
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Rosemary A Cowell
- Department of Psychological and Brain Sciences, University of Massachusetts Amherst, 135 Hicks Way, Amherst, MA, 01003, USA.
| |
Collapse
|
26
|
De Cesarei A, Marzocchi M, Codispoti M. Luminance and timing control during visual presentation of natural scenes. HARDWAREX 2022; 12:e00376. [PMID: 36437839 PMCID: PMC9682347 DOI: 10.1016/j.ohx.2022.e00376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 11/02/2022] [Accepted: 11/12/2022] [Indexed: 06/16/2023]
Abstract
In the study of visual cognition, accurate control of stimulus presentation is of primary importance yet is complicated by hardware malfunctioning, software variability, and visual materials used. Here, we describe VISTO 2.0, a low-cost and open-source device which is capable to measure the timing and temporal luminance profile of visual stimuli. This device represents a major improvement over VISTO (De Cesarei, Marzocchi, & Loftus, 2021), as it is only sensitive to a light spectrum in the visible range, is easier to assemble, and has a modular design that can be extended to other sensory modalities.
Collapse
|
27
|
Ellmore TM, Reichert Plaska C, Ng K, Mei N. Visual continuous recognition reveals behavioral and neural differences for short- and long-term scene memory. Front Behav Neurosci 2022; 16:958609. [PMID: 36187377 PMCID: PMC9520405 DOI: 10.3389/fnbeh.2022.958609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/24/2022] [Indexed: 11/23/2022] Open
Abstract
Humans have a remarkably high capacity and long duration memory for complex scenes. Previous research documents the neural substrates that allow for efficient categorization of scenes from other complex stimuli like objects and faces, but the spatiotemporal neural dynamics underlying scene memory at timescales relevant to working and longer-term memory are less well understood. In the present study, we used high density EEG during a visual continuous recognition task in which new, old, and scrambled scenes consisting of color outdoor photographs were presented at an average rate 0.26 Hz. Old scenes were single repeated presentations occurring within either a short-term (< 20 s) or longer-term intervals of between 30 s and 3 min or 4 and 10 min. Overall recognition was far above chance, with better performance at shorter- than longer-term intervals. Sensor-level ANOVA and post hoc pairwise comparisons of event related potentials (ERPs) revealed three main findings: (1) occipital and parietal amplitudes distinguishing new and old from scrambled scenes; (2) frontal amplitudes distinguishing old from new scenes with a central positivity highest for hits compared to misses, false alarms and correct rejections; and (3) frontal and parietal changes from ∼300 to ∼600 ms distinguishing among old scenes previously encountered at short- and longer-term retention intervals. These findings reveal how distributed spatiotemporal neural changes evolve to support short- and longer-term recognition of complex scenes.
Collapse
Affiliation(s)
- Timothy M. Ellmore
- Department of Psychology, The City College of the City University of New York, New York, NY, United States
- Behavioral and Cognitive Neuroscience, The Graduate Center of the City University of New York, New York, NY, United States
| | - Chelsea Reichert Plaska
- Behavioral and Cognitive Neuroscience, The Graduate Center of the City University of New York, New York, NY, United States
| | - Kenneth Ng
- Department of Psychology, The City College of the City University of New York, New York, NY, United States
| | - Ning Mei
- Department of Psychology, The City College of the City University of New York, New York, NY, United States
| |
Collapse
|
28
|
Tsang KY, Mannion DJ. Relating Sound and Sight in Simulated Environments. Multisens Res 2022; 35:589-622. [PMID: 36084933 DOI: 10.1163/22134808-bja10082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 08/17/2022] [Indexed: 02/07/2023]
Abstract
The auditory signals at the ear can be affected by components arriving both directly from a sound source and indirectly via environmental reverberation. Previous studies have suggested that the perceptual separation of these contributions can be aided by expectations of likely reverberant qualities. Here, we investigated whether vision can provide information about the auditory properties of physical locations that could also be used to develop such expectations. We presented participants with audiovisual stimuli derived from 10 simulated real-world locations via a head-mounted display (HMD; n = 44) or a web-based ( n = 60) delivery method. On each trial, participants viewed a first-person perspective rendering of a location before hearing a spoken utterance that was convolved with an impulse response that was from a location that was either the same as (congruent) or different to (incongruent) the visually-depicted location. We find that audiovisual congruence was associated with an increase in the probability of participants reporting an audiovisual match of about 0.22 (95% credible interval: [ 0.17 , 0.27 ]), and that participants were more likely to confuse audiovisual pairs as matching if their locations had similar reverberation times. Overall, this study suggests that human perceivers have a capacity to form expectations of reverberation from visual information. Such expectations may be useful for the perceptual challenge of separating sound sources and reverberation from within the signal available at the ear.
Collapse
Affiliation(s)
- Kevin Y Tsang
- School of Psychology, UNSW Sydney, Sydney, 2052, Australia
| | | |
Collapse
|
29
|
Eckmann JP, Gunaratne GH, Shulman J, Wood LT. Imaging in reflecting spheres. CHAOS (WOODBURY, N.Y.) 2022; 32:093136. [PMID: 36182354 DOI: 10.1063/5.0110865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 08/29/2022] [Indexed: 06/16/2023]
Abstract
We study the formation of images in a reflective sphere in three configurations using caustics on the field of light rays. The optical wavefront emerging from a source point reaching a subject following passage through the optical system is, in general, a Gaussian surface with partial focus along the two principal directions of the Gaussian surface; i.e., there are two images of the source point, each with partial focus. As the source point moves, the images move on two surfaces, referred to as viewable surfaces. In our systems, one viewable surface consists of points with radial focus and the other consists of points with azimuthal focus. The problems we study are (1) imaging of a parallel beam of light, (2) imaging of the infinite viewed from a location outside the sphere, and (3) imaging of a planar object viewed through the point of its intersection with the radial line normal to the plane. We verify the existence of two images experimentally and show that the distance between them agrees with the computations.
Collapse
Affiliation(s)
- Jean-Pierre Eckmann
- Département de Physique Théorique and Section de Mathématiques, Université de Genève, 1211 Geneva 4, Switzerland
| | | | - Jason Shulman
- Department of Physics, University of Houston, Houston, Texas 77204, USA
| | - Lowell T Wood
- Department of Physics, University of Houston, Houston, Texas 77204, USA
| |
Collapse
|
30
|
Helbing J, Draschkow D, L-H Võ M. Auxiliary Scene-Context Information Provided by Anchor Objects Guides Attention and Locomotion in Natural Search Behavior. Psychol Sci 2022; 33:1463-1476. [PMID: 35942922 DOI: 10.1177/09567976221091838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Successful adaptive behavior requires efficient attentional and locomotive systems. Previous research has thoroughly investigated how we achieve this efficiency during natural behavior by exploiting prior knowledge related to targets of our actions (e.g., attending to metallic targets when looking for a pot) and to the environmental context (e.g., looking for the pot in the kitchen). Less is known about whether and how individual nontarget components of the environment support natural behavior. In our immersive virtual reality task, 24 adult participants searched for objects in naturalistic scenes in which we manipulated the presence and arrangement of large, static objects that anchor predictions about targets (e.g., the sink provides a prediction for the location of the soap). Our results show that gaze and body movements in this naturalistic setting are strongly guided by these anchors. These findings demonstrate that objects auxiliary to the target are incorporated into the representations guiding attention and locomotion.
Collapse
Affiliation(s)
- Jason Helbing
- Scene Grammar Lab, Department of Psychology, Goethe University Frankfurt
| | - Dejan Draschkow
- Brain and Cognition Laboratory, Department of Experimental Psychology, University of Oxford.,Oxford Centre for Human Brain Activity, Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford
| | - Melissa L-H Võ
- Scene Grammar Lab, Department of Psychology, Goethe University Frankfurt
| |
Collapse
|
31
|
Wolfe B, Sawyer BD, Rosenholtz R. Toward a Theory of Visual Information Acquisition in Driving. HUMAN FACTORS 2022; 64:694-713. [PMID: 32678682 PMCID: PMC9136385 DOI: 10.1177/0018720820939693] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 06/09/2020] [Indexed: 06/01/2023]
Abstract
OBJECTIVE The aim of this study is to describe information acquisition theory, explaining how drivers acquire and represent the information they need. BACKGROUND While questions of what drivers are aware of underlie many questions in driver behavior, existing theories do not directly address how drivers in particular and observers in general acquire visual information. Understanding the mechanisms of information acquisition is necessary to build predictive models of drivers' representation of the world and can be applied beyond driving to a wide variety of visual tasks. METHOD We describe our theory of information acquisition, looking to questions in driver behavior and results from vision science research that speak to its constituent elements. We focus on the intersection of peripheral vision, visual attention, and eye movement planning and identify how an understanding of these visual mechanisms and processes in the context of information acquisition can inform more complete models of driver knowledge and state. RESULTS We set forth our theory of information acquisition, describing the gap in understanding that it fills and how existing questions in this space can be better understood using it. CONCLUSION Information acquisition theory provides a new and powerful way to study, model, and predict what drivers know about the world, reflecting our current understanding of visual mechanisms and enabling new theories, models, and applications. APPLICATION Using information acquisition theory to understand how drivers acquire, lose, and update their representation of the environment will aid development of driver assistance systems, semiautonomous vehicles, and road safety overall.
Collapse
|
32
|
Lin Z, Gong M, Li X. On the relation between crowding and ensemble perception: Examining the role of attention. Psych J 2022; 11:804-813. [PMID: 35557502 DOI: 10.1002/pchj.559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 04/10/2022] [Indexed: 11/06/2022]
Abstract
Ensemble perception of a crowd of stimuli is very accurate, even when individual stimuli are invisible due to crowding. The ability of high-precision ensemble perception can be an evolved compensatory mechanism for the limited attentional resolution caused by crowding. Thus the relationship of crowding and ensemble coding is like two sides of the same coin wherein attention may play a critical factor for their coexistence. The present study investigated whether crowding and ensemble coding were similarly modulated by attention, which can promote our understanding of their relation. Experiment 1 showed that diverting attention away from the target harmed the performance in both crowding and ensemble perception tasks regardless of stimulus density, but crowding was more severely harmed. Experiment 2 showed that directing attention toward the target bar enhanced the performance of crowding regardless of stimulus density. Ensemble perception of high-density bars was also enhanced but to a lesser extent, while ensemble perception of low-density bars was harmed. Together, our results indicate that crowding is strongly modulated by attention, whereas ensemble perception is only moderately modulated by attention, which conforms to the adaptive view.
Collapse
Affiliation(s)
- Zhen Lin
- School of Psychology, Jiangxi Normal University, Nanchang, China
| | - Mingliang Gong
- School of Psychology, Jiangxi Normal University, Nanchang, China
| | - Xiang Li
- School of Psychology, Jiangxi Normal University, Nanchang, China
| |
Collapse
|
33
|
Han X, Wang L, Seo SH, He J, Jung T. Measuring Perceived Psychological Stress in Urban Built Environments Using Google Street View and Deep Learning. Front Public Health 2022; 10:891736. [PMID: 35646775 PMCID: PMC9131010 DOI: 10.3389/fpubh.2022.891736] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 04/19/2022] [Indexed: 12/18/2022] Open
Abstract
An urban built environment is an important part of the daily lives of urban residents. Correspondingly, a poor design can lead to psychological stress, which can be harmful to their psychological and physical well-being. The relationship between the urban built environment and the perceived psychological stress of residents is a significant in many disciplines. Further research is needed to determine the stress level experienced by residents in the built environment on a large scale and identify the relationship between the visual components of the built environment and perceived psychological stress. Recent developments in big data and deep learning technology mean that the technical support required to measure the perceived psychological stress of residents has now become available. In this context, this study explored a method for a rapid and large-scale determination of the perceived psychological stress among urban residents through a deep learning approach. An empirical study was conducted in Gangnam District, Seoul, South Korea, and the SegNet deep learning algorithm was used to segment and classify the visual elements of street views. In addition, a human-machine adversarial model using random forest as a framework was employed to score the perception of the perceived psychological stress in the built environment. Consequently, we found a strong spatial autocorrelation in the perceived psychological stress in space, with more low-low clusters in the urban traffic arteries and riverine areas in Gangnam district and more high-high clusters in the commercial and residential areas. We also analyzed the street view images for three types of stress perception (i.e., low, medium and high) and obtained the percentage of each street view element combination under different stresses. Using multiple linear regression, we found that walls and buildings cause psychological stress, whereas sky, trees and roads relieve it. Our analytical study integrates street view big data with deep learning and proposes an innovative method for measuring the perceived psychological stress of residents in the built environment. The research methodology and results can be a reference for urban planning and design from a human centered perspective.
Collapse
Affiliation(s)
- Xin Han
- Department of Landscape Architecture, Kyungpook National University, Daegu, South Korea
| | - Lei Wang
- School of Architecture, Tianjin University, Tianjin, China
| | - Seong Hyeok Seo
- Department of Landscape Architecture, Kyungpook National University, Daegu, South Korea
| | - Jie He
- School of Architecture, Tianjin University, Tianjin, China
- School of Architecture, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Taeyeol Jung
- Department of Landscape Architecture, Kyungpook National University, Daegu, South Korea
| |
Collapse
|
34
|
Capparini C, To MPS, Reid VM. Identifying the limits of peripheral visual processing in 9‐month‐old infants. Dev Psychobiol 2022; 64:e22274. [DOI: 10.1002/dev.22274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 02/22/2022] [Accepted: 03/13/2022] [Indexed: 11/07/2022]
Affiliation(s)
- Chiara Capparini
- Department of Psychology Lancaster University Lancaster United Kingdom
| | - Michelle P. S. To
- Department of Psychology Lancaster University Lancaster United Kingdom
| | - Vincent M. Reid
- School of Psychology University of Waikato Hamilton New Zealand
| |
Collapse
|
35
|
Brau JM, Sugarman A, Rothlein D, DeGutis J, Esterman M, Fortenbaugh FC. The impact of image degradation and temporal dynamics on sustained attention. J Vis 2022; 22:8. [PMID: 35297998 PMCID: PMC8944397 DOI: 10.1167/jov.22.4.8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Many clinical populations that have sustained attention deficits also have visual deficits. Therefore, it is necessary to understand how the quality of visual input and different forms of image degradation can contribute to worse performance on sustained attention tasks, particularly those with dynamic and complex visual stimuli. This study investigated the impact of image degradation on an adapted version of the gradual-onset continuous performance task (gradCPT), where participants must discriminate between gradually fading city and mountain scenes. Thirty-six normal-vision participants completed the task, which featured two blocks of six resolution and contrast levels. Subjects either completed a version with gradually fading or static image presentations. The results show decreases in image resolution impair performance under both types of temporal dynamics, whereas performance is only impaired under gradual temporal dynamics for decreases in image contrast. Image similarity analyses showed that performance has a higher association with an observer's ability to gather an image's global spatial layout (i.e. gist) than local variations in pixel luminance, particularly under gradual image presentation. This work suggests that gradually fading attention paradigms are sensitive to deficits in primary visual function, potentially leading to these issues being misinterpreted as attentional failures.
Collapse
Affiliation(s)
- Julia M Brau
- Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,
| | - Alexander Sugarman
- Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,
| | - David Rothlein
- Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,Boston Attention and Learning Lab (BALLAB), VA Boston Healthcare System, Boston, MA, USA.,National Center for PTSD, VA Boston Healthcare System, Boston, MA, USA.,
| | - Joseph DeGutis
- Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,Boston Attention and Learning Lab (BALLAB), VA Boston Healthcare System, Boston, MA, USA.,Department of Psychiatry, Harvard Medical School, Cambridge, MA, USA.,
| | - Michael Esterman
- National Center for PTSD, VA Boston Healthcare System, Boston, MA, USA.,Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,Boston Attention and Learning Lab (BALLAB), VA Boston Healthcare System, Boston, MA, USA.,Department of Psychiatry, Boston University School of Medicine, Boston, MA, USA.,
| | - Francesca C Fortenbaugh
- Translational Research Center for TBI and Stress Disorders (TRACTS), VA Boston Healthcare System, Boston, MA, USA.,Department of Psychiatry, Harvard Medical School, Cambridge, MA, USA.,
| |
Collapse
|
36
|
Environment width robustly influences egocentric distance judgments. PLoS One 2022; 17:e0263497. [PMID: 35143537 PMCID: PMC8830710 DOI: 10.1371/journal.pone.0263497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 01/20/2022] [Indexed: 11/19/2022] Open
Abstract
Past work has suggested that perception of object distances in natural scenes depends on the environmental surroundings, even when the physical object distance remains constant. The cue bases for such effects remain unclear and are difficult to study systematically in real-world settings, given the challenges in manipulating large environmental features reliably and efficiently. Here, we used rendered scenes and crowdsourced data collection to address these challenges. In 4 experiments involving 452 participants, we investigated the effect of room width and depth on egocentric distance judgments. Targets were placed at distances of 2–37 meters in rendered rooms that varied in width (1.5–40 meters) and depth (6–40 meters). We found large and reliable effects of room width: Average judgments for the farthest targets in a 40-meter-wide room were between 16–33% larger than for the same target distances seen in a 1.5-meter-wide hallway. Egocentric distance cues and focal length were constant across room widths, highlighting the role of environmental context in judging distances in natural scenes. Obscuring the fine-grained ground texture, per se, is not primarily responsible for the width effect, nor does linear perspective play a strong role. However, distance judgments tended to decrease when doors and/or walls obscured more distant regions of the scene. We discuss how environmental features may be used to calibrate relative distance cues for egocentric distance judgments.
Collapse
|
37
|
Three cortical scene systems and their development. Trends Cogn Sci 2022; 26:117-127. [PMID: 34857468 PMCID: PMC8770598 DOI: 10.1016/j.tics.2021.11.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/14/2021] [Accepted: 11/06/2021] [Indexed: 02/03/2023]
Abstract
Since the discovery of three scene-selective regions in the human brain, a central assumption has been that all three regions directly support navigation. We propose instead that cortical scene processing regions support three distinct computational goals (and one not for navigation at all): (i) The parahippocampal place area supports scene categorization, which involves recognizing the kind of place we are in; (ii) the occipital place area supports visually guided navigation, which involves finding our way through the immediately visible environment, avoiding boundaries and obstacles; and (iii) the retrosplenial complex supports map-based navigation, which involves finding our way from a specific place to some distant, out-of-sight place. We further hypothesize that these systems develop along different timelines, with both navigation systems developing slower than the scene categorization system.
Collapse
|
38
|
Son G, Walther DB, Mack ML. Scene wheels: Measuring perception and memory of real-world scenes with a continuous stimulus space. Behav Res Methods 2022; 54:444-456. [PMID: 34244986 DOI: 10.3758/s13428-021-01630-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2021] [Indexed: 11/08/2022]
Abstract
Precisely characterizing mental representations of visual experiences requires careful control of experimental stimuli. Recent work leveraging such stimulus control has led to important insights; however, these findings are constrained to simple visual properties like color and line orientation. There remains a critical methodological barrier to characterizing perceptual and mnemonic representations of realistic visual experiences. Here, we introduce a novel method to systematically control visual properties of natural scene stimuli. Using generative adversarial networks (GANs), a state-of-the-art deep learning technique for creating highly realistic synthetic images, we generated scene wheels in which continuously changing visual properties smoothly transition between meaningful realistic scenes. To validate the efficacy of scene wheels, we conducted two behavioral experiments that assess perceptual and mnemonic representations attained from the scene wheels. In the perceptual validation experiment, we tested whether the continuous transition of scene images along the wheel is reflected in human perceptual similarity judgment. The perceived similarity of the scene images correspondingly decreased as distances between the images increase on the wheel. In the memory experiment, participants reconstructed to-be-remembered scenes from the scene wheels. Reconstruction errors for these scenes resemble error distributions observed in prior studies using simple stimulus properties. Importantly, perceptual similarity judgment and memory precision varied systematically with scene wheel radius. These findings suggest our novel approach offers a window into the mental representations of naturalistic visual experiences.
Collapse
Affiliation(s)
- Gaeun Son
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada.
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| | - Michael L Mack
- Department of Psychology, University of Toronto, Sidney Smith Hall, 100 St George St, Toronto, ON, Canada
| |
Collapse
|
39
|
Comparable prediction of breast cancer risk from a glimpse or a first impression of a mammogram. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2021; 6:72. [PMID: 34743266 PMCID: PMC8572261 DOI: 10.1186/s41235-021-00339-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Accepted: 10/18/2021] [Indexed: 12/02/2022]
Abstract
Expert radiologists can discern normal from abnormal mammograms with above-chance accuracy after brief (e.g. 500 ms) exposure. They can even predict cancer risk viewing currently normal images (priors) from women who will later develop cancer. This involves a rapid, global, non-selective process called “gist extraction”. It is not yet known whether prolonged exposure can strengthen the gist signal, or if it is available solely in the early exposure. This is of particular interest for the priors that do not contain any localizable signal of abnormality. The current study compared performance with brief (500 ms) or unlimited exposure for four types of mammograms (normal, abnormal, contralateral, priors). Groups of expert radiologists and untrained observers were tested. As expected, radiologists outperformed naïve participants. Replicating prior work, they exceeded chance performance though the gist signal was weak. However, we found no consistent performance differences in radiologists or naïves between timing conditions. Exposure time neither increased nor decreased ability to identify the gist of abnormality or predict cancer risk. If gist signals are to have a place in cancer risk assessments, more efforts should be made to strengthen the signal.
Collapse
|
40
|
van Dyck LE, Kwitt R, Denzler SJ, Gruber WR. Comparing Object Recognition in Humans and Deep Convolutional Neural Networks-An Eye Tracking Study. Front Neurosci 2021; 15:750639. [PMID: 34690686 PMCID: PMC8526843 DOI: 10.3389/fnins.2021.750639] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 09/16/2021] [Indexed: 11/30/2022] Open
Abstract
Deep convolutional neural networks (DCNNs) and the ventral visual pathway share vast architectural and functional similarities in visual challenges such as object recognition. Recent insights have demonstrated that both hierarchical cascades can be compared in terms of both exerted behavior and underlying activation. However, these approaches ignore key differences in spatial priorities of information processing. In this proof-of-concept study, we demonstrate a comparison of human observers (N = 45) and three feedforward DCNNs through eye tracking and saliency maps. The results reveal fundamentally different resolutions in both visualization methods that need to be considered for an insightful comparison. Moreover, we provide evidence that a DCNN with biologically plausible receptive field sizes called vNet reveals higher agreement with human viewing behavior as contrasted with a standard ResNet architecture. We find that image-specific factors such as category, animacy, arousal, and valence have a direct link to the agreement of spatial object recognition priorities in humans and DCNNs, while other measures such as difficulty and general image properties do not. With this approach, we try to open up new perspectives at the intersection of biological and computer vision research.
Collapse
Affiliation(s)
- Leonard Elia van Dyck
- Department of Psychology, University of Salzburg, Salzburg, Austria.,Center for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
| | - Roland Kwitt
- Department of Computer Science, University of Salzburg, Salzburg, Austria
| | | | - Walter Roland Gruber
- Department of Psychology, University of Salzburg, Salzburg, Austria.,Center for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
| |
Collapse
|
41
|
Jung Y, Walther DB. Neural Representations in the Prefrontal Cortex Are Task Dependent for Scene Attributes But Not for Scene Categories. J Neurosci 2021; 41:7234-7245. [PMID: 34103357 PMCID: PMC8387120 DOI: 10.1523/jneurosci.2816-20.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 05/28/2021] [Accepted: 06/03/2021] [Indexed: 11/21/2022] Open
Abstract
Natural scenes deliver rich sensory information about the world. Decades of research has shown that the scene-selective network in the visual cortex represents various aspects of scenes. However, less is known about how such complex scene information is processed beyond the visual cortex, such as in the prefrontal cortex. It is also unknown how task context impacts the process of scene perception, modulating which scene content is represented in the brain. In this study, we investigate these questions using scene images from four natural scene categories, which also depict two types of scene attributes, temperature (warm or cold), and sound level (noisy or quiet). A group of healthy human subjects from both sexes participated in the present study using fMRI. In the study, participants viewed scene images under two different task conditions: temperature judgment and sound-level judgment. We analyzed how these scene attributes and categories are represented across the brain under these task conditions. Our findings show that scene attributes (temperature and sound level) are only represented in the brain when they are task relevant. However, scene categories are represented in the brain, in both the parahippocampal place area and the prefrontal cortex, regardless of task context. These findings suggest that the prefrontal cortex selectively represents scene content according to task demands, but this task selectivity depends on the types of scene content: task modulates neural representations of scene attributes but not of scene categories.SIGNIFICANCE STATEMENT Research has shown that visual scene information is processed in scene-selective regions in the occipital and temporal cortices. Here, we ask how scene content is processed and represented beyond the visual brain, in the prefrontal cortex (PFC). We show that both scene categories and scene attributes are represented in PFC, with interesting differences in task dependency: scene attributes are only represented in PFC when they are task relevant, but scene categories are represented in PFC regardless of the task context. Together, our work shows that scene information is processed beyond the visual cortex, and scene representation in PFC reflects how adaptively our minds extract relevant information from a scene.
Collapse
Affiliation(s)
- Yaelan Jung
- Department of Psychology, University of Toronto, Toronto, Ontario M5S 3G3, Canada
| | - Dirk B Walther
- Department of Psychology, University of Toronto, Toronto, Ontario M5S 3G3, Canada
| |
Collapse
|
42
|
Çelik E, Keles U, Kiremitçi İ, Gallant JL, Çukur T. Cortical networks of dynamic scene category representation in the human brain. Cortex 2021; 143:127-147. [PMID: 34411847 DOI: 10.1016/j.cortex.2021.07.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 06/28/2021] [Accepted: 07/14/2021] [Indexed: 10/20/2022]
Abstract
Humans have an impressive ability to rapidly process global information in natural scenes to infer their category. Yet, it remains unclear whether and how scene categories observed dynamically in the natural world are represented in cerebral cortex beyond few canonical scene-selective areas. To address this question, here we examined the representation of dynamic visual scenes by recording whole-brain blood oxygenation level-dependent (BOLD) responses while subjects viewed natural movies. We fit voxelwise encoding models to estimate tuning for scene categories that reflect statistical ensembles of objects and actions in the natural world. We find that this scene-category model explains a significant portion of the response variance broadly across cerebral cortex. Cluster analysis of scene-category tuning profiles across cortex reveals nine spatially-segregated networks of brain regions consistently across subjects. These networks show heterogeneous tuning for a diverse set of dynamic scene categories related to navigation, human activity, social interaction, civilization, natural environment, non-human animals, motion-energy, and texture, suggesting that the organization of scene category representation is quite complex.
Collapse
Affiliation(s)
- Emin Çelik
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara, Turkey.
| | - Umit Keles
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara, Turkey; Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA
| | - İbrahim Kiremitçi
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara, Turkey
| | - Jack L Gallant
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA; Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA; Department of Psychology, University of California, Berkeley, CA, USA
| | - Tolga Çukur
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara, Turkey; National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara, Turkey; Department of Electrical and Electronics Engineering, Bilkent University, Ankara, Turkey
| |
Collapse
|
43
|
Isik AI, Vessel EA. From Visual Perception to Aesthetic Appeal: Brain Responses to Aesthetically Appealing Natural Landscape Movies. Front Hum Neurosci 2021; 15:676032. [PMID: 34366810 PMCID: PMC8336692 DOI: 10.3389/fnhum.2021.676032] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 06/29/2021] [Indexed: 11/13/2022] Open
Abstract
During aesthetically appealing visual experiences, visual content provides a basis for computation of affectively tinged representations of aesthetic value. How this happens in the brain is largely unexplored. Using engaging video clips of natural landscapes, we tested whether cortical regions that respond to perceptual aspects of an environment (e.g., spatial layout, object content and motion) were directly modulated by rated aesthetic appeal. Twenty-four participants watched a series of videos of natural landscapes while being scanned using functional magnetic resonance imaging (fMRI) and reported both continuous ratings of enjoyment (during the videos) and overall aesthetic judgments (after each video). Although landscape videos engaged a greater expanse of high-level visual cortex compared to that observed for images of landscapes, independently localized category-selective visual regions (e.g., scene-selective parahippocampal place area and motion-selective hMT+) were not significantly modulated by aesthetic appeal. Rather, a whole-brain analysis revealed modulations by aesthetic appeal in ventral (collateral sulcus) and lateral (middle occipital sulcus, posterior middle temporal gyrus) clusters that were adjacent to scene and motion selective regions. These findings suggest that aesthetic appeal per se is not represented in well-characterized feature- and category-selective regions of visual cortex. Rather, we propose that the observed activations reflect a local transformation from a feature-based visual representation to a representation of "elemental affect," computed through information-processing mechanisms that detect deviations from an observer's expectations. Furthermore, we found modulation by aesthetic appeal in subcortical reward structures but not in regions of the default-mode network (DMN) nor orbitofrontal cortex, and only weak evidence for associated changes in functional connectivity. In contrast to other visual aesthetic domains, aesthetically appealing interactions with natural landscapes may rely more heavily on comparisons between ongoing stimulation and well-formed representations of the natural world, and less on top-down processes for resolving ambiguities or assessing self-relevance.
Collapse
Affiliation(s)
- Ayse Ilkay Isik
- Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | | |
Collapse
|
44
|
Williams LH, Carrigan AJ, Mills M, Auffermann WF, Rich AN, Drew T. Characteristics of expert search behavior in volumetric medical image interpretation. J Med Imaging (Bellingham) 2021; 8:041208. [PMID: 34277889 DOI: 10.1117/1.jmi.8.4.041208] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 06/28/2021] [Indexed: 11/14/2022] Open
Abstract
Purpose: Experienced radiologists have enhanced global processing ability relative to novices, allowing experts to rapidly detect medical abnormalities without performing an exhaustive search. However, evidence for global processing models is primarily limited to two-dimensional image interpretation, and it is unclear whether these findings generalize to volumetric images, which are widely used in clinical practice. We examined whether radiologists searching volumetric images use methods consistent with global processing models of expertise. In addition, we investigated whether search strategy (scanning/drilling) differs with experience level. Approach: Fifty radiologists with a wide range of experience evaluated chest computed-tomography scans for lung nodules while their eye movements and scrolling behaviors were tracked. Multiple linear regressions were used to determine: (1) how search behaviors differed with years of experience and the number of chest CTs evaluated per week and (2) which search behaviors predicted better performance. Results: Contrary to global processing models based on 2D images, experience was unrelated to measures of global processing (saccadic amplitude, coverage, time to first fixation, search time, and depth passes) in this task. Drilling behavior was associated with better accuracy than scanning behavior when controlling for observer experience. Greater image coverage was a strong predictor of task accuracy. Conclusions: Global processing ability may play a relatively small role in volumetric image interpretation, where global scene statistics are not available to radiologists in a single glance. Rather, in volumetric images, it may be more important to engage in search strategies that support a more thorough search of the image.
Collapse
Affiliation(s)
- Lauren H Williams
- University of California, San Diego, Department of Psychology, San Diego, California, United States
| | - Ann J Carrigan
- Macquarie University, Department of Psychology, Sydney, New South Wales, Australia.,Macquarie University, Perception in Action Research Centre, Sydney, New South Wales, Australia.,Macquarie University, Centre for Elite Performance, Expertise, and Training, Sydney, New South Wales, Australia
| | - Megan Mills
- University of Utah, School of Medicine, Department of Radiology and Imaging Sciences, Salt Lake City, Utah, United States
| | - William F Auffermann
- University of Utah, School of Medicine, Department of Radiology and Imaging Sciences, Salt Lake City, Utah, United States
| | - Anina N Rich
- Macquarie University, Perception in Action Research Centre, Sydney, New South Wales, Australia.,Macquarie University, Centre for Elite Performance, Expertise, and Training, Sydney, New South Wales, Australia.,Macquarie University, Department of Cognitive Science, Sydney, New South Wales, Australia
| | - Trafton Drew
- University of Utah, Department of Psychology, Salt Lake City, Utah, United States
| |
Collapse
|
45
|
Kaiser D, Häberle G, Cichy RM. Coherent natural scene structure facilitates the extraction of task-relevant object information in visual cortex. Neuroimage 2021; 240:118365. [PMID: 34233220 PMCID: PMC8456750 DOI: 10.1016/j.neuroimage.2021.118365] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 04/22/2021] [Accepted: 07/03/2021] [Indexed: 11/24/2022] Open
Abstract
Looking for objects within complex natural environments is a task everybody performs multiple times each day. In this study, we explore how the brain uses the typical composition of real-world environments to efficiently solve this task. We recorded fMRI activity while participants performed two different categorization tasks on natural scenes. In the object task, they indicated whether the scene contained a person or a car, while in the scene task, they indicated whether the scene depicted an urban or a rural environment. Critically, each scene was presented in an "intact" way, preserving its coherent structure, or in a "jumbled" way, with information swapped across quadrants. In both tasks, participants' categorization was more accurate and faster for intact scenes. These behavioral benefits were accompanied by stronger responses to intact than to jumbled scenes across high-level visual cortex. To track the amount of object information in visual cortex, we correlated multi-voxel response patterns during the two categorization tasks with response patterns evoked by people and cars in isolation. We found that object information in object- and body-selective cortex was enhanced when the object was embedded in an intact, rather than a jumbled scene. However, this enhancement was only found in the object task: When participants instead categorized the scenes, object information did not differ between intact and jumbled scenes. Together, these results indicate that coherent scene structure facilitates the extraction of object information in a task-dependent way, suggesting that interactions between the object and scene processing pathways adaptively support behavioral goals.
Collapse
Affiliation(s)
- Daniel Kaiser
- Department of Psychology, University of York, York, UK.
| | - Greta Häberle
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Berlin, Germany; Humboldt-Universität zu Berlin, Faculty of Philosophy, Berlin School of Mind and Brain, Berlin, Germany
| | - Radoslaw M Cichy
- Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany; Charité - Universitätsmedizin Berlin, Einstein Center for Neurosciences Berlin, Berlin, Germany; Humboldt-Universität zu Berlin, Faculty of Philosophy, Berlin School of Mind and Brain, Berlin, Germany; Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
| |
Collapse
|
46
|
Brunyé TT, Drew T, Saikia MJ, Kerr KF, Eguchi MM, Lee AC, May C, Elder DE, Elmore JG. Melanoma in the Blink of an Eye: Pathologists' Rapid Detection, Classification, and Localization of Skin Abnormalities. VISUAL COGNITION 2021; 29:386-400. [PMID: 35197796 PMCID: PMC8863358 DOI: 10.1080/13506285.2021.1943093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 06/09/2021] [Indexed: 10/21/2022]
Abstract
Expert radiologists can quickly extract a basic "gist" understanding of a medical image following less than a second exposure, leading to above-chance diagnostic classification of images. Most of this work has focused on radiology tasks (such as screening mammography), and it is currently unclear whether this pattern of results and the nature of visual expertise underlying this ability are applicable to pathology, another medical imaging domain demanding visual diagnostic interpretation. To further characterize the detection, localization, and diagnosis of medical images, this study examined eye movements and diagnostic decision-making when pathologists were briefly exposed to digital whole slide images of melanocytic skin biopsies. Twelve resident (N = 5), fellow (N = 5), and attending pathologists (N = 2) with experience interpreting dermatopathology briefly viewed 48 cases presented for 500 ms each, and we tracked their eye movements towards histological abnormalities, their ability to classify images as containing or not containing invasive melanoma, and their ability to localize critical image regions. Results demonstrated rapid shifts of the eyes towards critical abnormalities during image viewing, high diagnostic sensitivity and specificity, and a surprisingly accurate ability to localize critical diagnostic image regions. Furthermore, when pathologists fixated critical regions with their eyes, they were subsequently much more likely to successfully localize that region on an outline of the image. Results are discussed relative to models of medical image interpretation and innovative methods for monitoring and assessing expertise development during medical education and training.
Collapse
Affiliation(s)
- Tad T. Brunyé
- Center for Applied Brain and Cognitive Sciences, Tufts University, Medford, MA, USA
| | - Trafton Drew
- Department of Psychology, University of Utah, Salt Lake City, UT, USA
| | - Manob Jyoti Saikia
- Center for Applied Brain and Cognitive Sciences, Tufts University, Medford, MA, USA
| | - Kathleen F. Kerr
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Megan M. Eguchi
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Annie C. Lee
- Department of Medicine, David Geffen School of Medicine, University of California Los Angeles, CA, USA
| | - Caitlin May
- Dermatopathology Northwest, Bellevue, WA, USA
| | - David E. Elder
- Division of Anatomic Pathology, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Joann G. Elmore
- Department of Medicine, David Geffen School of Medicine, University of California Los Angeles, CA, USA
| |
Collapse
|
47
|
Abstract
Every aspect of vision, from the opsin proteins to the eyes and the ways that they serve animal behavior, is incredibly diverse. It is only with an evolutionary perspective that this diversity can be understood and fully appreciated. In this review, I describe and explain the diversity at each level and try to convey an understanding of how the origin of the first opsin some 800 million years ago could initiate the avalanche that produced the astonishing diversity of eyes and vision that we see today. Despite the diversity, many types of photoreceptors, eyes, and visual roles have evolved multiple times independently in different animals, revealing a pattern of eye evolution strictly guided by functional constraints and driven by the evolution of gradually more demanding behaviors. I conclude the review by introducing a novel distinction between active and passive vision that points to uncharted territories in vision research. Expected final online publication date for the Annual Review of Vision Science, Volume 7 is September 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Dan-E Nilsson
- Lund Vision Group, Department of Biology, Lund University, 22362 Lund, Sweden;
| |
Collapse
|
48
|
Abstract
Categorization performance is a popular metric of scene recognition and understanding in behavioral and computational research. However, categorical constructs and their labels can be somewhat arbitrary. Derived from exhaustive vocabularies of place names (e.g., Deng et al., 2009), or the judgements of small groups of researchers (e.g., Fei-Fei, Iyer, Koch, & Perona, 2007), these categories may not correspond with human-preferred taxonomies. Here, we propose clustering by increasing the rand index via coordinate ascent (CIRCA): an unsupervised, data-driven clustering method for deriving ground-truth scene categories. In Experiment 1, human participants organized 80 stereoscopic images of outdoor scenes from the Southampton-York Natural Scenes (SYNS) dataset (Adams et al., 2016) into discrete categories. In separate tasks, images were grouped according to i) semantic content, ii) three-dimensional spatial structure, or iii) two-dimensional image appearance. Participants provided text labels for each group. Using the CIRCA method, we determined the most representative category structure and then derived category labels for each task/dimension. In Experiment 2, we found that these categories generalized well to a larger set of SYNS images, and new observers. In Experiment 3, we tested the relationship between our category systems and the spatial envelope model (Oliva & Torralba, 2001). Finally, in Experiment 4, we validated CIRCA on a larger, independent dataset of same-different category judgements. The derived category systems outperformed the SUN taxonomy (Xiao, Hays, Ehinger, Oliva, & Torralba, 2010) and an alternative clustering method (Greene, 2019). In summary, we believe this novel categorization method can be applied to a wide range of datasets to derive optimal categorical groupings and labels from psychophysical judgements of stimulus similarity.
Collapse
Affiliation(s)
- Matt D Anderson
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| | - Erich W Graf
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| | - James H Elder
- Centre for Vision Research, Department of Psychology, Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, Canada.,
| | - Krista A Ehinger
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia.,
| | - Wendy J Adams
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| |
Collapse
|
49
|
De Cesarei A, Cavicchi S, Cristadoro G, Lippi M. Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes? Cogn Sci 2021; 45:e13009. [PMID: 34170027 PMCID: PMC8365760 DOI: 10.1111/cogs.13009] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 05/19/2021] [Accepted: 05/31/2021] [Indexed: 11/28/2022]
Abstract
The investigation of visual categorization has recently been aided by the introduction of deep convolutional neural networks (CNNs), which achieve unprecedented accuracy in picture classification after extensive training. Even if the architecture of CNNs is inspired by the organization of the visual brain, the similarity between CNN and human visual processing remains unclear. Here, we investigated this issue by engaging humans and CNNs in a two-class visual categorization task. To this end, pictures containing animals or vehicles were modified to contain only low/high spatial frequency (HSF) information, or were scrambled in the phase of the spatial frequency spectrum. For all types of degradation, accuracy increased as degradation was reduced for both humans and CNNs; however, the thresholds for accurate categorization varied between humans and CNNs. More remarkable differences were observed for HSF information compared to the other two types of degradation, both in terms of overall accuracy and image-level agreement between humans and CNNs. The difficulty with which the CNNs were shown to categorize high-passed natural scenes was reduced by picture whitening, a procedure which is inspired by how visual systems process natural images. The results are discussed concerning the adaptation to regularities in the visual environment (scene statistics); if the visual characteristics of the environment are not learned by CNNs, their visual categorization may depend only on a subset of the visual information on which humans rely, for example, on low spatial frequency information.
Collapse
Affiliation(s)
| | | | | | - Marco Lippi
- Department of Sciences and Methods for EngineeringUniversity of Modena and Reggio Emilia
| |
Collapse
|
50
|
Kurtz KJ, Silliman DC. Object understanding: Investigating the path from percept to meaning. Acta Psychol (Amst) 2021; 216:103307. [PMID: 33894533 DOI: 10.1016/j.actpsy.2021.103307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 03/31/2021] [Accepted: 04/01/2021] [Indexed: 11/29/2022] Open
Abstract
Researchers tend to follow two paths when investigating categorization: 1) artificial classification learning tasks and 2) studies of natural conceptual organization involving reasoning from prior category knowledge. Largely separate, another body of research addresses the process of object recognition, i.e., how people identify what they are looking at strictly in terms of visual as opposed to semantic properties. The present work brings together elements from each of these approaches in order to address object understanding: the ubiquitous natural process of accessing meaning based on a realistic image of an everyday object. According to a widely held features-first framework, a stimulus is initially encoded as a set of features that is compared to stored category representations to find the best match. This approach has been successful for explaining artificial classification learning, but it bypasses how items are encoded and fails to include a role for top-down processing in constructing item representations. We used a speeded verification task to evaluate the features-first account using realistic stimuli. Participants saw photographic images of everyday objects and judged as quickly as possible whether a provided verbal description matched the picture. Category descriptions (basic-level labels) were verified significantly faster than descriptions of physical or functional properties. This suggests that people access the category of the stimulus prior to accessing its parsed features. We outline a construal account whereby the category is accessed first to construct a featural item interpretation rather than features being the basis for determining the category.
Collapse
|