51
|
Brunyé TT, Drew T, Saikia MJ, Kerr KF, Eguchi MM, Lee AC, May C, Elder DE, Elmore JG. Melanoma in the Blink of an Eye: Pathologists' Rapid Detection, Classification, and Localization of Skin Abnormalities. VISUAL COGNITION 2021; 29:386-400. [PMID: 35197796 PMCID: PMC8863358 DOI: 10.1080/13506285.2021.1943093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 06/09/2021] [Indexed: 10/21/2022]
Abstract
Expert radiologists can quickly extract a basic "gist" understanding of a medical image following less than a second exposure, leading to above-chance diagnostic classification of images. Most of this work has focused on radiology tasks (such as screening mammography), and it is currently unclear whether this pattern of results and the nature of visual expertise underlying this ability are applicable to pathology, another medical imaging domain demanding visual diagnostic interpretation. To further characterize the detection, localization, and diagnosis of medical images, this study examined eye movements and diagnostic decision-making when pathologists were briefly exposed to digital whole slide images of melanocytic skin biopsies. Twelve resident (N = 5), fellow (N = 5), and attending pathologists (N = 2) with experience interpreting dermatopathology briefly viewed 48 cases presented for 500 ms each, and we tracked their eye movements towards histological abnormalities, their ability to classify images as containing or not containing invasive melanoma, and their ability to localize critical image regions. Results demonstrated rapid shifts of the eyes towards critical abnormalities during image viewing, high diagnostic sensitivity and specificity, and a surprisingly accurate ability to localize critical diagnostic image regions. Furthermore, when pathologists fixated critical regions with their eyes, they were subsequently much more likely to successfully localize that region on an outline of the image. Results are discussed relative to models of medical image interpretation and innovative methods for monitoring and assessing expertise development during medical education and training.
Collapse
Affiliation(s)
- Tad T. Brunyé
- Center for Applied Brain and Cognitive Sciences, Tufts University, Medford, MA, USA
| | - Trafton Drew
- Department of Psychology, University of Utah, Salt Lake City, UT, USA
| | - Manob Jyoti Saikia
- Center for Applied Brain and Cognitive Sciences, Tufts University, Medford, MA, USA
| | - Kathleen F. Kerr
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Megan M. Eguchi
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Annie C. Lee
- Department of Medicine, David Geffen School of Medicine, University of California Los Angeles, CA, USA
| | - Caitlin May
- Dermatopathology Northwest, Bellevue, WA, USA
| | - David E. Elder
- Division of Anatomic Pathology, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Joann G. Elmore
- Department of Medicine, David Geffen School of Medicine, University of California Los Angeles, CA, USA
| |
Collapse
|
52
|
Abstract
Every aspect of vision, from the opsin proteins to the eyes and the ways that they serve animal behavior, is incredibly diverse. It is only with an evolutionary perspective that this diversity can be understood and fully appreciated. In this review, I describe and explain the diversity at each level and try to convey an understanding of how the origin of the first opsin some 800 million years ago could initiate the avalanche that produced the astonishing diversity of eyes and vision that we see today. Despite the diversity, many types of photoreceptors, eyes, and visual roles have evolved multiple times independently in different animals, revealing a pattern of eye evolution strictly guided by functional constraints and driven by the evolution of gradually more demanding behaviors. I conclude the review by introducing a novel distinction between active and passive vision that points to uncharted territories in vision research. Expected final online publication date for the Annual Review of Vision Science, Volume 7 is September 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Dan-E Nilsson
- Lund Vision Group, Department of Biology, Lund University, 22362 Lund, Sweden;
| |
Collapse
|
53
|
Abstract
Categorization performance is a popular metric of scene recognition and understanding in behavioral and computational research. However, categorical constructs and their labels can be somewhat arbitrary. Derived from exhaustive vocabularies of place names (e.g., Deng et al., 2009), or the judgements of small groups of researchers (e.g., Fei-Fei, Iyer, Koch, & Perona, 2007), these categories may not correspond with human-preferred taxonomies. Here, we propose clustering by increasing the rand index via coordinate ascent (CIRCA): an unsupervised, data-driven clustering method for deriving ground-truth scene categories. In Experiment 1, human participants organized 80 stereoscopic images of outdoor scenes from the Southampton-York Natural Scenes (SYNS) dataset (Adams et al., 2016) into discrete categories. In separate tasks, images were grouped according to i) semantic content, ii) three-dimensional spatial structure, or iii) two-dimensional image appearance. Participants provided text labels for each group. Using the CIRCA method, we determined the most representative category structure and then derived category labels for each task/dimension. In Experiment 2, we found that these categories generalized well to a larger set of SYNS images, and new observers. In Experiment 3, we tested the relationship between our category systems and the spatial envelope model (Oliva & Torralba, 2001). Finally, in Experiment 4, we validated CIRCA on a larger, independent dataset of same-different category judgements. The derived category systems outperformed the SUN taxonomy (Xiao, Hays, Ehinger, Oliva, & Torralba, 2010) and an alternative clustering method (Greene, 2019). In summary, we believe this novel categorization method can be applied to a wide range of datasets to derive optimal categorical groupings and labels from psychophysical judgements of stimulus similarity.
Collapse
Affiliation(s)
- Matt D Anderson
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| | - Erich W Graf
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| | - James H Elder
- Centre for Vision Research, Department of Psychology, Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, Canada.,
| | - Krista A Ehinger
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia.,
| | - Wendy J Adams
- Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.,
| |
Collapse
|
54
|
De Cesarei A, Cavicchi S, Cristadoro G, Lippi M. Do Humans and Deep Convolutional Neural Networks Use Visual Information Similarly for the Categorization of Natural Scenes? Cogn Sci 2021; 45:e13009. [PMID: 34170027 PMCID: PMC8365760 DOI: 10.1111/cogs.13009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 05/19/2021] [Accepted: 05/31/2021] [Indexed: 11/28/2022]
Abstract
The investigation of visual categorization has recently been aided by the introduction of deep convolutional neural networks (CNNs), which achieve unprecedented accuracy in picture classification after extensive training. Even if the architecture of CNNs is inspired by the organization of the visual brain, the similarity between CNN and human visual processing remains unclear. Here, we investigated this issue by engaging humans and CNNs in a two-class visual categorization task. To this end, pictures containing animals or vehicles were modified to contain only low/high spatial frequency (HSF) information, or were scrambled in the phase of the spatial frequency spectrum. For all types of degradation, accuracy increased as degradation was reduced for both humans and CNNs; however, the thresholds for accurate categorization varied between humans and CNNs. More remarkable differences were observed for HSF information compared to the other two types of degradation, both in terms of overall accuracy and image-level agreement between humans and CNNs. The difficulty with which the CNNs were shown to categorize high-passed natural scenes was reduced by picture whitening, a procedure which is inspired by how visual systems process natural images. The results are discussed concerning the adaptation to regularities in the visual environment (scene statistics); if the visual characteristics of the environment are not learned by CNNs, their visual categorization may depend only on a subset of the visual information on which humans rely, for example, on low spatial frequency information.
Collapse
Affiliation(s)
| | | | | | - Marco Lippi
- Department of Sciences and Methods for EngineeringUniversity of Modena and Reggio Emilia
| |
Collapse
|
55
|
Kurtz KJ, Silliman DC. Object understanding: Investigating the path from percept to meaning. Acta Psychol (Amst) 2021; 216:103307. [PMID: 33894533 DOI: 10.1016/j.actpsy.2021.103307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 03/31/2021] [Accepted: 04/01/2021] [Indexed: 11/29/2022] Open
Abstract
Researchers tend to follow two paths when investigating categorization: 1) artificial classification learning tasks and 2) studies of natural conceptual organization involving reasoning from prior category knowledge. Largely separate, another body of research addresses the process of object recognition, i.e., how people identify what they are looking at strictly in terms of visual as opposed to semantic properties. The present work brings together elements from each of these approaches in order to address object understanding: the ubiquitous natural process of accessing meaning based on a realistic image of an everyday object. According to a widely held features-first framework, a stimulus is initially encoded as a set of features that is compared to stored category representations to find the best match. This approach has been successful for explaining artificial classification learning, but it bypasses how items are encoded and fails to include a role for top-down processing in constructing item representations. We used a speeded verification task to evaluate the features-first account using realistic stimuli. Participants saw photographic images of everyday objects and judged as quickly as possible whether a provided verbal description matched the picture. Category descriptions (basic-level labels) were verified significantly faster than descriptions of physical or functional properties. This suggests that people access the category of the stimulus prior to accessing its parsed features. We outline a construal account whereby the category is accessed first to construct a featural item interpretation rather than features being the basis for determining the category.
Collapse
|
56
|
Van Rinsveld A, Wens V, Guillaume M, Beuel A, Gevers W, De Tiège X, Content A. Automatic Processing of Numerosity in Human Neocortex Evidenced by Occipital and Parietal Neuromagnetic Responses. Cereb Cortex Commun 2021; 2:tgab028. [PMID: 34296173 PMCID: PMC8152830 DOI: 10.1093/texcom/tgab028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 03/20/2021] [Accepted: 04/05/2021] [Indexed: 01/23/2023] Open
Abstract
Humans and other animal species are endowed with the ability to sense, represent, and mentally manipulate the number of items in a set without needing to count them. One central hypothesis is that this ability relies on an automated functional system dedicated to numerosity, the perception of the discrete numerical magnitude of a set of items. This system has classically been associated with intraparietal regions, however accumulating evidence in favor of an early visual number sense calls into question the functional role of parietal regions in numerosity processing. Targeting specifically numerosity among other visual features in the earliest stages of processing requires high temporal and spatial resolution. We used frequency-tagged magnetoencephalography to investigate the early automatic processing of numerical magnitudes and measured the steady-state brain responses specifically evoked by numerical and other visual changes in the visual scene. The neuromagnetic responses showed implicit discrimination of numerosity, total occupied area, and convex hull. The source reconstruction corresponding to the implicit discrimination responses showed common and separate sources along the ventral and dorsal visual pathways. Occipital sources attested the perceptual salience of numerosity similarly to both other implicitly discriminable visual features. Crucially, we found parietal responses uniquely associated with numerosity discrimination, showing automatic processing of numerosity in the parietal cortex, even when not relevant to the task. Taken together, these results provide further insights into the functional roles of parietal and occipital regions in numerosity encoding along the visual hierarchy.
Collapse
Affiliation(s)
- Amandine Van Rinsveld
- Center for Research in Cognition and Neurosciences (CRCN), UNI – ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels 1050, Belgium
| | - Vincent Wens
- Laboratoire de Cartographie fonctionnelle du Cerveau (LCFC), UNI – ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels 1070, Belgium
- Magnetoencephalography Unit, Department of Functional Neuroimaging, Service of Nuclear Medicine, CUB – Hôpital Erasme, Brussels 1070, Belgium
| | - Mathieu Guillaume
- Center for Research in Cognition and Neurosciences (CRCN), UNI – ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels 1050, Belgium
| | - Anthony Beuel
- Center for Research in Cognition and Neurosciences (CRCN), UNI – ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels 1050, Belgium
| | - Wim Gevers
- Center for Research in Cognition and Neurosciences (CRCN), UNI – ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels 1050, Belgium
| | - Xavier De Tiège
- Laboratoire de Cartographie fonctionnelle du Cerveau (LCFC), UNI – ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels 1070, Belgium
- Magnetoencephalography Unit, Department of Functional Neuroimaging, Service of Nuclear Medicine, CUB – Hôpital Erasme, Brussels 1070, Belgium
| | - Alain Content
- Center for Research in Cognition and Neurosciences (CRCN), UNI – ULB Neuroscience Institute, Université libre de Bruxelles (ULB), Brussels 1050, Belgium
| |
Collapse
|
57
|
Melcher D, Huber-Huber C, Wutz A. Enumerating the forest before the trees: The time courses of estimation-based and individuation-based numerical processing. Atten Percept Psychophys 2021; 83:1215-1229. [PMID: 33000437 PMCID: PMC8049909 DOI: 10.3758/s13414-020-02137-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/01/2020] [Indexed: 11/23/2022]
Abstract
Ensemble perception refers to the ability to report attributes of a group of objects, rather than focusing on only one or a few individuals. An everyday example of ensemble perception is the ability to estimate the numerosity of a large number of items. The time course of ensemble processing, including that of numerical estimation, remains a matter of debate, with some studies arguing for rapid, "preattentive" processing and other studies suggesting that ensemble perception improves with longer presentation durations. We used a forward-simultaneous masking procedure that effectively controls stimulus durations to directly measure the temporal dynamics of ensemble estimation and compared it with more precise enumeration of individual objects. Our main finding was that object individuation within the subitizing range (one to four items) took about 100-150 ms to reach its typical capacity limits, whereas estimation (six or more items) showed a temporal resolution of 50 ms or less. Estimation accuracy did not improve over time. Instead, there was an increasing tendency, with longer effective durations, to underestimate the number of targets for larger set sizes (11-35 items). Overall, the time course of enumeration for one or a few single items was dramatically different from that of estimating numerosity of six or more items. These results are consistent with the idea that the temporal resolution of ensemble processing may be as rapid as, or even faster than, individuation of individual items, and support a basic distinction between the mechanisms underlying exact enumeration of small sets (one to four items) from estimation.
Collapse
Affiliation(s)
- David Melcher
- Center for Mind/Brain Sciences and Department of Psychology and Cognitive Sciences, University of Trento, Corso Bettini 31, 38068, Rovereto, Italy.
- Psychology Program, Division of Science, New York University Abu Dhabi, Abu Dhabi, UAE.
| | - Christoph Huber-Huber
- Center for Mind/Brain Sciences and Department of Psychology and Cognitive Sciences, University of Trento, Corso Bettini 31, 38068, Rovereto, Italy
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Andreas Wutz
- Center for Cognitive Neuroscience, University of Salzburg, Salzburg, Austria
- Picower Institute for Learning and Memory, MIT, Cambridge, MA, USA
| |
Collapse
|
58
|
Abstract
In our exploratory study, we ask how naive observers, without a distinct religious background,
approach biblical art that combines image and text. For this purpose, we choose the
book ‘New biblical figures of the Old and New Testament’ published in 1569 as source of
the stimuli. This book belongs to the genre of illustrated Bibles, which were very popular
during the Reformation. Since there is no empirical knowledge regarding the interaction
between image and text during the process of such biblical art reception, we selected four
relevant images from the book and measured the eye movements of participants in order to
characterize and quantify their scanning behavior related to such stimuli in terms of i) looking
at text (text usage), ii) text vs. image interaction measures (semantic or contextual relevance
of text), and iii) narration. We show that texts capture attention early in the process
of inspection and that text and image interact. Moreover, semantics of texts are used to guide
eye movements later through the image, supporting the formation of the narrative.
Collapse
|
59
|
Abstract
Rapid visual perception is often viewed as a bottom-up process. Category-preferred neural regions are often characterized as automatic, default processing mechanisms for visual inputs of their categorical preference. To explore the sensitivity of such regions to top-down information, we examined three scene-preferring brain regions, the occipital place area (OPA), the parahippocampal place area (PPA), and the retrosplenial complex (RSC) and tested whether the processing of outdoor scenes is influenced by the functional contexts in which they are seen. Context was manipulated by presenting real-world landscape images as if being viewed through a window or within a picture frame-manipulations that do not affect scene content but do affect one's functional knowledge regarding the scene. This manipulation influences neural scene processing (as measured by fMRI): The OPA and the PPA exhibited greater neural activity when participants viewed images as if through a window as compared with within a picture frame, whereas the RSC did not show this difference. In a separate behavioral experiment, functional context affected scene memory in predictable directions (boundary extension). Our interpretation is that the window context denotes three dimensionality, therefore rendering the perceptual experience of viewing landscapes as more realistic. Conversely, the frame context denotes a 2-D image. As such, more spatially biased scene representations in the OPA and the PPA are influenced by differences in top-down, perceptual expectations generated from context. In contrast, more semantically biased scene representations in the RSC are likely to be less affected by top-down signals that carry information about the physical layout of a scene.
Collapse
|
60
|
Leroy A, Spotorno S, Faure S. Traitements sémantiques et émotionnels des scènes visuelles complexes : une synthèse critique de l’état actuel des connaissances. ANNEE PSYCHOLOGIQUE 2021. [DOI: 10.3917/anpsy1.211.0101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
|
61
|
Hunt C, Meinhardt G. Synergy of spatial frequency and orientation bandwidth in texture segregation. J Vis 2021; 21:5. [PMID: 33560290 PMCID: PMC7873498 DOI: 10.1167/jov.21.2.5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 12/23/2020] [Indexed: 11/28/2022] Open
Abstract
Defining target textures by increased bandwidths in spatial frequency and orientation, we observed strong cue combination effects in a combined texture figure detection and discrimination task. Performance for double-cue targets was better than predicted by independent processing of either cue and even better than predicted from linear cue integration. Application of a texture-processing model revealed that the oversummative cue combination effect is captured by calculating a low-level summary statistic (\(\Delta CE_m\)), which describes the differential contrast energy to target and reference textures, from multiple scales and orientations, and integrating this statistic across channels with a winner-take-all rule. Modeling detection performance using a signal detection theory framework showed that the observers' sensitivity to single-cue and double-cue texture targets, measured in \(d^{\prime }\) units, could be reproduced with plausible settings for filter and noise parameters. These results challenge models assuming separate channeling of elementary features and their later integration, since oversummative cue combination effects appear as an inherent property of local energy mechanisms, at least for spatial frequency and orientation bandwidth-modulated textures.
Collapse
Affiliation(s)
- Cordula Hunt
- Department of Psychology, Methods Section, Johannes Gutenberg-Universität, Mainz, Germany
| | - Günter Meinhardt
- Department of Psychology, Methods Section, Johannes Gutenberg-Universität, Mainz, Germany
| |
Collapse
|
62
|
Global and local interference effects in ensemble encoding are best explained by interactions between summary representations of the mean and the range. Atten Percept Psychophys 2021; 83:1106-1128. [PMID: 33506350 PMCID: PMC8049940 DOI: 10.3758/s13414-020-02224-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/30/2020] [Indexed: 11/16/2022]
Abstract
Through ensemble encoding, the visual system compresses redundant statistical properties from multiple items into a single summary metric (e.g., average size). Numerous studies have shown that global summary information is extracted quickly, does not require access to single-item representations, and often interferes with reports of single items from the set. Yet a thorough understanding of ensemble processing would benefit from a more extensive investigation at the local level. Thus, the purpose of this study was to provide a more critical inspection of global-local processing in ensemble perception. Taking inspiration from Navon (Cognitive Psychology, 9(3), 353-383, 1977), we employed a novel paradigm that independently manipulates the degree of interference at the global (mean) or local (single item) level of the ensemble. Initial results were consistent with reciprocal interference between global and local ensemble processing. However, further testing revealed that local interference effects were better explained by interference from another summary statistic, the range of the set. Furthermore, participants were unable to disambiguate single items from the ensemble display from other items that were within the ensemble range but, critically, were not actually present in the ensemble. Thus, it appears that local item values are likely inferred based on their relationship to higher-order summary statistics such as the range and the mean. These results conflict with claims that local information is captured alongside global information in summary representations. In such studies, successful identification of set members was not compared with misidentification of items within the range, but which were nevertheless not presented within the set.
Collapse
|
63
|
Gu J, Liu B, Yan W, Miao Q, Wei J. Investigating the Impact of the Missing Significant Objects in Scene Recognition Using Multivariate Pattern Analysis. Front Neurorobot 2021; 14:597471. [PMID: 33390924 PMCID: PMC7773817 DOI: 10.3389/fnbot.2020.597471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 11/30/2020] [Indexed: 11/13/2022] Open
Abstract
Significant objects in a scene can make a great contribution to scene recognition. Besides the three scene-selective regions: parahippocampal place area (PPA), retrosplenial complex (RSC), and occipital place area (OPA), some neuroimaging studies have shown that the lateral occipital complex (LOC) is also engaged in scene recognition processing. In this study, the multivariate pattern analysis was adopted to explore the object-scene association in scene recognition when different amounts of significant objects were masked. The scene classification only succeeded in the intact scene in the ROIs. In addition, the average signal intensity in LOC [including the lateral occipital cortex (LO) and the posterior fusiform area (pF)] decreased when there were masked objects, but such a decrease was not observed in scene-selective regions. These results suggested that LOC was sensitive to the loss of significant objects and mainly involved in scene recognition by the object-scene semantic association. The performance of the scene-selective areas may be mainly due to the fact that they responded to the change of the scene's entire attribute, such as the spatial information, when they were employed in the scene recognition processing. These findings further enrich our knowledge of the significant objects' influence on the activation pattern during the process of scene recognition.
Collapse
Affiliation(s)
- Jin Gu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Baolin Liu
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Weiran Yan
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Qiaomu Miao
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jianguo Wei
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
64
|
Lee SM, Jin SW, Park SB, Park EH, Lee CH, Lee HW, Lim HY, Yoo SW, Ahn JR, Shin J, Lee SA, Lee I. Goal-directed interaction of stimulus and task demand in the parahippocampal region. Hippocampus 2021; 31:717-736. [PMID: 33394547 PMCID: PMC8359334 DOI: 10.1002/hipo.23295] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 12/05/2020] [Accepted: 12/12/2020] [Indexed: 11/10/2022]
Abstract
The hippocampus and parahippocampal region are essential for representing episodic memories involving various spatial locations and objects, and for using those memories for future adaptive behavior. The “dual‐stream model” was initially formulated based on anatomical characteristics of the medial temporal lobe, dividing the parahippocampal region into two streams that separately process and relay spatial and nonspatial information to the hippocampus. Despite its significance, the dual‐stream model in its original form cannot explain recent experimental results, and many researchers have recognized the need for a modification of the model. Here, we argue that dividing the parahippocampal region into spatial and nonspatial streams a priori may be too simplistic, particularly in light of ambiguous situations in which a sensory cue alone (e.g., visual scene) may not allow such a definitive categorization. Upon reviewing evidence, including our own, that reveals the importance of goal‐directed behavioral responses in determining the relative involvement of the parahippocampal processing streams, we propose the Goal‐directed Interaction of Stimulus and Task‐demand (GIST) model. In the GIST model, input stimuli such as visual scenes and objects are first processed by both the postrhinal and perirhinal cortices—the postrhinal cortex more heavily involved with visual scenes and perirhinal cortex with objects—with relatively little dependence on behavioral task demand. However, once perceptual ambiguities are resolved and the scenes and objects are identified and recognized, the information is then processed through the medial or lateral entorhinal cortex, depending on whether it is used to fulfill navigational or non‐navigational goals, respectively. As complex sensory stimuli are utilized for both navigational and non‐navigational purposes in an intermixed fashion in naturalistic settings, the hippocampus may be required to then put together these experiences into a coherent map to allow flexible cognitive operations for adaptive behavior to occur.
Collapse
Affiliation(s)
- Su-Min Lee
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, South Korea
| | - Seung-Woo Jin
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, South Korea
| | - Seong-Beom Park
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, South Korea
| | - Eun-Hye Park
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, South Korea
| | - Choong-Hee Lee
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, South Korea
| | - Hyun-Woo Lee
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, South Korea
| | - Heung-Yeol Lim
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, South Korea
| | - Seung-Woo Yoo
- Department of Biomedical Science, Charles E. Schmidt College of Medicine, Brain Institute, Florida Atlantic University, Jupiter, Florida, USA
| | - Jae Rong Ahn
- Department of Biology, Tufts University, Medford, Massachusetts, USA
| | - Jhoseph Shin
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, South Korea
| | - Sang Ah Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | - Inah Lee
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, South Korea
| |
Collapse
|
65
|
Beitner J, Helbing J, Draschkow D, Võ MLH. Get Your Guidance Going: Investigating the Activation of Spatial Priors for Efficient Search in Virtual Reality. Brain Sci 2021; 11:44. [PMID: 33406655 PMCID: PMC7823740 DOI: 10.3390/brainsci11010044] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 12/21/2020] [Accepted: 12/22/2020] [Indexed: 11/21/2022] Open
Abstract
Repeated search studies are a hallmark in the investigation of the interplay between memory and attention. Due to a usually employed averaging, a substantial decrease in response times occurring between the first and second search through the same search environment is rarely discussed. This search initiation effect is often the most dramatic decrease in search times in a series of sequential searches. The nature of this initial lack of search efficiency has thus far remained unexplored. We tested the hypothesis that the activation of spatial priors leads to this search efficiency profile. Before searching repeatedly through scenes in VR, participants either (1) previewed the scene, (2) saw an interrupted preview, or (3) started searching immediately. The search initiation effect was present in the latter condition but in neither of the preview conditions. Eye movement metrics revealed that the locus of this effect lies in search guidance instead of search initiation or decision time, and was beyond effects of object learning or incidental memory. Our study suggests that upon visual processing of an environment, a process of activating spatial priors to enable orientation is initiated, which takes a toll on search time at first, but once activated it can be used to guide subsequent searches.
Collapse
Affiliation(s)
- Julia Beitner
- Scene Grammar Lab, Institute of Psychology, Goethe University, 60323 Frankfurt am Main, Germany; (J.H.); (M.L.-H.V.)
| | - Jason Helbing
- Scene Grammar Lab, Institute of Psychology, Goethe University, 60323 Frankfurt am Main, Germany; (J.H.); (M.L.-H.V.)
| | - Dejan Draschkow
- Brain and Cognition Laboratory, Department of Psychiatry, University of Oxford, Oxford OX3 7JX, UK;
| | - Melissa L.-H. Võ
- Scene Grammar Lab, Institute of Psychology, Goethe University, 60323 Frankfurt am Main, Germany; (J.H.); (M.L.-H.V.)
| |
Collapse
|
66
|
Episodic and semantic memory processes in the boundary extension effect: An investigation using the remember/know paradigm. Acta Psychol (Amst) 2020; 211:103190. [PMID: 33130488 DOI: 10.1016/j.actpsy.2020.103190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 08/31/2020] [Accepted: 09/24/2020] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Boundary extension (BE) is a phenomenon where participants report from memory that they have experienced more information of a scene than was initially presented. The goal of the current study was to investigate whether BE is fully based on episodic memory or also involves semantic scheme knowledge. METHODS The study incorporated the remember/know paradigm into a BE task. Scenes were first learned incidentally, with participants later indicating whether they remembered or knew that they had seen the scene before. Next, they had to rate 3 views - zoomed in, zoomed out or unchanged - of the original picture on similarity in closeness in order to measure BE. RESULTS The results showed a systematic BE pattern, but no difference in the amount of BE for episodic ('remember') and semantic ('know') memory. Additionally, the remember/know paradigm used in this study showed good sensitivity for both the remember and know responses. DISCUSSION The results suggest that BE might not critically depend on the contextual information provided by episodic memory, but rather depends on schematic knowledge shared by episodic and semantic memory. Schematic knowledge might be involved in BE by providing an expectation of what likely lies beyond the boundaries of the scene based on semantic guidance. GEL CLASSIFICATION 2343 learning & memory.
Collapse
|
67
|
Děchtěrenko F, Lukavský J, Štipl J. False memories for scenes using the DRM paradigm. Vision Res 2020; 178:48-59. [PMID: 33113436 DOI: 10.1016/j.visres.2020.09.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Revised: 09/25/2020] [Accepted: 09/29/2020] [Indexed: 11/24/2022]
Abstract
People are remarkably good at remembering photographs. To further investigate the nature of the stored representations and the fidelity of human memories, it would be useful to evaluate the visual similarity of stimuli presented in experiments. Here, we explored the possible use of convolutional neural networks (CNN) as a measure of perceptual or representational similarity of visual scenes with respect to visual memory research. In Experiment 1, we presented participants with sets of nine images from the same scene category and tested whether they were able to detect the most distant scene in the image space defined by CNN. Experiment 2 was a visual variant of the Deese-Roediger-McDermott paradigm. We asked participants to remember a set of photographs from the same scene category. The photographs were preselected based on their distance to a particular visual prototype (defined as centroid of the image space). In the recognition test, we observed higher false alarm rates for scenes closer to this visual prototype. Our findings show that the similarity measured by CNN is reflected in human behavior: people can detect odd-one-out scenes or be lured to false alarms with similar stimuli. This method can be used for further studies regarding visual memory for complex scenes.
Collapse
Affiliation(s)
- Filip Děchtěrenko
- Institute of Psychology, Czech Academy of Sciences, Hybernská 8, 110 00 Prague, Czech Republic; Faculty of Arts, Charles University, Celetná 20, 110 00 Prague, Czech Republic.
| | - Jiří Lukavský
- Institute of Psychology, Czech Academy of Sciences, Hybernská 8, 110 00 Prague, Czech Republic; Faculty of Arts, Charles University, Celetná 20, 110 00 Prague, Czech Republic
| | - Jiří Štipl
- Faculty of Arts, Charles University, Celetná 20, 110 00 Prague, Czech Republic
| |
Collapse
|
68
|
Henderson JM, Goold JE, Choi W, Hayes TR. Neural Correlates of Fixated Low- and High-level Scene Properties during Active Scene Viewing. J Cogn Neurosci 2020; 32:2013-2023. [PMID: 32573384 PMCID: PMC11164273 DOI: 10.1162/jocn_a_01599] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
During real-world scene perception, viewers actively direct their attention through a scene in a controlled sequence of eye fixations. During each fixation, local scene properties are attended, analyzed, and interpreted. What is the relationship between fixated scene properties and neural activity in the visual cortex? Participants inspected photographs of real-world scenes in an MRI scanner while their eye movements were recorded. Fixation-related fMRI was used to measure activation as a function of lower- and higher-level scene properties at fixation, operationalized as edge density and meaning maps, respectively. We found that edge density at fixation was most associated with activation in early visual areas, whereas semantic content at fixation was most associated with activation along the ventral visual stream including core object and scene-selective areas (lateral occipital complex, parahippocampal place area, occipital place area, and retrosplenial cortex). The observed activation from semantic content was not accounted for by differences in edge density. The results are consistent with active vision models in which fixation gates detailed visual analysis for fixated scene regions, and this gating influences both lower and higher levels of scene analysis.
Collapse
Affiliation(s)
| | | | - Wonil Choi
- Gwangju Institute of Science and Technology
| | | |
Collapse
|
69
|
Egner LE, Sütterlin S, Calogiuri G. Proposing a Framework for the Restorative Effects of Nature through Conditioning: Conditioned Restoration Theory. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E6792. [PMID: 32957693 PMCID: PMC7558998 DOI: 10.3390/ijerph17186792] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 09/09/2020] [Accepted: 09/15/2020] [Indexed: 12/02/2022]
Abstract
Natural environments have been shown to trigger psychological and physiological restoration in humans. A new framework regarding natural environments restorative properties is proposed. Conditioned restoration theory builds on a classical conditioning paradigm, postulating the occurrence of four stages: (i) unconditioned restoration, unconditioned positive affective responses reliably occur in a given environment (such as in a natural setting); (ii) restorative conditioning, the positive affective responses become conditioned to the environment; (iii) conditioned restoration, subsequent exposure to the environment, in the absence of the unconditioned stimulus, retrieves the same positive affective responses; and (iv) stimulus generalization, subsequent exposure to associated environmental cues retrieves the same positive affective responses. The process, hypothetically not unique to natural environments, involve the well-documented phenomenon of conditioning, retrieval, and association and relies on evaluative conditioning, classical conditioning, core affect, and conscious expectancy. Empirical findings showing that restoration can occur in non-natural environments and through various sensory stimuli, as well as findings demonstrating that previous negative experience with nature can subsequently lower restorative effects, are also presented in support of the theory. In integration with other existing theories, the theory should prove to be a valuable framework for future research.
Collapse
Affiliation(s)
- Lars Even Egner
- Citizens, Environment and Safety, Institute of Psychology, Norwegian University of Science and Technology, 7048 Trondheim, Norway
| | - Stefan Sütterlin
- Faculty of Health and Welfare Sciences, Østfold University College, 1757 Halden, Norway;
- Division of Clinical Neuroscience, Oslo University Hospital, 0450 Oslo, Norway
| | - Giovanna Calogiuri
- Faculty of Health and Social Sciences, University of South-Eastern Norway, 3045 Drammen, Norway;
- Department of Public Health and Sport Sciences, Inland Norway University of Applied Sciences, 2411 Elverum, Norway
| |
Collapse
|
70
|
Castelhano MS, Krzyś K. Rethinking Space: A Review of Perception, Attention, and Memory in Scene Processing. Annu Rev Vis Sci 2020; 6:563-586. [PMID: 32491961 DOI: 10.1146/annurev-vision-121219-081745] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Scene processing is fundamentally influenced and constrained by spatial layout and spatial associations with objects. However, semantic information has played a vital role in propelling our understanding of real-world scene perception forward. In this article, we review recent advances in assessing how spatial layout and spatial relations influence scene processing. We examine the organization of the larger environment and how we take full advantage of spatial configurations independently of semantic information. We demonstrate that a clear differentiation of spatial from semantic information is necessary to advance research in the field of scene processing.
Collapse
Affiliation(s)
- Monica S Castelhano
- Department of Psychology, Queen's University, Kingston, Ontario K7L 3N6, Canada;
| | - Karolina Krzyś
- Department of Psychology, Queen's University, Kingston, Ontario K7L 3N6, Canada;
| |
Collapse
|
71
|
Haskins AJ, Mentch J, Botch TL, Robertson CE. Active vision in immersive, 360° real-world environments. Sci Rep 2020; 10:14304. [PMID: 32868788 PMCID: PMC7459302 DOI: 10.1038/s41598-020-71125-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 08/06/2020] [Indexed: 11/30/2022] Open
Abstract
How do we construct a sense of place in a real-world environment? Real-world environments are actively explored via saccades, head turns, and body movements. Yet, little is known about how humans process real-world scene information during active viewing conditions. Here, we exploited recent developments in virtual reality (VR) and in-headset eye-tracking to test the impact of active vs. passive viewing conditions on gaze behavior while participants explored novel, real-world, 360° scenes. In one condition, participants actively explored 360° photospheres from a first-person perspective via self-directed motion (saccades and head turns). In another condition, photospheres were passively displayed to participants while they were head-restricted. We found that, relative to passive viewers, active viewers displayed increased attention to semantically meaningful scene regions, suggesting more exploratory, information-seeking gaze behavior. We also observed signatures of exploratory behavior in eye movements, such as quicker, more entropic fixations during active as compared with passive viewing conditions. These results show that active viewing influences every aspect of gaze behavior, from the way we move our eyes to what we choose to attend to. Moreover, these results offer key benchmark measurements of gaze behavior in 360°, naturalistic environments.
Collapse
Affiliation(s)
- Amanda J Haskins
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, 03755, USA.
| | - Jeff Mentch
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, 03755, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Thomas L Botch
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, 03755, USA
| | - Caroline E Robertson
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, 03755, USA
| |
Collapse
|
72
|
Feriol F, Vivet D, Watanabe Y. A Review of Environmental Context Detection for Navigation Based on Multiple Sensors. SENSORS 2020; 20:s20164532. [PMID: 32823560 PMCID: PMC7472608 DOI: 10.3390/s20164532] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 08/04/2020] [Accepted: 08/06/2020] [Indexed: 11/29/2022]
Abstract
Current navigation systems use multi-sensor data to improve the localization accuracy, but often without certitude on the quality of those measurements in certain situations. The context detection will enable us to build an adaptive navigation system to improve the precision and the robustness of its localization solution by anticipating possible degradation in sensor signal quality (GNSS in urban canyons for instance or camera-based navigation in a non-textured environment). That is why context detection is considered the future of navigation systems. Thus, it is important firstly to define this concept of context for navigation and to find a way to extract it from available information. This paper overviews existing GNSS and on-board vision-based solutions of environmental context detection. This review shows that most of the state-of-the art research works focus on only one type of data. It confirms that the main perspective of this problem is to combine different indicators from multiple sensors.
Collapse
Affiliation(s)
- Florent Feriol
- Optronics and Signal Research Group, ISAE-SUPAERO, 31055 Toulouse, France;
- Correspondence:
| | - Damien Vivet
- Optronics and Signal Research Group, ISAE-SUPAERO, 31055 Toulouse, France;
| | - Yoko Watanabe
- Department of Information Processing and Systems, ONERA, 31055 Toulouse, France;
| |
Collapse
|
73
|
Oscillatory Bursts in Parietal Cortex Reflect Dynamic Attention between Multiple Objects and Ensembles. J Neurosci 2020; 40:6927-6937. [PMID: 32753515 PMCID: PMC7470925 DOI: 10.1523/jneurosci.0231-20.2020] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 06/24/2020] [Accepted: 06/29/2020] [Indexed: 11/21/2022] Open
Abstract
The visual system uses two complimentary strategies to process multiple objects simultaneously within a scene and update their spatial positions in real time. It either uses selective attention to individuate a complex, dynamic scene into a few focal objects (i.e., object individuation), or it represents multiple objects as an ensemble by distributing attention more globally across the scene (i.e., ensemble grouping). Neural oscillations may be a key signature for focal object individuation versus distributed ensemble grouping, because they are thought to regulate neural excitability over visual areas through inhibitory control mechanisms. We recorded whole-head MEG data during a multiple-object tracking paradigm, in which human participants (13 female, 11 male) switched between different instructions for object individuation and ensemble grouping on different trials. The stimuli, responses, and the demand to keep track of multiple spatial locations over time were held constant between the two conditions. We observed increased α-band power (9-13 Hz) packed into oscillatory bursts in bilateral inferior parietal cortex during multiple-object processing. Single-trial analysis revealed greater burst occurrences on object individuation versus ensemble grouping trials. By contrast, we found no differences using standard analyses on across-trials averaged α-band power. Moreover, the bursting effects occurred only below/at, but not above, the typical capacity limits for multiple-object processing (at ∼4 objects). Our findings reveal the real-time neural correlates underlying the dynamic processing of multiple-object scenarios, which are modulated by grouping strategies and capacity. They support a rhythmic, α-pulsed organization of dynamic attention to multiple objects and ensembles.SIGNIFICANCE STATEMENT Dynamic multiple-object scenarios are an important problem in real-world and computer vision. They require keeping track of multiple objects as they move through space and time. Such problems can be solved in two ways: One can individuate a scene object by object, or alternatively group objects into ensembles. We observed greater occurrences of α-oscillatory burst events in parietal cortex for processing objects versus ensembles and below/at versus above processing capacity. These results demonstrate a unique top-down mechanism by which the brain dynamically adjusts its computational level between objects and ensembles. They help to explain how the brain copes with its capacity limitations in real-time environments and may lead the way to technological innovations for time-critical video analysis in computer vision.
Collapse
|
74
|
What can an echocardiographer see in briefly presented stimuli? Perceptual expertise in dynamic search. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2020; 5:30. [PMID: 32696181 PMCID: PMC7374494 DOI: 10.1186/s41235-020-00232-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 05/26/2020] [Indexed: 11/10/2022]
Abstract
Background Experts in medical image perception are able to detect abnormalities rapidly from medical images. This ability is likely due to enhanced pattern recognition on a global scale. However, the bulk of research in this domain has focused on static rather than dynamic images, so it remains unclear what level of information that can be extracted from these displays. This study was designed to examine the visual capabilities of echocardiographers—practitioners who provide information regarding cardiac integrity and functionality. In three experiments, echocardiographers and naïve participants completed an abnormality detection task that comprised movies presented on a range of durations, where half were abnormal. This was followed by an abnormality categorization task. Results Across all durations, the results showed that performance was high for detection, but less so for categorization, indicating that categorization was a more challenging task. Not surprisingly, echocardiographers outperformed naïve participants. Conclusions Together, this suggests that echocardiographers have a finely tuned capability for cardiac dysfunction, and a great deal of visual information can be extracted during a global assessment, within a brief glance. No relationship was evident between experience and performance which suggests that other factors such as individual differences need to be considered for future studies.
Collapse
|
75
|
Zhang X, Sun Y, Liu W, Zhang Z, Wu B. Twin mechanisms: Rapid scene recognition involves both feedforward and feedback processing. Acta Psychol (Amst) 2020; 208:103101. [PMID: 32485339 DOI: 10.1016/j.actpsy.2020.103101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 05/07/2020] [Accepted: 05/20/2020] [Indexed: 11/25/2022] Open
Abstract
The low spatial frequency (LSF) component of visual information rapidly conveyed coarse information for global perception, while the high spatial frequency (HSF) component delivered fine-grained information for detailed analyses. The feedforward theorists deemed that a coarse-to-fine process was sufficient for a rapid scene recognition. Based on the response priming paradigm, the present study aimed to deeply explore how different spatial frequency interacted with each other during rapid scene recognition. The response priming paradigm posited that as long as the prime slide could be rapidly recognized, the prime-target system was behaviorally equivalent to a feedforward system. Adopting broad spatial frequency images, experiment 1 revealed a typical response priming effect. But in experiment 2, when the HSF and the LSF components of the same pictures were separately presented, neither the LSF-to-HSF sequence nor the HSF-to-LSF sequence reproduced the response priming effect. These results demonstrated that LSF or HSF component alone was not sufficient for rapid scene recognition and, further, that the integration of different spatial frequency needed some early feedback loops. These findings supported that the local recurrent processing loops among early visual cortex was involved during rapid scene recognition.
Collapse
|
76
|
Seijdel N, Jahfari S, Groen IIA, Scholte HS. Low-level image statistics in natural scenes influence perceptual decision-making. Sci Rep 2020; 10:10573. [PMID: 32601499 PMCID: PMC7324621 DOI: 10.1038/s41598-020-67661-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 06/08/2020] [Indexed: 11/10/2022] Open
Abstract
A fundamental component of interacting with our environment is gathering and interpretation of sensory information. When investigating how perceptual information influences decision-making, most researchers have relied on manipulated or unnatural information as perceptual input, resulting in findings that may not generalize to real-world scenes. Unlike simplified, artificial stimuli, real-world scenes contain low-level regularities that are informative about the structural complexity, which the brain could exploit. In this study, participants performed an animal detection task on low, medium or high complexity scenes as determined by two biologically plausible natural scene statistics, contrast energy (CE) or spatial coherence (SC). In experiment 1, stimuli were sampled such that CE and SC both influenced scene complexity. Diffusion modelling showed that the speed of information processing was affected by low-level scene complexity. Experiment 2a/b refined these observations by showing how isolated manipulation of SC resulted in weaker but comparable effects, with an additional change in response boundary, whereas manipulation of only CE had no effect. Overall, performance was best for scenes with intermediate complexity. Our systematic definition quantifies how natural scene complexity interacts with decision-making. We speculate that CE and SC serve as an indication to adjust perceptual decision-making based on the complexity of the input.
Collapse
Affiliation(s)
- Noor Seijdel
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands. .,Amsterdam Brain and Cognition (ABC) Center, University of Amsterdam, Amsterdam, The Netherlands.
| | - Sara Jahfari
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.,Spinoza Centre for Neuroimaging, Royal Netherlands Academy of Arts and Sciences (KNAW), Amsterdam, The Netherlands
| | - Iris I A Groen
- Department of Psychology, New York University, New York, USA
| | - H Steven Scholte
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.,Amsterdam Brain and Cognition (ABC) Center, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
77
|
Mahamane S, Wan N, Porter A, Hancock AS, Campbell J, Lyon TE, Jordan KE. Natural Categorization: Electrophysiological Responses to Viewing Natural Versus Built Environments. Front Psychol 2020; 11:990. [PMID: 32587543 PMCID: PMC7298107 DOI: 10.3389/fpsyg.2020.00990] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 04/21/2020] [Indexed: 01/14/2023] Open
Abstract
Environments are unique in terms of structural composition and evoked human experience. Previous studies suggest that natural compared to built environments may increase positive emotions. Humans in natural environments also demonstrate greater performance on attention-based tasks. Few studies have investigated cortical mechanisms underlying these phenomena or probed these differences from a neural perspective. Using a temporally sensitive electrophysiological approach, we employ an event-related, implicit passive viewing task to demonstrate that in humans, a greater late positive potential (LPP) occurs with exposure to built than natural environments, resulting in a faster return of activation to pre-stimulus baseline levels when viewing natural environments. Our research thus provides new evidence suggesting natural environments are perceived differently from built environments, converging with previous behavioral findings and theoretical assumptions from environmental psychology.
Collapse
Affiliation(s)
- Salif Mahamane
- Department of Behavioral & Social Sciences, Western Colorado University, Gunnison, CO, United States
| | - Nick Wan
- Department of Psychology, Utah State University, Logan, UT, United States
| | - Alexis Porter
- Department of Psychology, Northwestern University, Evanston, IL, United States
| | - Allison S Hancock
- Department of Psychology, Utah State University, Logan, UT, United States
| | - Justin Campbell
- MD-PhD Program, School of Medicine, The University of Utah, Salt Lake City, UT, United States
| | - Thomas E Lyon
- Department of Psychology, Utah State University, Logan, UT, United States
| | - Kerry E Jordan
- Department of Psychology, Utah State University, Logan, UT, United States
| |
Collapse
|
78
|
Alwis Y, Haberman JM. Emotional judgments of scenes are influenced by unintentional averaging. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2020; 5:28. [PMID: 32529469 PMCID: PMC7290017 DOI: 10.1186/s41235-020-00228-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 05/09/2020] [Indexed: 11/28/2022]
Abstract
Background The visual system uses ensemble perception to summarize visual input across a variety of domains. This heuristic operates at multiple levels of vision, compressing information as basic as oriented lines or as complex as emotional faces. Given its pervasiveness, the ensemble unsurprisingly can influence how an individual item is perceived, and vice versa. Methods In the current experiments, we tested whether the perceived emotional valence of a single scene could be influenced by surrounding, simultaneously presented scenes. Observers first rated the emotional valence of a series of individual scenes. They then saw ensembles of the original images, presented in sets of four, and were cued to rate, for a second time, one of four. Results Results confirmed that the perceived emotional valence of the cued image was pulled toward the mean emotion of the surrounding ensemble on the majority of trials, even though the ensemble was task-irrelevant. Control experiments and analyses confirmed that the pull was driven by high-level, ensemble information. Conclusion We conclude that high-level ensemble information can influence how we perceive individual items in a crowd, even when working memory demands are low and the ensemble information is not directly task-relevant.
Collapse
Affiliation(s)
- Yavin Alwis
- The Department of Psychology, Rhodes College, 2000 N Parkway, Memphis, TN, USA
| | - Jason M Haberman
- The Department of Psychology, Rhodes College, 2000 N Parkway, Memphis, TN, USA.
| |
Collapse
|
79
|
Rosenholtz R. Demystifying visual awareness: Peripheral encoding plus limited decision complexity resolve the paradox of rich visual experience and curious perceptual failures. Atten Percept Psychophys 2020; 82:901-925. [PMID: 31970709 PMCID: PMC7303063 DOI: 10.3758/s13414-019-01968-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Human beings subjectively experience a rich visual percept. However, when behavioral experiments probe the details of that percept, observers perform poorly, suggesting that vision is impoverished. What can explain this awareness puzzle? Is the rich percept a mere illusion? How does vision work as well as it does? This paper argues for two important pieces of the solution. First, peripheral vision encodes its inputs using a scheme that preserves a great deal of useful information, while losing the information necessary to perform certain tasks. The tasks rendered difficult by the peripheral encoding include many of those used to probe the details of visual experience. Second, many tasks used to probe attentional and working memory limits are, arguably, inherently difficult, and poor performance on these tasks may indicate limits on decision complexity. Two assumptions are critical to making sense of this hypothesis: (1) All visual perception, conscious or not, results from performing some visual task; and (2) all visual tasks face the same limit on decision complexity. Together, peripheral encoding plus decision complexity can explain a wide variety of phenomena, including vision's marvelous successes, its quirky failures, and our rich subjective impression of the visual world.
Collapse
Affiliation(s)
- Ruth Rosenholtz
- MIT Department of Brain & Cognitive Sciences, CSAIL, Cambridge, MA, 02139, USA.
| |
Collapse
|
80
|
Disentangling the Independent Contributions of Visual and Conceptual Features to the Spatiotemporal Dynamics of Scene Categorization. J Neurosci 2020; 40:5283-5299. [PMID: 32467356 DOI: 10.1523/jneurosci.2088-19.2020] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 04/18/2020] [Accepted: 04/23/2020] [Indexed: 11/21/2022] Open
Abstract
Human scene categorization is characterized by its remarkable speed. While many visual and conceptual features have been linked to this ability, significant correlations exist between feature spaces, impeding our ability to determine their relative contributions to scene categorization. Here, we used a whitening transformation to decorrelate a variety of visual and conceptual features and assess the time course of their unique contributions to scene categorization. Participants (both sexes) viewed 2250 full-color scene images drawn from 30 different scene categories while having their brain activity measured through 256-channel EEG. We examined the variance explained at each electrode and time point of visual event-related potential (vERP) data from nine different whitened encoding models. These ranged from low-level features obtained from filter outputs to high-level conceptual features requiring human annotation. The amount of category information in the vERPs was assessed through multivariate decoding methods. Behavioral similarity measures were obtained in separate crowdsourced experiments. We found that all nine models together contributed 78% of the variance of human scene similarity assessments and were within the noise ceiling of the vERP data. Low-level models explained earlier vERP variability (88 ms after image onset), whereas high-level models explained later variance (169 ms). Critically, only high-level models shared vERP variability with behavior. Together, these results suggest that scene categorization is primarily a high-level process, but reliant on previously extracted low-level features.SIGNIFICANCE STATEMENT In a single fixation, we glean enough information to describe a general scene category. Many types of features are associated with scene categories, ranging from low-level properties, such as colors and contours, to high-level properties, such as objects and attributes. Because these properties are correlated, it is difficult to understand each property's unique contributions to scene categorization. This work uses a whitening transformation to remove the correlations between features and examines the extent to which each feature contributes to visual event-related potentials over time. We found that low-level visual features contributed first but were not correlated with categorization behavior. High-level features followed 80 ms later, providing key insights into how the brain makes sense of a complex visual world.
Collapse
|
81
|
Harel A, Mzozoyana MW, Al Zoubi H, Nador JD, Noesen BT, Lowe MX, Cant JS. Artificially-generated scenes demonstrate the importance of global scene properties for scene perception. Neuropsychologia 2020; 141:107434. [PMID: 32179102 DOI: 10.1016/j.neuropsychologia.2020.107434] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 03/04/2020] [Accepted: 03/09/2020] [Indexed: 10/24/2022]
Abstract
Recent electrophysiological research highlights the significance of global scene properties (GSPs) for scene perception. However, since real-world scenes span a range of low-level stimulus properties and high-level contextual semantics, GSP effects may also reflect additional processing of such non-global factors. We examined this question by asking whether Event-Related Potentials (ERPs) to GSPs will still be observed when specific low- and high-level scene properties are absent from the scene. We presented participants with computer-based artificially-manipulated scenes varying in two GSPs (spatial expanse and naturalness) which minimized other sources of scene information (color and semantic object detail). We found that the peak amplitude of the P2 component was sensitive to the spatial expanse and naturalness of the artificially-generated scenes: P2 amplitude was higher to closed than open scenes, and in response to manmade than natural scenes. A control experiment showed that the effect of Naturalness on the P2 is not driven by local texture information, while earlier effects of naturalness, expressed as a modulation of the P1 and N1 amplitudes, are sensitive to texture information. Our results demonstrate that GSPs are processed robustly around 220 ms and that P2 can be used as an index of global scene perception.
Collapse
Affiliation(s)
- Assaf Harel
- Department of Psychology, Wright State University, Dayton, OH, USA.
| | - Mavuso W Mzozoyana
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Hamada Al Zoubi
- Department of Neuroscience, Cell Biology and Physiology, Wright State University, Dayton, OH, USA
| | - Jeffrey D Nador
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Birken T Noesen
- Department of Psychology, Wright State University, Dayton, OH, USA
| | - Matthew X Lowe
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jonathan S Cant
- Department of Psychology, University of Toronto Scarborough, Toronto, ON, Canada
| |
Collapse
|
82
|
Park HS, Shi J. Force from Motion: Decoding Control Force of Activity in a First-Person Video. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:622-635. [PMID: 30489262 DOI: 10.1109/tpami.2018.2883327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
A first-person video delivers what the camera wearer (actor) experiences through physical interactions with surroundings. In this paper, we focus on a problem of Force from Motion-estimating the active force and torque exerted by the actor to drive her/his activity-from a first-person video. We use two physical cues inherited in the first-person video. (1) Ego-motion: the camera motion is generated by a resultant of force interactions, which allows us to understand the effect of the active force using Newtonian mechanics. (2) Visual semantics: the first-person visual scene is deployed to afford the actor's activity, which is indicative of the physical context of the activity. We estimate the active force and torque using a dynamical system that can describe the transition (dynamics) of the actor's physical state (position, orientation, and linear/angular momentum) where the latent physical state is indirectly observed by the first-person video. We approximate the physical state with the 3D camera trajectory that is reconstructed up to scale and orientation. The absolute scale factor and gravitation field are learned from the ego-motion and visual semantics of the first-person video. Inspired by an optimal control theory, we solve the dynamical system by minimizing reprojection error. Our method shows quantitatively equivalent reconstruction comparing to IMU measurements in terms of gravity and scale recovery and outperforms the methods based on 2D optical flow for an active action recognition task. We apply our method to first-person videos of mountain biking, urban bike racing, skiing, speedflying with parachute, and wingsuit flying where inertial measurements are not accessible.
Collapse
|
83
|
Goetschalckx L, Wagemans J. MemCat: a new category-based image set quantified on memorability. PeerJ 2019; 7:e8169. [PMID: 31844575 PMCID: PMC6911686 DOI: 10.7717/peerj.8169] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 11/06/2019] [Indexed: 01/21/2023] Open
Abstract
Images differ in their memorability in consistent ways across observers. What makes an image memorable is not fully understood to date. Most of the current insight is in terms of high-level semantic aspects, related to the content. However, research still shows consistent differences within semantic categories, suggesting a role for factors at other levels of processing in the visual hierarchy. To aid investigations into this role as well as contributions to the understanding of image memorability more generally, we present MemCat. MemCat is a category-based image set, consisting of 10K images representing five broader, memorability-relevant categories (animal, food, landscape, sports, and vehicle) and further divided into subcategories (e.g., bear). They were sampled from existing source image sets that offer bounding box annotations or more detailed segmentation masks. We collected memorability scores for all 10 K images, each score based on the responses of on average 99 participants in a repeat-detection memory task. Replicating previous research, the collected memorability scores show high levels of consistency across observers. Currently, MemCat is the second largest memorability image set and the largest offering a category-based structure. MemCat can be used to study the factors underlying the variability in image memorability, including the variability within semantic categories. In addition, it offers a new benchmark dataset for the automatic prediction of memorability scores (e.g., with convolutional neural networks). Finally, MemCat allows the study of neural and behavioral correlates of memorability while controlling for semantic category.
Collapse
|
84
|
Azer L, Zhang W. Composite Face Effect Predicts Configural Encoding in Visual Short-Term Memory. Front Psychol 2019; 10:2753. [PMID: 31920808 PMCID: PMC6917589 DOI: 10.3389/fpsyg.2019.02753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 11/22/2019] [Indexed: 11/13/2022] Open
Abstract
In natural vision, visual scenes consist of individual items (e.g., trees) and global properties of items as a whole (e.g., forest). These different levels of representations can all contribute to perception, natural scene understanding, sensory memory, working memory, and long-term memory. Despite these various hierarchical representations across perception and cognition, the nature of the global representations has received considerably less attention in empirical research on working memory than item representations. The present study aimed to understand the perceptual root of the configural information retained in Visual Short-term Memory (VSTM). Specifically, we assessed whether configural VSTM was related to holistic face processing across participants using an individual differences approach. Configural versus item encoding in VSTM was assessed using Xie and Zhang's (2017) dual-trace Signal Detection Theory model in a change detection task for orientation. Configural face processing was assessed using Le Grand composite face effect (CFE). In addition, overall face recognition was assessed using Glasgow Face Matching Test (GFMT). Across participants, holistic face encoding, but not face recognition accuracy, predicted configural information, but not item information, retained in VSTM. Together, these findings suggest that configural encoding in VSTM may have a perceptual root.
Collapse
Affiliation(s)
| | - Weiwei Zhang
- Department of Psychology, University of California, Riverside, Riverside, CA, United States
| |
Collapse
|
85
|
Owens JW, Chaparro BS, Palmer EM. Exploring website gist through rapid serial visual presentation. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2019; 4:44. [PMID: 31748970 PMCID: PMC6868081 DOI: 10.1186/s41235-019-0192-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 08/05/2019] [Indexed: 11/24/2022]
Abstract
Background Users can make judgments about web pages in a glance. Little research has explored what semantic information can be extracted from a web page within a single fixation or what mental representations users have of web pages, but the scene perception literature provides a framework for understanding how viewers can extract and represent diverse semantic information from scenes in a glance. The purpose of this research was (1) to explore whether semantic information about a web page could be extracted within a single fixation and (2) to explore the effects of size and resolution on extracting this information. Using a rapid serial visual presentation (RSVP) paradigm, Experiment 1 explored whether certain semantic categories of websites (i.e., news, search, shopping, and social networks/blogs) could be detected within a RSVP stream of web page stimuli. Natural scenes, which have been shown to be detectable within a single fixation in the literature, served as a baseline for comparison. Experiment 2 examined the effects of stimulus size and resolution on observers’ ability to detect the presence of website categories using similar methods. Results Findings from this research demonstrate that users have conceptual models of websites that allow detection of web pages from a fixation’s worth of stimulus exposure, when provided additional time for processing. For website categories other than search, detection performance decreased significantly when web elements were no longer discernible due to decreases in size and/or resolution. The implications of this research are that website conceptual models rely more on page elements and less on the spatial relationship between these elements. Conclusions Participants can detect websites accurately when they were displayed for less than a fixation and when the participants were allowed additional processing time. Subjective comments and stimulus onset asynchrony data suggested that participants likely relied on local features for the detection of website targets for several website categories. This notion was supported when the size and/or resolution of stimuli were decreased to the extent that web elements were indistinguishable. This demonstrates that schemas or conceptualizations of websites provided information sufficient to detect websites from approximately 140 ms of stimulus exposure.
Collapse
Affiliation(s)
- Justin W Owens
- Department of Psychology, Wichita State University, Wichita, KS, USA.,Google, Inc., Mountain View, CA, USA
| | - Barbara S Chaparro
- Department of Psychology, Wichita State University, Wichita, KS, USA.,Department of Human Factors and Behavioral Neurobiology, Embry Riddle Aeronautical University, Daytona Beach, FL, USA
| | - Evan M Palmer
- Department of Psychology, Wichita State University, Wichita, KS, USA. .,Department of Psychology, San José State University, San Jose, CA, USA.
| |
Collapse
|
86
|
Reorganization of spatial configurations in visual working memory: A matter of set size? PLoS One 2019; 14:e0225068. [PMID: 31721792 PMCID: PMC6853316 DOI: 10.1371/journal.pone.0225068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 10/28/2019] [Indexed: 11/19/2022] Open
Abstract
Humans process single objects in relation to other simultaneously maintained objects in visual working memory. This interdependence is called spatial configuration. Humans are able to reorganize global spatial configurations into relevant partial configurations. We conducted three experiments investigating the process underlying reorganization by manipulating memory set size and the presence of configurations at retrieval. Participants performed a location change detection task for a single object probed at retrieval. At the beginning of each trial, participants memorized the locations of all objects (set size: 4, 8, 12, or 16). During maintenance, a valid retro cue highlighted the side containing the object probed at retrieval, thus enabling participants to reorganize the memorized global spatial configuration to the partial cued configuration. At retrieval, the object probed was shown together with either all objects (complete configuration; Experiment 1a), the cued objects only (congruent configuration; all Experiments), the non-cued objects only (incongruent configuration, all Experiments) or alone (no configuration; Experiment 1b). We observed reorganization of spatial configurations as indicated by a superior location change detection performance with a congruent partial configuration than an incongruent partial configuration across all three experiments. We also observed an overall decrease in accuracy with increasing set size. Most importantly, however, we did not find evidence for a reliable impairment of reorganization with increasing set size. We discuss these findings with regard to the memory representation underlying spatial configurations.
Collapse
|
87
|
Laubrock J, Dunst A. Computational Approaches to Comics Analysis. Top Cogn Sci 2019; 12:274-310. [PMID: 31705626 DOI: 10.1111/tops.12476] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 08/17/2019] [Accepted: 08/27/2019] [Indexed: 11/29/2022]
Abstract
Comics are complex documents whose reception engages cognitive processes such as scene perception, language processing, and narrative understanding. Possibly because of their complexity, they have rarely been studied in cognitive science. Modeling the stimulus ideally requires a formal description, which can be provided by feature descriptors from computer vision and computational linguistics. With a focus on document analysis, here we review work on the computational modeling of comics. We argue that the development of modern feature descriptors based on deep learning techniques has made sufficient progress to allow the investigation of complex material such as comics for reception studies, including experimentation and computational modeling of cognitive processes.
Collapse
Affiliation(s)
| | - Alexander Dunst
- Department of English and American Studies, University of Paderborn
| |
Collapse
|
88
|
Kvasova D, Garcia-Vernet L, Soto-Faraco S. Characteristic Sounds Facilitate Object Search in Real-Life Scenes. Front Psychol 2019; 10:2511. [PMID: 31749751 PMCID: PMC6848886 DOI: 10.3389/fpsyg.2019.02511] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 10/23/2019] [Indexed: 12/02/2022] Open
Abstract
Real-world events do not only provide temporally and spatially correlated information across the senses, but also semantic correspondences about object identity. Prior research has shown that object sounds can enhance detection, identification, and search performance of semantically consistent visual targets. However, these effects are always demonstrated in simple and stereotyped displays that lack ecological validity. In order to address identity-based cross-modal relationships in real-world scenarios, we designed a visual search task using complex, dynamic scenes. Participants searched for objects in video clips recorded from real-life scenes. Auditory cues, embedded in the background sounds, could be target-consistent, distracter-consistent, neutral, or just absent. We found that, in these naturalistic scenes, characteristic sounds improve visual search for task-relevant objects but fail to increase the salience of irrelevant distracters. Our findings generalize previous results on object-based cross-modal interactions with simple stimuli and shed light upon how audio-visual semantically congruent relationships play out in real-life contexts.
Collapse
Affiliation(s)
- Daria Kvasova
- Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Laia Garcia-Vernet
- Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Salvador Soto-Faraco
- Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
- ICREA – Catalan Institution for Research and Advanced Studies, Barcelona, Spain
| |
Collapse
|
89
|
Võ MLH, Boettcher SEP, Draschkow D. Reading scenes: how scene grammar guides attention and aids perception in real-world environments. Curr Opin Psychol 2019; 29:205-210. [DOI: 10.1016/j.copsyc.2019.03.009] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 03/07/2019] [Accepted: 03/13/2019] [Indexed: 11/30/2022]
|
90
|
Wolfe JM, Utochkin IS. What is a preattentive feature? Curr Opin Psychol 2019; 29:19-26. [PMID: 30472539 PMCID: PMC6513732 DOI: 10.1016/j.copsyc.2018.11.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Revised: 11/01/2018] [Accepted: 11/08/2018] [Indexed: 11/30/2022]
Abstract
The concept of a preattentive feature has been central to vision and attention research for about half a century. A preattentive feature is a feature that guides attention in visual search and that cannot be decomposed into simpler features. While that definition seems straightforward, there is no simple diagnostic test that infallibly identifies a preattentive feature. This paper briefly reviews the criteria that have been proposed and illustrates some of the difficulties of definition.
Collapse
Affiliation(s)
- Jeremy M Wolfe
- Corresponding author Visual Attention Lab, Department
of Surgery, Brigham & Women's Hospital, Departments of Ophthalmology
and Radiology, Harvard Medical School, 64 Sidney St. Suite. 170, Cambridge, MA
02139-4170,
| | - Igor S Utochkin
- National Research University Higher School of
Economics, Moscow, Russian Federation Address: 101000, Armyansky per. 4, Moscow,
Russian Federation,
| |
Collapse
|
91
|
Abstract
Anne Treisman investigated many aspects of perception, and in particular the roles of different forms of attention. Four aspects of her work are reviewed here, including visual search, set mean perception, perception in special populations, and binocular rivalry. The importance of the breakthrough in each case is demonstrated. Search is easy or slow depending on whether it depends on the application of global or focused attention. Mean perception depends on global attention and affords simultaneous representation of the means of at least two sets of elements, and then of comparing them. Deficits exhibited in Balint's or unilateral neglect patients identify basic sensory system mechanisms. And, the ability to integrate binocular information for stereopsis despite simultaneous binocular rivalry for color, demonstrates the division of labor underlying visual system computations. All these studies are related to an appreciation of the difference between perceiving the gist of a scene, its elements or objects, versus perceiving the details of the scene and its components. This relationship between Anne Treisman's revolutionary discoveries and the concept of gist perception is the core of the current review.
Collapse
|
92
|
Change deafness can be reduced, but not eliminated, using brief training interventions. PSYCHOLOGICAL RESEARCH 2019; 85:423-438. [PMID: 31493050 DOI: 10.1007/s00426-019-01239-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 08/06/2019] [Indexed: 10/26/2022]
Abstract
Research on change deafness indicates there are substantial limitations to listeners' perception of which objects are present in complex auditory scenes, an ability that is important for many everyday situations. Experiment 1 examined the extent to which change deafness could be reduced by training with performance feedback compared to no training. Experiment 2 compared the efficacy of training with detailed feedback that identified the change and provided performance feedback on each trial, training without feedback, and no training. We further examined the timescale over which improvement unfolded by examining performance using an immediate post-test and a second post-test 12 h later. We were able to reduce, but not eliminate, change deafness for all groups, and determined that the practice content strongly impacted bias and response strategy. Training with simple performance feedback reduced change deafness but increased bias and false alarm rates, while providing a more detailed feedback improved change detection without affecting bias. Together, these findings suggest that change deafness can be reduced if a relatively small amount of practice is completed. When bias did not impede performance during the first post-test, the majority of the learning following training occurred immediately, suggesting that fast within-session learning primarily supported improvement on the task.
Collapse
|
93
|
The default-mode network represents aesthetic appeal that generalizes across visual domains. Proc Natl Acad Sci U S A 2019; 116:19155-19164. [PMID: 31484756 DOI: 10.1073/pnas.1902650116] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Visual aesthetic evaluations, which impact decision-making and well-being, recruit the ventral visual pathway, subcortical reward circuitry, and parts of the medial prefrontal cortex overlapping with the default-mode network (DMN). However, it is unknown whether these networks represent aesthetic appeal in a domain-general fashion, independent of domain-specific representations of stimulus content (artworks versus architecture or natural landscapes). Using a classification approach, we tested whether the DMN or ventral occipitotemporal cortex (VOT) contains a domain-general representation of aesthetic appeal. Classifiers were trained on multivoxel functional MRI response patterns collected while observers made aesthetic judgments about images from one aesthetic domain. Classifier performance (high vs. low aesthetic appeal) was then tested on response patterns from held-out trials from the same domain to derive a measure of domain-specific coding, or from a different domain to derive a measure of domain-general coding. Activity patterns in category-selective VOT contained a degree of domain-specific information about aesthetic appeal, but did not generalize across domains. Activity patterns from the DMN, however, were predictive of aesthetic appeal across domains. Importantly, the ability to predict aesthetic appeal varied systematically; predictions were better for observers who gave more extreme ratings to images subsequently labeled as "high" or "low." These findings support a model of aesthetic appreciation whereby domain-specific representations of the content of visual experiences in VOT feed in to a "core" domain-general representation of visual aesthetic appeal in the DMN. Whole-brain "searchlight" analyses identified additional prefrontal regions containing information relevant for appreciation of cultural artifacts (artwork and architecture) but not landscapes.
Collapse
|
94
|
Williams CC, Castelhano MS. The Changing Landscape: High-Level Influences on Eye Movement Guidance in Scenes. Vision (Basel) 2019; 3:E33. [PMID: 31735834 PMCID: PMC6802790 DOI: 10.3390/vision3030033] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 06/20/2019] [Accepted: 06/24/2019] [Indexed: 11/16/2022] Open
Abstract
The use of eye movements to explore scene processing has exploded over the last decade. Eye movements provide distinct advantages when examining scene processing because they are both fast and spatially measurable. By using eye movements, researchers have investigated many questions about scene processing. Our review will focus on research performed in the last decade examining: (1) attention and eye movements; (2) where you look; (3) influence of task; (4) memory and scene representations; and (5) dynamic scenes and eye movements. Although typically addressed as separate issues, we argue that these distinctions are now holding back research progress. Instead, it is time to examine the intersections of these seemingly separate influences and examine the intersectionality of how these influences interact to more completely understand what eye movements can tell us about scene processing.
Collapse
Affiliation(s)
- Carrick C. Williams
- Department of Psychology, California State University San Marcos, San Marcos, CA 92069, USA
| | | |
Collapse
|
95
|
Do target detection and target localization always go together? Extracting information from briefly presented displays. Atten Percept Psychophys 2019; 81:2685-2699. [PMID: 31218599 DOI: 10.3758/s13414-019-01782-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The human visual system is capable of processing an enormous amount of information in a short time. Although rapid target detection has been explored extensively, less is known about target localization. Here we used natural scenes and explored the relationship between being able to detect a target (present vs. absent) and being able to localize it. Across four presentation durations (~ 33-199 ms), participants viewed scenes taken from two superordinate categories (natural and manmade), each containing exemplars from four basic scene categories. In a two-interval forced choice task, observers were asked to detect a Gabor target inserted in one of the two scenes. This was followed by one of two different localization tasks. Participants were asked either to discriminate whether the target was on the left or the right side of the display or to click on the exact location where they had seen the target. Targets could be detected and localized at our shortest exposure duration (~ 33 ms), with a predictable improvement in performance with increasing exposure duration. We saw some evidence at this shortest duration of detection without localization, but further analyses demonstrated that these trials typically reflected coarse or imprecise localization information, rather than its complete absence. Experiment 2 replicated our main findings while exploring the effect of the level of "openness" in the scene. Our results are consistent with the notion that when we are able to extract what objects are present in a scene, we also have information about where each object is, which provides crucial guidance for our goal-directed actions.
Collapse
|
96
|
Abstract
Humans are remarkably adept at perceiving and understanding complex real-world scenes. Uncovering the neural basis of this ability is an important goal of vision science. Neuroimaging studies have identified three cortical regions that respond selectively to scenes: parahippocampal place area, retrosplenial complex/medial place area, and occipital place area. Here, we review what is known about the visual and functional properties of these brain areas. Scene-selective regions exhibit retinotopic properties and sensitivity to low-level visual features that are characteristic of scenes. They also mediate higher-level representations of layout, objects, and surface properties that allow individual scenes to be recognized and their spatial structure ascertained. Challenges for the future include developing computational models of information processing in scene regions, investigating how these regions support scene perception under ecologically realistic conditions, and understanding how they operate in the context of larger brain networks.
Collapse
Affiliation(s)
- Russell A Epstein
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA;
| | - Chris I Baker
- Section on Learning and Plasticity, Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, Maryland 20892, USA;
| |
Collapse
|
97
|
Evans KK, Culpan AM, Wolfe JM. Detecting the "gist" of breast cancer in mammograms three years before localized signs of cancer are visible. Br J Radiol 2019; 92:20190136. [PMID: 31166769 DOI: 10.1259/bjr.20190136] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
OBJECTIVES After a 500 ms presentation, experts can distinguish abnormal mammograms at above chance levels even when only the breast contralateral to the lesion is shown. Here, we show that this signal of abnormality is detectable 3 years before localized signs of cancer become visible. METHODS In 4 prospective studies, 59 expert observers from 3 groups viewed 116-200 bilateral mammograms for 500 ms each. Half of the images were prior exams acquired 3 years prior to onset of visible, actionable cancer and half were normal. Exp. 1D included cases having visible abnormalities. Observers rated likelihood of abnormality on a 0-100 scale and categorized breast density. Performance was measured using receiver operating characteristic analysis. RESULTS In all three groups, observers could detect abnormal images at above chance levels 3 years prior to visible signs of breast cancer (p < 0.001). The results were not due to specific salient cases nor to breast density. Performance was correlated with expertise quantified by the number of mammographic cases read within a year. In Exp. 1D, with cases having visible actionable pathology included, the full group of readers failed to reliably detect abnormal priors; with the exception of a subgroup of the six most experienced observers. CONCLUSIONS Imaging specialists can detect signals of abnormality in mammograms acquired years before lesions become visible. Detection may depend on expertise acquired by reading large numbers of cases. ADVANCES IN KNOWLEDGE Global gist signal can serve as imaging risk factor with the potential to identify patients with elevated risk for developing cancer, resulting in improved early cancer diagnosis rates and improved prognosis for females with breast cancer.
Collapse
Affiliation(s)
- Karla K Evans
- 1 Psychology Department, University of York , York , United Kingdom
| | | | - Jeremy M Wolfe
- 3 Harvard Medical School and Brigham and Women's Hospital , Boston , MA, USA
| |
Collapse
|
98
|
Lewandowska OP, Schmuckler MA. Tonal and textural influences on musical sight-reading. PSYCHOLOGICAL RESEARCH 2019; 84:1920-1945. [PMID: 31073771 DOI: 10.1007/s00426-019-01187-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 04/15/2019] [Indexed: 10/26/2022]
Abstract
Two experiments investigated the impact of two structural factors-musical tonality and musical texture-on pianists' ability to play by sight without prior preparation, known as musical sight-reading. Tonality refers to the cognitive organization of tones around a central reference pitch, whereas texture refers to the organization of music in terms of the simultaneous versus successive onsets of tones as well as the number of hands (unimanual versus bimanual) involved in performance. Both experiments demonstrated that tonality and texture influenced sight-reading. For tonality, both studies found that errors in performance increased for passages with lesser perceived psychological stability (i.e., minor and atonal passages) relative to greater perceived stability (i.e., major passages). For texture, both studies found that errors in performance increased for passages that were more texturally complex, requiring two-handed versus one-handed performance, with some additional evidence that the relative simultaneity of note onsets (primarily simultaneous versus primarily successive) also influenced errors. These experiments are interpreted within a perception-action framework of music performance, highlighting influences of both top-down cognitive factors and bottom-up motoric processes on sight-reading behavior.
Collapse
Affiliation(s)
- Olivia Podolak Lewandowska
- Department of Psychology, University of Toronto Scarborough, 1265 Military Trail Drive, Toronto, ON, M1C 1A4, USA.
| | - Mark A Schmuckler
- Department of Psychology, University of Toronto Scarborough, 1265 Military Trail Drive, Toronto, ON, M1C 1A4, USA
| |
Collapse
|
99
|
Hicklin RA, Ulery BT, Busey TA, Roberts MA, Buscaglia J. Gaze behavior and cognitive states during fingerprint target group localization. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2019; 4:12. [PMID: 30953242 PMCID: PMC6450991 DOI: 10.1186/s41235-019-0160-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 02/20/2019] [Indexed: 11/24/2022]
Abstract
Background The comparison of fingerprints by expert latent print examiners generally involves repeating a process in which the examiner selects a small area of distinctive features in one print (a target group), and searches for it in the other print. In order to isolate this key element of fingerprint comparison, we use eye-tracking data to describe the behavior of latent fingerprint examiners on a narrowly defined “find the target” task. Participants were shown a fingerprint image with a target group indicated and asked to find the corresponding area of ridge detail in a second impression of the same finger and state when they found the target location. Target groups were presented on latent and plain exemplar fingerprint images, and as small areas cropped from the plain exemplars, to assess how image quality and the lack of surrounding visual context affected task performance and eye behavior. One hundred and seventeen participants completed a total of 675 trials. Results The presence or absence of context notably affected the areas viewed and time spent in comparison; differences between latent and plain exemplar tasks were much less significant. In virtually all trials, examiners repeatedly looked back and forth between the images, suggesting constraints on the capacity of visual working memory. On most trials where context was provided, examiners looked immediately at the corresponding location: with context, median time to find the corresponding location was less than 0.3 s (second fixation); however, without context, median time was 1.9 s (five fixations). A few trials resulted in errors in which the examiner did not find the correct target location. Basic gaze measures of overt behaviors, such as speed, areas visited, and back-and-forth behavior, were used in conjunction with the known target area to infer the underlying cognitive state of the examiner. Conclusions Visual context has a significant effect on the eye behavior of latent print examiners. Localization errors suggest how errors may occur in real comparisons: examiners sometimes compare an incorrect but similar target group and do not continue to search for a better candidate target group. The analytic methods and predictive models developed here can be used to describe the more complex behavior involved in actual fingerprint comparisons. Electronic supplementary material The online version of this article (10.1186/s41235-019-0160-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Thomas A Busey
- Department of Psychology, Indiana University, Bloomington, IN, 47405, USA
| | - Maria Antonia Roberts
- Latent Print Support Unit, Federal Bureau of Investigation Laboratory Division, 2501 Investigation Parkway, Quantico, VA, 22135, USA
| | - JoAnn Buscaglia
- Counterterrorism and Forensic Science Research Unit, Federal Bureau of Investigation Laboratory Division, 2501 Investigation Parkway, Quantico, VA, 22135, USA.
| |
Collapse
|
100
|
Perceiving blurry scenes with translational optic flow, rotational optic flow or combined optic flow. Vision Res 2019; 158:49-57. [PMID: 30796993 DOI: 10.1016/j.visres.2018.11.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 11/28/2018] [Accepted: 11/29/2018] [Indexed: 11/21/2022]
Abstract
Perceiving the spatial layout of objects is crucial in visual scene perception. Optic flow provides information about spatial layout. This information is not affected by image blur because motion detection uses low spatial frequencies in image structure. Therefore, perceiving scenes with blurry vision should be effective when optic flow is available. Furthermore, when blurry images and optic flow interact, optic flow specifies spatial relations and calibrates blurry images. Calibrated image structure then preserves spatial relations specified by optic flow after motion stops. Thus, perceiving blurry scenes should be stable when optic flow and blurry images are available. We investigated the types of optic flow that facilitate recognition of blurry scenes and evaluated the stability of performance. Participants identified scenes in blurry videos when viewing single frames and the entire videos that contained translational flow (Experiment 1), rotational flow (Experiment 2) or both (Experiment 3). When first viewing the blurry images, participants identified a few scenes. When viewing blurry video clips, their performance improved with translational flow, whether it was available alone or in combination with rotational flow. Participants were still able to perceive scenes from static blurry images one week later. Therefore, translational flow interacts with blurry image structures to yield effective and stable scene perception. These results imply that observers with blurry vision may be able to identify their surrounds when they locomote.
Collapse
|