201
|
Xiao J, Hays J, Russell BC, Patterson G, Ehinger KA, Torralba A, Oliva A. Basic level scene understanding: categories, attributes and structures. Front Psychol 2013; 4:506. [PMID: 24009590 PMCID: PMC3756302 DOI: 10.3389/fpsyg.2013.00506] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 07/17/2013] [Indexed: 11/29/2022] Open
Abstract
A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image.
Collapse
Affiliation(s)
- Jianxiong Xiao
- Computer Science, Princeton UniversityPrinceton, NJ, USA
| | - James Hays
- Computer Science, Brown UniversityProvidence, RI, USA
| | - Bryan C. Russell
- Computer Science and Engineering, University of WashingtonSeattle, WA, USA
| | | | - Krista A. Ehinger
- Brain and Cognitive Sciences, Massachusetts Institute of TechnologyCambridge, MA, USA
| | - Antonio Torralba
- Department of EECS, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of TechnologyCambridge, MA, USA
| | - Aude Oliva
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of TechnologyCambridge, MA, USA
| |
Collapse
|
202
|
Temporal components in the parahippocampal place area revealed by human intracerebral recordings. J Neurosci 2013; 33:10123-31. [PMID: 23761907 DOI: 10.1523/jneurosci.4646-12.2013] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Many high-level visual regions exhibit complex patterns of stimulus selectivity that make their responses difficult to explain in terms of a single cognitive mechanism. For example, the parahippocampal place area (PPA) responds maximally to environmental scenes during fMRI studies but also responds strongly to nonscene landmark objects, such as buildings, which have a quite different geometric structure. We hypothesized that PPA responses to scenes and buildings might be driven by different underlying mechanisms with different temporal profiles. To test this, we examined broadband γ (50-150 Hz) responses from human intracerebral electroencephalography recordings, a measure that is closely related to population spiking activity. We found that the PPA distinguished scene from nonscene stimuli in ∼80 ms, suggesting the operation of a bottom-up process that encodes scene-specific visual or geometric features. In contrast, the differential PPA response to buildings versus nonbuildings occurred later (∼170 ms) and may reflect a delayed processing of spatial or semantic features definable for both scenes and objects, perhaps incorporating signals from other cortical regions. Although the response preferences of high-level visual regions are usually interpreted in terms of the operation of a single cognitive mechanism, these results suggest that a more complex picture emerges when the dynamics of recognition are considered.
Collapse
|
203
|
Munneke J, Brentari V, Peelen MV. The influence of scene context on object recognition is independent of attentional focus. Front Psychol 2013; 4:552. [PMID: 23970878 PMCID: PMC3748376 DOI: 10.3389/fpsyg.2013.00552] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Accepted: 08/04/2013] [Indexed: 11/13/2022] Open
Abstract
Humans can quickly and accurately recognize objects within briefly presented natural scenes. Previous work has provided evidence that scene context contributes to this process, demonstrating improved naming of objects that were presented in semantically consistent scenes (e.g., a sandcastle on a beach) relative to semantically inconsistent scenes (e.g., a sandcastle on a football field). The current study was aimed at investigating which processes underlie the scene consistency effect. Specifically, we tested: (1) whether the effect is due to increased visual feature and/or shape overlap for consistent relative to inconsistent scene-object pairs; and (2) whether the effect is mediated by attention to the background scene. Experiment 1 replicated the scene consistency effect of a previous report (Davenport and Potter, 2004). Using a new, carefully controlled stimulus set, Experiment 2 showed that the scene consistency effect could not be explained by low-level feature or shape overlap between scenes and target objects. Experiments 3a and 3b investigated whether focused attention modulates the scene consistency effect. By using a location cueing manipulation, participants were correctly informed about the location of the target object on a proportion of trials, allowing focused attention to be deployed toward the target object. Importantly, the effect of scene consistency on target object recognition was independent of spatial attention, and was observed both when attention was focused on the target object and when attention was focused on the background scene. These results indicate that a semantically consistent scene context benefits object recognition independently of the focus of attention. We suggest that the scene consistency effect is primarily driven by global scene properties, or "scene gist", that can be processed with minimal attentional resources.
Collapse
Affiliation(s)
- Jaap Munneke
- Center for Mind/Brain Sciences, University of Trento Trento, Italy ; Department of Cognitive Psychology, Vrije Universiteit Amsterdam, Netherlands
| | | | | |
Collapse
|
204
|
Stansbury DE, Naselaris T, Gallant JL. Natural scene statistics account for the representation of scene categories in human visual cortex. Neuron 2013; 79:1025-34. [PMID: 23932491 DOI: 10.1016/j.neuron.2013.06.034] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/12/2013] [Indexed: 11/29/2022]
Abstract
During natural vision, humans categorize the scenes they encounter: an office, the beach, and so on. These categories are informed by knowledge of the way that objects co-occur in natural scenes. How does the human brain aggregate information about objects to represent scene categories? To explore this issue, we used statistical learning methods to learn categories that objectively capture the co-occurrence statistics of objects in a large collection of natural scenes. Using the learned categories, we modeled fMRI brain signals evoked in human subjects when viewing images of scenes. We find that evoked activity across much of anterior visual cortex is explained by the learned categories. Furthermore, a decoder based on these scene categories accurately predicts the categories and objects comprising novel scenes from brain activity evoked by those scenes. These results suggest that the human brain represents scene categories that capture the co-occurrence statistics of objects in the world.
Collapse
|
205
|
Hafri A, Papafragou A, Trueswell JC. Getting the gist of events: recognition of two-participant actions from brief displays. J Exp Psychol Gen 2013; 142:880-905. [PMID: 22984951 PMCID: PMC3657301 DOI: 10.1037/a0030045] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Unlike rapid scene and object recognition from brief displays, little is known about recognition of event categories and event roles from minimal visual information. In 3 experiments, we displayed naturalistic photographs of a wide range of 2-participant event scenes for 37 ms and 73 ms followed by a mask, and found that event categories (the event gist; e.g., "kicking," "pushing") and event roles (i.e., Agent and Patient) can be recognized rapidly, even with various actor pairs and backgrounds. Norming ratings from a subsequent experiment revealed that certain physical features (e.g., outstretched extremities) that correlate with Agent-hood could have contributed to rapid role recognition. In a final experiment, using identical twin actors, we then varied these features in 2 sets of stimuli, in which Patients had Agent-like features or not. Subjects recognized the roles of event participants less accurately when Patients possessed Agent-like features, with this difference being eliminated with 2-s durations. Thus, given minimal visual input, typical Agent-like physical features are used in role recognition, but with sufficient input from multiple fixations, people categorically determine the relationship between event participants.
Collapse
Affiliation(s)
- Alon Hafri
- Department of Psychology, University of Pennsylvania, Philadelphia, PA 19104-6241, USA.
| | | | | |
Collapse
|
206
|
Ganaden RE, Mullin CR, Steeves JKE. Transcranial Magnetic Stimulation to the Transverse Occipital Sulcus Affects Scene but Not Object Processing. J Cogn Neurosci 2013; 25:961-8. [DOI: 10.1162/jocn_a_00372] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
Traditionally, it has been theorized that the human visual system identifies and classifies scenes in an object-centered approach, such that scene recognition can only occur once key objects within a scene are identified. Recent research points toward an alternative approach, suggesting that the global image features of a scene are sufficient for the recognition and categorization of a scene. We have previously shown that disrupting object processing with repetitive TMS to object-selective cortex enhances scene processing possibly through a release of inhibitory mechanisms between object and scene pathways [Mullin, C. R., & Steeves, J. K. E. TMS to the lateral occipital cortex disrupts object processing but facilitates scene processing. Journal of Cognitive Neuroscience, 23, 4174–4184, 2011]. Here we show the effects of TMS to the transverse occipital sulcus (TOS), an area implicated in scene perception, on scene and object processing. TMS was delivered to the TOS or the vertex (control site) while participants performed an object and scene natural/nonnatural categorization task. Transiently interrupting the TOS resulted in significantly lower accuracies for scene categorization compared with control conditions. This demonstrates a causal role of the TOS in scene processing and indicates its importance, in addition to the parahippocampal place area and retrosplenial cortex, in the scene processing network. Unlike TMS to object-selective cortex, which facilitates scene categorization, disrupting scene processing through stimulation of the TOS did not affect object categorization. Further analysis revealed a higher proportion of errors for nonnatural scenes that led us to speculate that the TOS may be involved in processing the higher spatial frequency content of a scene. This supports a nonhierarchical model of scene recognition.
Collapse
|
207
|
Miwa K, Libben G, Dijkstra T, Baayen H. The time-course of lexical activation in Japanese morphographic word recognition: evidence for a character-driven processing model. Q J Exp Psychol (Hove) 2013; 67:79-113. [PMID: 23713954 DOI: 10.1080/17470218.2013.790910] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
This lexical decision study with eye tracking of Japanese two-kanji-character words investigated the order in which a whole two-character word and its morphographic constituents are activated in the course of lexical access, the relative contributions of the left and the right characters in lexical decision, the depth to which semantic radicals are processed, and how nonlinguistic factors affect lexical processes. Mixed-effects regression analyses of response times and subgaze durations (i.e., first-pass fixation time spent on each of the two characters) revealed joint contributions of morphographic units at all levels of the linguistic structure with the magnitude and the direction of the lexical effects modulated by readers' locus of attention in a left-to-right preferred processing path. During the early time frame, character effects were larger in magnitude and more robust than radical and whole-word effects, regardless of the font size and the type of nonwords. Extending previous radical-based and character-based models, we propose a task/decision-sensitive character-driven processing model with a level-skipping assumption: Connections from the feature level bypass the lower radical level and link up directly to the higher character level.
Collapse
Affiliation(s)
- Koji Miwa
- a Department of Linguistics , University of Alberta , Edmonton , AB , Canada
| | | | | | | |
Collapse
|
208
|
Valsecchi M, Caziot B, Backus BT, Gegenfurtner KR. The role of binocular disparity in rapid scene and pattern recognition. Iperception 2013; 4:122-36. [PMID: 23755357 PMCID: PMC3677332 DOI: 10.1068/i0587] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Revised: 03/22/2013] [Indexed: 11/03/2022] Open
Abstract
We investigated the contribution of binocular disparity to the rapid recognition of scenes and simpler spatial patterns using a paradigm combining backward masked stimulus presentation and short-term match-to-sample recognition. First, we showed that binocular disparity did not contribute significantly to the recognition of briefly presented natural and artificial scenes, even when the availability of monocular cues was reduced. Subsequently, using dense random dot stereograms as stimuli, we showed that observers were in principle able to extract spatial patterns defined only by disparity under brief, masked presentations. Comparing our results with the predictions from a cue-summation model, we showed that combining disparity with luminance did not per se disrupt the processing of disparity. Our results suggest that the rapid recognition of scenes is mediated mostly by a monocular comparison of the images, although we can rely on stereo in fast pattern recognition.
Collapse
Affiliation(s)
- Matteo Valsecchi
- Abteilung Allgemeine Psychologie, Justus-Liebig-Universität, Otto-Behaghel-Str. 10F, D-35394 Giessen, Germany; e-mail:
| | | | | | | |
Collapse
|
209
|
Boucart M, Moroni C, Thibaut M, Szaffarczyk S, Greene M. Scene categorization at large visual eccentricities. Vision Res 2013; 86:35-42. [PMID: 23597581 DOI: 10.1016/j.visres.2013.04.006] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2012] [Revised: 03/13/2013] [Accepted: 04/04/2013] [Indexed: 11/28/2022]
Abstract
Studies of scene perception have shown that the visual system is particularly sensitive to global properties such as the overall layout of a scene. Such global properties cannot be computed locally, but rather require relational analysis over multiple regions. To what extent is observers' perception of scenes impaired in the far periphery? We examined the perception of global scene properties (Experiment 1) and basic-level categories (Experiment 2) presented in the periphery from 10° to 70°. Pairs of scene photographs were simultaneously presented left and right of fixation for 80ms on a panoramic screen (5m diameter) covering the whole visual field while central fixation was controlled. Observers were instructed to press a key corresponding to the spatial location left/right of a pre-defined target property or category. The results show that classification of global scene properties (e.g., naturalness, openness) as well as basic-level categorization (e.g., forests, highways), while better near the center, were accomplished with a performance highly above chance (around 70% correct) in the far periphery even at 70° eccentricity. The perception of some global properties (e.g., naturalness) was more robust in peripheral vision than others (e.g., indoor/outdoor) that required a more local analysis. The results are consistent with studies suggesting that scene gist recognition can be accomplished by the low resolution of peripheral vision.
Collapse
Affiliation(s)
- Muriel Boucart
- Lab. Neurosciences Fonctionnelles & Pathologies, Université Lille-Nord de France, CHU Lille, CNRS, France.
| | | | | | | | | |
Collapse
|
210
|
Good exemplars of natural scene categories elicit clearer patterns than bad exemplars but not greater BOLD activity. PLoS One 2013; 8:e58594. [PMID: 23555588 PMCID: PMC3608650 DOI: 10.1371/journal.pone.0058594] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2012] [Accepted: 02/06/2013] [Indexed: 11/25/2022] Open
Abstract
Within the range of images that we might categorize as a “beach”, for example, some will be more representative of that category than others. Here we first confirmed that humans could categorize “good” exemplars better than “bad” exemplars of six scene categories and then explored whether brain regions previously implicated in natural scene categorization showed a similar sensitivity to how well an image exemplifies a category. In a behavioral experiment participants were more accurate and faster at categorizing good than bad exemplars of natural scenes. In an fMRI experiment participants passively viewed blocks of good or bad exemplars from the same six categories. A multi-voxel pattern classifier trained to discriminate among category blocks showed higher decoding accuracy for good than bad exemplars in the PPA, RSC and V1. This difference in decoding accuracy cannot be explained by differences in overall BOLD signal, as average BOLD activity was either equivalent or higher for bad than good scenes in these areas. These results provide further evidence that V1, RSC and the PPA not only contain information relevant for natural scene categorization, but their activity patterns mirror the fundamentally graded nature of human categories. Analysis of the image statistics of our good and bad exemplars shows that variability in low-level features and image structure is higher among bad than good exemplars. A simulation of our neuroimaging experiment suggests that such a difference in variance could account for the observed differences in decoding accuracy. These results are consistent with both low-level models of scene categorization and models that build categories around a prototype.
Collapse
|
211
|
Eitel A, Scheiter K, Schüler A. How Inspecting a Picture Affects Processing of Text in Multimedia Learning. APPLIED COGNITIVE PSYCHOLOGY 2013. [DOI: 10.1002/acp.2922] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
212
|
Chan LKH, Hayward WG. Visual search. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2013; 4:415-429. [PMID: 26304227 DOI: 10.1002/wcs.1235] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Visual search is the act of looking for a predefined target among other objects. This task has been widely used as an experimental paradigm to study visual attention, and because of its influence has also become a subject of research itself. When used as a paradigm, visual search studies address questions including the nature, function, and limits of preattentive processing and focused attention. As a subject of research, visual search studies address the role of memory in search, the procedures involved in search, and factors that affect search performance. In this article, we review major theories of visual search, the ways in which preattentive information is used to guide attentional allocation, the role of memory, and the processes and decisions involved in its successful completion. We conclude by summarizing the current state of knowledge about visual search and highlight some unresolved issues. WIREs Cogn Sci 2013, 4:415-429. doi: 10.1002/wcs.1235 The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Louis K H Chan
- Psychology Unit, Hong Kong Baptist University, Shatin, Hong Kong
| | - William G Hayward
- Department of Psychology, University of Hong Kong, Pokfulam, Hong Kong
| |
Collapse
|
213
|
Enhanced Processing of Emotional Gist in Peripheral Vision. SPANISH JOURNAL OF PSYCHOLOGY 2013; 12:414-23. [DOI: 10.1017/s1138741600001803] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Emotional (pleasant or unpleasant) and neutral scenes were presented foveally (at fixation) or peripherally (5.2° away from fixation) as primes for 150 ms. The prime was followed by a mask and a centrally presented probe scene for recognition. The probe was either identical in specific content (i.e., same people and objects) to the prime, or it was related to the prime in general content and affective valence. The probe was always different from the prime in color, size, and spatial orientation. Results showed an interaction between prime location and emotional valence for the recognition hit rate, but also for the false alarm rate and correct rejection times. There were no differences as a function of emotional valence in the foveal display condition. In contrast, in the peripheral display condition both hit and false alarm rates were higher and correct rejection times were longer for emotional than for neutral scenes. It is concluded that emotional gist, or a coarse affective impression, is extracted from emotional scenes in peripheral vision, which then leads to confuse them with others of related affective valence. The underlying neurophysiological mechanisms are discussed. An alternative explanation based on the physical characteristics of the scene images was ruled out.
Collapse
|
214
|
Crouzet SM, Joubert OR, Thorpe SJ, Fabre-Thorpe M. Animal detection precedes access to scene category. PLoS One 2012; 7:e51471. [PMID: 23251545 PMCID: PMC3518465 DOI: 10.1371/journal.pone.0051471] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2012] [Accepted: 11/02/2012] [Indexed: 11/30/2022] Open
Abstract
The processes underlying object recognition are fundamental for the understanding of visual perception. Humans can recognize many objects rapidly even in complex scenes, a task that still presents major challenges for computer vision systems. A common experimental demonstration of this ability is the rapid animal detection protocol, where human participants earliest responses to report the presence/absence of animals in natural scenes are observed at 250–270 ms latencies. One of the hypotheses to account for such speed is that people would not actually recognize an animal per se, but rather base their decision on global scene statistics. These global statistics (also referred to as spatial envelope or gist) have been shown to be computationally easy to process and could thus be used as a proxy for coarse object recognition. Here, using a saccadic choice task, which allows us to investigate a previously inaccessible temporal window of visual processing, we showed that animal – but not vehicle – detection clearly precedes scene categorization. This asynchrony is in addition validated by a late contextual modulation of animal detection, starting simultaneously with the availability of scene category. Interestingly, the advantage for animal over scene categorization is in opposition to the results of simulations using standard computational models. Taken together, these results challenge the idea that rapid animal detection might be based on early access of global scene statistics, and rather suggests a process based on the extraction of specific local complex features that might be hardwired in the visual system.
Collapse
Affiliation(s)
- Sébastien M. Crouzet
- Université de Toulouse, UPS, CerCo, Toulouse, France
- CNRS, UMR 5549, Toulouse, France
- Cognitive, Linguistic and Psychological Science, Brown University, Providence, Rhode Island, United States of America
| | - Olivier R. Joubert
- Université de Toulouse, UPS, CerCo, Toulouse, France
- CNRS, UMR 5549, Toulouse, France
| | - Simon J. Thorpe
- Université de Toulouse, UPS, CerCo, Toulouse, France
- CNRS, UMR 5549, Toulouse, France
| | - Michèle Fabre-Thorpe
- Université de Toulouse, UPS, CerCo, Toulouse, France
- CNRS, UMR 5549, Toulouse, France
- * E-mail:
| |
Collapse
|
215
|
Chen P, Hartman AJ, Priscilla Galarza C, DeLuca J. Global processing training to improve visuospatial memory deficits after right-brain stroke. Arch Clin Neuropsychol 2012; 27:891-905. [PMID: 23070314 PMCID: PMC3589919 DOI: 10.1093/arclin/acs089] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/06/2012] [Indexed: 11/13/2022] Open
Abstract
Visuospatial stimuli are normally perceived from the global structure to local details. A right-brain stroke often disrupts this perceptual organization, resulting in piecemeal encoding and thus poor visuospatial memory. Using a randomized controlled design, the present study examined whether promoting the global-to-local encoding improves retrieval accuracy in right-brain-damaged stroke survivors with visuospatial memory deficits. Eleven participants received a single session of the Global Processing Training (global-to-local encoding) or the Rote Repetition Training (no encoding strategy) to learn the Rey-Osterrieth Complex Figure. The result demonstrated that the Global Processing Training significantly improved visuospatial memory deficits after a right-brain stroke. On the other hand, rote practice without a step-by-step guidance limited the degree of memory improvement. The treatment effect was observed both immediately after the training procedure and 24 h post-training. Overall, the present findings are consistent with the long-standing principle in cognitive rehabilitation that an effective treatment is based on specific training aimed at improving specific neurocognitive deficits. Importantly, visuospatial memory deficits after a right-brain stroke may improve with treatments that promote global processing at encoding.
Collapse
Affiliation(s)
- Peii Chen
- Kessler Foundation Research Center, West Orange, NJ 07052, USA.
| | | | | | | |
Collapse
|
216
|
Eitel A, Scheiter K, Schüler A. The Time Course of Information Extraction from Instructional Diagrams. Percept Mot Skills 2012; 115:677-701. [PMID: 23409583 DOI: 10.2466/22.23.pms.115.6.677-701] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This study investigated which information is extracted from a brief glance at an instructional diagram to assess its possible contribution for learning with text and diagrams. An experimental paradigm from scene perception research was used to study diagrams. University students ( N = 20) saw pictures showing a scene or instructional diagrams for four different presentation times (50 msec. vs 250 msec, vs 1, 000 msec, vs 3,000 msec). Following presentation of a picture or diagram, respectively, participants were asked to verify a statement about its gist, details, and the functioning (for diagrams only). Repeated-measures analyses of variance (ANOVAs) were used to analyze verification accuracy for statements about gist, details, and the functioning as well as the eye movements (i.e., fixation durations and saccade amplitudes) during picture inspection. In both scenes and instructional diagrams, gist but not details were accurately identified from a first glance at the picture (i.e., at 50 msec, and 250 msec). In contrast, verification accuracy for gist and details increased at a slower rate in instructional diagrams than in scene pictures over presentation times. Moreover, the characteristic function of increasing fixation durations with increasing inspection time was found in scenes, but not in instructional diagrams. Taken together, results suggest that both types of illustrations are processed differently at longer inspection times; however, patterns of early information extraction are similar, namely that the gist but far less information about details is extracted. Results imply people are able to extract an instructional diagram's global spatial structure from a first glance, which may be helpful to learning from text.
Collapse
Affiliation(s)
| | | | - Anne Schüler
- Knowledge Media Research Center, Tuebingen, Germany
| |
Collapse
|
217
|
On the contribution of binocular disparity to the long-term memory for natural scenes. PLoS One 2012; 7:e49947. [PMID: 23166799 PMCID: PMC3499513 DOI: 10.1371/journal.pone.0049947] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Accepted: 10/19/2012] [Indexed: 11/19/2022] Open
Abstract
Binocular disparity is a fundamental dimension defining the input we receive from the visual world, along with luminance and chromaticity. In a memory task involving images of natural scenes we investigate whether binocular disparity enhances long-term visual memory. We found that forest images studied in the presence of disparity for relatively long times (7s) were remembered better as compared to 2D presentation. This enhancement was not evident for other categories of pictures, such as images containing cars and houses, which are mostly identified by the presence of distinctive artifacts rather than by their spatial layout. Evidence from a further experiment indicates that observers do not retain a trace of stereo presentation in long-term memory.
Collapse
|
218
|
Drew T, Evans K, Võ MLH, Jacobson FL, Wolfe JM. Informatics in radiology: what can you see in a single glance and how might this guide visual search in medical images? Radiographics 2012; 33:263-74. [PMID: 23104971 DOI: 10.1148/rg.331125023] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Diagnostic accuracy for radiologists is above that expected by chance when they are exposed to a chest radiograph for only one-fifth of a second, a period too brief for more than a single voluntary eye movement. How do radiologists glean information from a first glance at an image? It is thought that this expert impression of the gestalt of an image is related to the everyday, immediate visual understanding of the gist of a scene. Several high-speed mechanisms guide our search of complex images. Guidance by basic features (such as color) requires no learning, whereas guidance by complex scene properties is learned. It is probable that both hardwired guidance by basic features and learned guidance by scene structure become part of radiologists' expertise. Search in scenes may be best explained by a two-pathway model: Object recognition is performed via a selective pathway in which candidate targets must be individually selected for recognition. A second, nonselective pathway extracts information from global or statistical information without selecting specific objects. An appreciation of the role of nonselective processing may be particularly useful for understanding what separates novice from expert radiologists and could help establish new methods of physician training based on medical image perception.
Collapse
Affiliation(s)
- Trafton Drew
- Visual Attention Laboratory, Department of Surgery, Brigham and Women's Hospital, 64 Sidney St, Suite 170, Cambridge, MA 02139-4170, USA.
| | | | | | | | | |
Collapse
|
219
|
Gregg MK, Snyder JS. Enhanced sensory processing accompanies successful detection of change for real-world sounds. Neuroimage 2012; 62:113-9. [DOI: 10.1016/j.neuroimage.2012.04.057] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2011] [Revised: 04/23/2012] [Accepted: 04/29/2012] [Indexed: 11/29/2022] Open
|
220
|
Gagnier KM, Intraub H. When less is more: Line-drawings lead to greater boundary extension than color photographs. VISUAL COGNITION 2012; 20:815-824. [PMID: 22997485 DOI: 10.1080/13506285.2012.703705] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Is boundary extension (false memory beyond the edges of the view; Intraub & Richardson, 1989) determined solely by the schematic structure of the view or does the quality of the pictorial information impact this error? To examine this color photograph or line-drawing versions of 12 multi-object scenes (Experiment 1: N=64) and 16 single-object scenes (Experiment 2: N=64) were presented for 14-s each. At test, the same pictures were each rated as being the "same", "closer-up" or "farther away" (5-pt scale). Although the layout, the scope of the view, the distance of the main objects to the edges, the background space and the gist of the scenes were held constant, line-drawings yielded greater boundary extension than did their photographic counterparts for multi-object (Experiment 1) and single-object (Experiment 2) scenes. Results are discussed in the context of the multisource model and its implications for the study of scene perception and memory.
Collapse
|
221
|
|
222
|
Abstract
An area of research that has experienced recent growth is the study of memory during perception of simple and complex auditory scenes. These studies have provided important information about how well auditory objects are encoded in memory and how well listeners can notice changes in auditory scenes. These are significant developments because they present an opportunity to better understand how we hear in realistic situations, how higher-level aspects of hearing such as semantics and prior exposure affect perception, and the similarities and differences between auditory perception and perception in other modalities, such as vision and touch. The research also poses exciting challenges for behavioral and neural models of how auditory perception and memory work.
Collapse
|
223
|
WU L, MO L. Re-examining the Classifying Advantage and Basic Level Effect. ACTA PSYCHOLOGICA SINICA 2012. [DOI: 10.3724/sp.j.1041.2011.00143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
224
|
Snyder JS, Gregg MK, Weintraub DM, Alain C. Attention, awareness, and the perception of auditory scenes. Front Psychol 2012; 3:15. [PMID: 22347201 PMCID: PMC3273855 DOI: 10.3389/fpsyg.2012.00015] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 01/11/2012] [Indexed: 11/25/2022] Open
Abstract
Auditory perception and cognition entails both low-level and high-level processes, which are likely to interact with each other to create our rich conscious experience of soundscapes. Recent research that we review has revealed numerous influences of high-level factors, such as attention, intention, and prior experience, on conscious auditory perception. And recently, studies have shown that auditory scene analysis tasks can exhibit multistability in a manner very similar to ambiguous visual stimuli, presenting a unique opportunity to study neural correlates of auditory awareness and the extent to which mechanisms of perception are shared across sensory modalities. Research has also led to a growing number of techniques through which auditory perception can be manipulated and even completely suppressed. Such findings have important consequences for our understanding of the mechanisms of perception and also should allow scientists to precisely distinguish the influences of different higher-level influences.
Collapse
Affiliation(s)
- Joel S. Snyder
- Department of Psychology, University of Nevada Las VegasLas Vegas, NV, USA
| | - Melissa K. Gregg
- Department of Psychology, University of Nevada Las VegasLas Vegas, NV, USA
| | - David M. Weintraub
- Department of Psychology, University of Nevada Las VegasLas Vegas, NV, USA
| | - Claude Alain
- The Rotman Research Institute, Baycrest Centre for Geriatric CareToronto, ON, Canada
| |
Collapse
|
225
|
Rosenholtz R, Huang J, Ehinger KA. Rethinking the role of top-down attention in vision: effects attributable to a lossy representation in peripheral vision. Front Psychol 2012; 3:13. [PMID: 22347200 PMCID: PMC3272623 DOI: 10.3389/fpsyg.2012.00013] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Accepted: 01/11/2012] [Indexed: 11/21/2022] Open
Abstract
According to common wisdom in the field of visual perception, top-down selective attention is required in order to bind features into objects. In this view, even simple tasks, such as distinguishing a rotated T from a rotated L, require selective attention since they require feature binding. Selective attention, in turn, is commonly conceived as involving volition, intention, and at least implicitly, awareness. There is something non-intuitive about the notion that we might need so expensive (and possibly human) a resource as conscious awareness in order to perform so basic a function as perception. In fact, we can carry out complex sensorimotor tasks, seemingly in the near absence of awareness or volitional shifts of attention ("zombie behaviors"). More generally, the tight association between attention and awareness, and the presumed role of attention on perception, is problematic. We propose that under normal viewing conditions, the main processes of feature binding and perception proceed largely independently of top-down selective attention. Recent work suggests that there is a significant loss of information in early stages of visual processing, especially in the periphery. In particular, our texture tiling model (TTM) represents images in terms of a fixed set of "texture" statistics computed over local pooling regions that tile the visual input. We argue that this lossy representation produces the perceptual ambiguities that have previously been as ascribed to a lack of feature binding in the absence of selective attention. At the same time, the TTM representation is sufficiently rich to explain performance in such complex tasks as scene gist recognition, pop-out target search, and navigation. A number of phenomena that have previously been explained in terms of voluntary attention can be explained more parsimoniously with the TTM. In this model, peripheral vision introduces a specific kind of information loss, and the information available to an observer varies greatly depending upon shifts of the point of gaze (which usually occur without awareness). The available information, in turn, provides a key determinant of the visual system's capabilities and deficiencies. This scheme dissociates basic perceptual operations, such as feature binding, from both top-down attention and conscious awareness.
Collapse
Affiliation(s)
- Ruth Rosenholtz
- Department of Brain and Cognitive Sciences, Massachusetts Institute of TechnologyCambridge, MA, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of TechnologyCambridge, MA, USA
| | - Jie Huang
- Department of Brain and Cognitive Sciences, Massachusetts Institute of TechnologyCambridge, MA, USA
| | - Krista A. Ehinger
- Department of Brain and Cognitive Sciences, Massachusetts Institute of TechnologyCambridge, MA, USA
| |
Collapse
|
226
|
|
227
|
[Perception of objects and scenes in age-related macular degeneration]. J Fr Ophtalmol 2012; 35:58-68. [PMID: 22221712 DOI: 10.1016/j.jfo.2011.08.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2011] [Revised: 07/31/2011] [Accepted: 08/02/2011] [Indexed: 11/22/2022]
Abstract
Vision related quality of life questionnaires suggest that patients with AMD exhibit difficulties in finding objects and in mobility. In the natural environment, objects seldom appear in isolation. They appear in a spatial context which may obscure them in part or place obstacles in the patient's path. Furthermore, the luminance of a natural scene varies as a function of the hour of the day and the light source, which can alter perception. This study aims to evaluate recognition of objects and natural scenes by patients with AMD, by using photographs of such scenes. Studies demonstrate that AMD patients are able to categorize scenes as nature scenes or urban scenes and to discriminate indoor from outdoor scenes with a high degree of precision. They detect objects better in isolation, in color, or against a white background than in their natural contexts. These patients encounter more difficulties than normally sighted individuals in detecting objects in a low-contrast, black-and-white scene. These results may have implications for rehabilitation, for layout of texts and magazines for the reading-impaired and for the rearrangement of the spatial environment of older AMD patients in order to facilitate mobility, finding objects and reducing the risk of falls.
Collapse
|
228
|
Abstract
How efficient is visual search in real scenes? In searches for targets among arrays of randomly placed distractors, efficiency is often indexed by the slope of the reaction time (RT) × Set Size function. However, it may be impossible to define set size for real scenes. As an approximation, we hand-labeled 100 indoor scenes and used the number of labeled regions as a surrogate for set size. In Experiment 1, observers searched for named objects (a chair, bowl, etc.). With set size defined as the number of labeled regions, search was very efficient (~5 ms/item). When we controlled for a possible guessing strategy in Experiment 2, slopes increased somewhat (~15 ms/item), but they were much shallower than search for a random object among other distinctive objects outside of a scene setting (Exp. 3: ~40 ms/item). In Experiments 4-6, observers searched repeatedly through the same scene for different objects. Increased familiarity with scenes had modest effects on RTs, while repetition of target items had large effects (>500 ms). We propose that visual search in scenes is efficient because scene-specific forms of attentional guidance can eliminate most regions from the "functional set size" of items that could possibly be the target.
Collapse
|
229
|
Calvo MG, Avero P, Nummenmaa L. Primacy of emotional vs. semantic scene recognition in peripheral vision. Cogn Emot 2011; 25:1358-75. [DOI: 10.1080/02699931.2010.544448] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
230
|
Epstein RA, Morgan LK. Neural responses to visual scenes reveals inconsistencies between fMRI adaptation and multivoxel pattern analysis. Neuropsychologia 2011; 50:530-43. [PMID: 22001314 DOI: 10.1016/j.neuropsychologia.2011.09.042] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2011] [Revised: 09/25/2011] [Accepted: 09/27/2011] [Indexed: 11/24/2022]
Abstract
Human observers can recognize real-world visual scenes with great efficiency. Cortical regions such as the parahippocampal place area (PPA) and retrosplenial complex (RSC) have been implicated in scene recognition, but the specific representations supported by these regions are largely unknown. We used functional magnetic resonance imaging adaptation (fMRIa) and multi-voxel pattern analysis (MVPA) to explore this issue, focusing on whether the PPA and RSC represent scenes in terms of general categories, or as specific scenic exemplars. Subjects were scanned while viewing images drawn from 10 outdoor scene categories in two scan runs and images of 10 familiar landmarks from their home college campus in two scan runs. Analyses of multi-voxel patterns revealed that the PPA and RSC encoded both category and landmark information, with a slight advantage for landmark coding in RSC. fMRIa, on the other hand, revealed a very different picture: both PPA and RSC adapted when landmark information was repeated, but category adaptation was only observed in a small subregion of the left PPA. These inconsistencies between the MVPA and fMRIa data suggests that these two techniques interrogate different aspects of the neuronal code. We propose three hypotheses about the mechanisms that might underlie adaptation and multi-voxel signals.
Collapse
Affiliation(s)
- Russell A Epstein
- Department of Psychology, University of Pennsylvania, 3720 Walnut St., Philadelphia, PA 19104, USA.
| | | |
Collapse
|
231
|
Constructing scenes from objects in human occipitotemporal cortex. Nat Neurosci 2011; 14:1323-9. [PMID: 21892156 DOI: 10.1038/nn.2903] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2011] [Accepted: 07/06/2011] [Indexed: 11/09/2022]
Abstract
We used functional magnetic resonance imaging (fMRI) to demonstrate the existence of a mechanism in the human lateral occipital (LO) cortex that supports recognition of real-world visual scenes through parallel analysis of within-scene objects. Neural activity was recorded while subjects viewed four categories of scenes and eight categories of 'signature' objects strongly associated with the scenes in three experiments. Multivoxel patterns evoked by scenes in the LO cortex were well predicted by the average of the patterns elicited by their signature objects. By contrast, there was no relationship between scene and object patterns in the parahippocampal place area (PPA), even though this region responds strongly to scenes and is believed to be crucial for scene identification. By combining information about multiple objects within a scene, the LO cortex may support an object-based channel for scene recognition that complements the processing of global scene properties in the PPA.
Collapse
|
232
|
|
233
|
Mills M, Hollingworth A, Van der Stigchel S, Hoffman L, Dodd MD. Examining the influence of task set on eye movements and fixations. J Vis 2011; 11:17. [PMID: 21799023 DOI: 10.1167/11.8.17] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The purpose of the present study was to examine the influence of task set on the spatial and temporal characteristics of eye movements during scene perception. In previous work, when strong control was exerted over the viewing task via specification of a target object (as in visual search), task set biased spatial, rather than temporal, parameters of eye movements. Here, we find that more participant-directed tasks (in which the task establishes general goals of viewing rather than specific objects to fixate) affect not only spatial (e.g., saccade amplitude) but also temporal parameters (e.g., fixation duration). Further, task set influenced the rate of change in fixation duration over the course of viewing but not saccade amplitude, suggesting independent mechanisms for control of these parameters.
Collapse
Affiliation(s)
- Mark Mills
- Department of Psychology, University of Nebraska, Lincoln, NE 68588, USA.
| | | | | | | | | |
Collapse
|
234
|
Real-world scene representations in high-level visual cortex: it's the spaces more than the places. J Neurosci 2011; 31:7322-33. [PMID: 21593316 DOI: 10.1523/jneurosci.4588-10.2011] [Citation(s) in RCA: 198] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Real-world scenes are incredibly complex and heterogeneous, yet we are able to identify and categorize them effortlessly. In humans, the ventral temporal parahippocampal place area (PPA) has been implicated in scene processing, but scene information is contained in many visual areas, leaving their specific contributions unclear. Although early theories of PPA emphasized its role in spatial processing, more recent reports of its function have emphasized semantic or contextual processing. Here, using functional imaging, we reconstructed the organization of scene representations across human ventral visual cortex by analyzing the distributed response to 96 diverse real-world scenes. We found that, although individual scenes could be decoded in both PPA and early visual cortex (EVC), the structure of representations in these regions was vastly different. In both regions, spatial rather than semantic factors defined the structure of representations. However, in PPA, representations were defined primarily by the spatial factor of expanse (open, closed) and in EVC primarily by distance (near, far). Furthermore, independent behavioral ratings of expanse and distance correlated strongly with representations in PPA and peripheral EVC, respectively. In neither region was content (manmade, natural) a major contributor to the overall organization. Furthermore, the response of PPA could not be used to decode the high-level semantic category of scenes even when spatial factors were held constant, nor could category be decoded across different distances. These findings demonstrate, contrary to recent reports, that the response PPA primarily reflects spatial, not categorical or contextual, aspects of real-world scenes.
Collapse
|
235
|
Evans KK, Horowitz TS, Wolfe JM. When categories collide: accumulation of information about multiple categories in rapid scene perception. Psychol Sci 2011; 22:739-46. [PMID: 21555522 PMCID: PMC3140830 DOI: 10.1177/0956797611407930] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Experiments have shown that people can rapidly determine if categories such as "animal" or "beach" are present in scenes that are presented for only a few milliseconds. Typically, observers in these experiments report on one prespecified category. For the first time, we show that observers can rapidly extract information about multiple categories. Moreover, we demonstrate task-dependent interactions between accumulating information about different categories in a scene. This interaction can be constructive or destructive, depending on whether the presence of one category can be taken as evidence for or against the presence of the other.
Collapse
|
236
|
Spotorno S, Faure S. Change detection in complex scenes: hemispheric contribution and the role of perceptual and semantic factors. Perception 2011; 40:5-22. [PMID: 21513180 DOI: 10.1068/p6524] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The perceptual salience and semantic relevance of objects for the meaning of a scene were evaluated with multiple criteria and then manipulated in a change-detection experiment that used an original combination of one-shot and tachistoscopic divided-visual-field paradigms to study behavioural hemispheric asymmetry. Coloured drawings that depicted meaningful situations were presented centrally and very briefly (120 ms) and only the changes were lateralised by adding an object in the right or in the left visual hemifield. High salience and high relevance improved both response times (RTs) and accuracy, although the overall contribution of salience was greater than that of relevance. Moreover, only for low-salience changes did relevance affect speed. RTs were shorter when a change occurred in the left visual hemifield, suggesting a right-hemisphere advantage for detection of visual change. Also, men responded faster than women. The theoretical and methodological implications are discussed.
Collapse
Affiliation(s)
- Sara Spotorno
- Dipartimento di Scienze Antropologiche, University of Genoa, Genoa, Italy.
| | | |
Collapse
|
237
|
Brady TF, Konkle T, Alvarez GA. A review of visual memory capacity: Beyond individual items and toward structured representations. J Vis 2011; 11:4. [PMID: 21617025 PMCID: PMC3405498 DOI: 10.1167/11.5.4] [Citation(s) in RCA: 264] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Traditional memory research has focused on identifying separate memory systems and exploring different stages of memory processing. This approach has been valuable for establishing a taxonomy of memory systems and characterizing their function but has been less informative about the nature of stored memory representations. Recent research on visual memory has shifted toward a representation-based emphasis, focusing on the contents of memory and attempting to determine the format and structure of remembered information. The main thesis of this review will be that one cannot fully understand memory systems or memory processes without also determining the nature of memory representations. Nowhere is this connection more obvious than in research that attempts to measure the capacity of visual memory. We will review research on the capacity of visual working memory and visual long-term memory, highlighting recent work that emphasizes the contents of memory. This focus impacts not only how we estimate the capacity of the system--going beyond quantifying how many items can be remembered and moving toward structured representations--but how we model memory systems and memory processes.
Collapse
Affiliation(s)
- Timothy F. Brady
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - Talia Konkle
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | - George A. Alvarez
- Vision Sciences Laboratory, Department of Psychology, Harvard University
| |
Collapse
|
238
|
He X, Yang Z, Tsien JZ. A hierarchical probabilistic model for rapid object categorization in natural scenes. PLoS One 2011; 6:e20002. [PMID: 21647443 PMCID: PMC3102072 DOI: 10.1371/journal.pone.0020002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2010] [Accepted: 04/19/2011] [Indexed: 11/19/2022] Open
Abstract
Humans can categorize objects in complex natural scenes within 100–150 ms. This amazing ability of rapid categorization has motivated many computational models. Most of these models require extensive training to obtain a decision boundary in a very high dimensional (e.g., ∼6,000 in a leading model) feature space and often categorize objects in natural scenes by categorizing the context that co-occurs with objects when objects do not occupy large portions of the scenes. It is thus unclear how humans achieve rapid scene categorization. To address this issue, we developed a hierarchical probabilistic model for rapid object categorization in natural scenes. In this model, a natural object category is represented by a coarse hierarchical probability distribution (PD), which includes PDs of object geometry and spatial configuration of object parts. Object parts are encoded by PDs of a set of natural object structures, each of which is a concatenation of local object features. Rapid categorization is performed as statistical inference. Since the model uses a very small number (∼100) of structures for even complex object categories such as animals and cars, it requires little training and is robust in the presence of large variations within object categories and in their occurrences in natural scenes. Remarkably, we found that the model categorized animals in natural scenes and cars in street scenes with a near human-level performance. We also found that the model located animals and cars in natural scenes, thus overcoming a flaw in many other models which is to categorize objects in natural context by categorizing contextual features. These results suggest that coarse PDs of object categories based on natural object structures and statistical operations on these PDs may underlie the human ability to rapidly categorize scenes.
Collapse
Affiliation(s)
- Xiaofu He
- Brain and Behavior Discovery Institute, Georgia Health Sciences University, Augusta, Georgia, United States of America
- Department of Computer Science and Technology, East China Normal University, Shanghai, China
| | - Zhiyong Yang
- Brain and Behavior Discovery Institute, Georgia Health Sciences University, Augusta, Georgia, United States of America
- Department of Ophthalmology, Georgia Health Sciences University, Augusta, Georgia, United States of America
- * E-mail: (ZY); jtsien@ georgiahealth.edu (JT)
| | - Joe Z. Tsien
- Brain and Behavior Discovery Institute, Georgia Health Sciences University, Augusta, Georgia, United States of America
- Department of Neurology, Georgia Health Sciences University, Augusta, Georgia, United States of America
- * E-mail: (ZY); jtsien@ georgiahealth.edu (JT)
| |
Collapse
|
239
|
Abstract
While basic visual features such as color, motion, and orientation can guide attention, it is likely that additional features guide search for objects in real-world scenes. Recent work has shown that human observers efficiently extract global scene properties such as mean depth or navigability from a brief glance at a single scene (M. R. Greene & A. Oliva, 2009a, 2009b). Can human observers also efficiently search for an image possessing a particular global scene property among other images lacking that property? Observers searched for scene image targets defined by global properties of naturalness, transience, navigability, and mean depth. All produced inefficient search. Search efficiency for a property was not correlated with its classification threshold time from M. R. Greene and A. Oliva (2009b). Differences in search efficiency between properties can be partially explained by low-level visual features that are correlated with the global property. Overall, while global scene properties can be rapidly classified from a single image, it does not appear to be possible to use those properties to guide attention to one of several images.
Collapse
|
240
|
Intraub H. Rethinking visual scene perception. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2011; 3:117-127. [PMID: 26302476 DOI: 10.1002/wcs.149] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A classic puzzle in understanding visual scene perception is how to reconcile the physiological constraints of vision with the phenomenology of seeing. Vision captures information via discrete eye fixations, interrupted by saccadic suppression, and limited by retinal inhomogeneity. Yet scenes are effortlessly perceived as coherent, continuous, and meaningful. Two conceptualizations of scene representation will be contrasted. The traditional visual-cognitive model casts visual scene representation as an imperfect reflection of the visual sensory input alone. By contrast, a new multisource model casts visual scene representation in terms of an egocentric spatial framework that is 'filled-in' by visual sensory input, but also by amodal perception, and by expectations and by constraints derived from rapid-scene classification and object-to-context associations. Together, these nonvisual sources serve to 'simulate' a likely surrounding scene that the visual input only partially reveals. Pros and cons of these alternative views will be discussed. WIREs Cogn Sci 2012, 3:117-127. doi: 10.1002/wcs.149 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Helene Intraub
- Department of Psychology, University of Delaware, Newark, DE, USA
| |
Collapse
|
241
|
Disentangling scene content from spatial boundary: complementary roles for the parahippocampal place area and lateral occipital complex in representing real-world scenes. J Neurosci 2011; 31:1333-40. [PMID: 21273418 DOI: 10.1523/jneurosci.3885-10.2011] [Citation(s) in RCA: 176] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Behavioral and computational studies suggest that visual scene analysis rapidly produces a rich description of both the objects and the spatial layout of surfaces in a scene. However, there is still a large gap in our understanding of how the human brain accomplishes these diverse functions of scene understanding. Here we probe the nature of real-world scene representations using multivoxel functional magnetic resonance imaging pattern analysis. We show that natural scenes are analyzed in a distributed and complementary manner by the parahippocampal place area (PPA) and the lateral occipital complex (LOC) in particular, as well as other regions in the ventral stream. Specifically, we study the classification performance of different scene-selective regions using images that vary in spatial boundary and naturalness content. We discover that, whereas both the PPA and LOC can accurately classify scenes, they make different errors: the PPA more often confuses scenes that have the same spatial boundaries, whereas the LOC more often confuses scenes that have the same content. By demonstrating that visual scene analysis recruits distinct and complementary high-level representations, our results testify to distinct neural pathways for representing the spatial boundaries and content of a visual scene.
Collapse
|
242
|
Wolfe JM, Võ MLH, Evans KK, Greene MR. Visual search in scenes involves selective and nonselective pathways. Trends Cogn Sci 2011; 15:77-84. [PMID: 21227734 DOI: 10.1016/j.tics.2010.12.001] [Citation(s) in RCA: 285] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Revised: 12/01/2010] [Accepted: 12/02/2010] [Indexed: 10/18/2022]
Abstract
How does one find objects in scenes? For decades, visual search models have been built on experiments in which observers search for targets, presented among distractor items, isolated and randomly arranged on blank backgrounds. Are these models relevant to search in continuous scenes? This article argues that the mechanisms that govern artificial, laboratory search tasks do play a role in visual search in scenes. However, scene-based information is used to guide search in ways that had no place in earlier models. Search in scenes might be best explained by a dual-path model: a 'selective' path in which candidate objects must be individually selected for recognition and a 'nonselective' path in which information can be extracted from global and/or statistical information.
Collapse
Affiliation(s)
- Jeremy M Wolfe
- Brigham & Women's Hospital, Harvard Medical School, 64 Sidney St. Suite 170, Cambridge, MA 02139, USA.
| | | | | | | |
Collapse
|
243
|
|
244
|
Gagnier KM, Intraub H, Oliva A, Wolfe JM. Why does vantage point affect boundary extension? VISUAL COGNITION 2010. [DOI: 10.1080/13506285.2010.520680] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
| | | | - Aude Oliva
- b Massachusetts Institute of Technology , Cambridge, MA, USA
| | - Jeremy M. Wolfe
- c Brigham and Women's Hospital and Harvard Medical School , Boston, MA, USA
| |
Collapse
|
245
|
Konkle T, Brady TF, Alvarez GA, Oliva A. Scene memory is more detailed than you think: the role of categories in visual long-term memory. Psychol Sci 2010; 21:1551-6. [PMID: 20921574 DOI: 10.1177/0956797610385359] [Citation(s) in RCA: 198] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Observers can store thousands of object images in visual long-term memory with high fidelity, but the fidelity of scene representations in long-term memory is not known. Here, we probed scene-representation fidelity by varying the number of studied exemplars in different scene categories and testing memory using exemplar-level foils. Observers viewed thousands of scenes over 5.5 hr and then completed a series of forced-choice tests. Memory performance was high, even with up to 64 scenes from the same category in memory. Moreover, there was only a 2% decrease in accuracy for each doubling of the number of studied scene exemplars. Surprisingly, this degree of categorical interference was similar to the degree previously demonstrated for object memory. Thus, although scenes have often been defined as a superset of objects, our results suggest that scenes and objects may be entities at a similar level of abstraction in visual long-term memory.
Collapse
Affiliation(s)
- Talia Konkle
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, USA.
| | | | | | | |
Collapse
|
246
|
Gajewski DA, Philbeck JW, Pothier S, Chichka D. From the most fleeting of glimpses: on the time course for the extraction of distance information. Psychol Sci 2010; 21:1446-53. [PMID: 20732904 DOI: 10.1177/0956797610381508] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
An observer's visual perception of the absolute distance between his or her position and an object is based on multiple sources of information that must be extracted during scene viewing. Research has not yet discovered the viewing duration observers need to fully extract distance information, particularly in navigable real-world environments. In a visually directed walking task, participants showed a sensitive response to distance when they were given 9-ms glimpses of floor- and eye-level targets. However, sensitivity to distance decreased markedly when targets were presented at eye level and angular size was rendered uninformative. Performance after brief viewing durations was characterized by underestimation of distance, unless the brief-viewing trials were preceded by a block of extended-viewing trials. The results indicate that experience plays a role in the extraction of information during brief glimpses. Even without prior experience, the extraction of useful information is virtually immediate when the cues of angular size or angular declination are informative for the observer.
Collapse
Affiliation(s)
- Daniel A Gajewski
- Department of Psychology, George Washington University, Washington, DC 20052, USA.
| | | | | | | |
Collapse
|
247
|
Kouider S, de Gardelle V, Sackur J, Dupoux E. How rich is consciousness? The partial awareness hypothesis. Trends Cogn Sci 2010; 14:301-7. [PMID: 20605514 DOI: 10.1016/j.tics.2010.04.006] [Citation(s) in RCA: 222] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2010] [Revised: 04/12/2010] [Accepted: 04/21/2010] [Indexed: 11/28/2022]
Abstract
Current theories of consciousness posit a dissociation between 'phenomenal' consciousness (rich) and 'access' consciousness (limited). Here, we argue that the empirical evidence for phenomenal consciousness without access is equivocal, resulting either from a confusion between phenomenal and unconscious contents, or from an impression of phenomenally rich experiences arising from illusory contents. We propose a refined account of access that relies on a hierarchy of representational levels and on the notion of partial awareness, whereby lower and higher levels are accessed independently. Reframing of the issue of dissociable forms of consciousness into dissociable levels of access provides a more parsimonious account of the existing evidence. In addition, the rich phenomenology illusion can be studied and described in terms of testable cognitive mechanisms.
Collapse
Affiliation(s)
- Sid Kouider
- Laboratoire de Sciences Cognitives & Psycholinguistique, Ecole Normale Supérieure-CNRS, 29 rue d'Ulm, 75005, Paris, France.
| | | | | | | |
Collapse
|
248
|
On the mental representations originating during the interaction between language and vision. Cogn Process 2010; 11:295-305. [PMID: 20446103 DOI: 10.1007/s10339-010-0363-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2010] [Accepted: 04/19/2010] [Indexed: 10/19/2022]
Abstract
The interaction between vision and language processing is clearly of interest to both cognitive psychologists and psycholinguists. Recent research has begun to create understanding of the interaction between vision and language in terms of the representational issues involved. In this paper, we first review some of the theoretical and methodological issues in the current vision-language interaction debate. Later, we develop a model that attempts to account for effects of affordances and visual context on language-scene interaction as well as the role of sensorimotor simulation. The paper addresses theoretical issues related to the mental representations that arise when visual and linguistic systems interact.
Collapse
|
249
|
Status and Development of Natural Scene Understanding for Vision-based Outdoor Moblie Robot. ACTA ACUST UNITED AC 2010. [DOI: 10.3724/sp.j.1004.2010.00001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
250
|
|