401
|
Potter MC. Recognition and memory for briefly presented scenes. Front Psychol 2012; 3:32. [PMID: 22371707 PMCID: PMC3284209 DOI: 10.3389/fpsyg.2012.00032] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Accepted: 01/28/2012] [Indexed: 11/17/2022] Open
Abstract
Three times per second, our eyes make a new fixation that generates a new bottom-up analysis in the visual system. How much is extracted from each glimpse? For how long and in what form is that information remembered? To answer these questions, investigators have mimicked the effect of continual shifts of fixation by using rapid serial visual presentation of sequences of unrelated pictures. Experiments in which viewers detect specified target pictures show that detection on the basis of meaning is possible at presentation durations as brief as 13 ms, suggesting that understanding may be based on feedforward processing, without feedback. In contrast, memory for what was just seen is poor unless the viewer has about 500 ms to think about the scene: the scene does not need to remain in view. Initial memory loss after brief presentations occurs over several seconds, suggesting that at least some of the information from the previous few fixations persists long enough to support a coherent representation of the current environment. In contrast to marked memory loss shortly after brief presentations, memory for pictures viewed for 1 s or more is excellent. Although some specific visual information persists, the form and content of the perceptual and memory representations of pictures over time indicate that conceptual information is extracted early and determines most of what remains in longer-term memory.
Collapse
Affiliation(s)
- Mary C Potter
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology Cambridge, MA, USA
| |
Collapse
|
402
|
Abstract
Mounting evidence suggests that 'core object recognition,' the ability to rapidly recognize objects despite substantial appearance variation, is solved in the brain via a cascade of reflexive, largely feedforward computations that culminate in a powerful neuronal representation in the inferior temporal cortex. However, the algorithm that produces this solution remains poorly understood. Here we review evidence ranging from individual neurons and neuronal populations to behavior and computational models. We propose that understanding this algorithm will require using neuronal and psychophysical data to sift through many computational models, each based on building blocks of small, canonical subnetworks with a common functional goal.
Collapse
Affiliation(s)
- James J DiCarlo
- Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|
403
|
Oppermann F, Hassler U, Jescheniak JD, Gruber T. The Rapid Extraction of Gist—Early Neural Correlates of High-level Visual Processing. J Cogn Neurosci 2012; 24:521-9. [DOI: 10.1162/jocn_a_00100] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
The human cognitive system is highly efficient in extracting information from our visual environment. This efficiency is based on acquired knowledge that guides our attention toward relevant events and promotes the recognition of individual objects as they appear in visual scenes. The experience-based representation of such knowledge contains not only information about the individual objects but also about relations between them, such as the typical context in which individual objects co-occur. The present EEG study aimed at exploring the availability of such relational knowledge in the time course of visual scene processing, using oscillatory evoked gamma-band responses as a neural correlate for a currently activated cortical stimulus representation. Participants decided whether two simultaneously presented objects were conceptually coherent (e.g., mouse–cheese) or not (e.g., crown–mushroom). We obtained increased evoked gamma-band responses for coherent scenes compared with incoherent scenes beginning as early as 70 msec after stimulus onset within a distributed cortical network, including the right temporal, the right frontal, and the bilateral occipital cortex. This finding provides empirical evidence for the functional importance of evoked oscillatory activity in high-level vision beyond the visual cortex and, thus, gives new insights into the functional relevance of neuronal interactions. It also indicates the very early availability of experience-based knowledge that might be regarded as a fundamental mechanism for the rapid extraction of the gist of a scene.
Collapse
|
404
|
Yau JM, Pasupathy A, Brincat SL, Connor CE. Curvature processing dynamics in macaque area V4. ACTA ACUST UNITED AC 2012; 23:198-209. [PMID: 22298729 DOI: 10.1093/cercor/bhs004] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
We have previously analyzed shape processing dynamics in macaque monkey posterior inferotemporal cortex (PIT). We described how early PIT responses to individual contour fragments evolve into tuning for multifragment shape configurations. Here, we analyzed curvature processing dynamics in area V4, which provides feedforward inputs to PIT. We contrasted 2 hypotheses: 1) that V4 curvature tuning evolves from tuning for simpler elements, analogous to PIT shape synthesis and 2) that V4 curvature tuning emerges immediately, based on purely feedforward mechanisms. Our results clearly supported the first hypothesis. Early V4 responses carried information about individual contour orientations. Tuning for multiorientation (curved) contours developed gradually over ∼50 ms. Together, the current and previous results suggest a partial sequence for shape synthesis in ventral pathway cortex. We propose that early orientation signals are synthesized into curved contour fragment representations in V4 and that these signals are transmitted to PIT, where they are then synthesized into multifragment shape representations. The observed dynamics might additionally or alternatively reflect influences from earlier (V1, V2) and later (central and anterior IT) processing stages in the ventral pathway. In either case, the dynamics of contour information in V4 and PIT appear to reflect a sequential hierarchical process of shape synthesis.
Collapse
Affiliation(s)
- Jeffrey M Yau
- Zanvyl Krieger Mind/Brain Institute and Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, MD 21218, USA.
| | | | | | | |
Collapse
|
405
|
Yue X, Biederman I, Mangini MC, Malsburg CVD, Amir O. Predicting the psychophysical similarity of faces and non-face complex shapes by image-based measures. Vision Res 2012; 55:41-6. [PMID: 22248730 DOI: 10.1016/j.visres.2011.12.012] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Revised: 09/29/2011] [Accepted: 12/23/2011] [Indexed: 11/16/2022]
Abstract
Shape representation is accomplished by a series of cortical stages in which cells in the first stage (V1) have local receptive fields tuned to contrast at a particular scale and orientation, each well modeled as a Gabor filter. In succeeding stages, the representation becomes largely invariant to Gabor coding (Kobatake & Tanaka, 1994). Because of the non-Gabor tuning in these later stages, which must be engaged for a behavioral response (Tong, 2003; Tong et al., 1998), a V1-based measure of shape similarity based on Gabor filtering would not be expected to be highly correlated with human performance when discriminating complex shapes (faces and teeth-like blobs) that differ metrically on a two-choice, match-to-sample task. Here we show that human performance is highly correlated with Gabor-based image measures (Gabor simple and complex cells), with values often in the mid 0.90s, even without discounting the variability in the speed and accuracy of performance not associated with the similarity of the distractors. This high correlation is generally maintained through the stages of HMAX, a model that builds upon the Gabor metric and develops units for complex features and larger receptive fields. This is the first report of the psychophysical similarity of complex shapes being predictable from a biologically motivated, physical measure of similarity. As accurate as these measures were for accounting for metric variation, a simple demonstration showed that all were insensitive to viewpoint invariant (nonaccidental) differences in shape.
Collapse
Affiliation(s)
- Xiaomin Yue
- Martinos Center for Biomedical Imaging, Massachusetts General Hospital, 149 13th Street, Suite 2301, Charlestown, MA 02129, USA.
| | | | | | | | | |
Collapse
|
406
|
Abstract
How does the brain compute? Answering this question necessitates neuronal connectomes, annotated graphs of all synaptic connections within defined brain areas. Further, understanding the energetics of the brain's computations requires vascular graphs. The assembly of a connectome requires sensitive hardware tools to measure neuronal and neurovascular features in all three dimensions, as well as software and machine learning for data analysis and visualization. We present the state of the art on the reconstruction of circuits and vasculature that link brain anatomy and function. Analysis at the scale of tens of nanometers yields connections between identified neurons, while analysis at the micrometer scale yields probabilistic rules of connection between neurons and exact vascular connectivity.
Collapse
|
407
|
Riesenhuber M. Getting a handle on how the brain generates complexity. NETWORK (BRISTOL, ENGLAND) 2012; 23:123-127. [PMID: 22897445 DOI: 10.3109/0954898x.2012.711918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Sensory processing in cortex across modalities appears to rely on a "simple-to-complex" hierarchical computational strategy in which neurons at later levels in the hierarchy combine inputs from earlier levels to create more complex neuronal selectivities. The specifics of this process are still poorly understood, however. In this issue of Network, Plebe shows how computational modeling of experimental data on neuronal tuning in secondary visual cortex can help us understand how the brain increases neuronal tuning complexity across the visual cortical hierarchy.
Collapse
Affiliation(s)
- Maximilian Riesenhuber
- Laboratory for Computational Cognitive Neuroscience, Department of Neuroscience, Georgetown University Medical Center, Washington, DC 20007, USA.
| |
Collapse
|
408
|
Wu Y, Liu Y, Yuan Z, Zheng N. IAIR-CarPed: A psychophysically annotated dataset with fine-grained and layered semantic labels for object recognition. Pattern Recognit Lett 2012. [DOI: 10.1016/j.patrec.2011.10.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
409
|
|
410
|
Haxby JV, Guntupalli JS, Connolly AC, Halchenko YO, Conroy BR, Gobbini MI, Hanke M, Ramadge PJ. A common, high-dimensional model of the representational space in human ventral temporal cortex. Neuron 2011; 72:404-16. [PMID: 22017997 DOI: 10.1016/j.neuron.2011.08.026] [Citation(s) in RCA: 389] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/30/2011] [Indexed: 10/16/2022]
Abstract
We present a high-dimensional model of the representational space in human ventral temporal (VT) cortex in which dimensions are response-tuning functions that are common across individuals and patterns of response are modeled as weighted sums of basis patterns associated with these response tunings. We map response-pattern vectors, measured with fMRI, from individual subjects' voxel spaces into this common model space using a new method, "hyperalignment." Hyperalignment parameters based on responses during one experiment--movie viewing--identified 35 common response-tuning functions that captured fine-grained distinctions among a wide range of stimuli in the movie and in two category perception experiments. Between-subject classification (BSC, multivariate pattern classification based on other subjects' data) of response-pattern vectors in common model space greatly exceeded BSC of anatomically aligned responses and matched within-subject classification. Results indicate that population codes for complex visual stimuli in VT cortex are based on response-tuning functions that are common across individuals.
Collapse
Affiliation(s)
- James V Haxby
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA.
| | | | | | | | | | | | | | | |
Collapse
|
411
|
ROTHENSTEIN ALBERTL, RODRÍGUEZ-SÁNCHEZ ANTONIOJ, SIMINE EVGUENI, TSOTSOS JOHNK. VISUAL FEATURE BINDING WITHIN THE SELECTIVE TUNING ATTENTION FRAMEWORK. INT J PATTERN RECOGN 2011. [DOI: 10.1142/s0218001408006648] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We present a biologically plausible computational model for solving the visual feature binding problem, based on recent results regarding the time course and processing sequence in the primate visual system. The feature binding problem appears due to the distributed nature of visual processing in the primate brain, and the gradual loss of spatial information along the processing hierarchy. This paper puts forward the proposal that by using multiple passes of the visual processing hierarchy, both bottom-up and top-down, and using task information to tune the processing prior to each pass, we can explain the different recognition behaviors that primate vision exhibits. To accomplish this, four different kinds of binding processes are introduced and are tied directly to specific recognition tasks and their time course. The model relies on the reentrant connections so ubiquitous in the primate brain to recover spatial information, and thus allow features represented in different parts of the brain to be integrated in a unitary conscious percept. We show how different tasks and stimuli have different binding requirements, and present a unified framework within the Selective Tuning model of visual attention.
Collapse
Affiliation(s)
- ALBERT L. ROTHENSTEIN
- Department of Computer Science & Engineering and Centre for Vision Research, York University, Toronto, Ontario, Canada
| | - ANTONIO J. RODRÍGUEZ-SÁNCHEZ
- Department of Computer Science & Engineering and Centre for Vision Research, York University, Toronto, Ontario, Canada
| | - EVGUENI SIMINE
- Department of Computer Science & Engineering and Centre for Vision Research, York University, Toronto, Ontario, Canada
| | - JOHN K. TSOTSOS
- Department of Computer Science & Engineering and Centre for Vision Research, York University, Toronto, Ontario, Canada
| |
Collapse
|
412
|
GOPYCH PETRO. BIOLOGICALLY PLAUSIBLE BSDT RECOGNITION OF COMPLEX IMAGES: THE CASE OF HUMAN FACES. Int J Neural Syst 2011; 18:527-45. [DOI: 10.1142/s0129065708001762] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
On the basis of recent binary signal detection theory (BSDT), optimal recognition algorithms for complex images are constructed and their optimal performance are calculated. A methodology for comparing BSDT predictions and measured human performance is developed and applied to explaining particular face recognition experiment. The BSDT makes possible computer codes with recognition performance better than that in humans, its fundamental discreteness is consistent with the experiment. Related neurobiological and behavioral effects are briefly discussed.
Collapse
Affiliation(s)
- PETRO GOPYCH
- Universal Power Systems USA-Ukraine LLC, 3 Kotsarskaya Street, Kharkiv, 61012, Ukraine
| |
Collapse
|
413
|
Crouzet SM, Serre T. What are the Visual Features Underlying Rapid Object Recognition? Front Psychol 2011; 2:326. [PMID: 22110461 PMCID: PMC3216029 DOI: 10.3389/fpsyg.2011.00326] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2011] [Accepted: 10/23/2011] [Indexed: 11/13/2022] Open
Abstract
Research progress in machine vision has been very significant in recent years. Robust face detection and identification algorithms are already readily available to consumers, and modern computer vision algorithms for generic object recognition are now coping with the richness and complexity of natural visual scenes. Unlike early vision models of object recognition that emphasized the role of figure-ground segmentation and spatial information between parts, recent successful approaches are based on the computation of loose collections of image features without prior segmentation or any explicit encoding of spatial relations. While these models remain simplistic models of visual processing, they suggest that, in principle, bottom-up activation of a loose collection of image features could support the rapid recognition of natural object categories and provide an initial coarse visual representation before more complex visual routines and attentional mechanisms take place. Focusing on biologically plausible computational models of (bottom-up) pre-attentive visual recognition, we review some of the key visual features that have been described in the literature. We discuss the consistency of these feature-based representations with classical theories from visual psychology and test their ability to account for human performance on a rapid object categorization task.
Collapse
Affiliation(s)
- Sébastien M Crouzet
- Cognitive, Linguistic, and Psychological Sciences Department, Institute for Brain Sciences, Brown University Providence, RI, USA
| | | |
Collapse
|
414
|
Spratling MW. Predictive coding as a model of the V1 saliency map hypothesis. Neural Netw 2011; 26:7-28. [PMID: 22047778 DOI: 10.1016/j.neunet.2011.10.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Revised: 07/15/2011] [Accepted: 10/10/2011] [Indexed: 10/16/2022]
Abstract
The predictive coding/biased competition (PC/BC) model is a specific implementation of the predictive coding theory that has previously been shown to provide a detailed account of the response properties of orientation tuned cells in primary visual cortex (V1). Here it is shown that the same model can successfully simulate psychophysical data relating to the saliency of unique items in search arrays, of contours embedded in random texture, and of borders between textured regions. This model thus provides a possible implementation of the hypothesis that V1 generates a bottom-up saliency map. However, PC/BC is very different from previous models of visual salience, in that it proposes that saliency results from the failure of an internal model of simple elementary image components to accurately predict the visual input. Saliency can therefore be interpreted as a mechanism by which prediction errors attract attention in an attempt to improve the accuracy of the brain's internal representation of the world.
Collapse
Affiliation(s)
- M W Spratling
- King’s College London, Department of Informatics and Division of Engineering, London, UK.
| |
Collapse
|
415
|
Gintautas V, Ham MI, Kunsberg B, Barr S, Brumby SP, Rasmussen C, George JS, Nemenman I, Bettencourt LMA, Kenyon GT. Model cortical association fields account for the time course and dependence on target complexity of human contour perception. PLoS Comput Biol 2011; 7:e1002162. [PMID: 21998562 PMCID: PMC3188484 DOI: 10.1371/journal.pcbi.1002162] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Accepted: 06/29/2011] [Indexed: 12/13/2022] Open
Abstract
Can lateral connectivity in the primary visual cortex account for the time dependence and intrinsic task difficulty of human contour detection? To answer this question, we created a synthetic image set that prevents sole reliance on either low-level visual features or high-level context for the detection of target objects. Rendered images consist of smoothly varying, globally aligned contour fragments (amoebas) distributed among groups of randomly rotated fragments (clutter). The time course and accuracy of amoeba detection by humans was measured using a two-alternative forced choice protocol with self-reported confidence and variable image presentation time (20-200 ms), followed by an image mask optimized so as to interrupt visual processing. Measured psychometric functions were well fit by sigmoidal functions with exponential time constants of 30-91 ms, depending on amoeba complexity. Key aspects of the psychophysical experiments were accounted for by a computational network model, in which simulated responses across retinotopic arrays of orientation-selective elements were modulated by cortical association fields, represented as multiplicative kernels computed from the differences in pairwise edge statistics between target and distractor images. Comparing the experimental and the computational results suggests that each iteration of the lateral interactions takes at least ms of cortical processing time. Our results provide evidence that cortical association fields between orientation selective elements in early visual areas can account for important temporal and task-dependent aspects of the psychometric curves characterizing human contour perception, with the remaining discrepancies postulated to arise from the influence of higher cortical areas. Current computer vision algorithms reproducing the feed-forward features of the primate visual pathway still fall far behind the capabilities of human subjects in detecting objects in cluttered backgrounds. Here we investigate the possibility that recurrent lateral interactions, long hypothesized to form cortical association fields, can account for the dependence of object detection accuracy on shape complexity and image exposure time. Cortical association fields are thought to aid object detection by reinforcing global image features that cannot easily be detected by single neurons in feed-forward models. Our implementation uses the spatial arrangement, relative orientation, and continuity of putative contour elements to compute the lateral contextual support. We designed synthetic images that allowed us to control object shape and background clutter while eliminating unintentional cues to the presence of an otherwise hidden target. In contrast, real objects can vary uncontrollably in shape, are camouflaged to different degrees by background clutter, and are often associated with non-shape cues, making results using natural image sets difficult to interpret. Our computational model of cortical association fields matches many aspects of the time course and object detection accuracy of human subjects on statistically identical synthetic image sets. This implies that lateral interactions may selectively reinforce smooth object global boundaries.
Collapse
Affiliation(s)
- Vadas Gintautas
- Center for Nonlinear Studies and T-5, Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- Physics Department, Chatham University, Pittsburgh, Pennsylvania, United States of America
- * E-mail: (VG); (GTK)
| | - Michael I. Ham
- P-21 Applied Modern Physics (Biological and Quantum Physics), Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Benjamin Kunsberg
- New Mexico Consortium, Los Alamos, New Mexico, United States of America
| | - Shawn Barr
- New Mexico Consortium, Los Alamos, New Mexico, United States of America
| | - Steven P. Brumby
- Space and Remote Sensing Sciences, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Craig Rasmussen
- New Mexico Consortium, Los Alamos, New Mexico, United States of America
| | - John S. George
- P-21 Applied Modern Physics (Biological and Quantum Physics), Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Ilya Nemenman
- Departments of Physics and Biology and Computational and Life Sciences Initiative, Emory University, Atlanta, Georgia, United States of America
| | - Luís M. A. Bettencourt
- Center for Nonlinear Studies and T-5, Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Garret T. Kenyon
- P-21 Applied Modern Physics (Biological and Quantum Physics), Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- New Mexico Consortium, Los Alamos, New Mexico, United States of America
- * E-mail: (VG); (GTK)
| |
Collapse
|
416
|
Masquelier T. Relative spike time coding and STDP-based orientation selectivity in the early visual system in natural continuous and saccadic vision: a computational model. J Comput Neurosci 2011; 32:425-41. [PMID: 21938439 DOI: 10.1007/s10827-011-0361-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2011] [Revised: 09/05/2011] [Accepted: 09/08/2011] [Indexed: 10/17/2022]
Abstract
We have built a phenomenological spiking model of the cat early visual system comprising the retina, the Lateral Geniculate Nucleus (LGN) and V1's layer 4, and established four main results (1) When exposed to videos that reproduce with high fidelity what a cat experiences under natural conditions, adjacent Retinal Ganglion Cells (RGCs) have spike-time correlations at a short timescale (~30 ms), despite neuronal noise and possible jitter accumulation. (2) In accordance with recent experimental findings, the LGN filters out some noise. It thus increases the spike reliability and temporal precision, the sparsity, and, importantly, further decreases down to ~15 ms adjacent cells' correlation timescale. (3) Downstream simple cells in V1's layer 4, if equipped with Spike Timing-Dependent Plasticity (STDP), may detect these fine-scale cross-correlations, and thus connect principally to ON- and OFF-centre cells with Receptive Fields (RF) aligned in the visual space, and thereby become orientation selective, in accordance with Hubel and Wiesel (Journal of Physiology 160:106-154, 1962) classic model. Up to this point we dealt with continuous vision, and there was no absolute time reference such as a stimulus onset, yet information was encoded and decoded in the relative spike times. (4) We then simulated saccades to a static image and benchmarked relative spike time coding and time-to-first spike coding w.r.t. to saccade landing in the context of orientation representation. In both the retina and the LGN, relative spike times are more precise, less affected by pre-landing history and global contrast than absolute ones, and lead to robust contrast invariant orientation representations in V1.
Collapse
Affiliation(s)
- Timothée Masquelier
- Unit for Brain and Cognition, Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
417
|
|
418
|
Abstract
Is visual attention required for visual consciousness? In the past decade, many researchers have claimed that awareness can arise in the absence of attention. This claim is largely based on the notion that natural scene (or "gist") perception occurs without attention. This article presents evidence against this idea. We show that when observers perform a variety of demanding, sustained-attention tasks, inattentional blindness occurs for natural scenes. In addition, scene perception is impaired under dual-task conditions, but only when the primary task is sufficiently demanding. This finding suggests that previous studies that have been interpreted as demonstrating scene perception without attention failed to fully engage attention and that natural-scene perception does indeed require attention. Thus, natural-scene perception is not a preattentive process and cannot be used to support the idea of awareness without attention.
Collapse
Affiliation(s)
- Michael A Cohen
- Department of Psychology, Harvard University, William James Hall, 33 Kirkland St., Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
419
|
Clarke A, Taylor KI, Tyler LK. The evolution of meaning: spatio-temporal dynamics of visual object recognition. J Cogn Neurosci 2011; 23:1887-99. [PMID: 20617883 DOI: 10.1162/jocn.2010.21544] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2025]
Abstract
Research on the spatio-temporal dynamics of visual object recognition suggests a recurrent, interactive model whereby an initial feedforward sweep through the ventral stream to prefrontal cortex is followed by recurrent interactions. However, critical questions remain regarding the factors that mediate the degree of recurrent interactions necessary for meaningful object recognition. The novel prediction we test here is that recurrent interactivity is driven by increasing semantic integration demands as defined by the complexity of semantic information required by the task and driven by the stimuli. To test this prediction, we recorded magnetoencephalography data while participants named living and nonliving objects during two naming tasks. We found that the spatio-temporal dynamics of neural activity were modulated by the level of semantic integration required. Specifically, source reconstructed time courses and phase synchronization measures showed increased recurrent interactions as a function of semantic integration demands. These findings demonstrate that the cortical dynamics of object processing are modulated by the complexity of semantic information required from the visual input.
Collapse
Affiliation(s)
- Alex Clarke
- Centre for Speech, Language and the Brain, Department of Experimental Psychology, University of Cambridge, United Kingdom
| | | | | |
Collapse
|
420
|
Stollhoff R, Kennerknecht I, Elze T, Jost J. A computational model of dysfunctional facial encoding in congenital prosopagnosia. Neural Netw 2011; 24:652-64. [DOI: 10.1016/j.neunet.2011.03.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2010] [Revised: 02/14/2011] [Accepted: 03/06/2011] [Indexed: 11/15/2022]
|
421
|
Mack ML, Palmeri TJ. The timing of visual object categorization. Front Psychol 2011; 2:165. [PMID: 21811480 PMCID: PMC3139955 DOI: 10.3389/fpsyg.2011.00165] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2011] [Accepted: 07/01/2011] [Indexed: 12/03/2022] Open
Abstract
An object can be categorized at different levels of abstraction: as natural or man-made, animal or plant, bird or dog, or as a Northern Cardinal or Pyrrhuloxia. There has been growing interest in understanding how quickly categorizations at different levels are made and how the timing of those perceptual decisions changes with experience. We specifically contrast two perspectives on the timing of object categorization at different levels of abstraction. By one account, the relative timing implies a relative timing of stages of visual processing that are tied to particular levels of object categorization: Fast categorizations are fast because they precede other categorizations within the visual processing hierarchy. By another account, the relative timing reflects when perceptual features are available over time and the quality of perceptual evidence used to drive a perceptual decision process: Fast simply means fast, it does not mean first. Understanding the short-term and long-term temporal dynamics of object categorizations is key to developing computational models of visual object recognition. We briefly review a number of models of object categorization and outline how they explain the timing of visual object categorization at different levels of abstraction.
Collapse
Affiliation(s)
- Michael L Mack
- Department of Psychology, The University of Texas at Austin Austin, TX, USA
| | | |
Collapse
|
422
|
Schmidt T, Haberkamp A, Veltkamp GM, Weber A, Seydell-Greenwald A, Schmidt F. Visual processing in rapid-chase systems: image processing, attention, and awareness. Front Psychol 2011; 2:169. [PMID: 21811484 PMCID: PMC3139957 DOI: 10.3389/fpsyg.2011.00169] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2011] [Accepted: 07/06/2011] [Indexed: 11/13/2022] Open
Abstract
Visual stimuli can be classified so rapidly that their analysis may be based on a single sweep of feedforward processing through the visuomotor system. Behavioral criteria for feedforward processing can be evaluated in response priming tasks where speeded pointing or keypress responses are performed toward target stimuli which are preceded by prime stimuli. We apply this method to several classes of complex stimuli. (1) When participants classify natural images into animals or non-animals, the time course of their pointing responses indicates that prime and target signals remain strictly sequential throughout all processing stages, meeting stringent behavioral criteria for feedforward processing (rapid-chase criteria). (2) Such priming effects are boosted by selective visual attention for positions, shapes, and colors, in a way consistent with bottom-up enhancement of visuomotor processing, even when primes cannot be consciously identified. (3) Speeded processing of phobic images is observed in participants specifically fearful of spiders or snakes, suggesting enhancement of feedforward processing by long-term perceptual learning. (4) When the perceived brightness of primes in complex displays is altered by means of illumination or transparency illusions, priming effects in speeded keypress responses can systematically contradict subjective brightness judgments, such that one prime appears brighter than the other but activates motor responses as if it was darker. We propose that response priming captures the output of the first feedforward pass of visual signals through the visuomotor system, and that this output lacks some characteristic features of more elaborate, recurrent processing. This way, visuomotor measures may become dissociated from several aspects of conscious vision. We argue that "fast" visuomotor measures predominantly driven by feedforward processing should supplement "slow" psychophysical measures predominantly based on visual awareness.
Collapse
Affiliation(s)
- Thomas Schmidt
- Faculty of Social Sciences, Psychology I, University of Kaiserslautern Kaiserslautern, Germany
| | | | | | | | | | | |
Collapse
|
423
|
Sugase-Miyamoto Y, Matsumoto N, Kawano K. Role of temporal processing stages by inferior temporal neurons in facial recognition. Front Psychol 2011; 2:141. [PMID: 21734904 PMCID: PMC3124819 DOI: 10.3389/fpsyg.2011.00141] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2011] [Accepted: 06/12/2011] [Indexed: 11/24/2022] Open
Abstract
In this review, we focus on the role of temporal stages of encoded facial information in the visual system, which might enable the efficient determination of species, identity, and expression. Facial recognition is an important function of our brain and is known to be processed in the ventral visual pathway, where visual signals are processed through areas V1, V2, V4, and the inferior temporal (IT) cortex. In the IT cortex, neurons show selective responses to complex visual images such as faces, and at each stage along the pathway the stimulus selectivity of the neural responses becomes sharper, particularly in the later portion of the responses. In the IT cortex of the monkey, facial information is represented by different temporal stages of neural responses, as shown in our previous study: the initial transient response of face-responsive neurons represents information about global categories, i.e., human vs. monkey vs. simple shapes, whilst the later portion of these responses represents information about detailed facial categories, i.e., expression and/or identity. This suggests that the temporal stages of the neuronal firing pattern play an important role in the coding of visual stimuli, including faces. This type of coding may be a plausible mechanism underlying the temporal dynamics of recognition, including the process of detection/categorization followed by the identification of objects. Recent single-unit studies in monkeys have also provided evidence consistent with the important role of the temporal stages of encoded facial information. For example, view-invariant facial identity information is represented in the response at a later period within a region of face-selective neurons. Consistent with these findings, temporally modulated neural activity has also been observed in human studies. These results suggest a close correlation between the temporal processing stages of facial information by IT neurons and the temporal dynamics of face recognition.
Collapse
Affiliation(s)
- Yasuko Sugase-Miyamoto
- Human Technology Research Institute, The National Institute of Advanced Industrial Science and Technology Tsukuba, Japan
| | | | | |
Collapse
|
424
|
Wilder J, Feldman J, Singh M. Superordinate shape classification using natural shape statistics. Cognition 2011; 119:325-40. [PMID: 21440250 PMCID: PMC3094567 DOI: 10.1016/j.cognition.2011.01.009] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Revised: 01/14/2011] [Accepted: 01/22/2011] [Indexed: 11/29/2022]
Abstract
This paper investigates the classification of shapes into broad natural categories such as animal or leaf. We asked whether such coarse classifications can be achieved by a simple statistical classification of the shape skeleton. We surveyed databases of natural shapes, extracting shape skeletons and tabulating their parameters within each class, seeking shape statistics that effectively discriminated the classes. We conducted two experiments in which human subjects were asked to classify novel shapes into the same natural classes. We compared subjects' classifications to those of a naive Bayesian classifier based on the natural shape statistics, and found good agreement. We conclude that human superordinate shape classifications can be well understood as involving a simple statistical classification of the shape skeleton that has been "tuned" to the natural statistics of shape.
Collapse
Affiliation(s)
- John Wilder
- Department of Psychology, Center for Cognitive Science, Rutgers University, New Brunswick, United States.
| | | | | |
Collapse
|
425
|
Evans KK, Horowitz TS, Wolfe JM. When categories collide: accumulation of information about multiple categories in rapid scene perception. Psychol Sci 2011; 22:739-46. [PMID: 21555522 PMCID: PMC3140830 DOI: 10.1177/0956797611407930] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Experiments have shown that people can rapidly determine if categories such as "animal" or "beach" are present in scenes that are presented for only a few milliseconds. Typically, observers in these experiments report on one prespecified category. For the first time, we show that observers can rapidly extract information about multiple categories. Moreover, we demonstrate task-dependent interactions between accumulating information about different categories in a scene. This interaction can be constructive or destructive, depending on whether the presence of one category can be taken as evidence for or against the presence of the other.
Collapse
|
426
|
He X, Yang Z, Tsien JZ. A hierarchical probabilistic model for rapid object categorization in natural scenes. PLoS One 2011; 6:e20002. [PMID: 21647443 PMCID: PMC3102072 DOI: 10.1371/journal.pone.0020002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2010] [Accepted: 04/19/2011] [Indexed: 11/19/2022] Open
Abstract
Humans can categorize objects in complex natural scenes within 100–150 ms. This amazing ability of rapid categorization has motivated many computational models. Most of these models require extensive training to obtain a decision boundary in a very high dimensional (e.g., ∼6,000 in a leading model) feature space and often categorize objects in natural scenes by categorizing the context that co-occurs with objects when objects do not occupy large portions of the scenes. It is thus unclear how humans achieve rapid scene categorization. To address this issue, we developed a hierarchical probabilistic model for rapid object categorization in natural scenes. In this model, a natural object category is represented by a coarse hierarchical probability distribution (PD), which includes PDs of object geometry and spatial configuration of object parts. Object parts are encoded by PDs of a set of natural object structures, each of which is a concatenation of local object features. Rapid categorization is performed as statistical inference. Since the model uses a very small number (∼100) of structures for even complex object categories such as animals and cars, it requires little training and is robust in the presence of large variations within object categories and in their occurrences in natural scenes. Remarkably, we found that the model categorized animals in natural scenes and cars in street scenes with a near human-level performance. We also found that the model located animals and cars in natural scenes, thus overcoming a flaw in many other models which is to categorize objects in natural context by categorizing contextual features. These results suggest that coarse PDs of object categories based on natural object structures and statistical operations on these PDs may underlie the human ability to rapidly categorize scenes.
Collapse
Affiliation(s)
- Xiaofu He
- Brain and Behavior Discovery Institute, Georgia Health Sciences University, Augusta, Georgia, United States of America
- Department of Computer Science and Technology, East China Normal University, Shanghai, China
| | - Zhiyong Yang
- Brain and Behavior Discovery Institute, Georgia Health Sciences University, Augusta, Georgia, United States of America
- Department of Ophthalmology, Georgia Health Sciences University, Augusta, Georgia, United States of America
- * E-mail: (ZY); jtsien@ georgiahealth.edu (JT)
| | - Joe Z. Tsien
- Brain and Behavior Discovery Institute, Georgia Health Sciences University, Augusta, Georgia, United States of America
- Department of Neurology, Georgia Health Sciences University, Augusta, Georgia, United States of America
- * E-mail: (ZY); jtsien@ georgiahealth.edu (JT)
| |
Collapse
|
427
|
Kriegeskorte N. Pattern-information analysis: from stimulus decoding to computational-model testing. Neuroimage 2011; 56:411-21. [PMID: 21281719 DOI: 10.1016/j.neuroimage.2011.01.061] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2010] [Revised: 11/29/2010] [Accepted: 01/21/2011] [Indexed: 11/28/2022] Open
Abstract
Pattern-information analysis has become an important new paradigm in functional imaging. Here I review and compare existing approaches with a focus on the question of what we can learn from them in terms of brain theory. The most popular and widespread method is stimulus decoding by response-pattern classification. This approach addresses the question whether activity patterns in a given region carry information about the stimulus category. Pattern classification uses generic models of the stimulus-response relationship that do not mimic brain information processing and treats the stimulus space as categorical-a simplification that is often helpful, but also limiting in terms of the questions that can be addressed. We can address the question whether representations are consistent across different stimulus sets or tasks by cross-decoding, where the classifier is trained with one set of stimuli (or task) and tested with another. Beyond pattern classification, a major new direction is the integration of computational models of brain information processing into pattern-information analysis. This approach enables us to address the question to what extent competing computational models are consistent with the stimulus representations in a brain region. Two methods that test computational models are voxel receptive-field modeling and representational similarity analysis. These methods sample the stimulus (or mental-state) space more richly, estimate a separate response pattern for each stimulus, and can generalize from the stimulus sample to a stimulus population. Computational models that mimic brain information processing predict responses from stimuli. The reverse transform can be modeled to reconstruct stimuli from responses. Stimulus reconstruction is a challenging feat of engineering, but the implications of the results for brain theory are not always clear. Exploratory pattern analyses complement the confirmatory approaches mentioned so far and can reveal strong, unexpected effects that might be missed when testing only a restricted set of predefined hypotheses.
Collapse
|
428
|
Grossberg S, Markowitz J, Cao Y. On the road to invariant recognition: explaining tradeoff and morph properties of cells in inferotemporal cortex using multiple-scale task-sensitive attentive learning. Neural Netw 2011; 24:1036-49. [PMID: 21665428 DOI: 10.1016/j.neunet.2011.04.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Revised: 03/30/2011] [Accepted: 04/05/2011] [Indexed: 11/30/2022]
Abstract
Visual object recognition is an essential accomplishment of advanced brains. Object recognition needs to be tolerant, or invariant, with respect to changes in object position, size, and view. In monkeys and humans, a key area for recognition is the anterior inferotemporal cortex (ITa). Recent neurophysiological data show that ITa cells with high object selectivity often have low position tolerance. We propose a neural model whose cells learn to simulate this tradeoff, as well as ITa responses to image morphs, while explaining how invariant recognition properties may arise in stages due to processes across multiple cortical areas. These processes include the cortical magnification factor, multiple receptive field sizes, and top-down attentive matching and learning properties that may be tuned by task requirements to attend to either concrete or abstract visual features with different levels of vigilance. The model predicts that data from the tradeoff and image morph tasks emerge from different levels of vigilance in the animals performing them. This result illustrates how different vigilance requirements of a task may change the course of category learning, notably the critical features that are attended and incorporated into learned category prototypes. The model outlines a path for developing an animal model of how defective vigilance control can lead to symptoms of various mental disorders, such as autism and amnesia.
Collapse
Affiliation(s)
- Stephen Grossberg
- Department of Cognitive and Neural Systems, Center of Excellence for Learning in Education, Science and Technology, Boston University, 677 Beacon Street, Boston, MA 02215, USA
| | | | | |
Collapse
|
429
|
Cao Y, Grossberg S, Markowitz J. How does the brain rapidly learn and reorganize view-invariant and position-invariant object representations in the inferotemporal cortex? Neural Netw 2011; 24:1050-61. [PMID: 21596523 DOI: 10.1016/j.neunet.2011.04.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2010] [Revised: 04/10/2011] [Accepted: 04/12/2011] [Indexed: 11/18/2022]
Abstract
All primates depend for their survival on being able to rapidly learn about and recognize objects. Objects may be visually detected at multiple positions, sizes, and viewpoints. How does the brain rapidly learn and recognize objects while scanning a scene with eye movements, without causing a combinatorial explosion in the number of cells that are needed? How does the brain avoid the problem of erroneously classifying parts of different objects together at the same or different positions in a visual scene? In monkeys and humans, a key area for such invariant object category learning and recognition is the inferotemporal cortex (IT). A neural model is proposed to explain how spatial and object attention coordinate the ability of IT to learn invariant category representations of objects that are seen at multiple positions, sizes, and viewpoints. The model clarifies how interactions within a hierarchy of processing stages in the visual brain accomplish this. These stages include the retina, lateral geniculate nucleus, and cortical areas V1, V2, V4, and IT in the brain's What cortical stream, as they interact with spatial attention processes within the parietal cortex of the Where cortical stream. The model builds upon the ARTSCAN model, which proposed how view-invariant object representations are generated. The positional ARTSCAN (pARTSCAN) model proposes how the following additional processes in the What cortical processing stream also enable position-invariant object representations to be learned: IT cells with persistent activity, and a combination of normalizing object category competition and a view-to-object learning law which together ensure that unambiguous views have a larger effect on object recognition than ambiguous views. The model explains how such invariant learning can be fooled when monkeys, or other primates, are presented with an object that is swapped with another object during eye movements to foveate the original object. The swapping procedure is predicted to prevent the reset of spatial attention, which would otherwise keep the representations of multiple objects from being combined by learning. Li and DiCarlo (2008) have presented neurophysiological data from monkeys showing how unsupervised natural experience in a target swapping experiment can rapidly alter object representations in IT. The model quantitatively simulates the swapping data by showing how the swapping procedure fools the spatial attention mechanism. More generally, the model provides a unifying framework, and testable predictions in both monkeys and humans, for understanding object learning data using neurophysiological methods in monkeys, and spatial attention, episodic learning, and memory retrieval data using functional imaging methods in humans.
Collapse
Affiliation(s)
- Yongqiang Cao
- Center for Adaptive Systems, Department of Cognitive and Neural Systems, Center of Excellence for Learning in Education, Science, and Technology, Boston University, 677 Beacon Street, Boston, MA 02215, USA
| | | | | |
Collapse
|
430
|
Sigala R, Logothetis NK, Rainer G. Own-species bias in the representations of monkey and human face categories in the primate temporal lobe. J Neurophysiol 2011; 105:2740-52. [PMID: 21430277 DOI: 10.1152/jn.00882.2010] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Face categorization is fundamental for social interactions of primates and is crucial for determining conspecific groups and mate choice. Current evidence suggests that faces are processed by a set of well-defined brain areas. What is the fine structure of this representation, and how is it affected by visual experience? Here, we investigated the neural representations of human and monkey face categories using realistic three-dimensional morphed faces that spanned the continuum between the two species. We found an "own-species" bias in the categorical representation of human and monkey faces in the monkey inferior temporal cortex at the level of single neurons as well as in the population response analyzed using a pattern classifier. For monkey and human subjects, we also found consistent psychophysical evidence indicative of an own-species bias in face perception. For both behavioural and neural data, the species boundary was shifted away from the center of the morph continuum, for each species toward their own face category. This shift may reflect visual expertise for members of one's own species and be a signature of greater brain resources assigned to the processing of privileged categories. Such boundary shifts may thus serve as sensitive and robust indicators of encoding strength for categories of interest.
Collapse
Affiliation(s)
- R Sigala
- Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| | | | | |
Collapse
|
431
|
Dura-Bernal S, Wennekers T, Denham SL. The Role of Feedback in a Hierarchical Model of Object Perception. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2011; 718:165-79. [DOI: 10.1007/978-1-4614-0164-3_14] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
432
|
Perceptual learning in Vision Research. Vision Res 2010; 51:1552-66. [PMID: 20974167 DOI: 10.1016/j.visres.2010.10.019] [Citation(s) in RCA: 315] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2010] [Revised: 10/15/2010] [Accepted: 10/15/2010] [Indexed: 12/31/2022]
Abstract
Reports published in Vision Research during the late years of the 20th century described surprising effects of long-term sensitivity improvement with some basic visual tasks as a result of training. These improvements, found in adult human observers, were highly specific to simple visual features, such as location in the visual field, spatial-frequency, local and global orientation, and in some cases even the eye of origin. The results were interpreted as arising from the plasticity of sensory brain regions that display those features of specificity within their constituting neuronal subpopulations. A new view of the visual cortex has emerged, according to which a degree of plasticity is retained at adult age, allowing flexibility in acquiring new visual skills when the need arises. Although this "sensory plasticity" interpretation is often questioned, it is commonly believed that learning has access to detailed low-level visual representations residing within the visual cortex. More recent studies during the last decade revealed the conditions needed for learning and the conditions under which learning can be generalized across stimuli and tasks. The results are consistent with an account of perceptual learning according to which visual processing is remodeled by the brain, utilizing sensory information acquired during task performance. The stability of the visual system is viewed as an adaptation to a stable environment and instances of perceptual learning as a reaction of the brain to abrupt changes in the environment. Training on a restricted stimulus set may lead to perceptual overfitting and over-specificity. The systemic methodology developed for perceptual learning, and the accumulated knowledge, allows us to explore issues related to learning and memory in general, such as learning rules, reinforcement, memory consolidation, and neural rehabilitation. A persistent open question is the neuro-anatomical substrate underlying these learning effects.
Collapse
|
433
|
Soto FA, Wasserman EA. Missing the forest for the trees: object-discrimination learning blocks categorization learning. Psychol Sci 2010; 21:1510-7. [PMID: 20817911 PMCID: PMC2953592 DOI: 10.1177/0956797610382125] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Growing evidence indicates that error-driven associative learning underlies the ability of nonhuman animals to categorize natural images. This study explored whether this form of learning might also be at play when people categorize natural objects in photographs. Two groups of college students (a blocking group and a control group) were trained on a categorization task and then tested with novel photographs from each category; however, only the blocking group received pretraining on a task that required the discrimination of objects from the same category. Because of this earlier noncategorical discrimination learning, the blocking group performed well in the categorization task from the outset, and this strong initial performance reduced the likelihood of category learning driven by error. There was far less transfer of categorical responding during testing in the blocking group than in the control group; this finding suggests that learning the specific properties of each photographic image in pretraining blocked later learning of an open-ended category.
Collapse
Affiliation(s)
- Fabian A Soto
- Department of Psychology, University of Iowa, Iowa City, IA 52242, USA.
| | | |
Collapse
|
434
|
Blumberg J, Kreiman G. How cortical neurons help us see: visual recognition in the human brain. J Clin Invest 2010; 120:3054-63. [PMID: 20811161 PMCID: PMC2929717 DOI: 10.1172/jci42161] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Through a series of complex transformations, the pixel-like input to the retina is converted into rich visual perceptions that constitute an integral part of visual recognition. Multiple visual problems arise due to damage or developmental abnormalities in the cortex of the brain. Here, we provide an overview of how visual information is processed along the ventral visual cortex in the human brain. We discuss how neurophysiological recordings in macaque monkeys and in humans can help us understand the computations performed by visual cortex.
Collapse
Affiliation(s)
- Julie Blumberg
- Department of Ophthalmology, Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA.
Epilepsy Center, University Hospital Freiburg, Freiburg, Germany.
Center for Brain Science, Harvard University, Boston, Massachusetts, USA
| | - Gabriel Kreiman
- Department of Ophthalmology, Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA.
Epilepsy Center, University Hospital Freiburg, Freiburg, Germany.
Center for Brain Science, Harvard University, Boston, Massachusetts, USA
| |
Collapse
|
435
|
Feldman JA. Cognitive Science should be unified: comment on Griffiths et al. and McClelland et al. Trends Cogn Sci 2010; 14:341. [DOI: 10.1016/j.tics.2010.05.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2010] [Accepted: 05/25/2010] [Indexed: 11/24/2022]
|
436
|
Hu X, Zhang B. A Gaussian attractor network for memory and recognition with experience-dependent learning. Neural Comput 2010; 22:1333-57. [PMID: 20100070 DOI: 10.1162/neco.2010.02-09-957] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Attractor networks are widely believed to underlie the memory systems of animals across different species. Existing models have succeeded in qualitatively modeling properties of attractor dynamics, but their computational abilities often suffer from poor representations for realistic complex patterns, spurious attractors, low storage capacity, and difficulty in identifying attractive fields of attractors. We propose a simple two-layer architecture, gaussian attractor network, which has no spurious attractors if patterns to be stored are uncorrelated and can store as many patterns as the number of neurons in the output layer. Meanwhile the attractive fields can be precisely quantified and manipulated. Equipped with experience-dependent unsupervised learning strategies, the network can exhibit both discrete and continuous attractor dynamics. A testable prediction based on numerical simulations is that there exist neurons in the brain that can discriminate two similar stimuli at first but cannot after extensive exposure to physically intermediate stimuli. Inspired by this network, we found that adding some local feedbacks to a well-known hierarchical visual recognition model, HMAX, can enable the model to reproduce some recent experimental results related to high-level visual perception.
Collapse
Affiliation(s)
- Xiaolin Hu
- State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China.
| | | |
Collapse
|
437
|
Nagel KI, McLendon HM, Doupe AJ. Differential influence of frequency, timing, and intensity cues in a complex acoustic categorization task. J Neurophysiol 2010; 104:1426-37. [PMID: 20610781 DOI: 10.1152/jn.00028.2010] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Songbirds, which, like humans, learn complex vocalizations, provide an excellent model for the study of acoustic pattern recognition. Here we examined the role of three basic acoustic parameters in an ethologically relevant categorization task. Female zebra finches were first trained to classify songs as belonging to one of two males and then asked whether they could generalize this knowledge to songs systematically altered with respect to frequency, timing, or intensity. Birds' performance on song categorization fell off rapidly when songs were altered in frequency or intensity, but they generalized well to songs that were changed in duration by >25%. Birds were not deaf to timing changes, however; they detected these tempo alterations when asked to discriminate between the same song played back at two different speeds. In addition, when birds were retrained with songs at many intensities, they could correctly categorize songs over a wide range of volumes. Thus although they can detect all these cues, birds attend less to tempo than to frequency or intensity cues during song categorization. These results are unexpected for several reasons: zebra finches normally encounter a wide range of song volumes but most failed to generalize across volumes in this task; males produce only slight variations in tempo, but females generalized widely over changes in song duration; and all three acoustic parameters are critical for auditory neurons. Thus behavioral data place surprising constraints on the relationship between previous experience, behavioral task, neural responses, and perception. We discuss implications for models of auditory pattern recognition.
Collapse
Affiliation(s)
- Katherine I Nagel
- Keck Center for Integrative Neuroscience, Department of Physiology, University of California, San Francisco, California, USA
| | | | | |
Collapse
|
438
|
Delorme A, Richard G, Fabre-Thorpe M. Key visual features for rapid categorization of animals in natural scenes. Front Psychol 2010; 1:21. [PMID: 21607075 PMCID: PMC3095379 DOI: 10.3389/fpsyg.2010.00021] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Accepted: 05/26/2010] [Indexed: 11/13/2022] Open
Abstract
In speeded categorization tasks, decisions could be based on diagnostic target features or they may need the activation of complete representations of the object. Depending on task requirements, the priming of feature detectors through top-down expectation might lower the threshold of selective units or speed up the rate of information accumulation. In the present paper, 40 subjects performed a rapid go/no-go animal/non-animal categorization task with 400 briefly flashed natural scenes to study how performance depends on physical scene characteristics, target configuration, and the presence or absence of diagnostic animal features. Performance was evaluated both in terms of accuracy and speed and d' curves were plotted as a function of reaction time (RT). Such d' curves give an estimation of the processing dynamics for studied features and characteristics over the entire subject population. Global image characteristics such as color and brightness do not critically influence categorization speed, although they slightly influence accuracy. Global critical factors include the presence of a canonical animal posture and animal/background size ratio suggesting the role of coarse global form. Performance was best for both accuracy and speed, when the animal was in a typical posture and when it occupied about 20-30% of the image. The presence of diagnostic animal features was another critical factor. Performance was significantly impaired both in accuracy (drop 3.3-7.5%) and speed (median RT increase 7-16 ms) when diagnostic animal parts (eyes, mouth, and limbs) were missing. Such animal features were shown to influence performance very early when only 15-25% of the response had been produced. In agreement with other experimental and modeling studies, our results support fast diagnostic recognition of animals based on key intermediate features and priming based on the subject's expertise.
Collapse
Affiliation(s)
- Arnaud Delorme
- Université de Toulouse, Université Paul Sabatier, Centre de Recherche Cerveau et CognitionToulouse, France
- Centre National de la Recherche Scientifique, Centre de Recherche Cerveau et CognitionToulouse, France
| | - Ghislaine Richard
- Université de Toulouse, Université Paul Sabatier, Centre de Recherche Cerveau et CognitionToulouse, France
- Centre National de la Recherche Scientifique, Centre de Recherche Cerveau et CognitionToulouse, France
| | - Michele Fabre-Thorpe
- Université de Toulouse, Université Paul Sabatier, Centre de Recherche Cerveau et CognitionToulouse, France
- Centre National de la Recherche Scientifique, Centre de Recherche Cerveau et CognitionToulouse, France
| |
Collapse
|
439
|
Continuous transformation learning of translation invariant representations. Exp Brain Res 2010; 204:255-70. [PMID: 20544186 DOI: 10.1007/s00221-010-2309-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2009] [Accepted: 05/21/2010] [Indexed: 01/24/2023]
Abstract
We show that spatial continuity can enable a network to learn translation invariant representations of objects by self-organization in a hierarchical model of cortical processing in the ventral visual system. During 'continuous transformation learning', the active synapses from each overlapping transform are associatively modified onto the set of postsynaptic neurons. Because other transforms of the same object overlap with previously learned exemplars, a common set of postsynaptic neurons is activated by the new transforms, and learning of the new active inputs onto the same postsynaptic neurons is facilitated. We show that the transforms must be close for this to occur; that the temporal order of presentation of each transformed image during training is not crucial for learning to occur; that relatively large numbers of transforms can be learned; and that such continuous transformation learning can be usefully combined with temporal trace training.
Collapse
|
440
|
Neuromorphic sensory systems. Curr Opin Neurobiol 2010; 20:288-95. [DOI: 10.1016/j.conb.2010.03.007] [Citation(s) in RCA: 210] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Revised: 03/22/2010] [Accepted: 03/24/2010] [Indexed: 11/17/2022]
|
441
|
Eriksson D, Valentiniene S, Papaioannou S. Relating information, encoding and adaptation: decoding the population firing rate in visual areas 17/18 in response to a stimulus transition. PLoS One 2010; 5:e10327. [PMID: 20436907 PMCID: PMC2860500 DOI: 10.1371/journal.pone.0010327] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2009] [Accepted: 03/24/2010] [Indexed: 11/18/2022] Open
Abstract
Neurons in the primary visual cortex typically reach their highest firing rate after an abrupt image transition. Since the mutual information between the firing rate and the currently presented image is largest during this early firing period it is tempting to conclude this early firing encodes the current image. This view is, however, made more complicated by the fact that the response to the current image is dependent on the preceding image. Therefore we hypothesize that neurons encode a combination of current and previous images, and that the strength of the current image relative to the previous image changes over time. The temporal encoding is interesting, first, because neurons are, at different time points, sensitive to different features such as luminance, edges and textures; second, because the temporal evolution provides temporal constraints for deciphering the instantaneous population activity. To study the temporal evolution of the encoding we presented a sequence of 250 ms stimulus patterns during multiunit recordings in areas 17 and 18 of the anaesthetized ferret. Using a novel method we decoded the pattern given the instantaneous population-firing rate. Following a stimulus transition from stimulus A to B the decoded stimulus during the first 90ms was more correlated with the difference between A and B (B-A) than with B alone. After 90ms the decoded stimulus was more correlated with stimulus B than with B-A. Finally we related our results to information measures of previous (B) and current stimulus (A). Despite that the initial transient conveys the majority of the stimulus-related information; we show that it actually encodes a difference image which can be independent of the stimulus. Only later on, spikes gradually encode the stimulus more exclusively.
Collapse
Affiliation(s)
- David Eriksson
- Cortical Function and Dynamics, Max Planck Institute for Brain Research, Frankfurt, Germany.
| | | | | |
Collapse
|
442
|
Goris RLT, de Beeck HPO. Invariance in visual object recognition requires training: a computational argument. Front Neurosci 2010; 4:71. [PMID: 20589239 PMCID: PMC2920526 DOI: 10.3389/neuro.01.012.2010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2009] [Accepted: 12/17/2009] [Indexed: 11/13/2022] Open
Abstract
Visual object recognition is remarkably accurate and robust, yet its neurophysiological underpinnings are poorly understood. Single cells in brain regions thought to underlie object recognition code for many stimulus aspects, which poses a limit on their invariance. Combining the responses of multiple non-invariant neurons via weighted linear summation offers an optimal decoding strategy, which may be able to achieve invariant object recognition. However, because object identification is essentially parameter optimization in this model, the characteristics of the identification task trained to perform are critically important. If this task does not require invariance, a neural population-code is inherently more selective but less tolerant than the single-neurons constituting the population. Nevertheless, tolerance can be learned - provided that it is trained for - at the cost of selectivity. We argue that this model is an interesting null-hypothesis to compare behavioral results with and conclude that it may explain several experimental findings.
Collapse
Affiliation(s)
- Robbe L T Goris
- Laboratory of Experimental Psychology, University of Leuven Leuven, Belgium
| | | |
Collapse
|
443
|
Soto FA, Wasserman EA. Error-driven learning in visual categorization and object recognition: a common-elements model. Psychol Rev 2010; 117:349-81. [PMID: 20438230 PMCID: PMC2930356 DOI: 10.1037/a0018695] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A wealth of empirical evidence has now accumulated concerning animals' categorizing photographs of real-world objects. Although these complex stimuli have the advantage of fostering rapid category learning, they are difficult to manipulate experimentally and to represent in formal models of behavior. We present a solution to the representation problem in modeling natural categorization by adopting a common-elements approach. A common-elements stimulus representation, in conjunction with an error-driven learning rule, can explain a wide range of experimental outcomes in animals' categorization of naturalistic images. The model also generates novel predictions that can be empirically tested. We report 2 experiments that show how entirely hypothetical representational elements can nevertheless be subject to experimental manipulation. The results represent the first evidence of error-driven learning in natural image categorization, and they support the idea that basic associative processes underlie this important form of animal cognition.
Collapse
Affiliation(s)
- Fabian A Soto
- Department of Psychology, University of Iowa, Iowa City, IA 52242, USA.
| | | |
Collapse
|
444
|
Abstract
Recordings from single cells in human medial temporal cortex confirm that sensory processing forms explicit neural representations of the objects and concepts needed for a causal model of the world.
Collapse
Affiliation(s)
- Peter Földiák
- School of Psychology, University of St Andrews, St Andrews, KY16 9JP, UK.
| |
Collapse
|
445
|
Fiser J, Berkes P, Orbán G, Lengyel M. Statistically optimal perception and learning: from behavior to neural representations. Trends Cogn Sci 2010; 14:119-30. [PMID: 20153683 PMCID: PMC2939867 DOI: 10.1016/j.tics.2010.01.003] [Citation(s) in RCA: 392] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Revised: 01/06/2010] [Accepted: 01/08/2010] [Indexed: 10/19/2022]
Abstract
Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and re-evaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty.
Collapse
Affiliation(s)
- József Fiser
- National Volen Center for Complex Systems, Brandeis University, Volen 208/MS 013, Waltham, MA 02454, USA.
| | | | | | | |
Collapse
|
446
|
Willmore BDB, Prenger RJ, Gallant JL. Neural representation of natural images in visual area V2. J Neurosci 2010; 30:2102-14. [PMID: 20147538 PMCID: PMC2994536 DOI: 10.1523/jneurosci.4099-09.2010] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2009] [Revised: 12/04/2009] [Accepted: 12/15/2009] [Indexed: 11/21/2022] Open
Abstract
Area V2 is a major visual processing stage in mammalian visual cortex, but little is currently known about how V2 encodes information during natural vision. To determine how V2 represents natural images, we used a novel nonlinear system identification approach to obtain quantitative estimates of spatial tuning across a large sample of V2 neurons. We compared these tuning estimates with those obtained in area V1, in which the neural code is relatively well understood. We find two subpopulations of neurons in V2. Approximately one-half of the V2 neurons have tuning that is similar to V1. The other half of the V2 neurons are selective for complex features such as those that occur in natural scenes. These neurons are distinguished from V1 neurons mainly by the presence of stronger suppressive tuning. Selectivity in these neurons therefore reflects a balance between excitatory and suppressive tuning for specific features. These results provide a new perspective on how complex shape selectivity arises, emphasizing the role of suppressive tuning in determining stimulus selectivity in higher visual cortex.
Collapse
Affiliation(s)
| | - Ryan J. Prenger
- Physics Department, University of California, Berkeley, Berkeley, California 94720-1650
| | - Jack L. Gallant
- Psychology Department
- Helen Wills Neuroscience Institute, and
| |
Collapse
|
447
|
Abstract
Reentrant processing has been proposed as a critical mechanism in feature binding. To test this claim, participants were shown arrays of six pairs of crossed vertical and horizontal bars. In each pair, one bar was white; one was red, green, or blue. Identifying the orientation, but not the color, of the nonwhite bar in the target item required correct binding. Four dots appeared around one of the items (the target) and either disappeared with it or persisted for 300 ms after the array disappeared. This type of trailing mask is thought to interfere with target processing by disrupting reentry. Consistent with the hypothesis that binding requires reentrant processing, the trailing mask significantly reduced the accuracy of orientation but not color judgments. In a control condition, when the white bar was omitted, binding was no longer required, and both color and orientation were accurately reported.
Collapse
Affiliation(s)
- Seth Bouvier
- Department of Psychology, Princeton University, Princeton, NJ 08540, USA.
| | | |
Collapse
|
448
|
Honey CJ, Thivierge JP, Sporns O. Can structure predict function in the human brain? Neuroimage 2010; 52:766-76. [PMID: 20116438 DOI: 10.1016/j.neuroimage.2010.01.071] [Citation(s) in RCA: 444] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2009] [Revised: 01/17/2010] [Accepted: 01/21/2010] [Indexed: 01/07/2023] Open
Abstract
Over the past decade, scientific interest in the properties of large-scale spontaneous neural dynamics has intensified. Concurrently, novel technologies have been developed for characterizing the connective anatomy of intra-regional circuits and inter-regional fiber pathways. It will soon be possible to build computational models that incorporate these newly detailed structural network measurements to make predictions of neural dynamics at multiple scales. Here, we review the practicality and the value of these efforts, while at the same time considering in which cases and to what extent structure does determine neural function. Studies of the healthy brain, of neural development, and of pathology all yield examples of direct correspondences between structural linkage and dynamical correlation. Theoretical arguments further support the notion that brain network topology and spatial embedding should strongly influence network dynamics. Although future models will need to be tested more quantitatively and against a wider range of empirical neurodynamic features, our present large-scale models can already predict the macroscopic pattern of dynamic correlation across the brain. We conclude that as neuroscience grapples with datasets of increasing completeness and complexity, and attempts to relate the structural and functional architectures discovered at different neural scales, the value of computational modeling will continue to grow.
Collapse
|
449
|
Dehaene S, Nakamura K, Jobert A, Kuroki C, Ogawa S, Cohen L. Why do children make mirror errors in reading? Neural correlates of mirror invariance in the visual word form area. Neuroimage 2010; 49:1837-48. [PMID: 19770045 DOI: 10.1016/j.neuroimage.2009.09.024] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Revised: 09/04/2009] [Accepted: 09/15/2009] [Indexed: 01/18/2023] Open
|
450
|
Tyler CW, Likova LT. An algebra for the analysis of object encoding. Neuroimage 2009; 50:1243-50. [PMID: 20025978 DOI: 10.1016/j.neuroimage.2009.10.091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Revised: 09/29/2009] [Accepted: 10/08/2009] [Indexed: 10/20/2022] Open
Abstract
The encoding of the objects from the world around us is one of the major topics of cognitive psychology, yet the principles of object coding in the human brain remain unresolved. Beyond referring to the particular features commonly associated with objects, our ability to categorize and discuss objects in detailed linguistic propositions implies that we have access to generic concepts of each object category with well-specified boundaries between them. Consideration of the nature of generic object concepts reveals that they must have the structure of a probabilistic list array specifying the Bayesian prior on all possible features that the object can possess, together with mutual covariance matrices among the features. Generic object concepts must also be largely context independent for propositions to have communicable meaning. Although, there is good evidence for local feature processing in the occipital lobe and specific responses for a few basic object categories in the posterior temporal lobe, the encoding of the generic object concepts remains obscure. We analyze the conceptual underpinnings of the study of object encoding, draw some necessary clarifications in relation to its modality-specific and amodal aspects, and propose an analytic algebra with specific reference to functional Magnetic Resonance Imaging approaches to the issue of how generic (amodal) object concepts are encoded in the human brain.
Collapse
|