451
|
Kriegeskorte N. Relating Population-Code Representations between Man, Monkey, and Computational Models. Front Neurosci 2009; 3:363-73. [PMID: 20198153 PMCID: PMC2796920 DOI: 10.3389/neuro.01.035.2009] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2009] [Accepted: 09/20/2009] [Indexed: 11/13/2022] Open
Abstract
Perceptual and cognitive content is thought to be represented in the brain by patterns of activity across populations of neurons. In order to test whether a computational model can explain a given population code and whether corresponding codes in man and monkey convey the same information, we need to quantitatively relate population-code representations. Here I give a brief introduction to representational similarity analysis, a particular approach to this problem. A population code is characterized by a representational dissimilarity matrix (RDM), which contains a dissimilarity for each pair of activity patterns elicited by a given stimulus set. The RDM encapsulates which distinctions the representation emphasizes and which it deemphasizes. By analyzing correlations between RDMs we can test models and compare different species. Moreover, we can study how representations are transformed across stages of processing and how they relate to behavioral measures of object similarity. We use an example from object vision to illustrate the method's potential to bridge major divides that have hampered progress in systems neuroscience.
Collapse
|
452
|
Riesenhuber M, Wolff BS. Task effects, performance levels, features, configurations, and holistic face processing: a reply to Rossion. Acta Psychol (Amst) 2009; 132:286-92. [PMID: 19665104 PMCID: PMC2788156 DOI: 10.1016/j.actpsy.2009.07.004] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2009] [Revised: 07/08/2009] [Accepted: 07/10/2009] [Indexed: 11/23/2022] Open
Abstract
A recent article in Acta Psychologica ("Picture-plane inversion leads to qualitative changes of face perception" by Rossion [Rossion, B. (2008). Picture-plane inversion leads to qualitative changes of face perception. Acta Psychologica (Amst), 128(2), 274-289]) criticized several aspects of an earlier paper of ours [Riesenhuber, M., Jarudi, I., Gilad, S., & Sinha, P. (2004). Face processing in humans is compatible with a simple shape-based model of vision. Proceedings of the Royal Society of London B (Supplements), 271, S448-S450]. We here address Rossion's criticisms and correct some misunderstandings. To frame the discussion, we first review our previously presented computational model of face recognition in cortex [Jiang, X., Rosen, E., Zeffiro, T., Vanmeter, J., Blanz, V., & Riesenhuber, M. (2006). Evaluation of a shape-based model of human face discrimination using FMRI and behavioral techniques. Neuron, 50(1), 159-172] that provides a concrete biologically plausible computational substrate for holistic coding, namely a neural representation learned for upright faces, in the spirit of the original simple-to-complex hierarchical model of vision by Hubel and Wiesel. We show that Rossion's and others' data support the model, and that there is actually a convergence of views on the mechanisms underlying face recognition, in particular regarding holistic processing.
Collapse
Affiliation(s)
- Maximilian Riesenhuber
- Department of Neuroscience, Georgetown University Medical Center, 3970 Reservoir Road NW, Washington, DC 20007, USA.
| | | |
Collapse
|
453
|
Gaspar CM, Rousselet GA. How do amplitude spectra influence rapid animal detection? Vision Res 2009; 49:3001-12. [PMID: 19818804 DOI: 10.1016/j.visres.2009.09.021] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2009] [Revised: 09/23/2009] [Accepted: 09/25/2009] [Indexed: 10/20/2022]
Abstract
Amplitude spectra might provide information for natural scene classification. Amplitude does play a role in animal detection because accuracy suffers when amplitude is normalized. However, this effect could be due to an interaction between phase and amplitude, rather than to a loss of amplitude-only information. We used an amplitude-swapping paradigm to establish that animal detection is partly based on an interaction between phase and amplitude. A difference in false alarms for two subsets of our distractor stimuli suggests that the classification of scene environment (man-made versus natural) may also be based on an interaction between phase and amplitude. Examples of interaction between amplitude and phase are discussed.
Collapse
Affiliation(s)
- Carl M Gaspar
- Centre for Cognitive Neuroimaging (CCNi), Department of Psychology, University of Glasgow, G12 8QB Glasgow, UK
| | | |
Collapse
|
454
|
Feldman J. Ecological expected utility and the mythical neural code. Cogn Neurodyn 2009; 4:25-35. [PMID: 19731084 PMCID: PMC2820693 DOI: 10.1007/s11571-009-9090-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2009] [Revised: 08/12/2009] [Accepted: 08/18/2009] [Indexed: 01/07/2023] Open
Abstract
Neural spikes are an evolutionarily ancient innovation that remains nature’s unique mechanism for rapid, long distance information transfer. It is now known that neural spikes sub serve a wide variety of functions and essentially all of the basic questions about the communication role of spikes have been answered. Current efforts focus on the neural communication of probabilities and utility values involved in decision making. Significant progress is being made, but many framing issues remain. One basic problem is that the metaphor of a neural code suggests a communication network rather than a recurrent computational system like the real brain. We propose studying the various manifestations of neural spike signaling as adaptations that optimize a utility function called ecological expected utility.
Collapse
Affiliation(s)
- Jerome Feldman
- UC Berkeley and International Computer Science Institute, Berkeley, CA USA
| |
Collapse
|
455
|
King AJ, Nelken I. Unraveling the principles of auditory cortical processing: can we learn from the visual system? Nat Neurosci 2009; 12:698-701. [PMID: 19471268 PMCID: PMC3657701 DOI: 10.1038/nn.2308] [Citation(s) in RCA: 108] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Studies of auditory cortex are often driven by the assumption, derived from our better understanding of visual cortex, that basic physical properties of sounds are represented there before being used by higher-level areas for determining sound-source identity and location. However, we only have a limited appreciation of what the cortex adds to the extensive subcortical processing of auditory information, which can account for many perceptual abilities. This is partly because of the approaches that have dominated the study of auditory cortical processing to date, and future progress will unquestionably profit from the adoption of methods that have provided valuable insights into the neural basis of visual perception. At the same time, we propose that there are unique operating principles employed by the auditory cortex that relate largely to the simultaneous and sequential processing of previously derived features and that therefore need to be studied and understood in their own right.
Collapse
Affiliation(s)
- Andrew J King
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK
| | - Israel Nelken
- Department of Neurobiology, The Silberman Institute of Life Sciences and the Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem, Israel
| |
Collapse
|
456
|
|
457
|
A rodent model for the study of invariant visual object recognition. Proc Natl Acad Sci U S A 2009; 106:8748-53. [PMID: 19429704 DOI: 10.1073/pnas.0811583106] [Citation(s) in RCA: 108] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The human visual system is able to recognize objects despite tremendous variation in their appearance on the retina resulting from variation in view, size, lighting, etc. This ability--known as "invariant" object recognition--is central to visual perception, yet its computational underpinnings are poorly understood. Traditionally, nonhuman primates have been the animal model-of-choice for investigating the neuronal substrates of invariant recognition, because their visual systems closely mirror our own. Meanwhile, simpler and more accessible animal models such as rodents have been largely overlooked as possible models of higher-level visual functions, because their brains are often assumed to lack advanced visual processing machinery. As a result, little is known about rodents' ability to process complex visual stimuli in the face of real-world image variation. In the present work, we show that rats possess more advanced visual abilities than previously appreciated. Specifically, we trained pigmented rats to perform a visual task that required them to recognize objects despite substantial variation in their appearance, due to changes in size, view, and lighting. Critically, rats were able to spontaneously generalize to previously unseen transformations of learned objects. These results provide the first systematic evidence for invariant object recognition in rats and argue for an increased focus on rodents as models for studying high-level visual processing.
Collapse
|
458
|
Liu H, Agam Y, Madsen JR, Kreiman G. Timing, timing, timing: fast decoding of object information from intracranial field potentials in human visual cortex. Neuron 2009; 62:281-90. [PMID: 19409272 PMCID: PMC2921507 DOI: 10.1016/j.neuron.2009.02.025] [Citation(s) in RCA: 270] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Revised: 02/09/2009] [Accepted: 02/24/2009] [Indexed: 12/20/2022]
Abstract
The difficulty of visual recognition stems from the need to achieve high selectivity while maintaining robustness to object transformations within hundreds of milliseconds. Theories of visual recognition differ in whether the neuronal circuits invoke recurrent feedback connections or not. The timing of neurophysiological responses in visual cortex plays a key role in distinguishing between bottom-up and top-down theories. Here, we quantified at millisecond resolution the amount of visual information conveyed by intracranial field potentials from 912 electrodes in 11 human subjects. We could decode object category information from human visual cortex in single trials as early as 100 ms poststimulus. Decoding performance was robust to depth rotation and scale changes. The results suggest that physiological activity in the temporal lobe can account for key properties of visual recognition. The fast decoding in single trials is compatible with feedforward theories and provides strong constraints for computational models of human vision.
Collapse
Affiliation(s)
- Hesheng Liu
- Department of Neuroscience and Ophthalmology, Children's Hospital Boston, Harvard Medical School, Boston, MA 02115, USA
| | | | | | | |
Collapse
|
459
|
Mundhenk TN, Einhäuser W, Itti L. Automatic computation of an image's statistical surprise predicts performance of human observers on a natural image detection task. Vision Res 2009; 49:1620-37. [PMID: 19351543 DOI: 10.1016/j.visres.2009.03.025] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Revised: 03/28/2009] [Accepted: 03/30/2009] [Indexed: 11/28/2022]
Abstract
To understand the neural mechanisms underlying humans' exquisite ability at processing briefly flashed visual scenes, we present a computer model that predicts human performance in a Rapid Serial Visual Presentation (RSVP) task. The model processes streams of natural scene images presented at a rate of 20Hz to human observers, and attempts to predict when subjects will correctly detect if one of the presented images contains an animal (target). We find that metrics of Bayesian surprise, which models both spatial and temporal aspects of human attention, differ significantly between RSVP sequences on which subjects will detect the target (easy) and those on which subjects miss the target (hard). Extending beyond previous studies, we here assess the contribution of individual image features including color opponencies and Gabor edges. We also investigate the effects of the spatial location of surprise in the visual field, rather than only using a single aggregate measure. A physiologically plausible feed-forward system, which optimally combines spatial and temporal surprise metrics for all features, predicts performance in 79.5% of human trials correctly. This is significantly better than a baseline maximum likelihood Bayesian model (71.7%). We can see that attention as measured by surprise, accounts for a large proportion of observer performance in RSVP. The time course of surprise in different feature types (channels) provides additional quantitative insight in rapid bottom-up processes of human visual attention and recognition, and illuminates the phenomenon of attentional blink and lag-1 sparing. Surprise also reveals classical Type-B like masking effects intrinsic in natural image RSVP sequences. We summarize these with the discussion of a multistage model of visual attention.
Collapse
Affiliation(s)
- T Nathan Mundhenk
- Department of Computer Science, University of Southern California, Hedco Neuroscience Building, HNB 10 Los Angeles, CA 90089-2520, USA.
| | | | | |
Collapse
|
460
|
Westphal G, Würtz RP. Combining feature- and correspondence-based methods for visual object recognition. Neural Comput 2009; 21:1952-89. [PMID: 19292649 DOI: 10.1162/neco.2009.12-07-675] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We present an object recognition system built on a combination of feature- and correspondence-based pattern recognizers. The feature-based part, called preselection network, is a single-layer feedforward network weighted with the amount of information contributed by each feature to the decision at hand. For processing arbitrary objects, we employ small, regular graphs whose nodes are attributed with Gabor amplitudes, termed parquet graphs. The preselection network can quickly rule out most irrelevant matches and leaves only the ambiguous cases, so-called model candidates, to be verified by a rudimentary version of elastic graph matching, a standard correspondence-based technique for face and object recognition. According to the model, graphs are constructed that describe the object in the input image well. We report the results of experiments on standard databases for object recognition. The method achieved high recognition rates on identity and pose. Unlike many other models, it can also cope with varying background, multiple objects, and partial occlusion.
Collapse
Affiliation(s)
- Günter Westphal
- Mobile Vision Systems, Blücherstrasse 19, D-46397 Bocholt, Germany
| | | |
Collapse
|
461
|
Fox CW, Mitchinson B, Pearson MJ, Pipe AG, Prescott TJ. Contact type dependency of texture classification in a whiskered mobile robot. Auton Robots 2009. [DOI: 10.1007/s10514-009-9109-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
462
|
Greene MR, Oliva A. Recognition of natural scenes from global properties: seeing the forest without representing the trees. Cogn Psychol 2009; 58:137-76. [PMID: 18762289 PMCID: PMC2759758 DOI: 10.1016/j.cogpsych.2008.06.001] [Citation(s) in RCA: 260] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2007] [Indexed: 11/25/2022]
Abstract
Human observers are able to rapidly and accurately categorize natural scenes, but the representation mediating this feat is still unknown. Here we propose a framework of rapid scene categorization that does not segment a scene into objects and instead uses a vocabulary of global, ecological properties that describe spatial and functional aspects of scene space (such as navigability or mean depth). In Experiment 1, we obtained ground truth rankings on global properties for use in Experiments 2-4. To what extent do human observers use global property information when rapidly categorizing natural scenes? In Experiment 2, we found that global property resemblance was a strong predictor of both false alarm rates and reaction times in a rapid scene categorization experiment. To what extent is global property information alone a sufficient predictor of rapid natural scene categorization? In Experiment 3, we found that the performance of a classifier representing only these properties is indistinguishable from human performance in a rapid scene categorization task in terms of both accuracy and false alarms. To what extent is this high predictability unique to a global property representation? In Experiment 4, we compared two models that represent scene object information to human categorization performance and found that these models had lower fidelity at representing the patterns of performance than the global property model. These results provide support for the hypothesis that rapid categorization of natural scenes may not be mediated primarily though objects and parts, but also through global properties of structure and affordance.
Collapse
Affiliation(s)
- Michelle R Greene
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 77 Massachusetts Avenue 46-4078, Cambridge, MA 02139, USA.
| | | |
Collapse
|
463
|
|
464
|
Abstract
AbstractThe human visual system is remarkably tolerant to degradation in image resolution: human performance in scene categorization remains high no matter whether low-resolution images or multimegapixel images are used. This observation raises the question of how many pixels are required to form a meaningful representation of an image and identify the objects it contains. In this article, we show that very small thumbnail images at the spatial resolution of 32 × 32 color pixels provide enough information to identify the semantic category of real-world scenes. Most strikingly, this low resolution permits observers to report, with 80% accuracy, four to five of the objects that the scene contains, despite the fact that some of these objects are unrecognizable in isolation. The robustness of the information available at very low resolution for describing semantic content of natural images could be an important asset to explain the speed and efficiently at which the human brain comprehends the gist of visual scenes.
Collapse
|
465
|
|
466
|
Task-specific codes for face recognition: how they shape the neural representation of features for detection and individuation. PLoS One 2008; 3:e3978. [PMID: 19112516 PMCID: PMC2607027 DOI: 10.1371/journal.pone.0003978] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Accepted: 11/18/2008] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The variety of ways in which faces are categorized makes face recognition challenging for both synthetic and biological vision systems. Here we focus on two face processing tasks, detection and individuation, and explore whether differences in task demands lead to differences both in the features most effective for automatic recognition and in the featural codes recruited by neural processing. METHODOLOGY/PRINCIPAL FINDINGS Our study appeals to a computational framework characterizing the features representing object categories as sets of overlapping image fragments. Within this framework, we assess the extent to which task-relevant information differs across image fragments. Based on objective differences we find among task-specific representations, we test the sensitivity of the human visual system to these different face descriptions independently of one another. Both behavior and functional magnetic resonance imaging reveal effects elicited by objective task-specific levels of information. Behaviorally, recognition performance with image fragments improves with increasing task-specific information carried by different face fragments. Neurally, this sensitivity to the two tasks manifests as differential localization of neural responses across the ventral visual pathway. Fragments diagnostic for detection evoke larger neural responses than non-diagnostic ones in the right posterior fusiform gyrus and bilaterally in the inferior occipital gyrus. In contrast, fragments diagnostic for individuation evoke larger responses than non-diagnostic ones in the anterior inferior temporal gyrus. Finally, for individuation only, pattern analysis reveals sensitivity to task-specific information within the right "fusiform face area". CONCLUSIONS/SIGNIFICANCE OUR RESULTS DEMONSTRATE: 1) information diagnostic for face detection and individuation is roughly separable; 2) the human visual system is independently sensitive to both types of information; 3) neural responses differ according to the type of task-relevant information considered. More generally, these findings provide evidence for the computational utility and the neural validity of fragment-based visual representation and recognition.
Collapse
|
467
|
Kriegeskorte N, Mur M, Ruff DA, Kiani R, Bodurka J, Esteky H, Tanaka K, Bandettini PA. Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 2008; 60:1126-41. [PMID: 19109916 PMCID: PMC3143574 DOI: 10.1016/j.neuron.2008.10.043] [Citation(s) in RCA: 835] [Impact Index Per Article: 49.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2008] [Revised: 08/19/2008] [Accepted: 10/13/2008] [Indexed: 11/26/2022]
Abstract
Inferior temporal (IT) object representations have been intensively studied in monkeys and humans, but representations of the same particular objects have never been compared between the species. Moreover, IT's role in categorization is not well understood. Here, we presented monkeys and humans with the same images of real-world objects and measured the IT response pattern elicited by each image. In order to relate the representations between the species and to computational models, we compare response-pattern dissimilarity matrices. IT response patterns form category clusters, which match between man and monkey. The clusters correspond to animate and inanimate objects; within the animate objects, faces and bodies form subclusters. Within each category, IT distinguishes individual exemplars, and the within-category exemplar similarities also match between the species. Our findings suggest that primate IT across species may host a common code, which combines a categorical and a continuous representation of objects.
Collapse
Affiliation(s)
- Nikolaus Kriegeskorte
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, NIH, Bethesda, MD 20892-1148, USA.
| | | | | | | | | | | | | | | |
Collapse
|
468
|
Kriegeskorte N, Mur M, Bandettini P. Representational similarity analysis - connecting the branches of systems neuroscience. Front Syst Neurosci 2008; 2:4. [PMID: 19104670 PMCID: PMC2605405 DOI: 10.3389/neuro.06.004.2008] [Citation(s) in RCA: 1227] [Impact Index Per Article: 72.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 10/21/2008] [Indexed: 11/13/2022] Open
Abstract
A fundamental challenge for systems neuroscience is to quantitatively relate its three major branches of research: brain-activity measurement, behavioral measurement, and computational modeling. Using measured brain-activity patterns to evaluate computational network models is complicated by the need to define the correspondency between the units of the model and the channels of the brain-activity data, e.g., single-cell recordings or voxels from functional magnetic resonance imaging (fMRI). Similar correspondency problems complicate relating activity patterns between different modalities of brain-activity measurement (e.g., fMRI and invasive or scalp electrophysiology), and between subjects and species. In order to bridge these divides, we suggest abstracting from the activity patterns themselves and computing representational dissimilarity matrices (RDMs), which characterize the information carried by a given representation in a brain or model. Building on a rich psychological and mathematical literature on similarity analysis, we propose a new experimental and data-analytical framework called representational similarity analysis (RSA), in which multi-channel measures of neural activity are quantitatively related to each other and to computational theory and behavior by comparing RDMs. We demonstrate RSA by relating representations of visual objects as measured with fMRI in early visual cortex and the fusiform face area to computational models spanning a wide range of complexities. The RDMs are simultaneously related via second-level application of multidimensional scaling and tested using randomization and bootstrap techniques. We discuss the broad potential of RSA, including novel approaches to experimental design, and argue that these ideas, which have deep roots in psychology and neuroscience, will allow the integrated quantitative analysis of data from all three branches, thus contributing to a more unified systems neuroscience.
Collapse
Affiliation(s)
- Nikolaus Kriegeskorte
- Section on Functional Imaging Methods, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health Bethesda, MD, USA
| | | | | |
Collapse
|
469
|
Yamane Y, Carlson ET, Bowman KC, Wang Z, Connor CE. A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nat Neurosci 2008; 11:1352-60. [PMID: 18836443 PMCID: PMC2725445 DOI: 10.1038/nn.2202] [Citation(s) in RCA: 180] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2008] [Accepted: 09/03/2008] [Indexed: 11/09/2022]
Abstract
Previous investigations of the neural code for complex object shape have focused on two-dimensional pattern representation. This may be the primary mode for object vision given its simplicity and direct relation to the retinal image. In contrast, three-dimensional shape representation requires higher-dimensional coding derived from extensive computation. We found evidence for an explicit neural code for complex three-dimensional object shape. We used an evolutionary stimulus strategy and linear/nonlinear response models to characterize three-dimensional shape responses in macaque monkey inferotemporal cortex (IT). We found widespread tuning for three-dimensional spatial configurations of surface fragments characterized by their three-dimensional orientations and joint principal curvatures. Configural representation of three-dimensional shape could provide specific knowledge of object structure to support guidance of complex physical interactions and evaluation of object functionality and utility.
Collapse
Affiliation(s)
- Yukako Yamane
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | | | | | | | | |
Collapse
|
470
|
Early category-specific cortical activation revealed by visual stimulus inversion. PLoS One 2008; 3:e3503. [PMID: 18946504 PMCID: PMC2566817 DOI: 10.1371/journal.pone.0003503] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 09/12/2008] [Indexed: 11/19/2022] Open
Abstract
Visual categorization may already start within the first 100-ms after stimulus onset, in contrast with the long-held view that during this early stage all complex stimuli are processed equally and that category-specific cortical activation occurs only at later stages. The neural basis of this proposed early stage of high-level analysis is however poorly understood. To address this question we used magnetoencephalography and anatomically-constrained distributed source modeling to monitor brain activity with millisecond-resolution while subjects performed an orientation task on the upright and upside-down presented images of three different stimulus categories: faces, houses and bodies. Significant inversion effects were found for all three stimulus categories between 70-100-ms after picture onset with a highly category-specific cortical distribution. Differential responses between upright and inverted faces were found in well-established face-selective areas of the inferior occipital cortex and right fusiform gyrus. In addition, early category-specific inversion effects were found well beyond visual areas. Our results provide the first direct evidence that category-specific processing in high-level category-sensitive cortical areas already takes place within the first 100-ms of visual processing, significantly earlier than previously thought, and suggests the existence of fast category-specific neocortical routes in the human brain.
Collapse
|
471
|
Perceived shape similarity among unfamiliar objects and the organization of the human object vision pathway. J Neurosci 2008; 28:10111-23. [PMID: 18829969 DOI: 10.1523/jneurosci.2511-08.2008] [Citation(s) in RCA: 164] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Humans rely heavily on shape similarity among objects for object categorization and identification. Studies using functional magnetic resonance imaging (fMRI) have shown that a large region in human occipitotemporal cortex processes the shape of meaningful as well as unfamiliar objects. Here, we investigate whether the functional organization of this region as measured with fMRI is related to perceived shape similarity. We found that unfamiliar object classes that are rated as having a similar shape were associated with a very similar response pattern distributed across object-selective cortex, whereas object classes that were rated as being very different in shape were associated with a more different response pattern. Human observers, as well as object-selective cortex, were very sensitive to differences in shape features of the objects such as straight versus curved versus "spiky" edges, more so than to differences in overall shape envelope. Response patterns in retinotopic areas V1, V2, and V4 were not found to be related to perceived shape. The functional organization in area V3 was partially related to perceived shape but without a stronger sensitivity for shape features relative to overall shape envelope. Thus, for unfamiliar objects, the organization of human object-selective cortex is strongly related to perceived shape, and this shape-based organization emerges gradually throughout the object vision pathway.
Collapse
|
472
|
Abstract
The human visual system recognizes objects and their constituent parts rapidly and with high accuracy. Standard models of recognition by the visual cortex use feed-forward processing, in which an object's parts are detected before the complete object. However, parts are often ambiguous on their own and require the prior detection and localization of the entire object. We show how a cortical-like hierarchy obtains recognition and localization of objects and parts at multiple levels nearly simultaneously by a single feed-forward sweep from low to high levels of the hierarchy, followed by a feedback sweep from high- to low-level areas.
Collapse
|
473
|
Robust Handwritten Character Recognition with Features Inspired by Visual Ventral Stream. Neural Process Lett 2008. [DOI: 10.1007/s11063-008-9084-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
474
|
Schindler K, Van Gool L, de Gelder B. Recognizing emotions expressed by body pose: a biologically inspired neural model. Neural Netw 2008; 21:1238-46. [PMID: 18585892 DOI: 10.1016/j.neunet.2008.05.003] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2007] [Accepted: 05/20/2008] [Indexed: 10/22/2022]
Abstract
Research into the visual perception of human emotion has traditionally focused on the facial expression of emotions. Recently researchers have turned to the more challenging field of emotional body language, i.e. emotion expression through body pose and motion. In this work, we approach recognition of basic emotional categories from a computational perspective. In keeping with recent computational models of the visual cortex, we construct a biologically plausible hierarchy of neural detectors, which can discriminate seven basic emotional states from static views of associated body poses. The model is evaluated against human test subjects on a recent set of stimuli manufactured for research on emotional body language.
Collapse
Affiliation(s)
- Konrad Schindler
- BIWI, Eidgenössische Technische Hochschule, Zürich, Switzerland.
| | | | | |
Collapse
|
475
|
Abstract
Motivated by the existence of highly selective, sparsely firing cells observed in the human medial temporal lobe (MTL), we present an unsupervised method for learning and recognizing object categories from unlabeled images. In our model, a network of nonlinear neurons learns a sparse representation of its inputs through an unsupervised expectation-maximization process. We show that the application of this strategy to an invariant feature-based description of natural images leads to the development of units displaying sparse, invariant selectivity for particular individuals or image categories much like those observed in the MTL data.
Collapse
Affiliation(s)
- Stephen Waydo
- Control and Dynamical Systems, California Institute of Technology, Pasadena, CA 91125, U.S.A
| | - Christof Koch
- Computation and Neural Systems, California Institute of Technology, Pasadena, CA 91125, U.S.A
| |
Collapse
|
476
|
How position dependent is visual object recognition? Trends Cogn Sci 2008; 12:114-22. [DOI: 10.1016/j.tics.2007.12.006] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Revised: 12/07/2007] [Accepted: 12/20/2007] [Indexed: 11/24/2022]
|
477
|
Ultra-rapid categorisation in non-human primates. Anim Cogn 2008; 11:485-93. [DOI: 10.1007/s10071-008-0139-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2007] [Revised: 12/18/2007] [Accepted: 01/17/2008] [Indexed: 11/25/2022]
|
478
|
Abstract
Neuroimaging research over the past decade has revealed a detailed picture of the functional organization of the human brain. Here we focus on two fundamental questions that are raised by the detailed mapping of sensory and cognitive functions and illustrate these questions with findings from the object-vision pathway. First, are functionally specific regions that are located close together best understood as distinct cortical modules or as parts of a larger-scale cortical map? Second, what functional properties define each cortical map or module? We propose a model in which overlapping continuous maps of simple features give rise to discrete modules that are selective for complex stimuli.
Collapse
Affiliation(s)
- Hans P Op de Beeck
- Laboratory of Experimental Psychology, Katholieke Universiteit Leuven, Leuven, Belgium.
| | | | | |
Collapse
|
479
|
|
480
|
Geiger G, Cattaneo C, Galli R, Pozzoli U, Lorusso ML, Facoetti A, Molteni M. Wide and Diffuse Perceptual Modes Characterize Dyslexics in Vision and Audition. Perception 2008; 37:1745-64. [DOI: 10.1068/p6036] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
We examined the performance of dyslexic and typically reading children on two analogous recognition tasks: one visual and the other auditory. Both tasks required recognition of centrally and peripherally presented stimuli. Dyslexics recognized letters visually farther in the periphery and more diffuse near the center than typical readers did. Both groups performed comparably in recognizing centrally spoken stimuli presented without peripheral interference, but in the presence of a surrounding speech mask (the ‘cocktail-party effect’) dyslexics recognized the central stimuli significantly less well than typical readers. However, dyslexics had a higher ratio of the number of words recognized from the surrounding speech mask, relative to the ones from the center, than typical readers did. We suggest that the evidence of wide visual and auditory perceptual modes in dyslexics indicates wider multi-dimensional neural tuning of sensory processing interacting with wider spatial attention.
Collapse
Affiliation(s)
| | - Carmen Cattaneo
- Scientific Institute ‘Eugenio Medea’, I 23842 Bosisio Parini (Lecco), Italy
| | - Raffaella Galli
- Scientific Institute ‘Eugenio Medea’, I 23842 Bosisio Parini (Lecco), Italy
| | - Uberto Pozzoli
- Scientific Institute ‘Eugenio Medea’, I 23842 Bosisio Parini (Lecco), Italy
| | | | - Andrea Facoetti
- Scientific Institute ‘Eugenio Medea’, I 23842 Bosisio Parini (Lecco), Italy
- Department of General Psychology, University of Padua, via Venezia 8, I 35131 Padua, Italy
| | - Massimo Molteni
- Scientific Institute ‘Eugenio Medea’, I 23842 Bosisio Parini (Lecco), Italy
| |
Collapse
|
481
|
Abstract
Object recognition requires both selectivity among different objects and tolerance to vastly different retinal images of the same object, resulting from natural variation in (e.g.) position, size, illumination, and clutter. Thus, discovering neuronal responses that have object selectivity and tolerance to identity-preserving transformations is fundamental to understanding object recognition. Although selectivity and tolerance are found at the highest level of the primate ventral visual stream [the inferotemporal cortex (IT)], both properties are highly varied and poorly understood. If an IT neuron has very sharp selectivity for a unique combination of object features ("diagnostic features"), this might automatically endow it with high tolerance. However, this relationship cannot be taken as given; although some IT neurons are highly object selective and some are highly tolerant, the empirical connection of these key properties is unknown. In this study, we systematically measured both object selectivity and tolerance to different identity-preserving image transformations in the spiking responses of a population of monkey IT neurons. We found that IT neurons with high object selectivity typically have low tolerance (and vice versa), regardless of how object selectivity was quantified and the type of tolerance examined. The discovery of this trade-off illuminates object selectivity and tolerance in IT and unifies a range of previous, seemingly disparate results. This finding also argues against the idea that diagnostic conjunctions of features guarantee tolerance. Instead, it is naturally explained by object recognition models in which object selectivity is built through AND-like tuning mechanisms.
Collapse
|
482
|
|
483
|
|
484
|
Cadieu C, Kouh M, Pasupathy A, Connor CE, Riesenhuber M, Poggio T. A model of V4 shape selectivity and invariance. J Neurophysiol 2007; 98:1733-50. [PMID: 17596412 DOI: 10.1152/jn.01265.2006] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Object recognition in primates is mediated by the ventral visual pathway and is classically described as a feedforward hierarchy of increasingly sophisticated representations. Neurons in macaque monkey area V4, an intermediate stage along the ventral pathway, have been shown to exhibit selectivity to complex boundary conformation and invariance to spatial translation. How could such a representation be derived from the signals in lower visual areas such as V1? We show that a quantitative model of hierarchical processing, which is part of a larger model of object recognition in the ventral pathway, provides a plausible mechanism for the translation-invariant shape representation observed in area V4. Simulated model neurons successfully reproduce V4 selectivity and invariance through a nonlinear, translation-invariant combination of locally selective subunits, suggesting that a similar transformation may occur or culminate in area V4. Specifically, this mechanism models the selectivity of individual V4 neurons to boundary conformation stimuli, exhibits the same degree of translation invariance observed in V4, and produces observed V4 population responses to bars and non-Cartesian gratings. This work provides a quantitative model of the widely described shape selectivity and invariance properties of area V4 and points toward a possible canonical mechanism operating throughout the ventral pathway.
Collapse
Affiliation(s)
- Charles Cadieu
- Center for Biological and Computational Learning, McGovern Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. )
| | | | | | | | | | | |
Collapse
|
485
|
Serre T, Kreiman G, Kouh M, Cadieu C, Knoblich U, Poggio T. A quantitative theory of immediate visual recognition. PROGRESS IN BRAIN RESEARCH 2007; 165:33-56. [PMID: 17925239 DOI: 10.1016/s0079-6123(06)65004-8] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Human and non-human primates excel at visual recognition tasks. The primate visual system exhibits a strong degree of selectivity while at the same time being robust to changes in the input image. We have developed a quantitative theory to account for the computations performed by the feedforward path in the ventral stream of the primate visual cortex. Here we review recent predictions by a model instantiating the theory about physiological observations in higher visual areas. We also show that the model can perform recognition tasks on datasets of complex natural images at a level comparable to psychophysical measurements on human observers during rapid categorization tasks. In sum, the evidence suggests that the theory may provide a framework to explain the first 100-150 ms of visual object recognition. The model also constitutes a vivid example of how computational models can interact with experimental observations in order to advance our understanding of a complex phenomenon. We conclude by suggesting a number of open questions, predictions, and specific experiments for visual physiology and psychophysics.
Collapse
Affiliation(s)
- Thomas Serre
- Center for Biological and Computational Learning, McGovern Institute for Brain Research, Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Cambridge 02139, USA.
| | | | | | | | | | | |
Collapse
|