1
|
Oleskiw TD, Lieber JD, Simoncelli EP, Movshon JA. FOUNDATIONS OF VISUAL FORM SELECTIVITY IN MACAQUE AREAS V1 AND V2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.03.04.583307. [PMID: 38496618 PMCID: PMC10942284 DOI: 10.1101/2024.03.04.583307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Neurons early in the primate visual cortical pathway generate responses by combining signals from other neurons: some from downstream areas, some from within the same area, and others from areas upstream. Here we develop a model that selectively combines afferents derived from a population model of V1 cells. We use this model to account for responses we recorded of both V1 and V2 neurons in awake fixating macaque monkeys to stimuli composed of a sparse collection of locally oriented features ("droplets") designed to drive subsets of V1 neurons. The first stage computes the rectified responses of a fixed population of oriented filters at different scales that cover the visual field. The second stage computes a weighted combination of these first-stage responses, followed by a final nonlinearity, with parameters optimized to fit data from physiological recordings and constrained to encourage sparsity and locality. The fitted model accounts for the responses of both V1 and V2 neurons, capturing an average of 43% of the explainable variance for V1 and 38% for V2. The models fitted to droplet recordings predict responses to classical stimuli, such as gratings of different orientations and spatial frequencies, as well as to textures of different spectral content, which are known to be especially effective in driving V2. The models are less effective, however, at capturing the selectivity of responses to textures that include naturalistic image statistics. The pattern of afferents - defined by their weights over the 4 dimensions of spatial position, orientation, and spatial frequency - provides a common and interpretable characterization of the origin of many neuronal response properties in the early visual cortex.
Collapse
Affiliation(s)
- Timothy D Oleskiw
- Center for Neural Science, New York University
- Center for Computational Neuroscience, Flatiron Institute
| | | | - Eero P Simoncelli
- Center for Computational Neuroscience, Flatiron Institute
- Center for Neural Science, New York University
| | | |
Collapse
|
2
|
Ranson RE, Scarfe P, van Dam LCJ, Hibbard PB. Depth constancy and the absolute vergence anomaly. Vision Res 2025; 226:108501. [PMID: 39488862 DOI: 10.1016/j.visres.2024.108501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 10/04/2024] [Accepted: 10/07/2024] [Indexed: 11/05/2024]
Abstract
Binocular disparity provides information about the depth structure of objects and surfaces in our environment. Since disparity depends on the distance to objects as well as the depth separation of points, information about distance is required to estimate depth from disparity. Our perception of size and shape is biased, such that far objects appear too small and flattened in depth, and near objects too big and stretched in depth. The current study assessed the extent to which the failure of depth constancy can be accounted for by the uncertainty of distance information provided by vergence. We measured individual differences in vergence noise using a nonius line task, and the degree of depth constancy using a task in which observers judged the magnitude of a depth interval relative to the vertical distance between two targets in the image plane. We found no correlation between the two measures, and show that depth constancy was much poorer than would be expected from vergence noise measured in this way. This limited ability to take account of vergence in the perception of depth is, however, consistent with our poor sensitivity to absolute disparity differences. This absolute disparity anomaly thus also applies to our poor ability to make use of vergence information for absolute distance judgements.
Collapse
Affiliation(s)
- Rebecca E Ranson
- Department of Psychology, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
| | - Peter Scarfe
- School of Psychology and Clinical Language Sciences, University ofReading, Early Gate, Whiteknights Road, RG6 6AL, UK
| | - Loes C J van Dam
- Department of Psychology, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK; Institute of Psychology, Centre for Cognitive Science, TU-Darmstadt, 64283 Darmstadt, Germany
| | - Paul B Hibbard
- Department of Psychology, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK; Division of Psychology, University of Stirling, Stirling, FK9 4LA, UK.
| |
Collapse
|
3
|
Necessary and sufficient conditions of proper estimators based on self density ratio for unnormalized statistical models. Neural Netw 2018; 98:263-270. [DOI: 10.1016/j.neunet.2017.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Revised: 04/12/2017] [Accepted: 11/28/2017] [Indexed: 11/20/2022]
|
4
|
Hunter DW, Hibbard PB. The effect of image position on the Independent Components of natural binocular images. Sci Rep 2018; 8:449. [PMID: 29323133 PMCID: PMC5765131 DOI: 10.1038/s41598-017-18460-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 12/01/2017] [Indexed: 11/09/2022] Open
Abstract
Human visual performance degrades substantially as the angular distance from the fovea increases. This decrease in performance is found for both binocular and monocular vision. Although analysis of the statistics of natural images has provided significant insights into human visual processing, little research has focused on the statistical content of binocular images at eccentric angles. We applied Independent Component Analysis to rectangular image patches cut from locations within binocular images corresponding to different degrees of eccentricity. The distribution of components learned from the varying locations was examined to determine how these distributions varied across eccentricity. We found a general trend towards a broader spread of horizontal and vertical position disparity tunings in eccentric regions compared to the fovea, with the horizontal spread more pronounced than the vertical spread. Eccentric locations above the centroid show a strong bias towards far-tuned components, eccentric locations below the centroid show a strong bias towards near-tuned components. These distributions exhibit substantial similarities with physiological measurements in V1, however in common with previous research we also observe important differences, in particular distributions of binocular phase disparity which do not match physiology.
Collapse
Affiliation(s)
- David W Hunter
- Prifysgol Aberystwyth University, Department of Computer Science, Aberystwyth, SY23 3DB, UK.
| | - Paul B Hibbard
- University of Essex, Department of Psychology, Colchester, CO4 3SQ, UK
| |
Collapse
|
5
|
Sasaki H, Gutmann MU, Shouno H, Hyvärinen A. Simultaneous Estimation of Nongaussian Components and Their Correlation Structure. Neural Comput 2017; 29:2887-2924. [PMID: 28777730 DOI: 10.1162/neco_a_01006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The statistical dependencies that independent component analysis (ICA) cannot remove often provide rich information beyond the linear independent components. It would thus be very useful to estimate the dependency structure from data. While such models have been proposed, they have usually concentrated on higher-order correlations such as energy (square) correlations. Yet linear correlations are a fundamental and informative form of dependency in many real data sets. Linear correlations are usually completely removed by ICA and related methods so they can only be analyzed by developing new methods that explicitly allow for linearly correlated components. In this article, we propose a probabilistic model of linear nongaussian components that are allowed to have both linear and energy correlations. The precision matrix of the linear components is assumed to be randomly generated by a higher-order process and explicitly parameterized by a parameter matrix. The estimation of the parameter matrix is shown to be particularly simple because using score-matching (Hyvärinen, 2005 ), the objective function is a quadratic form. Using simulations with artificial data, we demonstrate that the proposed method improves the identifiability of nongaussian components by simultaneously learning their correlation structure. Applications on simulated complex cells with natural image input, as well as spectrograms of natural audio data, show that the method finds new kinds of dependencies between the components.
Collapse
Affiliation(s)
- Hiroaki Sasaki
- Graduate School of Information Science, Nara Institute of Science and Technology, Nara 630-0192, Japan
| | - Michael U Gutmann
- School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, U.K.
| | - Hayaru Shouno
- Graduate School of Informatics and Engineering, University of Electro-Communications, Tokyo 182-8585, Japan
| | - Aapo Hyvärinen
- Helsinki Institute for Information Technology, University of Helsinki, Helsinki 00560, Finland, and Gatsby Computational Neuroscience Unit, University College London, London W1T 4JG, U.K.
| |
Collapse
|
6
|
Hosoya H, Hyvärinen A. A mixture of sparse coding models explaining properties of face neurons related to holistic and parts-based processing. PLoS Comput Biol 2017; 13:e1005667. [PMID: 28742816 PMCID: PMC5549761 DOI: 10.1371/journal.pcbi.1005667] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Revised: 08/08/2017] [Accepted: 07/05/2017] [Indexed: 11/23/2022] Open
Abstract
Experimental studies have revealed evidence of both parts-based and holistic representations of objects and faces in the primate visual system. However, it is still a mystery how such seemingly contradictory types of processing can coexist within a single system. Here, we propose a novel theory called mixture of sparse coding models, inspired by the formation of category-specific subregions in the inferotemporal (IT) cortex. We developed a hierarchical network that constructed a mixture of two sparse coding submodels on top of a simple Gabor analysis. The submodels were each trained with face or non-face object images, which resulted in separate representations of facial parts and object parts. Importantly, evoked neural activities were modeled by Bayesian inference, which had a top-down explaining-away effect that enabled recognition of an individual part to depend strongly on the category of the whole input. We show that this explaining-away effect was indeed crucial for the units in the face submodel to exhibit significant selectivity to face images over object images in a similar way to actual face-selective neurons in the macaque IT cortex. Furthermore, the model explained, qualitatively and quantitatively, several tuning properties to facial features found in the middle patch of face processing in IT as documented by Freiwald, Tsao, and Livingstone (2009). These included, in particular, tuning to only a small number of facial features that were often related to geometrically large parts like face outline and hair, preference and anti-preference of extreme facial features (e.g., very large/small inter-eye distance), and reduction of the gain of feature tuning for partial face stimuli compared to whole face stimuli. Thus, we hypothesize that the coding principle of facial features in the middle patch of face processing in the macaque IT cortex may be closely related to mixture of sparse coding models. Does the brain represent an object as a combination of parts or as a whole? Past experiments have found both types of representation; but how can such opposing notions coexist in a single visual system? Here, we introduce a novel theory called mixture of sparse coding models for investigating the possible computational principles underlying the primate visual object processing. We constructed a hierarchical network combining two sparse coding modules that each represented one feature set, of either facial parts or non-facial object parts. Competitive computation between the modules, formalized as Bayesian inference, enabled parts to be recognized with a strong top-down influence from the category of the whole input. We show that the latter computation is crucial to explain in detail neural selectivity and tuning properties that were experimentally reported for a particular face processing region called the middle patch. Thus, we offer the first theoretical account of neural face processing in relation to parts-based and holistic representations.
Collapse
Affiliation(s)
- Haruo Hosoya
- Cognitive Mechanisms Laboratories, ATR International, Kyoto, Japan
- * E-mail:
| | - Aapo Hyvärinen
- Department of Computer Science and HIIT, University of Helsinki, Helsinki, Finland
- Gatsby Computational Neuroscience Unit, University College London, London, UK
| |
Collapse
|
7
|
Gutmann MU, Dutta R, Kaski S, Corander J. Likelihood-free inference via classification. STATISTICS AND COMPUTING 2017; 28:411-425. [PMID: 31997856 PMCID: PMC6956883 DOI: 10.1007/s11222-017-9738-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Accepted: 02/28/2017] [Indexed: 06/10/2023]
Abstract
Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers.
Collapse
Affiliation(s)
| | - Ritabrata Dutta
- InterDisciplinary Institute of Data Science, Universitá della Svizzera italiana, Lugano, Switzerland
| | - Samuel Kaski
- Helsinki Institute for Information Technology, Department of Computer Science, Aalto University, Espoo, Finland
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
8
|
Hunter DW, Hibbard PB. Ideal Binocular Disparity Detectors Learned Using Independent Subspace Analysis on Binocular Natural Image Pairs. PLoS One 2016; 11:e0150117. [PMID: 26982184 PMCID: PMC4794214 DOI: 10.1371/journal.pone.0150117] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 02/09/2016] [Indexed: 11/22/2022] Open
Abstract
An influential theory of mammalian vision, known as the efficient coding hypothesis, holds that early stages in the visual cortex attempts to form an efficient coding of ecologically valid stimuli. Although numerous authors have successfully modelled some aspects of early vision mathematically, closer inspection has found substantial discrepancies between the predictions of some of these models and observations of neurons in the visual cortex. In particular analysis of linear-non-linear models of simple-cells using Independent Component Analysis has found a strong bias towards features on the horoptor. In order to investigate the link between the information content of binocular images, mathematical models of complex cells and physiological recordings, we applied Independent Subspace Analysis to binocular image patches in order to learn a set of complex-cell-like models. We found that these complex-cell-like models exhibited a wide range of binocular disparity-discriminability, although only a minority exhibited high binocular discrimination scores. However, in common with the linear-non-linear model case we found that feature detection was limited to the horoptor suggesting that current mathematical models are limited in their ability to explain the functionality of the visual cortex.
Collapse
Affiliation(s)
- David W. Hunter
- School of Psychology and Neuroscience, University of St Andrews, St Andrews, United Kingdom
| | - Paul B. Hibbard
- Department of Psychology, University of Essex, Colchester, United Kingdom
| |
Collapse
|
9
|
Abstract
Previous theoretical and experimental studies have demonstrated tight relationships between natural image statistics and neural representations in V1. In particular, receptive field properties similar to simple and complex cells have been shown to be inferable from sparse coding of natural images. However, whether such a relationship exists in higher areas has not been clarified. To address this question for V2, we trained a sparse coding model that took as input the output of a fixed V1-like model, which was in its turn fed a large variety of natural image patches as input. After the training, the model exhibited response properties that were qualitatively and quantitatively compatible with three major neurophysiological results on macaque V2, as follows: (1) homogeneous and heterogeneous integration of local orientations (Anzai et al., 2007); (2) a wide range of angle selectivities with biased sensitivities to one component orientation (Ito and Komatsu, 2004); and (3) exclusive length and width suppression (Schmid et al., 2014). The reproducibility was stable across variations in several model parameters. Further, a formal classification of the internal representations of the model units offered detailed interpretations of the experimental data, emphasizing that a novel type of model cell that could detect a combination of local orientations converging toward a single spatial point (potentially related to corner-like features) played an important role in reproducing tuning properties compatible with V2. These results are consistent with the idea that V2 uses a sparse code of natural images. Significance statement: Sparse coding theory has successfully explained a number of receptive field properties in V1; but how about in V2? This question has recently become important since a variety of properties distinct from V1 have been discovered in V2, and thus a more integrative understanding is called for. Our study shows that a hierarchical sparse coding model of natural images explains three major response properties known in the macaque V2. We further provide a detailed analysis revealing the roles of different kinds of model cells in explaining the V2-specific properties. Our results thus offer the first sparse coding account for receptive field properties in V2 that has extensive biological relevance.
Collapse
|
10
|
Güçlü U, van Gerven MAJ. Unsupervised feature learning improves prediction of human brain activity in response to natural images. PLoS Comput Biol 2014; 10:e1003724. [PMID: 25101625 PMCID: PMC4125038 DOI: 10.1371/journal.pcbi.1003724] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 04/28/2014] [Indexed: 11/18/2022] Open
Abstract
Encoding and decoding in functional magnetic resonance imaging has recently emerged as an area of research to noninvasively characterize the relationship between stimulus features and human brain activity. To overcome the challenge of formalizing what stimulus features should modulate single voxel responses, we introduce a general approach for making directly testable predictions of single voxel responses to statistically adapted representations of ecologically valid stimuli. These representations are learned from unlabeled data without supervision. Our approach is validated using a parsimonious computational model of (i) how early visual cortical representations are adapted to statistical regularities in natural images and (ii) how populations of these representations are pooled by single voxels. This computational model is used to predict single voxel responses to natural images and identify natural images from stimulus-evoked multiple voxel responses. We show that statistically adapted low-level sparse and invariant representations of natural images better span the space of early visual cortical representations and can be more effectively exploited in stimulus identification than hand-designed Gabor wavelets. Our results demonstrate the potential of our approach to better probe unknown cortical representations. An important but difficult problem in contemporary cognitive neuroscience is to find what stimulus features best drive responses in the human brain. The conventional approach to solve this problem is to use descriptive encoding models that predict responses to stimulus features that are known a priori. In this study, we introduce an alternative to this approach that is independent of a priori knowledge. Instead, we use a normative encoding model that predicts responses to stimulus features that are learned from unlabeled data. We show that this normative encoding model learns sparse, topographic and invariant stimulus features from tens of thousands of grayscale natural image patches without supervision, and reproduces the population behavior of simple and complex cells. We find that these stimulus features significantly better drive blood-oxygen-level dependent hemodynamic responses in early visual areas than Gabor wavelets–the fundamental building blocks of the conventional approach. Our approach will improve our understanding of how sensory information is represented beyond early visual areas since it can theoretically find what stimulus features best drive responses in other sensory areas.
Collapse
Affiliation(s)
- Umut Güçlü
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands
- * E-mail:
| | - Marcel A. J. van Gerven
- Radboud University Nijmegen, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands
| |
Collapse
|
11
|
Hu X, Zhang J, Li J, Zhang B. Sparsity-regularized HMAX for visual recognition. PLoS One 2014; 9:e81813. [PMID: 24392078 PMCID: PMC3879257 DOI: 10.1371/journal.pone.0081813] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 10/16/2013] [Indexed: 11/19/2022] Open
Abstract
About ten years ago, HMAX was proposed as a simple and biologically feasible model for object recognition, based on how the visual cortex processes information. However, the model does not encompass sparse firing, which is a hallmark of neurons at all stages of the visual pathway. The current paper presents an improved model, called sparse HMAX, which integrates sparse firing. This model is able to learn higher-level features of objects on unlabeled training images. Unlike most other deep learning models that explicitly address global structure of images in every layer, sparse HMAX addresses local to global structure gradually along the hierarchy by applying patch-based learning to the output of the previous layer. As a consequence, the learning method can be standard sparse coding (SSC) or independent component analysis (ICA), two techniques deeply rooted in neuroscience. What makes SSC and ICA applicable at higher levels is the introduction of linear higher-order statistical regularities by max pooling. After training, high-level units display sparse, invariant selectivity for particular individuals or for image categories like those observed in human inferior temporal cortex (ITC) and medial temporal lobe (MTL). Finally, on an image classification benchmark, sparse HMAX outperforms the original HMAX by a large margin, suggesting its great potential for computer vision.
Collapse
Affiliation(s)
- Xiaolin Hu
- State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), and Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - Jianwei Zhang
- Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Jianmin Li
- State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), and Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - Bo Zhang
- State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology (TNList), and Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
12
|
Escobar MJ, Palacios AG. Beyond the retina neural coding: on models and neural rehabilitation. ACTA ACUST UNITED AC 2013; 107:335-7. [PMID: 23994100 DOI: 10.1016/j.jphysparis.2013.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- María-José Escobar
- Universidad Técnica Federico Santa María, Departmento de Electronica, 2390123 Valparaíso, Chile.
| | | |
Collapse
|