1
|
Oleskiw TD, Lieber JD, Simoncelli EP, Movshon JA. FOUNDATIONS OF VISUAL FORM SELECTIVITY IN MACAQUE AREAS V1 AND V2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.03.04.583307. [PMID: 38496618 PMCID: PMC10942284 DOI: 10.1101/2024.03.04.583307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Neurons early in the primate visual cortical pathway generate responses by combining signals from other neurons: some from downstream areas, some from within the same area, and others from areas upstream. Here we develop a model that selectively combines afferents derived from a population model of V1 cells. We use this model to account for responses we recorded of both V1 and V2 neurons in awake fixating macaque monkeys to stimuli composed of a sparse collection of locally oriented features ("droplets") designed to drive subsets of V1 neurons. The first stage computes the rectified responses of a fixed population of oriented filters at different scales that cover the visual field. The second stage computes a weighted combination of these first-stage responses, followed by a final nonlinearity, with parameters optimized to fit data from physiological recordings and constrained to encourage sparsity and locality. The fitted model accounts for the responses of both V1 and V2 neurons, capturing an average of 43% of the explainable variance for V1 and 38% for V2. The models fitted to droplet recordings predict responses to classical stimuli, such as gratings of different orientations and spatial frequencies, as well as to textures of different spectral content, which are known to be especially effective in driving V2. The models are less effective, however, at capturing the selectivity of responses to textures that include naturalistic image statistics. The pattern of afferents - defined by their weights over the 4 dimensions of spatial position, orientation, and spatial frequency - provides a common and interpretable characterization of the origin of many neuronal response properties in the early visual cortex.
Collapse
Affiliation(s)
- Timothy D Oleskiw
- Center for Neural Science, New York University
- Center for Computational Neuroscience, Flatiron Institute
| | | | - Eero P Simoncelli
- Center for Computational Neuroscience, Flatiron Institute
- Center for Neural Science, New York University
| | | |
Collapse
|
2
|
Parthasarathy N, Hénaff OJ, Simoncelli EP. Layerwise complexity-matched learning yields an improved model of cortical area V2. ARXIV 2024:arXiv:2312.11436v3. [PMID: 39070038 PMCID: PMC11275700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained end-to-end for object recognition approach human capabilities, and offer the best descriptions to date of neural responses in the late stages of the hierarchy. But these networks provide a poor account of the early stages, compared to traditional hand-engineered models, or models optimized for coding efficiency or prediction. Moreover, the gradient backpropagation used in end-to-end learning is generally considered to be biologically implausible. Here, we overcome both of these limitations by developing a bottom-up self-supervised training methodology that operates independently on successive layers. Specifically, we maximize feature similarity between pairs of locally-deformed natural image patches, while decorrelating features across patches sampled from other images. Crucially, the deformation amplitudes are adjusted proportionally to receptive field sizes in each layer, thus matching the task complexity to the capacity at each stage of processing. In comparison with architecture-matched versions of previous models, we demonstrate that our layerwise complexity-matched learning (LCL) formulation produces a two-stage model (LCL-V2) that is better aligned with selectivity properties and neural activity in primate area V2. We demonstrate that the complexity-matched learning paradigm is responsible for much of the emergence of the improved biological alignment. Finally, when the two-stage model is used as a fixed front-end for a deep network trained to perform object recognition, the resultant model (LCL-V2Net) is significantly better than standard end-to-end self-supervised, supervised, and adversarially-trained models in terms of generalization to out-of-distribution tasks and alignment with human behavior. Our code and pre-trained checkpoints are available at https://github.com/nikparth/LCL-V2.git.
Collapse
Affiliation(s)
- Nikhil Parthasarathy
- Center for Neural Science, New York University
- Center for Computational Neuroscience, Flatiron Institute
| | | | - Eero P Simoncelli
- Center for Neural Science, New York University
- Center for Computational Neuroscience, Flatiron Institute
| |
Collapse
|
3
|
Hassanpour MS, Merlin S, Federer F, Zaidi Q, Angelucci A. Primate V2 Receptive Fields Derived from Anatomically Identified Large-Scale V1 Inputs. RESEARCH SQUARE 2024:rs.3.rs-4139501. [PMID: 38798339 PMCID: PMC11118708 DOI: 10.21203/rs.3.rs-4139501/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
In the primate visual system, visual object recognition involves a series of cortical areas arranged hierarchically along the ventral visual pathway. As information flows through this hierarchy, neurons become progressively tuned to more complex image features. The circuit mechanisms and computations underlying the increasing complexity of these receptive fields (RFs) remain unidentified. To understand how this complexity emerges in the secondary visual area (V2), we investigated the functional organization of inputs from the primary visual cortex (V1) to V2 by combining retrograde anatomical tracing of these inputs with functional imaging of feature maps in macaque monkey V1 and V2. We found that V1 neurons sending inputs to single V2 orientation columns have a broad range of preferred orientations, but are strongly biased towards the orientation represented at the injected V2 site. For each V2 site, we then constructed a feedforward model based on the linear combination of its anatomically-identified large-scale V1 inputs, and studied the response proprieties of the generated V2 RFs. We found that V2 RFs derived from the linear feedforward model were either elongated versions of V1 filters or had spatially complex structures. These modeled RFs predicted V2 neuron responses to oriented grating stimuli with high accuracy. Remarkably, this simple model also explained the greater selectivity to naturalistic textures of V2 cells compared to their V1 input cells. Our results demonstrate that simple linear combinations of feedforward inputs can account for the orientation selectivity and texture sensitivity of V2 RFs.
Collapse
Affiliation(s)
- Mahlega S Hassanpour
- Dept. of Ophthalmology and Visual Science, Moran Eye Institute, University of Utah
| | - Sam Merlin
- Dept. of Ophthalmology and Visual Science, Moran Eye Institute, University of Utah
- Present address: Dept of Medical Science, School of Science, Western Sydney University
| | - Frederick Federer
- Dept. of Ophthalmology and Visual Science, Moran Eye Institute, University of Utah
| | - Qasim Zaidi
- Graduate Center for Vision Research, State University of New York, College of Optometry
| | - Alessandra Angelucci
- Dept. of Ophthalmology and Visual Science, Moran Eye Institute, University of Utah
| |
Collapse
|
4
|
Hassanpour MS, Merlin S, Federer F, Zaidi Q, Angelucci A. Primate V2 Receptive Fields Derived from Anatomically Identified Large-Scale V1 Inputs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.22.586002. [PMID: 38585792 PMCID: PMC10996519 DOI: 10.1101/2024.03.22.586002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
In the primate visual system, visual object recognition involves a series of cortical areas arranged hierarchically along the ventral visual pathway. As information flows through this hierarchy, neurons become progressively tuned to more complex image features. The circuit mechanisms and computations underlying the increasing complexity of these receptive fields (RFs) remain unidentified. To understand how this complexity emerges in the secondary visual area (V2), we investigated the functional organization of inputs from the primary visual cortex (V1) to V2 by combining retrograde anatomical tracing of these inputs with functional imaging of feature maps in macaque monkey V1 and V2. We found that V1 neurons sending inputs to single V2 orientation columns have a broad range of preferred orientations, but are strongly biased towards the orientation represented at the injected V2 site. For each V2 site, we then constructed a feedforward model based on the linear combination of its anatomically-identified large-scale V1 inputs, and studied the response proprieties of the generated V2 RFs. We found that V2 RFs derived from the linear feedforward model were either elongated versions of V1 filters or had spatially complex structures. These modeled RFs predicted V2 neuron responses to oriented grating stimuli with high accuracy. Remarkably, this simple model also explained the greater selectivity to naturalistic textures of V2 cells compared to their V1 input cells. Our results demonstrate that simple linear combinations of feedforward inputs can account for the orientation selectivity and texture sensitivity of V2 RFs.
Collapse
|
5
|
Mai A, Riès S, Ben-Haim S, Shih JJ, Gentner TQ. Acoustic and language-specific sources for phonemic abstraction from speech. Nat Commun 2024; 15:677. [PMID: 38263364 PMCID: PMC10805762 DOI: 10.1038/s41467-024-44844-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 01/03/2024] [Indexed: 01/25/2024] Open
Abstract
Spoken language comprehension requires abstraction of linguistic information from speech, but the interaction between auditory and linguistic processing of speech remains poorly understood. Here, we investigate the nature of this abstraction using neural responses recorded intracranially while participants listened to conversational English speech. Capitalizing on multiple, language-specific patterns where phonological and acoustic information diverge, we demonstrate the causal efficacy of the phoneme as a unit of analysis and dissociate the unique contributions of phonemic and spectrographic information to neural responses. Quantitive higher-order response models also reveal that unique contributions of phonological information are carried in the covariance structure of the stimulus-response relationship. This suggests that linguistic abstraction is shaped by neurobiological mechanisms that involve integration across multiple spectro-temporal features and prior phonological information. These results link speech acoustics to phonology and morphosyntax, substantiating predictions about abstractness in linguistic theory and providing evidence for the acoustic features that support that abstraction.
Collapse
Affiliation(s)
- Anna Mai
- University of California, San Diego, Linguistics, 9500 Gilman Dr., La Jolla, CA, 92093, USA.
| | - Stephanie Riès
- San Diego State University, School of Speech, Language, and Hearing Sciences, 5500 Campanile Drive, San Diego, CA, 92182, USA
- San Diego State University, Center for Clinical and Cognitive Sciences, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Sharona Ben-Haim
- University of California, San Diego, Neurological Surgery, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Jerry J Shih
- University of California, San Diego, Neurosciences, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Timothy Q Gentner
- University of California, San Diego, Psychology, 9500 Gilman Dr., La Jolla, CA, 92093, USA
- University of California, San Diego, Neurobiology, 9500 Gilman Dr., La Jolla, CA, 92093, USA
- University of California, San Diego, Kavli Institute for Brain and Mind, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| |
Collapse
|
6
|
Liu JK, Karamanlis D, Gollisch T. Simple model for encoding natural images by retinal ganglion cells with nonlinear spatial integration. PLoS Comput Biol 2022; 18:e1009925. [PMID: 35259159 PMCID: PMC8932571 DOI: 10.1371/journal.pcbi.1009925] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 03/18/2022] [Accepted: 02/14/2022] [Indexed: 01/05/2023] Open
Abstract
A central goal in sensory neuroscience is to understand the neuronal signal processing involved in the encoding of natural stimuli. A critical step towards this goal is the development of successful computational encoding models. For ganglion cells in the vertebrate retina, the development of satisfactory models for responses to natural visual scenes is an ongoing challenge. Standard models typically apply linear integration of visual stimuli over space, yet many ganglion cells are known to show nonlinear spatial integration, in particular when stimulated with contrast-reversing gratings. We here study the influence of spatial nonlinearities in the encoding of natural images by ganglion cells, using multielectrode-array recordings from isolated salamander and mouse retinas. We assess how responses to natural images depend on first- and second-order statistics of spatial patterns inside the receptive field. This leads us to a simple extension of current standard ganglion cell models. We show that taking not only the weighted average of light intensity inside the receptive field into account but also its variance over space can partly account for nonlinear integration and substantially improve response predictions of responses to novel images. For salamander ganglion cells, we find that response predictions for cell classes with large receptive fields profit most from including spatial contrast information. Finally, we demonstrate how this model framework can be used to assess the spatial scale of nonlinear integration. Our results underscore that nonlinear spatial stimulus integration translates to stimulation with natural images. Furthermore, the introduced model framework provides a simple, yet powerful extension of standard models and may serve as a benchmark for the development of more detailed models of the nonlinear structure of receptive fields. For understanding how sensory systems operate in the natural environment, an important goal is to develop models that capture neuronal responses to natural stimuli. For retinal ganglion cells, which connect the eye to the brain, current standard models often fail to capture responses to natural visual scenes. This shortcoming is at least partly rooted in the fact that ganglion cells may combine visual signals over space in a nonlinear fashion. We here show that a simple model, which not only considers the average light intensity inside a cell’s receptive field but also the variance of light intensity over space, can partly account for these nonlinearities and thereby improve current standard models. This provides an easy-to-obtain benchmark for modeling ganglion cell responses to natural images.
Collapse
Affiliation(s)
- Jian K. Liu
- University Medical Center Göttingen, Department of Ophthalmology, Göttingen, Germany
- Bernstein Center for Computational Neuroscience Göttingen, Göttingen, Germany
- School of Computing, University of Leeds, Leeds, United Kingdom
| | - Dimokratis Karamanlis
- University Medical Center Göttingen, Department of Ophthalmology, Göttingen, Germany
- Bernstein Center for Computational Neuroscience Göttingen, Göttingen, Germany
- International Max Planck Research School for Neurosciences, Göttingen, Germany
| | - Tim Gollisch
- University Medical Center Göttingen, Department of Ophthalmology, Göttingen, Germany
- Bernstein Center for Computational Neuroscience Göttingen, Göttingen, Germany
- Cluster of Excellence “Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells” (MBExC), University of Göttingen, Göttingen, Germany
- * E-mail:
| |
Collapse
|
7
|
Bowren J, Sanchez-Giraldo L, Schwartz O. Inference via sparse coding in a hierarchical vision model. J Vis 2022; 22:19. [PMID: 35212744 PMCID: PMC8883180 DOI: 10.1167/jov.22.2.19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Sparse coding has been incorporated in models of the visual cortex for its computational advantages and connection to biology. But how the level of sparsity contributes to performance on visual tasks is not well understood. In this work, sparse coding has been integrated into an existing hierarchical V2 model (Hosoya & Hyvärinen, 2015), but replacing its independent component analysis (ICA) with an explicit sparse coding in which the degree of sparsity can be controlled. After training, the sparse coding basis functions with a higher degree of sparsity resembled qualitatively different structures, such as curves and corners. The contributions of the models were assessed with image classification tasks, specifically tasks associated with mid-level vision including figure–ground classification, texture classification, and angle prediction between two line stimuli. In addition, the models were assessed in comparison with a texture sensitivity measure that has been reported in V2 (Freeman et al., 2013) and a deleted-region inference task. The results from the experiments show that although sparse coding performed worse than ICA at classifying images, only sparse coding was able to better match the texture sensitivity level of V2 and infer deleted image regions, both by increasing the degree of sparsity in sparse coding. Greater degrees of sparsity allowed for inference over larger deleted image regions. The mechanism that allows for this inference capability in sparse coding is described in this article.
Collapse
Affiliation(s)
- Joshua Bowren
- Department of Computer Science, University of Miami, Coral Gables, FL, USA.,
| | - Luis Sanchez-Giraldo
- Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY, USA.,
| | - Odelia Schwartz
- Department of Computer Science, University of Miami, Coral Gables, FL, USA.,
| |
Collapse
|
8
|
Yan Q, Zheng Y, Jia S, Zhang Y, Yu Z, Chen F, Tian Y, Huang T, Liu JK. Revealing Fine Structures of the Retinal Receptive Field by Deep-Learning Networks. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:39-50. [PMID: 32167923 DOI: 10.1109/tcyb.2020.2972983] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Deep convolutional neural networks (CNNs) have demonstrated impressive performance on many visual tasks. Recently, they became useful models for the visual system in neuroscience. However, it is still not clear what is learned by CNNs in terms of neuronal circuits. When a deep CNN with many layers is used for the visual system, it is not easy to compare the structure components of CNNs with possible neuroscience underpinnings due to highly complex circuits from the retina to the higher visual cortex. Here, we address this issue by focusing on single retinal ganglion cells with biophysical models and recording data from animals. By training CNNs with white noise images to predict neuronal responses, we found that fine structures of the retinal receptive field can be revealed. Specifically, convolutional filters learned are resembling biological components of the retinal circuit. This suggests that a CNN learning from one single retinal cell reveals a minimal neural network carried out in this cell. Furthermore, when CNNs learned from different cells are transferred between cells, there is a diversity of transfer learning performance, which indicates that CNNs are cell specific. Moreover, when CNNs are transferred between different types of input images, here white noise versus natural images, transfer learning shows a good performance, which implies that CNNs indeed capture the full computational ability of a single retinal cell for different inputs. Taken together, these results suggest that CNNs could be used to reveal structure components of neuronal circuits, and provide a powerful model for neural system identification.
Collapse
|
9
|
Zheng Y, Jia S, Yu Z, Liu JK, Huang T. Unraveling neural coding of dynamic natural visual scenes via convolutional recurrent neural networks. PATTERNS (NEW YORK, N.Y.) 2021; 2:100350. [PMID: 34693375 PMCID: PMC8515013 DOI: 10.1016/j.patter.2021.100350] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 06/22/2021] [Accepted: 08/23/2021] [Indexed: 11/18/2022]
Abstract
Traditional models of retinal system identification analyze the neural response to artificial stimuli using models consisting of predefined components. The model design is limited to prior knowledge, and the artificial stimuli are too simple to be compared with stimuli processed by the retina. To fill in this gap with an explainable model that reveals how a population of neurons work together to encode the larger field of natural scenes, here we used a deep-learning model for identifying the computational elements of the retinal circuit that contribute to learning the dynamics of natural scenes. Experimental results verify that the recurrent connection plays a key role in encoding complex dynamic visual scenes while learning biological computational underpinnings of the retinal circuit. In addition, the proposed models reveal both the shapes and the locations of the spatiotemporal receptive fields of ganglion cells.
Collapse
Affiliation(s)
- Yajing Zheng
- Department of Computer Science and Technology, National Engineering Laboratory for Video Technology, Peking University, Beijing 100871, China
| | - Shanshan Jia
- Department of Computer Science and Technology, National Engineering Laboratory for Video Technology, Peking University, Beijing 100871, China
| | - Zhaofei Yu
- Department of Computer Science and Technology, National Engineering Laboratory for Video Technology, Peking University, Beijing 100871, China
- Institute for Artificial Intelligence, Peking University, Beijing 100871, China
| | - Jian K. Liu
- School of Computing, University of Leeds, Leeds LS2 9JT, UK
| | - Tiejun Huang
- Department of Computer Science and Technology, National Engineering Laboratory for Video Technology, Peking University, Beijing 100871, China
- Institute for Artificial Intelligence, Peking University, Beijing 100871, China
| |
Collapse
|
10
|
Jassim N, Baron-Cohen S, Suckling J. Meta-analytic evidence of differential prefrontal and early sensory cortex activity during non-social sensory perception in autism. Neurosci Biobehav Rev 2021; 127:146-157. [PMID: 33887326 DOI: 10.1016/j.neubiorev.2021.04.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 03/26/2021] [Accepted: 04/12/2021] [Indexed: 01/24/2023]
Abstract
To date, neuroimaging research has had a limited focus on non-social features of autism. As a result, neurobiological explanations for atypical sensory perception in autism are lacking. To address this, we quantitively condensed findings from the non-social autism fMRI literature in line with the current best practices for neuroimaging meta-analyses. Using activation likelihood estimation (ALE), we conducted a series of robust meta-analyses across 83 experiments from 52 fMRI studies investigating differences between autistic (n = 891) and typical (n = 967) participants. We found that typical controls, compared to autistic people, show greater activity in the prefrontal cortex (BA9, BA10) during perception tasks. More refined analyses revealed that, when compared to typical controls, autistic people show greater recruitment of the extrastriate V2 cortex (BA18) during visual processing. Taken together, these findings contribute to our understanding of current theories of autistic perception, and highlight some of the challenges of cognitive neuroscience research in autism.
Collapse
Affiliation(s)
- Nazia Jassim
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Douglas House, 18B Trumpington Road, Cambridge, CB2 8AH, United Kingdom.
| | - Simon Baron-Cohen
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Douglas House, 18B Trumpington Road, Cambridge, CB2 8AH, United Kingdom
| | - John Suckling
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Douglas House, 18B Trumpington Road, Cambridge, CB2 8AH, United Kingdom; Department of Psychiatry, University of Cambridge, Herchel Smith Building for Brain and Mind Sciences, Forvie Site, Robinson Way, Cambridge, CB2 0SZ, United Kingdom
| |
Collapse
|
11
|
Wei H, Xu C, Jin Z. Binocular Matching Model Based on Hierarchical V1 and V2 Receptive Fields With Color, Orientation, and Region Feature Information. IEEE Trans Biomed Eng 2020; 67:3141-3150. [PMID: 32142415 DOI: 10.1109/tbme.2020.2977350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Binocular matching models serve as the core component in most stereo visual aid systems developed for people with visual impairments. However, purely computational models lack a neuro-biological basis for explaining the phenomena observed in neuro-biology, and therefore offer no support for the development of bioengineering applications, and are overly complex for hardware implementation. In contrast, existing neurobiological models suffer from low matching calculation accuracy. Therefore, the present work proposes a novel binocular matching model based on the receptive field of simple cells rather than on image pixels, and thereby incorporates neurobiological structure, reduces hardware complexity, has enough accuracy and can be used in visual aid system. The proposed model is employed to calculate and optimize the binocular disparity via a cost function. Specifically, we simulate the functions and structures of V1 and V2 neurons according to the discoveries of modern neurobiology. Accordingly, the receptive fields of V1 layer neurons are aggregated to obtain the receptive fields of the V2 layer, and the disparity is obtained in the V2 layer. The accuracy of the proposed model is verified by comparisons of the disparity results obtained using the proposed model with those obtained using other neurobiological model, and thereby demonstrates that the model can guide the design of visual aid systems.
Collapse
|
12
|
Object shape and surface properties are jointly encoded in mid-level ventral visual cortex. Curr Opin Neurobiol 2019; 58:199-208. [PMID: 31586749 DOI: 10.1016/j.conb.2019.09.009] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 08/30/2019] [Accepted: 09/11/2019] [Indexed: 11/22/2022]
Abstract
Recognizing a myriad visual objects rapidly is a hallmark of the primate visual system. Traditional theories of object recognition have focused on how crucial form features, for example, the orientation of edges, may be extracted in early visual cortex and utilized to recognize objects. An alternative view argues that much of early and mid-level visual processing focuses on encoding surface characteristics, for example, texture. Neurophysiological evidence from primate area V4 supports a third alternative - the joint, but independent, encoding of form and texture - that would be advantageous for segmenting objects from the background in natural scenes and for object recognition that is independent of surface texture. Future studies that leverage deep convolutional network models, especially focusing on network failures to match biology and behavior, can advance our insights into how such a joint representation of form and surface properties might emerge in visual cortex.
Collapse
|
13
|
Giraldo LGS, Schwartz O. Integrating Flexible Normalization into Midlevel Representations of Deep Convolutional Neural Networks. Neural Comput 2019; 31:2138-2176. [PMID: 31525314 DOI: 10.1162/neco_a_01226] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Deep convolutional neural networks (CNNs) are becoming increasingly popular models to predict neural responses in visual cortex. However, contextual effects, which are prevalent in neural processing and in perception, are not explicitly handled by current CNNs, including those used for neural prediction. In primary visual cortex, neural responses are modulated by stimuli spatially surrounding the classical receptive field in rich ways. These effects have been modeled with divisive normalization approaches, including flexible models, where spatial normalization is recruited only to the degree that responses from center and surround locations are deemed statistically dependent. We propose a flexible normalization model applied to midlevel representations of deep CNNs as a tractable way to study contextual normalization mechanisms in midlevel cortical areas. This approach captures nontrivial spatial dependencies among midlevel features in CNNs, such as those present in textures and other visual stimuli, that arise from tiling high-order features geometrically. We expect that the proposed approach can make predictions about when spatial normalization might be recruited in midlevel cortical areas. We also expect this approach to be useful as part of the CNN tool kit, therefore going beyond more restrictive fixed forms of normalization.
Collapse
Affiliation(s)
| | - Odelia Schwartz
- Computer Science Department, University of Miami, Coral Gables, FL 33146, U.S.A.
| |
Collapse
|
14
|
DiMattina C, Baker CL. Modeling second-order boundary perception: A machine learning approach. PLoS Comput Biol 2019; 15:e1006829. [PMID: 30883556 PMCID: PMC6438569 DOI: 10.1371/journal.pcbi.1006829] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Revised: 03/28/2019] [Accepted: 01/15/2019] [Indexed: 11/18/2022] Open
Abstract
Visual pattern detection and discrimination are essential first steps for scene analysis. Numerous human psychophysical studies have modeled visual pattern detection and discrimination by estimating linear templates for classifying noisy stimuli defined by spatial variations in pixel intensities. However, such methods are poorly suited to understanding sensory processing mechanisms for complex visual stimuli such as second-order boundaries defined by spatial differences in contrast or texture. We introduce a novel machine learning framework for modeling human perception of second-order visual stimuli, using image-computable hierarchical neural network models fit directly to psychophysical trial data. This framework is applied to modeling visual processing of boundaries defined by differences in the contrast of a carrier texture pattern, in two different psychophysical tasks: (1) boundary orientation identification, and (2) fine orientation discrimination. Cross-validation analysis is employed to optimize model hyper-parameters, and demonstrate that these models are able to accurately predict human performance on novel stimulus sets not used for fitting model parameters. We find that, like the ideal observer, human observers take a region-based approach to the orientation identification task, while taking an edge-based approach to the fine orientation discrimination task. How observers integrate contrast modulation across orientation channels is investigated by fitting psychophysical data with two models representing competing hypotheses, revealing a preference for a model which combines multiple orientations at the earliest possible stage. Our results suggest that this machine learning approach has much potential to advance the study of second-order visual processing, and we outline future steps towards generalizing the method to modeling visual segmentation of natural texture boundaries. This study demonstrates how machine learning methodology can be fruitfully applied to psychophysical studies of second-order visual processing. Many naturally occurring visual boundaries are defined by spatial differences in features other than luminance, for example by differences in texture or contrast. Quantitative models of such “second-order” boundary perception cannot be estimated using the standard regression techniques (known as “classification images”) commonly applied to “first-order”, luminance-defined stimuli. Here we present a novel machine learning approach to modeling second-order boundary perception using hierarchical neural networks. In contrast to previous quantitative studies of second-order boundary perception, we directly estimate network model parameters using psychophysical trial data. We demonstrate that our method can reveal different spatial summation strategies that human observers utilize for different kinds of second-order boundary perception tasks, and can be used to compare competing hypotheses of how contrast modulation is integrated across orientation channels. We outline extensions of the methodology to other kinds of second-order boundaries, including those in natural images.
Collapse
Affiliation(s)
- Christopher DiMattina
- Computational Perception Laboratory, Department of Psychology, Florida Gulf Coast University, Fort Myers, Florida, United States of America
- * E-mail:
| | - Curtis L. Baker
- McGill Vision Research Unit, Department of Ophthalmology, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
15
|
Sanchez-Giraldo LG, Laskar MNU, Schwartz O. Normalization and pooling in hierarchical models of natural images. Curr Opin Neurobiol 2019; 55:65-72. [PMID: 30785005 DOI: 10.1016/j.conb.2019.01.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 12/29/2018] [Accepted: 01/13/2019] [Indexed: 11/17/2022]
Abstract
Divisive normalization and subunit pooling are two canonical classes of computation that have become widely used in descriptive (what) models of visual cortical processing. Normative (why) models from natural image statistics can help constrain the form and parameters of such classes of models. We focus on recent advances in two particular directions, namely deriving richer forms of divisive normalization, and advances in learning pooling from image statistics. We discuss the incorporation of such components into hierarchical models. We consider both hierarchical unsupervised learning from image statistics, and discriminative supervised learning in deep convolutional neural networks (CNNs). We further discuss studies on the utility and extensions of the convolutional architecture, which has also been adopted by recent descriptive models. We review the recent literature and discuss the current promises and gaps of using such approaches to gain a better understanding of how cortical neurons represent and process complex visual stimuli.
Collapse
Affiliation(s)
- Luis G Sanchez-Giraldo
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States.
| | - Md Nasir Uddin Laskar
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States
| | - Odelia Schwartz
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States
| |
Collapse
|
16
|
Turner MH, Sanchez Giraldo LG, Schwartz O, Rieke F. Stimulus- and goal-oriented frameworks for understanding natural vision. Nat Neurosci 2019; 22:15-24. [PMID: 30531846 PMCID: PMC8378293 DOI: 10.1038/s41593-018-0284-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 10/22/2018] [Indexed: 12/21/2022]
Abstract
Our knowledge of sensory processing has advanced dramatically in the last few decades, but this understanding remains far from complete, especially for stimuli with the large dynamic range and strong temporal and spatial correlations characteristic of natural visual inputs. Here we describe some of the issues that make understanding the encoding of natural images a challenge. We highlight two broad strategies for approaching this problem: a stimulus-oriented framework and a goal-oriented one. Different contexts can call for one framework or the other. Looking forward, recent advances, particularly those based in machine learning, show promise in borrowing key strengths of both frameworks and by doing so illuminating a path to a more comprehensive understanding of the encoding of natural stimuli.
Collapse
Affiliation(s)
- Maxwell H Turner
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA
- Graduate Program in Neuroscience, University of Washington, Seattle, WA, USA
| | | | - Odelia Schwartz
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Fred Rieke
- Department of Physiology and Biophysics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
17
|
Zhang Y, Lee TS, Li M, Liu F, Tang S. Convolutional neural network models of V1 responses to complex patterns. J Comput Neurosci 2018; 46:33-54. [PMID: 29869761 DOI: 10.1007/s10827-018-0687-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Revised: 04/26/2018] [Accepted: 04/30/2018] [Indexed: 11/30/2022]
Abstract
In this study, we evaluated the convolutional neural network (CNN) method for modeling V1 neurons of awake macaque monkeys in response to a large set of complex pattern stimuli. CNN models outperformed all the other baseline models, such as Gabor-based standard models for V1 cells and various variants of generalized linear models. We then systematically dissected different components of the CNN and found two key factors that made CNNs outperform other models: thresholding nonlinearity and convolution. In addition, we fitted our data using a pre-trained deep CNN via transfer learning. The deep CNN's higher layers, which encode more complex patterns, outperformed lower ones, and this result was consistent with our earlier work on the complexity of V1 neural code. Our study systematically evaluates the relative merits of different CNN components in the context of V1 neuron modeling.
Collapse
Affiliation(s)
- Yimeng Zhang
- Center for the Neural Basis of Cognition and Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Tai Sing Lee
- Center for the Neural Basis of Cognition and Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Ming Li
- Peking University School of Life Sciences and Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China.,IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
| | - Fang Liu
- Peking University School of Life Sciences and Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China.,IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China
| | - Shiming Tang
- Peking University School of Life Sciences and Peking-Tsinghua Center for Life Sciences, Beijing, 100871, China. .,IDG/McGovern Institute for Brain Research at Peking University, Beijing, 100871, China.
| |
Collapse
|
18
|
Development of Cross-Orientation Suppression and Size Tuning and the Role of Experience. J Neurosci 2018; 38:2656-2670. [PMID: 29431651 DOI: 10.1523/jneurosci.2886-17.2018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 12/19/2017] [Accepted: 01/22/2018] [Indexed: 11/21/2022] Open
Abstract
Many sensory neural circuits exhibit response normalization, which occurs when the response of a neuron to a combination of multiple stimuli is less than the sum of the responses to the individual stimuli presented alone. In the visual cortex, normalization takes the forms of cross-orientation suppression and surround suppression. At the onset of visual experience, visual circuits are partially developed and exhibit some mature features such as orientation selectivity, but it is unknown whether cross-orientation suppression is present at the onset of visual experience or requires visual experience for its emergence. We characterized the development of normalization and its dependence on visual experience in female ferrets. Visual experience was varied across the following three conditions: typical rearing, dark rearing, and dark rearing with daily exposure to simple sinusoidal gratings (14-16 h total). Cross-orientation suppression and surround suppression were noted in the earliest observations, and did not vary considerably with experience. We also observed evidence of continued maturation of receptive field properties in the second month of visual experience: substantial length summation was observed only in the oldest animals (postnatal day 90); evoked firing rates were greatly increased in older animals; and direction selectivity required experience, but declined slightly in older animals. These results constrain the space of possible circuit implementations of these features.SIGNIFICANCE STATEMENT The development of the brain depends on both nature-factors that are independent of the experience of an individual animal-and nurture-factors that depend on experience. While orientation selectivity, one of the major response properties of neurons in visual cortex, is already present at the onset of visual experience, it is unknown whether response properties that depend on interactions among multiple stimuli develop without experience. We find that the properties of cross-orientation suppression and surround suppression are present at eye opening, and do not depend on visual experience. Our results are consistent with the idea that a majority of the basic properties of sensory neurons in primary visual cortex are derived independent of the experience of an individual animal.
Collapse
|
19
|
On texture, form, and fixational eye movements. Curr Opin Neurobiol 2017; 46:228-233. [PMID: 28961499 DOI: 10.1016/j.conb.2017.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 09/05/2017] [Accepted: 09/05/2017] [Indexed: 11/21/2022]
Abstract
Recent studies show that small movements of the eye that occur during fixation are controlled in the brain by similar neural mechanisms as large eye movements. Information theory has been successful in explaining many properties of large eye movements. Could it also help us understand the smaller eye movements that are much more difficult to study experimentally? Here I describe new predictions for how small amplitude fixational eye movements should be modulated by visual context in order to improve visual perception. In particular, the amplitude of fixational eye movements is predicted to differ when localizing edges defined by changes in texture or luminance.
Collapse
|