1
|
Pavuluri A, Kohn A. The representational geometry for naturalistic textures in macaque V1 and V2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.18.619102. [PMID: 39484570 PMCID: PMC11526966 DOI: 10.1101/2024.10.18.619102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Our understanding of visual cortical processing has relied primarily on studying the selectivity of individual neurons in different areas. A complementary approach is to study how the representational geometry of neuronal populations differs across areas. Though the geometry is derived from individual neuronal selectivity, it can reveal encoding strategies difficult to infer from single neuron responses. In addition, recent theoretical work has begun to relate distinct functional objectives to different representational geometries. To understand how the representational geometry changes across stages of processing, we measured neuronal population responses in primary visual cortex (V1) and area V2 of macaque monkeys to an ensemble of synthetic, naturalistic textures. Responses were lower dimensional in V2 than V1, and there was a better alignment of V2 population responses to different textures. The representational geometry in V2 afforded better discriminability between out-of-sample textures. We performed complementary analyses of standard convolutional network models, which did not replicate the representational geometry of cortex. We conclude that there is a shift in the representational geometry between V1 and V2, with the V2 representation exhibiting features of a low-dimensional, systematic encoding of different textures and of different instantiations of each texture. Our results suggest that comparisons of representational geometry can reveal important transformations that occur across successive stages of visual processing.
Collapse
|
2
|
Ziemba CM, Goris RLT, Stine GM, Perez RK, Simoncelli EP, Movshon JA. Neuronal and Behavioral Responses to Naturalistic Texture Images in Macaque Monkeys. J Neurosci 2024; 44:e0349242024. [PMID: 39197942 PMCID: PMC11484546 DOI: 10.1523/jneurosci.0349-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 06/19/2024] [Accepted: 08/10/2024] [Indexed: 09/01/2024] Open
Abstract
The visual world is richly adorned with texture, which can serve to delineate important elements of natural scenes. In anesthetized macaque monkeys, selectivity for the statistical features of natural texture is weak in V1, but substantial in V2, suggesting that neuronal activity in V2 might directly support texture perception. To test this, we investigated the relation between single cell activity in macaque V1 and V2 and simultaneously measured behavioral judgments of texture. We generated stimuli along a continuum between naturalistic texture and phase-randomized noise and trained two macaque monkeys to judge whether a sample texture more closely resembled one or the other extreme. Analysis of responses revealed that individual V1 and V2 neurons carried much less information about texture naturalness than behavioral reports. However, the sensitivity of V2 neurons, especially those preferring naturalistic textures, was significantly closer to that of behavior compared with V1. The firing of both V1 and V2 neurons predicted perceptual choices in response to repeated presentations of the same ambiguous stimulus in one monkey, despite low individual neural sensitivity. However, neither population predicted choice in the second monkey. We conclude that neural responses supporting texture perception likely continue to develop downstream of V2. Further, combined with neural data recorded while the same two monkeys performed an orientation discrimination task, our results demonstrate that choice-correlated neural activity in early sensory cortex is unstable across observers and tasks, untethered from neuronal sensitivity, and therefore unlikely to directly reflect the formation of perceptual decisions.
Collapse
Affiliation(s)
- Corey M Ziemba
- Center for Neural Science, New York University, New York, NY
| | - Robbe L T Goris
- Center for Neural Science, New York University, New York, NY
| | - Gabriel M Stine
- Center for Neural Science, New York University, New York, NY
| | - Richard K Perez
- Center for Neural Science, New York University, New York, NY
| | - Eero P Simoncelli
- Center for Neural Science, New York University, New York, NY
- Center for Computational Neuroscience, Flatiron Institute, New York, NY
| | | |
Collapse
|
3
|
Lee GM, Rodríguez Deliz CL, Bushnell BN, Majaj NJ, Movshon JA, Kiorpes L. Developmentally stable representations of naturalistic image structure in macaque visual cortex. Cell Rep 2024; 43:114534. [PMID: 39067025 PMCID: PMC11491121 DOI: 10.1016/j.celrep.2024.114534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/17/2024] [Accepted: 07/09/2024] [Indexed: 07/30/2024] Open
Abstract
To determine whether post-natal improvements in form vision result from changes in mid-level visual cortex, we studied neuronal and behavioral responses to texture stimuli that were matched in local spectral content but varied in "naturalistic" structure. We made longitudinal measurements of visual behavior from 16 to 95 weeks of age, and of neural responses from 20 to 56 weeks. We also measured behavioral and neural responses in near-adult animals more than 3 years old. Behavioral sensitivity reached half-maximum around 25 weeks of age, but neural sensitivities remained stable through all ages tested. Neural sensitivity to naturalistic structure was highest in V4, lower in V2 and inferotemporal cortex (IT), and barely discernible in V1. Our results show a dissociation between stable neural performance and improving behavioral performance, which may reflect improved processing capacity in circuits downstream of visual cortex.
Collapse
Affiliation(s)
- Gerick M Lee
- Center for Neural Science, New York University, New York, NY 10003, USA
| | | | | | - Najib J Majaj
- Center for Neural Science, New York University, New York, NY 10003, USA
| | - J Anthony Movshon
- Center for Neural Science, New York University, New York, NY 10003, USA.
| | - Lynne Kiorpes
- Center for Neural Science, New York University, New York, NY 10003, USA.
| |
Collapse
|
4
|
Parthasarathy N, Hénaff OJ, Simoncelli EP. Layerwise complexity-matched learning yields an improved model of cortical area V2. ARXIV 2024:arXiv:2312.11436v3. [PMID: 39070038 PMCID: PMC11275700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained end-to-end for object recognition approach human capabilities, and offer the best descriptions to date of neural responses in the late stages of the hierarchy. But these networks provide a poor account of the early stages, compared to traditional hand-engineered models, or models optimized for coding efficiency or prediction. Moreover, the gradient backpropagation used in end-to-end learning is generally considered to be biologically implausible. Here, we overcome both of these limitations by developing a bottom-up self-supervised training methodology that operates independently on successive layers. Specifically, we maximize feature similarity between pairs of locally-deformed natural image patches, while decorrelating features across patches sampled from other images. Crucially, the deformation amplitudes are adjusted proportionally to receptive field sizes in each layer, thus matching the task complexity to the capacity at each stage of processing. In comparison with architecture-matched versions of previous models, we demonstrate that our layerwise complexity-matched learning (LCL) formulation produces a two-stage model (LCL-V2) that is better aligned with selectivity properties and neural activity in primate area V2. We demonstrate that the complexity-matched learning paradigm is responsible for much of the emergence of the improved biological alignment. Finally, when the two-stage model is used as a fixed front-end for a deep network trained to perform object recognition, the resultant model (LCL-V2Net) is significantly better than standard end-to-end self-supervised, supervised, and adversarially-trained models in terms of generalization to out-of-distribution tasks and alignment with human behavior. Our code and pre-trained checkpoints are available at https://github.com/nikparth/LCL-V2.git.
Collapse
Affiliation(s)
- Nikhil Parthasarathy
- Center for Neural Science, New York University
- Center for Computational Neuroscience, Flatiron Institute
| | | | - Eero P Simoncelli
- Center for Neural Science, New York University
- Center for Computational Neuroscience, Flatiron Institute
| |
Collapse
|
5
|
Shook EN, Barlow GT, Garcia-Rosales D, Gibbons CJ, Montague TG. Dynamic skin behaviors in cephalopods. Curr Opin Neurobiol 2024; 86:102876. [PMID: 38652980 DOI: 10.1016/j.conb.2024.102876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 03/11/2024] [Accepted: 03/23/2024] [Indexed: 04/25/2024]
Abstract
The coleoid cephalopods (cuttlefish, octopus, and squid) are a group of soft-bodied mollusks that exhibit a wealth of complex behaviors, including dynamic camouflage, object mimicry, skin-based visual communication, and dynamic body patterns during sleep. Many of these behaviors are visually driven and engage the animals' color changing skin, a pixelated display that is directly controlled by neurons projecting from the brain. Thus, cephalopod skin provides a direct readout of neural activity in the brain. During camouflage, cephalopods recreate on their skin an approximation of what they see, providing a window into perceptual processes in the brain. Additionally, cephalopods communicate their internal state during social encounters using innate skin patterns, and create waves of pigmentation on their skin during periods of arousal. Thus, by leveraging the visual displays of cephalopods, we can gain insight into how the external world is represented in the brain and how this representation is transformed into a recapitulation of the world on the skin. Here, we describe the rich skin behaviors of the coleoid cephalopods, what is known about cephalopod neuroanatomy, and how advancements in gene editing, machine learning, optical imaging, and electrophysiological tools may provide an opportunity to explore the neural bases of these fascinating behaviors.
Collapse
Affiliation(s)
- Erica N Shook
- The Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY 10027, USA
| | - George Thomas Barlow
- The Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY 10027, USA
| | - Daniella Garcia-Rosales
- The Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY 10027, USA
| | - Connor J Gibbons
- The Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY 10027, USA
| | - Tessa G Montague
- The Mortimer B. Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY 10027, USA; Howard Hughes Medical Institute, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
6
|
Hassanpour MS, Merlin S, Federer F, Zaidi Q, Angelucci A. Primate V2 Receptive Fields Derived from Anatomically Identified Large-Scale V1 Inputs. RESEARCH SQUARE 2024:rs.3.rs-4139501. [PMID: 38798339 PMCID: PMC11118708 DOI: 10.21203/rs.3.rs-4139501/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
In the primate visual system, visual object recognition involves a series of cortical areas arranged hierarchically along the ventral visual pathway. As information flows through this hierarchy, neurons become progressively tuned to more complex image features. The circuit mechanisms and computations underlying the increasing complexity of these receptive fields (RFs) remain unidentified. To understand how this complexity emerges in the secondary visual area (V2), we investigated the functional organization of inputs from the primary visual cortex (V1) to V2 by combining retrograde anatomical tracing of these inputs with functional imaging of feature maps in macaque monkey V1 and V2. We found that V1 neurons sending inputs to single V2 orientation columns have a broad range of preferred orientations, but are strongly biased towards the orientation represented at the injected V2 site. For each V2 site, we then constructed a feedforward model based on the linear combination of its anatomically-identified large-scale V1 inputs, and studied the response proprieties of the generated V2 RFs. We found that V2 RFs derived from the linear feedforward model were either elongated versions of V1 filters or had spatially complex structures. These modeled RFs predicted V2 neuron responses to oriented grating stimuli with high accuracy. Remarkably, this simple model also explained the greater selectivity to naturalistic textures of V2 cells compared to their V1 input cells. Our results demonstrate that simple linear combinations of feedforward inputs can account for the orientation selectivity and texture sensitivity of V2 RFs.
Collapse
Affiliation(s)
- Mahlega S Hassanpour
- Dept. of Ophthalmology and Visual Science, Moran Eye Institute, University of Utah
| | - Sam Merlin
- Dept. of Ophthalmology and Visual Science, Moran Eye Institute, University of Utah
- Present address: Dept of Medical Science, School of Science, Western Sydney University
| | - Frederick Federer
- Dept. of Ophthalmology and Visual Science, Moran Eye Institute, University of Utah
| | - Qasim Zaidi
- Graduate Center for Vision Research, State University of New York, College of Optometry
| | - Alessandra Angelucci
- Dept. of Ophthalmology and Visual Science, Moran Eye Institute, University of Utah
| |
Collapse
|
7
|
Keshvari S, Wijntjes MWA. Peripheral material perception. J Vis 2024; 24:13. [PMID: 38625088 PMCID: PMC11033595 DOI: 10.1167/jov.24.4.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 02/19/2024] [Indexed: 04/17/2024] Open
Abstract
Humans can rapidly identify materials, such as wood or leather, even within a complex visual scene. Given a single image, one can easily identify the underlying "stuff," even though a given material can have highly variable appearance; fabric comes in unlimited variations of shape, pattern, color, and smoothness, yet we have little trouble categorizing it as fabric. What visual cues do we use to determine material identity? Prior research suggests that simple "texture" features of an image, such as the power spectrum, capture information about material properties and identity. Few studies, however, have tested richer and biologically motivated models of texture. We compared baseline material classification performance to performance with synthetic textures generated from the Portilla-Simoncelli model and several common image degradations. The textures retain statistical information but are otherwise random. We found that performance with textures and most degradations was well below baseline, suggesting insufficient information to support foveal material perception. Interestingly, modern research suggests that peripheral vision might use a statistical, texture-like representation. In a second set of experiments, we found that peripheral performance is more closely predicted by texture and other image degradations. These findings delineate the nature of peripheral material classification.
Collapse
Affiliation(s)
| | - Maarten W A Wijntjes
- Perceptual Intelligence Lab, Industrial Design Engineering, Delft University of Technology, Delft, Netherlands
| |
Collapse
|
8
|
Lee GM, Rodríguez-Deliz CL, Bushnell BN, Majaj NJ, Movshon JA, Kiorpes L. Developmentally stable representations of naturalistic image structure in macaque visual cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.24.581889. [PMID: 38463955 PMCID: PMC10925106 DOI: 10.1101/2024.02.24.581889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
We studied visual development in macaque monkeys using texture stimuli, matched in local spectral content but varying in "naturalistic" structure. In adult monkeys, naturalistic textures preferentially drive neurons in areas V2 and V4, but not V1. We paired behavioral measurements of naturalness sensitivity with separately-obtained neuronal population recordings from neurons in areas V1, V2, V4, and inferotemporal cortex (IT). We made behavioral measurements from 16 weeks of age and physiological measurements as early as 20 weeks, and continued through 56 weeks. Behavioral sensitivity reached half of maximum at roughly 25 weeks of age. Neural sensitivities remained stable from the earliest ages tested. As in adults, neural sensitivity to naturalistic structure increased from V1 to V2 to V4. While sensitivities in V2 and IT were similar, the dimensionality of the IT representation was more similar to V4's than to V2's.
Collapse
Affiliation(s)
- Gerick M. Lee
- Center for Neural Science New York University New York, NY, USA 10003
| | | | | | - Najib J. Majaj
- Center for Neural Science New York University New York, NY, USA 10003
| | | | - Lynne Kiorpes
- Center for Neural Science New York University New York, NY, USA 10003
| |
Collapse
|
9
|
Hassanpour MS, Merlin S, Federer F, Zaidi Q, Angelucci A. Primate V2 Receptive Fields Derived from Anatomically Identified Large-Scale V1 Inputs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.22.586002. [PMID: 38585792 PMCID: PMC10996519 DOI: 10.1101/2024.03.22.586002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
In the primate visual system, visual object recognition involves a series of cortical areas arranged hierarchically along the ventral visual pathway. As information flows through this hierarchy, neurons become progressively tuned to more complex image features. The circuit mechanisms and computations underlying the increasing complexity of these receptive fields (RFs) remain unidentified. To understand how this complexity emerges in the secondary visual area (V2), we investigated the functional organization of inputs from the primary visual cortex (V1) to V2 by combining retrograde anatomical tracing of these inputs with functional imaging of feature maps in macaque monkey V1 and V2. We found that V1 neurons sending inputs to single V2 orientation columns have a broad range of preferred orientations, but are strongly biased towards the orientation represented at the injected V2 site. For each V2 site, we then constructed a feedforward model based on the linear combination of its anatomically-identified large-scale V1 inputs, and studied the response proprieties of the generated V2 RFs. We found that V2 RFs derived from the linear feedforward model were either elongated versions of V1 filters or had spatially complex structures. These modeled RFs predicted V2 neuron responses to oriented grating stimuli with high accuracy. Remarkably, this simple model also explained the greater selectivity to naturalistic textures of V2 cells compared to their V1 input cells. Our results demonstrate that simple linear combinations of feedforward inputs can account for the orientation selectivity and texture sensitivity of V2 RFs.
Collapse
|
10
|
Bolaños F, Orlandi JG, Aoki R, Jagadeesh AV, Gardner JL, Benucci A. Efficient coding of natural images in the mouse visual cortex. Nat Commun 2024; 15:2466. [PMID: 38503746 PMCID: PMC10951403 DOI: 10.1038/s41467-024-45919-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 02/06/2024] [Indexed: 03/21/2024] Open
Abstract
How the activity of neurons gives rise to natural vision remains a matter of intense investigation. The mid-level visual areas along the ventral stream are selective to a common class of natural images-textures-but a circuit-level understanding of this selectivity and its link to perception remains unclear. We addressed these questions in mice, first showing that they can perceptually discriminate between textures and statistically simpler spectrally matched stimuli, and between texture types. Then, at the neural level, we found that the secondary visual area (LM) exhibited a higher degree of selectivity for textures compared to the primary visual area (V1). Furthermore, textures were represented in distinct neural activity subspaces whose relative distances were found to correlate with the statistical similarity of the images and the mice's ability to discriminate between them. Notably, these dependencies were more pronounced in LM, where the texture-related subspaces were smaller than in V1, resulting in superior stimulus decoding capabilities. Together, our results demonstrate texture vision in mice, finding a linking framework between stimulus statistics, neural representations, and perceptual sensitivity-a distinct hallmark of efficient coding computations.
Collapse
Affiliation(s)
- Federico Bolaños
- University of British Columbia, Neuroimaging and NeuroComputation Centre, Vancouver, BC, V6T, Canada
| | - Javier G Orlandi
- University of Calgary, Department of Physics and Astronomy, Calgary, AB, T2N 1N4, Canada.
| | - Ryo Aoki
- RIKEN Center for Brain Science, Laboratory for Neural Circuits and Behavior, Wakoshi, Japan
| | | | - Justin L Gardner
- Stanford University, Wu Tsai Neurosciences Institute, Stanford, CA, USA
| | - Andrea Benucci
- RIKEN Center for Brain Science, Laboratory for Neural Circuits and Behavior, Wakoshi, Japan.
- Queen Mary, University of London, School of Biological and Behavioral Science, London, E1 4NS, UK.
| |
Collapse
|
11
|
Peng F, Harper NS, Mishra AP, Auksztulewicz R, Schnupp JWH. Dissociable Roles of the Auditory Midbrain and Cortex in Processing the Statistical Features of Natural Sound Textures. J Neurosci 2024; 44:e1115232023. [PMID: 38267259 PMCID: PMC10919253 DOI: 10.1523/jneurosci.1115-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 11/23/2023] [Accepted: 12/11/2023] [Indexed: 01/26/2024] Open
Abstract
Sound texture perception takes advantage of a hierarchy of time-averaged statistical features of acoustic stimuli, but much remains unclear about how these statistical features are processed along the auditory pathway. Here, we compared the neural representation of sound textures in the inferior colliculus (IC) and auditory cortex (AC) of anesthetized female rats. We recorded responses to texture morph stimuli that gradually add statistical features of increasingly higher complexity. For each texture, several different exemplars were synthesized using different random seeds. An analysis of transient and ongoing multiunit responses showed that the IC units were sensitive to every type of statistical feature, albeit to a varying extent. In contrast, only a small proportion of AC units were overtly sensitive to any statistical features. Differences in texture types explained more of the variance of IC neural responses than did differences in exemplars, indicating a degree of "texture type tuning" in the IC, but the same was, perhaps surprisingly, not the case for AC responses. We also evaluated the accuracy of texture type classification from single-trial population activity and found that IC responses became more informative as more summary statistics were included in the texture morphs, while for AC population responses, classification performance remained consistently very low. These results argue against the idea that AC neurons encode sound type via an overt sensitivity in neural firing rate to fine-grain spectral and temporal statistical features.
Collapse
Affiliation(s)
- Fei Peng
- Department of Neuroscience, City University of Hong Kong, Hong Kong, China
| | - Nicol S Harper
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 2JD, United Kingdom
| | - Ambika P Mishra
- Department of Neuroscience, City University of Hong Kong, Hong Kong, China
| | - Ryszard Auksztulewicz
- Department of Neuroscience, City University of Hong Kong, Hong Kong, China
- Center for Cognitive Neuroscience Berlin, Free University Berlin, Berlin 14195, Germany
| | - Jan W H Schnupp
- Department of Neuroscience, City University of Hong Kong, Hong Kong, China
| |
Collapse
|
12
|
Ziemba CM, Goris RLT, Stine GM, Perez RK, Simoncelli EP, Movshon JA. Neuronal and behavioral responses to naturalistic texture images in macaque monkeys. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.22.581645. [PMID: 38464304 PMCID: PMC10925125 DOI: 10.1101/2024.02.22.581645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The visual world is richly adorned with texture, which can serve to delineate important elements of natural scenes. In anesthetized macaque monkeys, selectivity for the statistical features of natural texture is weak in V1, but substantial in V2, suggesting that neuronal activity in V2 might directly support texture perception. To test this, we investigated the relation between single cell activity in macaque V1 and V2 and simultaneously measured behavioral judgments of texture. We generated stimuli along a continuum between naturalistic texture and phase-randomized noise and trained two macaque monkeys to judge whether a sample texture more closely resembled one or the other extreme. Analysis of responses revealed that individual V1 and V2 neurons carried much less information about texture naturalness than behavioral reports. However, the sensitivity of V2 neurons, especially those preferring naturalistic textures, was significantly closer to that of behavior compared with V1. The firing of both V1 and V2 neurons predicted perceptual choices in response to repeated presentations of the same ambiguous stimulus in one monkey, despite low individual neural sensitivity. However, neither population predicted choice in the second monkey. We conclude that neural responses supporting texture perception likely continue to develop downstream of V2. Further, combined with neural data recorded while the same two monkeys performed an orientation discrimination task, our results demonstrate that choice-correlated neural activity in early sensory cortex is unstable across observers and tasks, untethered from neuronal sensitivity, and thus unlikely to reflect a critical aspect of the formation of perceptual decisions. Significance statement As visual signals propagate along the cortical hierarchy, they encode increasingly complex aspects of the sensory environment and likely have a more direct relationship with perceptual experience. We replicate and extend previous results from anesthetized monkeys differentiating the selectivity of neurons along the first step in cortical vision from area V1 to V2. However, our results further complicate efforts to establish neural signatures that reveal the relationship between perception and the neuronal activity of sensory populations. We find that choice-correlated activity in V1 and V2 is unstable across different observers and tasks, and also untethered from neuronal sensitivity and other features of nonsensory response modulation.
Collapse
|
13
|
Matteucci G, Piasini E, Zoccolan D. Unsupervised learning of mid-level visual representations. Curr Opin Neurobiol 2024; 84:102834. [PMID: 38154417 DOI: 10.1016/j.conb.2023.102834] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 12/30/2023]
Abstract
Recently, a confluence between trends in neuroscience and machine learning has brought a renewed focus on unsupervised learning, where sensory processing systems learn to exploit the statistical structure of their inputs in the absence of explicit training targets or rewards. Sophisticated experimental approaches have enabled the investigation of the influence of sensory experience on neural self-organization and its synaptic bases. Meanwhile, novel algorithms for unsupervised and self-supervised learning have become increasingly popular both as inspiration for theories of the brain, particularly for the function of intermediate visual cortical areas, and as building blocks of real-world learning machines. Here we review some of these recent developments, placing them in historical context and highlighting some research lines that promise exciting breakthroughs in the near future.
Collapse
Affiliation(s)
- Giulio Matteucci
- Department of Basic Neurosciences, University of Geneva, Geneva, 1206, Switzerland. https://twitter.com/giulio_matt
| | - Eugenio Piasini
- International School for Advanced Studies (SISSA), Trieste, 34136, Italy
| | - Davide Zoccolan
- International School for Advanced Studies (SISSA), Trieste, 34136, Italy.
| |
Collapse
|
14
|
Pan X, DeForge A, Schwartz O. Generalizing biological surround suppression based on center surround similarity via deep neural network models. PLoS Comput Biol 2023; 19:e1011486. [PMID: 37738258 PMCID: PMC10550176 DOI: 10.1371/journal.pcbi.1011486] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 10/04/2023] [Accepted: 09/04/2023] [Indexed: 09/24/2023] Open
Abstract
Sensory perception is dramatically influenced by the context. Models of contextual neural surround effects in vision have mostly accounted for Primary Visual Cortex (V1) data, via nonlinear computations such as divisive normalization. However, surround effects are not well understood within a hierarchy, for neurons with more complex stimulus selectivity beyond V1. We utilized feedforward deep convolutional neural networks and developed a gradient-based technique to visualize the most suppressive and excitatory surround. We found that deep neural networks exhibited a key signature of surround effects in V1, highlighting center stimuli that visually stand out from the surround and suppressing responses when the surround stimulus is similar to the center. We found that in some neurons, especially in late layers, when the center stimulus was altered, the most suppressive surround surprisingly can follow the change. Through the visualization approach, we generalized previous understanding of surround effects to more complex stimuli, in ways that have not been revealed in visual cortices. In contrast, the suppression based on center surround similarity was not observed in an untrained network. We identified further successes and mismatches of the feedforward CNNs to the biology. Our results provide a testable hypothesis of surround effects in higher visual cortices, and the visualization approach could be adopted in future biological experimental designs.
Collapse
Affiliation(s)
- Xu Pan
- Department of Computer Science, University of Miami, Coral Gables, FL, United States of America
| | - Annie DeForge
- School of Information, University of California, Berkeley, CA, United States of America
- Bentley University, Waltham, MA, United States of America
| | - Odelia Schwartz
- Department of Computer Science, University of Miami, Coral Gables, FL, United States of America
| |
Collapse
|
15
|
Bredenberg C, Savin C. Desiderata for normative models of synaptic plasticity. ARXIV 2023:arXiv:2308.04988v1. [PMID: 37608931 PMCID: PMC10441445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Normative models of synaptic plasticity use a combination of mathematics and computational simulations to arrive at predictions of behavioral and network-level adaptive phenomena. In recent years, there has been an explosion of theoretical work on these models, but experimental confirmation is relatively limited. In this review, we organize work on normative plasticity models in terms of a set of desiderata which, when satisfied, are designed to guarantee that a model has a clear link between plasticity and adaptive behavior, consistency with known biological evidence about neural plasticity, and specific testable predictions. We then discuss how new models have begun to improve on these criteria and suggest avenues for further development. As prototypes, we provide detailed analyses of two specific models - REINFORCE and the Wake-Sleep algorithm. We provide a conceptual guide to help develop neural learning theories that are precise, powerful, and experimentally testable.
Collapse
Affiliation(s)
- Colin Bredenberg
- Center for Neural Science, New York University, New York, NY 10003, USA
- Mila-Quebec AI Institute, 6666 Rue Saint-Urbain, Montréal, QC H2S 3H1
| | - Cristina Savin
- Center for Neural Science, New York University, New York, NY 10003, USA
- Center for Data Science, New York University, New York, NY 10011, USA
| |
Collapse
|
16
|
Maruyama H, Okada K, Motoyoshi I. A two-stage spectral model for sound texture perception: Synthesis and psychophysics. Iperception 2023; 14:20416695231157349. [PMID: 36845027 PMCID: PMC9950610 DOI: 10.1177/20416695231157349] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 01/30/2023] [Indexed: 02/25/2023] Open
Abstract
The natural environment is filled with a variety of auditory events such as wind blowing, water flowing, and fire crackling. It has been suggested that the perception of such textural sounds is based on the statistics of the natural auditory events. Inspired by a recent spectral model for visual texture perception, we propose a model that can describe the perceived sound texture only with the linear spectrum and the energy spectrum. We tested the validity of the model by using synthetic noise sounds that preserve the two-stage amplitude spectra of the original sound. Psychophysical experiment showed that our synthetic noises were perceived as like the original sounds for 120 real-world auditory events. The performance was comparable with the synthetic sounds produced by McDermott-Simoncelli's model which considers various classes of auditory statistics. The results support the notion that the perception of natural sound textures is predictable by the two-stage spectral signals.
Collapse
Affiliation(s)
| | | | - Isamu Motoyoshi
- Isamu Motoyoshi, Department of Life
Sciences, The University of Tokyo, Japan.
| |
Collapse
|
17
|
Williams N, Olson CR. Independent repetition suppression in macaque area V2 and inferotemporal cortex. J Neurophysiol 2022; 128:1421-1434. [PMID: 36350050 PMCID: PMC9678433 DOI: 10.1152/jn.00043.2022] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 10/11/2022] [Accepted: 10/23/2022] [Indexed: 11/11/2022] Open
Abstract
When a complexly structured natural image is presented twice in succession, first as adapter and then as test, neurons in area TE of macaque inferotemporal cortex exhibit repetition suppression, responding less strongly to the second presentation than to the first. This phenomenon, which has been studied primarily in TE, might plausibly be argued to arise in TE because TE neurons respond selectively to complex images and thus carry information adequate for determining whether an image is or is not a repeat. However, the idea has never been put to a direct test. To resolve this issue, we monitored neuronal responses to sequences of complex natural images under identical conditions in areas V2 and TE. We found that repetition suppression occurs in both areas. Moreover, in each area, suppression takes the form of a dynamic alteration whereby the initial peak of excitation is followed by a trough and then a rebound of firing rate. To assess whether repetition suppression in either area is transmitted from the other area, we analyzed the timing of the phenomenon and its degree of spatial generalization. Suppression occurs at shorter latency in V2 than in TE. Therefore it is not simply fed back from TE. Suppression occurs in TE but not in V2 under conditions in which the test and adapter are presented in different visual field quadrants. Therefore it is not simply fed forward from V2. We conclude that repetition suppression occurs independently in V2 and TE.NEW & NOTEWORTHY When a complexly structured natural image is presented twice in rapid succession, neurons in inferotemporal area TE exhibit repetition suppression, responding less strongly to the second than to the first presentation. We have explored whether this phenomenon is confined to high-order areas where neurons respond selectively to such images and thus carry information relevant to recognizing a repeat. We have found surprisingly that repetition suppression occurs even in low-order visual area V2.
Collapse
Affiliation(s)
- Nathaniel Williams
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania
- Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Carl R Olson
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania
- Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania
| |
Collapse
|
18
|
Ensemble perception without phenomenal awareness of elements. Sci Rep 2022; 12:11922. [PMID: 35831387 PMCID: PMC9279487 DOI: 10.1038/s41598-022-15850-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 06/30/2022] [Indexed: 11/09/2022] Open
Abstract
Humans efficiently recognize complex scenes by grouping multiple features and objects into ensembles. It has been suggested that ensemble processing does not require, or even impairs, conscious discrimination of individual element properties. The present study examined whether ensemble perception requires phenomenal awareness of elements. We asked observers to judge the mean orientation of a line-based texture pattern whose central region was made invisible by backward masks. Masks were composed of either a Mondrian pattern (Exp. 1) or of an annular contour (Exp. 2) which, unlike the Mondrian, did not overlap spatially with elements in the central region. In the Mondrian-mask experiment, perceived mean orientation was determined only by visible elements outside the central region. However, in the annular-mask experiment, perceived mean orientation matched the mean orientation of all elements, including invisible elements within the central region. Results suggest that the visual system can compute spatial ensembles even without phenomenal awareness of stimuli.
Collapse
|
19
|
Yu Y, Stirman JN, Dorsett CR, Smith SL. Selective representations of texture and motion in mouse higher visual areas. Curr Biol 2022; 32:2810-2820.e5. [PMID: 35609609 DOI: 10.1016/j.cub.2022.04.091] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 03/22/2022] [Accepted: 04/28/2022] [Indexed: 10/18/2022]
Abstract
The mouse visual cortex contains interconnected higher visual areas, but their functional specializations are unclear. Here, we used a data-driven approach to examine the representations of complex visual stimuli by L2/3 neurons across mouse higher visual areas, measured using large-field-of-view two-photon calcium imaging. Using specialized stimuli, we found higher fidelity representations of texture in area LM, compared to area AL. Complementarily, we found higher fidelity representations of motion in area AL, compared to area LM. We also observed this segregation of information in response to naturalistic videos. Finally, we explored how receptive field models of visual cortical neurons could produce the segregated representations of texture and motion we observed. These selective representations could aid in behaviors such as visually guided navigation.
Collapse
Affiliation(s)
- Yiyi Yu
- Department of Electrical & Computer Engineering, Center for BioEngineering, Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA 93106, USA
| | - Jeffrey N Stirman
- Neuroscience Research Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christopher R Dorsett
- Neuroscience Research Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Spencer L Smith
- Department of Electrical & Computer Engineering, Center for BioEngineering, Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA 93106, USA.
| |
Collapse
|
20
|
Abstract
Humans are exquisitely sensitive to the spatial arrangement of visual features in objects and scenes, but not in visual textures. Category-selective regions in the visual cortex are widely believed to underlie object perception, suggesting such regions should distinguish natural images of objects from synthesized images containing similar visual features in scrambled arrangements. Contrarily, we demonstrate that representations in category-selective cortex do not discriminate natural images from feature-matched scrambles but can discriminate images of different categories, suggesting a texture-like encoding. We find similar insensitivity to feature arrangement in Imagenet-trained deep convolutional neural networks. This suggests the need to reconceptualize the role of category-selective cortex as representing a basis set of complex texture-like features, useful for a myriad of behaviors. The human visual ability to recognize objects and scenes is widely thought to rely on representations in category-selective regions of the visual cortex. These representations could support object vision by specifically representing objects, or, more simply, by representing complex visual features regardless of the particular spatial arrangement needed to constitute real-world objects, that is, by representing visual textures. To discriminate between these hypotheses, we leveraged an image synthesis approach that, unlike previous methods, provides independent control over the complexity and spatial arrangement of visual features. We found that human observers could easily detect a natural object among synthetic images with similar complex features that were spatially scrambled. However, observer models built from BOLD responses from category-selective regions, as well as a model of macaque inferotemporal cortex and Imagenet-trained deep convolutional neural networks, were all unable to identify the real object. This inability was not due to a lack of signal to noise, as all observer models could predict human performance in image categorization tasks. How then might these texture-like representations in category-selective regions support object perception? An image-specific readout from category-selective cortex yielded a representation that was more selective for natural feature arrangement, showing that the information necessary for natural object discrimination is available. Thus, our results suggest that the role of the human category-selective visual cortex is not to explicitly encode objects but rather to provide a basis set of texture-like features that can be infinitely reconfigured to flexibly learn and identify new object categories.
Collapse
|
21
|
|
22
|
Mo C, Zhang S, Lu J, Yu M, Yao Y. Attention impedes neural representation of interpolated orientation during perceptual completion. Psychophysiology 2022; 59:e14031. [PMID: 35239985 DOI: 10.1111/psyp.14031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 01/07/2022] [Accepted: 01/21/2022] [Indexed: 11/30/2022]
Abstract
One of the most remarkable functional feats accomplished by visual system is the interpolation of missing retinal inputs based on surrounding information, a process known as perceptual completion. Perceptual completion enables the active construction of coherent, vivid percepts from spatially discontinuous visual information that is prevalent in real-life visual scenes. Despite mounting evidence linking sensory activity enhancement and perceptual completion, surprisingly little is known about whether and how attention, a fundamental modulator of sensory activities, affects perceptual completion. Using EEG-based time-resolved inverted encoding model (IEM), we reconstructed the moment-to-moment representation of the illusory grating that resulted from spatially interpolating the orientation of surrounding inducers. We found that, despite manipulation of observers' attentional focus, the illusory grating representation unfolded in time in a similar manner. Critically, attention to the surrounding inducers simultaneously attenuated the illusory grating representation and delayed its temporal development. Our findings disclosed, for the first time, the suppressive role of selective attention in perceptual completion and were suggestive of a fast, automatic neural machinery that implements the interpolation of missing visual information.
Collapse
Affiliation(s)
- Ce Mo
- Department of Psychology, Sun-Yat-Sen University, Guangzhou, China
| | - Shijia Zhang
- Center for Studies of Psychological Application, School of Psychology, South China Normal University, Guangzhou, China
| | - Junshi Lu
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Mengxia Yu
- Bilingual Cognition and Development Lab, Center for Linguistics and Applied Linguistics, Guangdong University of Foreign Studies, Guangzhou, China
| | - Yujie Yao
- Center for Studies of Psychological Application, School of Psychology, South China Normal University, Guangzhou, China
| |
Collapse
|
23
|
Gu Z, Jamison KW, Khosla M, Allen EJ, Wu Y, St-Yves G, Naselaris T, Kay K, Sabuncu MR, Kuceyeski A. NeuroGen: Activation optimized image synthesis for discovery neuroscience. Neuroimage 2022; 247:118812. [PMID: 34936922 PMCID: PMC8845078 DOI: 10.1016/j.neuroimage.2021.118812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 10/11/2021] [Accepted: 12/12/2021] [Indexed: 11/24/2022] Open
Abstract
Functional MRI (fMRI) is a powerful technique that has allowed us to characterize visual cortex responses to stimuli, yet such experiments are by nature constructed based on a priori hypotheses, limited to the set of images presented to the individual while they are in the scanner, are subject to noise in the observed brain responses, and may vary widely across individuals. In this work, we propose a novel computational strategy, which we call NeuroGen, to overcome these limitations and develop a powerful tool for human vision neuroscience discovery. NeuroGen combines an fMRI-trained neural encoding model of human vision with a deep generative network to synthesize images predicted to achieve a target pattern of macro-scale brain activation. We demonstrate that the reduction of noise that the encoding model provides, coupled with the generative network's ability to produce images of high fidelity, results in a robust discovery architecture for visual neuroscience. By using only a small number of synthetic images created by NeuroGen, we demonstrate that we can detect and amplify differences in regional and individual human brain response patterns to visual stimuli. We then verify that these discoveries are reflected in the several thousand observed image responses measured with fMRI. We further demonstrate that NeuroGen can create synthetic images predicted to achieve regional response patterns not achievable by the best-matching natural images. The NeuroGen framework extends the utility of brain encoding models and opens up a new avenue for exploring, and possibly precisely controlling, the human visual system.
Collapse
Affiliation(s)
- Zijin Gu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, New York, USA
| | | | - Meenakshi Khosla
- School of Electrical and Computer Engineering, Cornell University, Ithaca, New York, USA
| | - Emily J Allen
- Center for Magnetic Resonance Research(CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA; Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, USA
| | - Yihan Wu
- Center for Magnetic Resonance Research(CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA
| | - Ghislain St-Yves
- Center for Magnetic Resonance Research(CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA; Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, USA
| | - Thomas Naselaris
- Center for Magnetic Resonance Research(CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA; Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, USA
| | - Kendrick Kay
- Center for Magnetic Resonance Research(CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA
| | - Mert R Sabuncu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, New York, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, New York, USA.
| |
Collapse
|
24
|
Bowren J, Sanchez-Giraldo L, Schwartz O. Inference via sparse coding in a hierarchical vision model. J Vis 2022; 22:19. [PMID: 35212744 PMCID: PMC8883180 DOI: 10.1167/jov.22.2.19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Sparse coding has been incorporated in models of the visual cortex for its computational advantages and connection to biology. But how the level of sparsity contributes to performance on visual tasks is not well understood. In this work, sparse coding has been integrated into an existing hierarchical V2 model (Hosoya & Hyvärinen, 2015), but replacing its independent component analysis (ICA) with an explicit sparse coding in which the degree of sparsity can be controlled. After training, the sparse coding basis functions with a higher degree of sparsity resembled qualitatively different structures, such as curves and corners. The contributions of the models were assessed with image classification tasks, specifically tasks associated with mid-level vision including figure–ground classification, texture classification, and angle prediction between two line stimuli. In addition, the models were assessed in comparison with a texture sensitivity measure that has been reported in V2 (Freeman et al., 2013) and a deleted-region inference task. The results from the experiments show that although sparse coding performed worse than ICA at classifying images, only sparse coding was able to better match the texture sensitivity level of V2 and infer deleted image regions, both by increasing the degree of sparsity in sparse coding. Greater degrees of sparsity allowed for inference over larger deleted image regions. The mechanism that allows for this inference capability in sparse coding is described in this article.
Collapse
Affiliation(s)
- Joshua Bowren
- Department of Computer Science, University of Miami, Coral Gables, FL, USA.,
| | - Luis Sanchez-Giraldo
- Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY, USA.,
| | - Odelia Schwartz
- Department of Computer Science, University of Miami, Coral Gables, FL, USA.,
| |
Collapse
|
25
|
Dong X, Gao Y, Dong J, Chantler MJ. The Importance of Phase to Texture Discrimination and Similarity. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3755-3768. [PMID: 32191889 DOI: 10.1109/tvcg.2020.2981063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we investigate the importance of phase for texture discrimination and similarity estimation tasks. We first use two psychophysical experiments to investigate the relative importance of phase and magnitude spectra for human texture discrimination and similarity estimation. The results show that phase is more important to humans for both tasks. We further examine the ability of 51 computational feature sets to perform these two tasks. In contrast with the psychophysical experiments, it is observed that the magnitude data is more important to these computational feature sets than the phase data. We hypothesise that this inconsistency is due to the difference between the abilities of humans and the computational feature sets to utilise phase data. This motivates us to investigate the application of the 51 feature sets to phase-only images in addition to their use on the original data set. This investigation is extended to exploit Convolutional Neural Network (CNN) features. The results show that our feature fusion scheme improves the average performance of those feature sets for estimating humans' perceptual texture similarity. The superior performance should be attributed to the importance of phase to texture similarity.
Collapse
|
26
|
Ziemba CM, Simoncelli EP. Opposing effects of selectivity and invariance in peripheral vision. Nat Commun 2021; 12:4597. [PMID: 34321483 PMCID: PMC8319169 DOI: 10.1038/s41467-021-24880-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 07/08/2021] [Indexed: 02/07/2023] Open
Abstract
Sensory processing necessitates discarding some information in service of preserving and reformatting more behaviorally relevant information. Sensory neurons seem to achieve this by responding selectively to particular combinations of features in their inputs, while averaging over or ignoring irrelevant combinations. Here, we expose the perceptual implications of this tradeoff between selectivity and invariance, using stimuli and tasks that explicitly reveal their opposing effects on discrimination performance. We generate texture stimuli with statistics derived from natural photographs, and ask observers to perform two different tasks: Discrimination between images drawn from families with different statistics, and discrimination between image samples with identical statistics. For both tasks, the performance of an ideal observer improves with stimulus size. In contrast, humans become better at family discrimination but worse at sample discrimination. We demonstrate through simulations that these behaviors arise naturally in an observer model that relies on a common set of physiologically plausible local statistical measurements for both tasks.
Collapse
Affiliation(s)
- Corey M Ziemba
- Center for Perceptual Systems, The University of Texas at Austin, Austin, TX, USA.
- Center for Neural Science, New York University, New York, NY, USA.
| | - Eero P Simoncelli
- Center for Neural Science, New York University, New York, NY, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA
| |
Collapse
|
27
|
Okada K, Motoyoshi I. Human Texture Vision as Multi-Order Spectral Analysis. Front Comput Neurosci 2021; 15:692334. [PMID: 34381346 PMCID: PMC8349988 DOI: 10.3389/fncom.2021.692334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 06/28/2021] [Indexed: 11/13/2022] Open
Abstract
Texture information plays a critical role in the rapid perception of scenes, objects, and materials. Here, we propose a novel model in which visual texture perception is essentially determined by the 1st-order (2D-luminance) and 2nd-order (4D-energy) spectra. This model is an extension of the dimensionality of the Filter-Rectify-Filter (FRF) model, and it also corresponds to the frequency representation of the Portilla-Simoncelli (PS) statistics. We show that preserving two spectra and randomizing phases of a natural texture image result in a perceptually similar texture, strongly supporting the model. Based on only two single spectral spaces, this model provides a simpler framework to describe and predict texture representations in the primate visual system. The idea of multi-order spectral analysis is consistent with the hierarchical processing principle of the visual cortex, which is approximated by a multi-layer convolutional network.
Collapse
Affiliation(s)
- Kosuke Okada
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Isamu Motoyoshi
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
28
|
Mohr KS, Carr N, Georgel R, Kelly SP. Modulation of the Earliest Component of the Human VEP by Spatial Attention: An Investigation of Task Demands. Cereb Cortex Commun 2021; 1:tgaa045. [PMID: 34296113 PMCID: PMC8152881 DOI: 10.1093/texcom/tgaa045] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 07/29/2020] [Accepted: 07/29/2020] [Indexed: 11/17/2022] Open
Abstract
Spatial attention modulations of initial afferent activity in area V1, indexed by the first component “C1” of the human visual evoked potential, are rarely found. It has thus been suggested that early modulation is induced only by special task conditions, but what these conditions are remains unknown. Recent failed replications—findings of no C1 modulation using a certain task that had previously produced robust modulations—present a strong basis for examining this question. We ran 3 experiments, the first to more exactly replicate the stimulus and behavioral conditions of the original task, and the second and third to manipulate 2 key factors that differed in the failed replication studies: the provision of informative performance feedback, and the degree to which the probed stimulus features matched those facilitating target perception. Although there was an overall significant C1 modulation of 11%, individually, only experiments 1 and 2 showed reliable effects, underlining that the modulations do occur but not consistently. Better feedback induced greater P1, but not C1, modulations. Target-probe feature matching had an inconsistent influence on modulation patterns, with behavioral performance differences and signal-overlap analyses suggesting interference from extrastriate modulations as a potential cause.
Collapse
Affiliation(s)
- Kieran S Mohr
- Cognitive Neural Systems Lab, School of Electrical and Electronic Engineering and UCD Centre for Biomedical Engineering, University College Dublin, Dublin 4, Ireland
| | - Niamh Carr
- Cognitive Neural Systems Lab, School of Electrical and Electronic Engineering and UCD Centre for Biomedical Engineering, University College Dublin, Dublin 4, Ireland
| | - Rachel Georgel
- Cognitive Neural Systems Lab, School of Electrical and Electronic Engineering and UCD Centre for Biomedical Engineering, University College Dublin, Dublin 4, Ireland
| | - Simon P Kelly
- Cognitive Neural Systems Lab, School of Electrical and Electronic Engineering and UCD Centre for Biomedical Engineering, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
29
|
Redundancy between spectral and higher-order texture statistics for natural image segmentation. Vision Res 2021; 187:55-65. [PMID: 34217005 DOI: 10.1016/j.visres.2021.06.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 06/09/2021] [Accepted: 06/11/2021] [Indexed: 11/23/2022]
Abstract
Visual texture, defined by local image statistics, provides important information to the human visual system for perceptual segmentation. Second-order or spectral statistics (equivalent to the Fourier power spectrum) are a well-studied segmentation cue. However, the role of higher-order statistics (HOS) in segmentation remains unclear, particularly for natural images. Recent experiments indicate that, in peripheral vision, the HOS of the widely adopted Portilla-Simoncelli texture model are a weak segmentation cue compared to spectral statistics, despite the fact that both are necessary to explain other perceptual phenomena and to support high-quality texture synthesis. Here we test whether this discrepancy reflects a property of natural image statistics. First, we observe that differences in spectral statistics across segments of natural images are redundant with differences in HOS. Second, using linear and nonlinear classifiers, we show that each set of statistics individually affords high performance in natural scenes and texture segmentation tasks, but combining spectral statistics and HOS produces relatively small improvements. Third, we find that HOS improve segmentation for a subset of images, although these images are difficult to identify. We also find that different subsets of HOS improve segmentation to a different extent, in agreement with previous physiological and perceptual work. These results show that the HOS add modestly to spectral statistics for natural image segmentation. We speculate that tuning to natural image statistics under resource constraints could explain the weak contribution of HOS to perceptual segmentation in human peripheral vision.
Collapse
|
30
|
Kuroki 黒木 忍 S, Sawayama 澤山 正貴 M, Nishida 西田 眞也 S. The roles of lower- and higher-order surface statistics in tactile texture perception. J Neurophysiol 2021; 126:95-111. [PMID: 34038163 DOI: 10.1152/jn.00577.2020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Humans can haptically discriminate surface textures when there is a significant difference in the statistics of the surface profile. Previous studies on tactile texture discrimination have emphasized the perceptual effects of lower-order statistical features such as carving depth, inter-ridge distance, and anisotropy, which can be characterized by local amplitude spectra or spatial-frequency/orientation subband histograms. However, the real-world surfaces we encounter in everyday life also differ in the higher-order statistics, such as statistics about correlations of nearby spatial-frequencies/orientations. For another modality, vision, the human brain has the ability to use the textural differences in both higher- and lower-order image statistics. In this work, we examined whether the haptic texture perception can use higher-order surface statistics as visual texture perception does, by three-dimensional (3-D)-printing textured surfaces transcribed from different "photos" of natural scenes such as stones and leaves. Even though the maximum carving depth was well above the haptic detection threshold, some texture pairs were hard to discriminate. Specifically, those texture pairs with similar amplitude spectra were difficult to discriminate, which suggests that the lower-order statistics have the dominant effect on tactile texture discrimination. To directly test the poor sensitivity of the tactile texture perception to higher-order surface statistics, we matched the lower-order statistics across different textures using a texture synthesis algorithm and found that haptic discrimination of the matched textures was nearly impossible unless the stimuli contained salient local features. We found no evidence for the ability of the human tactile system to use higher-order surface statistics for texture discrimination.NEW & NOTEWORTHY Humans can discriminate subtle spatial patterns differences in the surrounding world through their hands, but the underlying computation remains poorly understood. Here, we 3-D-printed textured surfaces and analyzed the tactile discrimination performance regarding the sensitivity to surface statistics. The results suggest that observers have sensitivity to lower-order statistics whereas not to higher-order statistics. That is, touch differs from vision not only in spatiotemporal resolution but also in (in)sensitivity to high-level surface statistics.
Collapse
Affiliation(s)
| | - Masataka Sawayama 澤山 正貴
- NTT Communication Science Laboratories, NTT Corporation, Atsugi, Japan.,Inria, Bordeaux, France
| | - Shin'ya Nishida 西田 眞也
- NTT Communication Science Laboratories, NTT Corporation, Atsugi, Japan.,Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Japan
| |
Collapse
|
31
|
Li Z. Unique Neural Activity Patterns Among Lower Order Cortices and Shared Patterns Among Higher Order Cortices During Processing of Similar Shapes With Different Stimulus Types. Iperception 2021; 12:20416695211018222. [PMID: 34104383 PMCID: PMC8161881 DOI: 10.1177/20416695211018222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Accepted: 04/28/2021] [Indexed: 11/16/2022] Open
Abstract
We investigated the neural mechanism of the processing of three-dimensional (3D) shapes defined by disparity and perspective. We measured blood oxygenation level-dependent signals as participants viewed and classified 3D images of convex-concave shapes. According to the cue (disparity or perspective) and element type (random dots or black and white dotted lines), three types of stimuli were used: random dot stereogram, black and white dotted lines with perspective, and black and white dotted lines with binocular disparity. The blood oxygenation level-dependent images were then classified by multivoxel pattern analysis. To identify areas selective to shape, we assessed convex-concave classification accuracy with classifiers trained and tested using signals evoked by the same stimulus type (same cue and element type). To identify cortical regions with similar neural activity patterns regardless of stimulus type, we assessed the convex-concave classification accuracy of transfer classification in which classifiers were trained and tested using different stimulus types (different cues or element types). Classification accuracy using the same stimulus type was high in the early visual areas and subregions of the intraparietal sulcus (IPS), whereas transfer classification accuracy was high in the dorsal subregions of the IPS. These results indicate that the early visual areas process the specific features of stimuli, whereas the IPS regions perform more generalized processing of 3D shapes, independent of a specific stimulus type.
Collapse
Affiliation(s)
- Zhen Li
- Department of Psychology, The University of Hong Kong, Hong Kong, China; Graduate School of Engineering, Kochi University of Technology, Kochi, Japan
| |
Collapse
|
32
|
Herrera-Esposito D, Coen-Cagli R, Gomez-Sena L. Flexible contextual modulation of naturalistic texture perception in peripheral vision. J Vis 2021; 21:1. [PMID: 33393962 PMCID: PMC7794279 DOI: 10.1167/jov.21.1.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 12/01/2020] [Indexed: 11/24/2022] Open
Abstract
Peripheral vision comprises most of our visual field, and is essential in guiding visual behavior. Its characteristic capabilities and limitations, which distinguish it from foveal vision, have been explained by the most influential theory of peripheral vision as the product of representing the visual input using summary statistics. Despite its success, this account may provide a limited understanding of peripheral vision, because it neglects processes of perceptual grouping and segmentation. To test this hypothesis, we studied how contextual modulation, namely the modulation of the perception of a stimulus by its surrounds, interacts with segmentation in human peripheral vision. We used naturalistic textures, which are directly related to summary-statistics representations. We show that segmentation cues affect contextual modulation, and that this is not captured by our implementation of the summary-statistics model. We then characterize the effects of different texture statistics on contextual modulation, providing guidance for extending the model, as well as for probing neural mechanisms of peripheral vision.
Collapse
Affiliation(s)
- Daniel Herrera-Esposito
- Laboratorio de Neurociencias, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Ruben Coen-Cagli
- Department of Systems and Computational Biology and Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Leonel Gomez-Sena
- Laboratorio de Neurociencias, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| |
Collapse
|
33
|
Papale P, Leo A, Handjaras G, Cecchetti L, Pietrini P, Ricciardi E. Shape coding in occipito-temporal cortex relies on object silhouette, curvature, and medial axis. J Neurophysiol 2020; 124:1560-1570. [PMID: 33052726 DOI: 10.1152/jn.00212.2020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Object recognition relies on different transformations of the retinal input, carried out by the visual system, that range from local contrast to object shape and category. While some of those transformations are thought to occur at specific stages of the visual hierarchy, the features they represent are correlated (e.g., object shape and identity) and selectivity for the same feature overlaps in many brain regions. This may be explained either by collinearity across representations or may instead reflect the coding of multiple dimensions by the same cortical population. Moreover, orthogonal and shared components may differently impact distinctive stages of the visual hierarchy. We recorded functional MRI activity while participants passively attended to object images and employed a statistical approach that partitioned orthogonal and shared object representations to reveal their relative impact on brain processing. Orthogonal shape representations (silhouette, curvature, and medial axis) independently explained distinct and overlapping clusters of selectivity in the occitotemporal and parietal cortex. Moreover, we show that the relevance of shared representations linearly increases moving from posterior to anterior regions. These results indicate that the visual cortex encodes shared relations between different features in a topographic fashion and that object shape is encoded along different dimensions, each representing orthogonal features.NEW & NOTEWORTHY There are several possible ways of characterizing the shape of an object. Which shape description better describes our brain responses while we passively perceive objects? Here, we employed three competing shape models to explain brain representations when viewing real objects. We found that object shape is encoded in a multidimensional fashion and thus defined by the interaction of multiple features.
Collapse
Affiliation(s)
- Paolo Papale
- Molecular Mind Laboratory, IMT School for Advanced Studies Lucca, Italy.,Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, The Netherlands
| | - Andrea Leo
- Molecular Mind Laboratory, IMT School for Advanced Studies Lucca, Italy.,Department of Translational Research and Advanced Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Giacomo Handjaras
- Molecular Mind Laboratory, IMT School for Advanced Studies Lucca, Italy
| | - Luca Cecchetti
- Molecular Mind Laboratory, IMT School for Advanced Studies Lucca, Italy
| | - Pietro Pietrini
- Molecular Mind Laboratory, IMT School for Advanced Studies Lucca, Italy
| | | |
Collapse
|
34
|
Vacher J, Davila A, Kohn A, Coen-Cagli R. Texture Interpolation for Probing Visual Perception. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2020; 33:22146-22157. [PMID: 36420050 PMCID: PMC9681139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Texture synthesis models are important tools for understanding visual processing. In particular, statistical approaches based on neurally relevant features have been instrumental in understanding aspects of visual perception and of neural coding. New deep learning-based approaches further improve the quality of synthetic textures. Yet, it is still unclear why deep texture synthesis performs so well, and applications of this new framework to probe visual perception are scarce. Here, we show that distributions of deep convolutional neural network (CNN) activations of a texture are well described by elliptical distributions and therefore, following optimal transport theory, constraining their mean and covariance is sufficient to generate new texture samples. Then, we propose the natural geodesics (i.e. the shortest path between two points) arising with the optimal transport metric to interpolate between arbitrary textures. Compared to other CNN-based approaches, our interpolation method appears to match more closely the geometry of texture perception, and our mathematical framework is better suited to study its statistical nature. We apply our method by measuring the perceptual scale associated to the interpolation parameter in human observers, and the neural sensitivity of different areas of visual cortex in macaque monkeys.
Collapse
Affiliation(s)
- Jonathan Vacher
- Albert Einstein College of Medicine, Dept. of Systems and Comp. Biology, 10461 Bronx, NY, USA
| | - Aida Davila
- Albert Einstein College of Medicine, Dominick P. Purpura Dept. of Neuroscience, 10461 Bronx, NY, USA
| | - Adam Kohn
- Albert Einstein College of Medicine, Dept. of Systems and Comp. Biology, and Dominick P. Purpura Dept. of Neuroscience, 10461 Bronx, NY, USA
| | - Ruben Coen-Cagli
- Albert Einstein College of Medicine, Dept. of Systems and Comp. Biology, and Dominick P. Purpura Dept. of Neuroscience, 10461 Bronx, NY, USA
| |
Collapse
|
35
|
Distinct neural ensemble response statistics are associated with recognition and discrimination of natural sound textures. Proc Natl Acad Sci U S A 2020; 117:31482-31493. [PMID: 33219122 DOI: 10.1073/pnas.2005644117] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The perception of sound textures, a class of natural sounds defined by statistical sound structure such as fire, wind, and rain, has been proposed to arise through the integration of time-averaged summary statistics. Where and how the auditory system might encode these summary statistics to create internal representations of these stationary sounds, however, is unknown. Here, using natural textures and synthetic variants with reduced statistics, we show that summary statistics modulate the correlations between frequency organized neuron ensembles in the awake rabbit inferior colliculus (IC). These neural ensemble correlation statistics capture high-order sound structure and allow for accurate neural decoding in a single trial recognition task with evidence accumulation times approaching 1 s. In contrast, the average activity across the neural ensemble (neural spectrum) provides a fast (tens of milliseconds) and salient signal that contributes primarily to texture discrimination. Intriguingly, perceptual studies in human listeners reveal analogous trends: the sound spectrum is integrated quickly and serves as a salient discrimination cue while high-order sound statistics are integrated slowly and contribute substantially more toward recognition. The findings suggest statistical sound cues such as the sound spectrum and correlation structure are represented by distinct response statistics in auditory midbrain ensembles, and that these neural response statistics may have dissociable roles and time scales for the recognition and discrimination of natural sounds.
Collapse
|
36
|
Nagy DG, Török B, Orbán G. Optimal forgetting: Semantic compression of episodic memories. PLoS Comput Biol 2020; 16:e1008367. [PMID: 33057380 PMCID: PMC7591090 DOI: 10.1371/journal.pcbi.1008367] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 10/27/2020] [Accepted: 09/03/2020] [Indexed: 11/26/2022] Open
Abstract
It has extensively been documented that human memory exhibits a wide range of systematic distortions, which have been associated with resource constraints. Resource constraints on memory can be formalised in the normative framework of lossy compression, however traditional lossy compression algorithms result in qualitatively different distortions to those found in experiments with humans. We argue that the form of distortions is characteristic of relying on a generative model adapted to the environment for compression. We show that this semantic compression framework can provide a unifying explanation of a wide variety of memory phenomena. We harness recent advances in learning deep generative models, that yield powerful tools to approximate generative models of complex data. We use three datasets, chess games, natural text, and hand-drawn sketches, to demonstrate the effects of semantic compression on memory performance. Our model accounts for memory distortions related to domain expertise, gist-based distortions, contextual effects, and delayed recall.
Collapse
Affiliation(s)
- David G. Nagy
- Computational Systems Neuroscience Lab, Wigner Research Centre for Physics, Budapest, Hungary
- Institute of Physics, Eötvös Loránd University, Budapest, Hungary
| | - Balázs Török
- Computational Systems Neuroscience Lab, Wigner Research Centre for Physics, Budapest, Hungary
- Department of Cognitive Science, Budapest University of Technology and Economics, Budapest, Hungary
| | - Gergő Orbán
- Computational Systems Neuroscience Lab, Wigner Research Centre for Physics, Budapest, Hungary
| |
Collapse
|
37
|
Abstract
Area V4-the focus of this review-is a mid-level processing stage along the ventral visual pathway of the macaque monkey. V4 is extensively interconnected with other visual cortical areas along the ventral and dorsal visual streams, with frontal cortical areas, and with several subcortical structures. Thus, it is well poised to play a broad and integrative role in visual perception and recognition-the functional domain of the ventral pathway. Neurophysiological studies in monkeys engaged in passive fixation and behavioral tasks suggest that V4 responses are dictated by tuning in a high-dimensional stimulus space defined by form, texture, color, depth, and other attributes of visual stimuli. This high-dimensional tuning may underlie the development of object-based representations in the visual cortex that are critical for tracking, recognizing, and interacting with objects. Neurophysiological and lesion studies also suggest that V4 responses are important for guiding perceptual decisions and higher-order behavior.
Collapse
Affiliation(s)
- Anitha Pasupathy
- Department of Biological Structure, University of Washington, Seattle, Washington 98195, USA; ,
- Washington National Primate Research Center, University of Washington, Seattle, Washington 98121, USA
| | - Dina V Popovkina
- Department of Psychology, University of Washington, Seattle, Washington 98105, USA;
| | - Taekjun Kim
- Department of Biological Structure, University of Washington, Seattle, Washington 98195, USA; ,
- Washington National Primate Research Center, University of Washington, Seattle, Washington 98121, USA
| |
Collapse
|
38
|
Laskar MNU, Sanchez Giraldo LG, Schwartz O. Deep neural networks capture texture sensitivity in V2. J Vis 2020; 20:21-1. [PMID: 32692830 PMCID: PMC7424103 DOI: 10.1167/jov.20.7.21] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 02/28/2020] [Indexed: 11/24/2022] Open
Abstract
Deep convolutional neural networks (CNNs) trained on visual objects have shown intriguing ability to predict some response properties of visual cortical neurons. However, the factors (e.g., if the model is trained or not, receptive field size) and computations (e.g., convolution, rectification, pooling, normalization) that give rise to such ability, at what level, and the role of intermediate processing stages in explaining changes that develop across areas of the cortical hierarchy are poorly understood. We focused on the sensitivity to textures as a paradigmatic example, since recent neurophysiology experiments provide rich data pointing to texture sensitivity in secondary (but not primary) visual cortex (V2). We initially explored the CNN without any fitting to the neural data and found that the first two layers of the CNN showed qualitative correspondence to the first two cortical areas in terms of texture sensitivity. We therefore developed a quantitative approach to select a population of CNN model neurons that best fits the brain neural recordings. We found that the CNN could develop compatibility to secondary cortex in the second layer following rectification and that this was improved following pooling but only mildly influenced by the local normalization operation. Higher layers of the CNN could further, though modestly, improve the compatibility with the V2 data. The compatibility was reduced when incorporating random rather than learned weights. Our results show that the CNN class of model is effective for capturing changes that develop across early areas of cortex, and has the potential to help identify the computations that give rise to hierarchical processing in the brain (code is available in GitHub).
Collapse
Affiliation(s)
| | | | - Odelia Schwartz
- Department of Computer Science, University of Miami, FL, USA
| |
Collapse
|
39
|
Hénaff OJ, Boundy-Singer ZM, Meding K, Ziemba CM, Goris RLT. Representation of visual uncertainty through neural gain variability. Nat Commun 2020; 11:2513. [PMID: 32427825 PMCID: PMC7237668 DOI: 10.1038/s41467-020-15533-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 03/14/2020] [Indexed: 01/25/2023] Open
Abstract
Uncertainty is intrinsic to perception. Neural circuits which process sensory information must therefore also represent the reliability of this information. How they do so is a topic of debate. We propose a model of visual cortex in which average neural response strength encodes stimulus features, while cross-neuron variability in response gain encodes the uncertainty of these features. To test this model, we studied spiking activity of neurons in macaque V1 and V2 elicited by repeated presentations of stimuli whose uncertainty was manipulated in distinct ways. We show that gain variability of individual neurons is tuned to stimulus uncertainty, that this tuning is specific to the features encoded by these neurons and largely invariant to the source of uncertainty. We demonstrate that this behavior naturally arises from known gain-control mechanisms, and illustrate how downstream circuits can jointly decode stimulus features and their uncertainty from sensory population activity.
Collapse
Affiliation(s)
- Olivier J Hénaff
- Center for Neural Science, New York University, New York, NY, USA.,DeepMind, London, UK
| | - Zoe M Boundy-Singer
- Center for Perceptual Systems, University of Texas at Austin, Austin, TX, USA
| | - Kristof Meding
- Neural Information Processing Group, University of Tübingen, Tübingen, Germany
| | - Corey M Ziemba
- Center for Perceptual Systems, University of Texas at Austin, Austin, TX, USA
| | - Robbe L T Goris
- Center for Perceptual Systems, University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
40
|
Chin BM, Burge J. Predicting the Partition of Behavioral Variability in Speed Perception with Naturalistic Stimuli. J Neurosci 2020; 40:864-879. [PMID: 31772139 PMCID: PMC6975300 DOI: 10.1523/jneurosci.1904-19.2019] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 11/12/2019] [Accepted: 11/17/2019] [Indexed: 11/21/2022] Open
Abstract
A core goal of visual neuroscience is to predict human perceptual performance from natural signals. Performance in any natural task can be limited by at least three sources of uncertainty: stimulus variability, internal noise, and suboptimal computations. Determining the relative importance of these factors has been a focus of interest for decades but requires methods for predicting the fundamental limits imposed by stimulus variability on sensory-perceptual precision. Most successes have been limited to simple stimuli and simple tasks. But perception science ultimately aims to understand how vision works with natural stimuli. Successes in this domain have proven elusive. Here, we develop a model of humans based on an image-computable (images in, estimates out) Bayesian ideal observer. Given biological constraints, the ideal optimally uses the statistics relating local intensity patterns in moving images to speed, specifying the fundamental limits imposed by natural stimuli. Next, we propose a theoretical link between two key decision-theoretic quantities that suggests how to experimentally disentangle the impacts of internal noise and deterministic suboptimal computations. In several interlocking discrimination experiments with three male observers, we confirm this link and determine the quantitative impact of each candidate performance-limiting factor. Human performance is near-exclusively limited by natural stimulus variability and internal noise, and humans use near-optimal computations to estimate speed from naturalistic image movies. The findings indicate that the partition of behavioral variability can be predicted from a principled analysis of natural images and scenes. The approach should be extendable to studies of neural variability with natural signals.SIGNIFICANCE STATEMENT Accurate estimation of speed is critical for determining motion in the environment, but humans cannot perform this task without error. Different objects moving at the same speed cast different images on the eyes. This stimulus variability imposes fundamental external limits on the human ability to estimate speed. Predicting these limits has proven difficult. Here, by analyzing natural signals, we predict the quantitative impact of natural stimulus variability on human performance given biological constraints. With integrated experiments, we compare its impact to well-studied performance-limiting factors internal to the visual system. The results suggest that the deterministic computations humans perform are near optimal, and that behavioral responses to natural stimuli can be studied with the rigor and interpretability defining work with simpler stimuli.
Collapse
Affiliation(s)
| | - Johannes Burge
- Department of Psychology,
- Neuroscience Graduate Group, and
- Bioengineering Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| |
Collapse
|
41
|
Laminar Differences in Responses to Naturalistic Texture in Macaque V1 and V2. J Neurosci 2019; 39:9748-9756. [PMID: 31666355 DOI: 10.1523/jneurosci.1743-19.2019] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 10/02/2019] [Accepted: 10/16/2019] [Indexed: 11/21/2022] Open
Abstract
Most single units recorded from macaque secondary visual cortex (V2) respond with higher firing rates to synthetic texture images containing "naturalistic" higher-order statistics than to spectrally matched "noise" images lacking these statistics. In contrast, few single units in V1 show this property. We explored how the strength and dynamics of response vary across the different layers of visual cortex by recording multiunit (defined as high-frequency power in the local field potential) and gamma-band activity evoked by brief presentations of naturalistic and noise images in V1 and V2 of anesthetized macaque monkeys of both sexes. As previously reported, recordings in V2 showed consistently stronger responses to naturalistic texture than to spectrally matched noise. In contrast to single-unit recordings, V1 multiunit activity showed a preference for images with naturalistic statistics, and in gamma-band activity this preference was comparable across V1 and V2. Sensitivity to naturalistic image structure was strongest in the supragranular and infragranular layers of V1, but weak in granular layers, suggesting that it might reflect feedback from V2. Response timing was consistent with this idea. Visual responses appeared first in V1, followed by V2. Sensitivity to naturalistic texture emerged first in V2, followed by the supragranular and infragranular layers of V1, and finally in the granular layers of V1. Our results demonstrate laminar differences in the encoding of higher-order statistics of natural texture, and suggest that this sensitivity first arises in V2 and is fed back to modulate activity in V1.SIGNIFICANCE STATEMENT The circuit mechanisms responsible for visual representations of intermediate complexity are largely unknown. We used a well validated set of synthetic texture stimuli to probe the temporal and laminar profile of sensitivity to the higher-order statistical structure of natural images. We found that this sensitivity emerges first and most strongly in V2 but soon after in V1. However, sensitivity in V1 is higher in the laminae (extragranular) and recording modalities (local field potential) most likely affected by V2 connections, suggesting a feedback origin. Our results show how sensitivity to naturalistic image structure emerges across time and circuitry in the early visual cortex.
Collapse
|
42
|
Giraldo LGS, Schwartz O. Integrating Flexible Normalization into Midlevel Representations of Deep Convolutional Neural Networks. Neural Comput 2019; 31:2138-2176. [PMID: 31525314 DOI: 10.1162/neco_a_01226] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Deep convolutional neural networks (CNNs) are becoming increasingly popular models to predict neural responses in visual cortex. However, contextual effects, which are prevalent in neural processing and in perception, are not explicitly handled by current CNNs, including those used for neural prediction. In primary visual cortex, neural responses are modulated by stimuli spatially surrounding the classical receptive field in rich ways. These effects have been modeled with divisive normalization approaches, including flexible models, where spatial normalization is recruited only to the degree that responses from center and surround locations are deemed statistically dependent. We propose a flexible normalization model applied to midlevel representations of deep CNNs as a tractable way to study contextual normalization mechanisms in midlevel cortical areas. This approach captures nontrivial spatial dependencies among midlevel features in CNNs, such as those present in textures and other visual stimuli, that arise from tiling high-order features geometrically. We expect that the proposed approach can make predictions about when spatial normalization might be recruited in midlevel cortical areas. We also expect this approach to be useful as part of the CNN tool kit, therefore going beyond more restrictive fixed forms of normalization.
Collapse
Affiliation(s)
| | - Odelia Schwartz
- Computer Science Department, University of Miami, Coral Gables, FL 33146, U.S.A.
| |
Collapse
|
43
|
Adesnik H, Naka A. Cracking the Function of Layers in the Sensory Cortex. Neuron 2019; 100:1028-1043. [PMID: 30521778 DOI: 10.1016/j.neuron.2018.10.032] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 08/08/2018] [Accepted: 10/18/2018] [Indexed: 12/24/2022]
Abstract
Understanding how cortical activity generates sensory perceptions requires a detailed dissection of the function of cortical layers. Despite our relatively extensive knowledge of their anatomy and wiring, we have a limited grasp of what each layer contributes to cortical computation. We need to develop a theory of cortical function that is rooted solidly in each layer's component cell types and fine circuit architecture and produces predictions that can be validated by specific perturbations. Here we briefly review the progress toward such a theory and suggest an experimental road map toward this goal. We discuss new methods for the all-optical interrogation of cortical layers, for correlating in vivo function with precise identification of transcriptional cell type, and for mapping local and long-range activity in vivo with synaptic resolution. The new technologies that can crack the function of cortical layers are finally on the immediate horizon.
Collapse
Affiliation(s)
- Hillel Adesnik
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA; The Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA.
| | - Alexander Naka
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA; The Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, USA
| |
Collapse
|
44
|
Nakamura D, Satoh S. Simple speed estimators reproduce MT responses and identify strength of visual illusion. Neural Comput Appl 2019. [DOI: 10.1007/s00521-017-3211-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
45
|
Zhaoping L. A new framework for understanding vision from the perspective of the primary visual cortex. Curr Opin Neurobiol 2019; 58:1-10. [PMID: 31271931 DOI: 10.1016/j.conb.2019.06.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 06/02/2019] [Accepted: 06/10/2019] [Indexed: 11/25/2022]
Abstract
Visual attention selects only a tiny fraction of visual input information for further processing. Selection starts in the primary visual cortex (V1), which creates a bottom-up saliency map to guide the fovea to selected visual locations via gaze shifts. This motivates a new framework that views vision as consisting of encoding, selection, and decoding stages, placing selection on center stage. It suggests a massive loss of non-selected information from V1 downstream along the visual pathway. Hence, feedback from downstream visual cortical areas to V1 for better decoding (recognition), through analysis-by-synthesis, should query for additional information and be mainly directed at the foveal region. Accordingly, non-foveal vision is not only poorer in spatial resolution, but also more susceptible to many illusions.
Collapse
Affiliation(s)
- Li Zhaoping
- University of Tübingen, Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| |
Collapse
|
46
|
Wallis TS, Funke CM, Ecker AS, Gatys LA, Wichmann FA, Bethge M. Image content is more important than Bouma's Law for scene metamers. eLife 2019; 8:42512. [PMID: 31038458 PMCID: PMC6491040 DOI: 10.7554/elife.42512] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 03/09/2019] [Indexed: 11/16/2022] Open
Abstract
We subjectively perceive our visual field with high fidelity, yet peripheral distortions can go unnoticed and peripheral objects can be difficult to identify (crowding). Prior work showed that humans could not discriminate images synthesised to match the responses of a mid-level ventral visual stream model when information was averaged in receptive fields with a scaling of about half their retinal eccentricity. This result implicated ventral visual area V2, approximated ‘Bouma’s Law’ of crowding, and has subsequently been interpreted as a link between crowding zones, receptive field scaling, and our perceptual experience. However, this experiment never assessed natural images. We find that humans can easily discriminate real and model-generated images at V2 scaling, requiring scales at least as small as V1 receptive fields to generate metamers. We speculate that explaining why scenes look as they do may require incorporating segmentation and global organisational constraints in addition to local pooling. As you read this digest, your eyes move to follow the lines of text. But now try to hold your eyes in one position, while reading the text on either side and below: it soon becomes clear that peripheral vision is not as good as we tend to assume. It is not possible to read text far away from the center of your line of vision, but you can see ‘something’ out of the corner of your eye. You can see that there is text there, even if you cannot read it, and you can see where your screen or page ends. So how does the brain generate peripheral vision, and why does it differ from what you see when you look straight ahead? One idea is that the visual system averages information over areas of the peripheral visual field. This gives rise to texture-like patterns, as opposed to images made up of fine details. Imagine looking at an expanse of foliage, gravel or fur, for example. Your eyes cannot make out the individual leaves, pebbles or hairs. Instead, you perceive an overall pattern in the form of a texture. Our peripheral vision may also consist of such textures, created when the brain averages information over areas of space. Wallis, Funke et al. have now tested this idea using an existing computer model that averages visual input in this way. By giving the model a series of photographs to process, Wallis, Funke et al. obtained images that should in theory simulate peripheral vision. If the model mimics the mechanisms that generate peripheral vision, then healthy volunteers should be unable to distinguish the processed images from the original photographs. But in fact, the participants could easily discriminate the two sets of images. This suggests that the visual system does not solely use textures to represent information in the peripheral visual field. Wallis, Funke et al. propose that other factors, such as how the visual system separates and groups objects, may instead determine what we see in our peripheral vision. This knowledge could ultimately benefit patients with eye diseases such as macular degeneration, a condition that causes loss of vision in the center of the visual field and forces patients to rely on their peripheral vision.
Collapse
Affiliation(s)
- Thomas Sa Wallis
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Christina M Funke
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany
| | - Alexander S Ecker
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Bernstein Center for Computational Neuroscience, Berlin, Germany.,Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, United States.,Institute for Theoretical Physics, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Leon A Gatys
- Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Felix A Wichmann
- Neural Information Processing Group, Faculty of Science, Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Matthias Bethge
- Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, United States.,Institute for Theoretical Physics, Eberhard Karls Universität Tübingen, Tübingen, Germany.,Max Planck Institute for Biological Cybernetics, Tübingen, Germany
| |
Collapse
|
47
|
Neural Coding for Shape and Texture in Macaque Area V4. J Neurosci 2019; 39:4760-4774. [PMID: 30948478 DOI: 10.1523/jneurosci.3073-18.2019] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Revised: 03/19/2019] [Accepted: 04/01/2019] [Indexed: 11/21/2022] Open
Abstract
The distinct visual sensations of shape and texture have been studied separately in cortex; therefore, it remains unknown whether separate neuronal populations encode each of these properties or one population carries a joint encoding. We directly compared shape and texture selectivity of individual V4 neurons in awake macaques (1 male, 1 female) and found that V4 neurons lie along a continuum from strong tuning for boundary curvature of shapes to strong tuning for perceptual dimensions of texture. Among neurons tuned to both attributes, tuning for shape and texture were largely separable, with the latter delayed by ∼30 ms. We also found that shape stimuli typically evoked stronger, more selective responses than did texture patches, regardless of whether the latter were contained within or extended beyond the receptive field. These results suggest that there are separate specializations in mid-level cortical processing for visual attributes of shape and texture.SIGNIFICANCE STATEMENT Object recognition depends on our ability to see both the shape of the boundaries of objects and properties of their surfaces. However, neuroscientists have never before examined how shape and texture are linked together in mid-level visual cortex. In this study, we used systematically designed sets of simple shapes and texture patches to probe the responses of individual neurons in the primate visual cortex. Our results provide the first evidence that some cortical neurons specialize in processing shape whereas others specialize in processing textures. Most neurons lie between the ends of this continuum, and in these neurons we find that shape and texture encoding are largely independent.
Collapse
|
48
|
Kindel WF, Christensen ED, Zylberberg J. Using deep learning to probe the neural code for images in primary visual cortex. J Vis 2019; 19:29. [PMID: 31026016 PMCID: PMC6485988 DOI: 10.1167/19.4.29] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 01/01/2019] [Indexed: 11/24/2022] Open
Abstract
Primary visual cortex (V1) is the first stage of cortical image processing, and major effort in systems neuroscience is devoted to understanding how it encodes information about visual stimuli. Within V1, many neurons respond selectively to edges of a given preferred orientation: These are known as either simple or complex cells. Other neurons respond to localized center-surround image features. Still others respond selectively to certain image stimuli, but the specific features that excite them are unknown. Moreover, even for the simple and complex cells-the best-understood V1 neurons-it is challenging to predict how they will respond to natural image stimuli. Thus, there are important gaps in our understanding of how V1 encodes images. To fill this gap, we trained deep convolutional neural networks to predict the firing rates of V1 neurons in response to natural image stimuli, and we find that the predicted firing rates are highly correlated (C C ¯ n o r m = 0 . 556 ± 0 . 01 ) with the neurons' actual firing rates over a population of 355 neurons. This performance value is quoted for all neurons, with no selection filter. Performance is better for more active neurons: When evaluated only on neurons with mean firing rates above 5 Hz, our predictors achieve correlations ofC C ¯ n o r m = 0 . 69 ± 0 . 01 with the neurons' true firing rates. We find that the firing rates of both orientation-selective and non-orientation-selective neurons can be predicted with high accuracy. Additionally, we use a variety of models to benchmark performance and find that our convolutional neural-network model makes more accurate predictions.
Collapse
Affiliation(s)
- William F Kindel
- Department of Physiology and Biophysics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Elijah D Christensen
- Department of Physiology and Biophysics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Joel Zylberberg
- Department of Physiology and Biophysics, University of Colorado School of Medicine, Aurora, CO, USA
- Learning in Machines and Brains Program, Canadian Institute for Advanced Research, Toronto, Canada
| |
Collapse
|
49
|
DiMattina C, Baker CL. Modeling second-order boundary perception: A machine learning approach. PLoS Comput Biol 2019; 15:e1006829. [PMID: 30883556 PMCID: PMC6438569 DOI: 10.1371/journal.pcbi.1006829] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Revised: 03/28/2019] [Accepted: 01/15/2019] [Indexed: 11/18/2022] Open
Abstract
Visual pattern detection and discrimination are essential first steps for scene analysis. Numerous human psychophysical studies have modeled visual pattern detection and discrimination by estimating linear templates for classifying noisy stimuli defined by spatial variations in pixel intensities. However, such methods are poorly suited to understanding sensory processing mechanisms for complex visual stimuli such as second-order boundaries defined by spatial differences in contrast or texture. We introduce a novel machine learning framework for modeling human perception of second-order visual stimuli, using image-computable hierarchical neural network models fit directly to psychophysical trial data. This framework is applied to modeling visual processing of boundaries defined by differences in the contrast of a carrier texture pattern, in two different psychophysical tasks: (1) boundary orientation identification, and (2) fine orientation discrimination. Cross-validation analysis is employed to optimize model hyper-parameters, and demonstrate that these models are able to accurately predict human performance on novel stimulus sets not used for fitting model parameters. We find that, like the ideal observer, human observers take a region-based approach to the orientation identification task, while taking an edge-based approach to the fine orientation discrimination task. How observers integrate contrast modulation across orientation channels is investigated by fitting psychophysical data with two models representing competing hypotheses, revealing a preference for a model which combines multiple orientations at the earliest possible stage. Our results suggest that this machine learning approach has much potential to advance the study of second-order visual processing, and we outline future steps towards generalizing the method to modeling visual segmentation of natural texture boundaries. This study demonstrates how machine learning methodology can be fruitfully applied to psychophysical studies of second-order visual processing. Many naturally occurring visual boundaries are defined by spatial differences in features other than luminance, for example by differences in texture or contrast. Quantitative models of such “second-order” boundary perception cannot be estimated using the standard regression techniques (known as “classification images”) commonly applied to “first-order”, luminance-defined stimuli. Here we present a novel machine learning approach to modeling second-order boundary perception using hierarchical neural networks. In contrast to previous quantitative studies of second-order boundary perception, we directly estimate network model parameters using psychophysical trial data. We demonstrate that our method can reveal different spatial summation strategies that human observers utilize for different kinds of second-order boundary perception tasks, and can be used to compare competing hypotheses of how contrast modulation is integrated across orientation channels. We outline extensions of the methodology to other kinds of second-order boundaries, including those in natural images.
Collapse
Affiliation(s)
- Christopher DiMattina
- Computational Perception Laboratory, Department of Psychology, Florida Gulf Coast University, Fort Myers, Florida, United States of America
- * E-mail:
| | - Curtis L. Baker
- McGill Vision Research Unit, Department of Ophthalmology, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
50
|
Sanchez-Giraldo LG, Laskar MNU, Schwartz O. Normalization and pooling in hierarchical models of natural images. Curr Opin Neurobiol 2019; 55:65-72. [PMID: 30785005 DOI: 10.1016/j.conb.2019.01.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 12/29/2018] [Accepted: 01/13/2019] [Indexed: 11/17/2022]
Abstract
Divisive normalization and subunit pooling are two canonical classes of computation that have become widely used in descriptive (what) models of visual cortical processing. Normative (why) models from natural image statistics can help constrain the form and parameters of such classes of models. We focus on recent advances in two particular directions, namely deriving richer forms of divisive normalization, and advances in learning pooling from image statistics. We discuss the incorporation of such components into hierarchical models. We consider both hierarchical unsupervised learning from image statistics, and discriminative supervised learning in deep convolutional neural networks (CNNs). We further discuss studies on the utility and extensions of the convolutional architecture, which has also been adopted by recent descriptive models. We review the recent literature and discuss the current promises and gaps of using such approaches to gain a better understanding of how cortical neurons represent and process complex visual stimuli.
Collapse
Affiliation(s)
- Luis G Sanchez-Giraldo
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States.
| | - Md Nasir Uddin Laskar
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States
| | - Odelia Schwartz
- Computational Neuroscience Lab, Dept. of Computer Science, University of Miami, FL 33146, United States
| |
Collapse
|