1
|
Sharvashidze N, Valsecchi M, Schütz AC. Transsaccadic perception of changes in object regularity. J Vis 2024; 24:3. [PMID: 39630465 PMCID: PMC11627247 DOI: 10.1167/jov.24.13.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 10/07/2024] [Indexed: 12/11/2024] Open
Abstract
The visual system compensates for differences between peripheral and foveal vision using different mechanisms. Although peripheral vision is characterized by higher spatial uncertainty and lower resolution than foveal vision, observers reported objects to be less distorted and less blurry in the periphery than the fovea in a visual matching task during fixation (Valsecchi et al., 2018). Here, we asked whether a similar overcompensation could be found across saccadic eye movements and whether it would bias the detection of transsaccadic changes in object regularity. The blur and distortion levels of simple geometric shapes were manipulated in the Eidolons algorithm (Koenderink et al., 2017). In an appearance discrimination task, participants had to judge the appearance of blur (experiment 1) and distortion (experiment 2) separately before and after a saccade. Objects appeared less blurry before a saccade (in the periphery) than after a saccade (in the fovea). No differences were found in the appearance of distortion. In a change discrimination task, participants had to judge if blur (experiment 1) and distortion (experiment 2) either increased or decreased during a saccade. Overall, they showed a tendency to report an increase in both blur and distortion across saccades. The precision of the responses was improved by a 200-ms postsaccadic blank. Results from the change discrimination task of both experiments suggest that a transsaccadic decrease in regularity is more visible, compared to an increase in regularity. In line with the previous study that reported a peripheral overcompensation in the visual matching task, we found a similar mechanism, exhibiting a phenomenological sharpening of blurry edges before a saccade. These results generalize peripheral-foveal differences observed during fixation to the here tested dynamic, transsaccadic conditions where they contribute to biases in transsaccadic change detection.
Collapse
Affiliation(s)
- Nino Sharvashidze
- Allgemeine und Biologische Psychologie, Philipps-Universität Marburg, Marburg, Germany
| | - Matteo Valsecchi
- Dipartimento di Psicologia, Università di Bologna, Bologna, Italy
| | - Alexander C Schütz
- Allgemeine und Biologische Psychologie, Philipps-Universität Marburg, Marburg, Germany
- Center for Mind, Brain and Behavior, Universities of Marburg, Giessen, and Darmstadt, Germany
- https://www.uni-marburg.de/en/fb04/team-schuetz/team/alexander-schutz
| |
Collapse
|
2
|
Cicchini GM, D'Errico G, Burr DC. Color crowding considered as adaptive spatial integration. J Vis 2024; 24:9. [PMID: 39656167 PMCID: PMC11636666 DOI: 10.1167/jov.24.13.9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 10/21/2024] [Indexed: 12/14/2024] Open
Abstract
Crowding is the inability to recognize an object in clutter, classically considered a fundamental low-level bottleneck to object recognition. Recently, however, it has been suggested that crowding, like predictive phenomena such as serial dependence, may result from optimizing strategies that exploit redundancies in natural scenes. This notion leads to several testable predictions, such as crowding being greater for nonsalient targets and, counterintuitively, that flanker interference should be associated with higher precision in judgements, leading to a lower overall error rate. Here we measured color discrimination for targets flanked by stimuli of variable color. The results verified both predictions, showing that although crowding can affect object recognition, it may be better understood not as a processing bottleneck, but rather as a consequence of mechanisms evolved to efficiently exploit the spatial redundancies of the natural world. Analyses of reaction times of judgments shows that the integration occurs at sensory rather than decisional levels.
Collapse
Affiliation(s)
| | | | - David Charles Burr
- Department of Neurosciences, Psychology, Drug Research and Child Health, University of Florence, Firenze, Italy
- School of Psychology, University of Sydney, Camperdown, NSW, Australia
| |
Collapse
|
3
|
Lande KJ. Compositionality in perception: A framework. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2024; 15:e1691. [PMID: 38807187 DOI: 10.1002/wcs.1691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 05/07/2024] [Accepted: 05/08/2024] [Indexed: 05/30/2024]
Abstract
Perception involves the processing of content or information about the world. In what form is this content represented? I argue that perception is widely compositional. The perceptual system represents many stimulus features (including shape, orientation, and motion) in terms of combinations of other features (such as shape parts, slant and tilt, common and residual motion vectors). But compositionality can take a variety of forms. The ways in which perceptual representations compose are markedly different from the ways in which sentences or thoughts are thought to be composed. I suggest that the thesis that perception is compositional is not itself a concrete hypothesis with specific predictions; rather it affords a productive framework for developing and evaluating specific empirical hypotheses about the form and content of perceptual representations. The question is not just whether perception is compositional, but how. Answering this latter question can provide fundamental insights into perception. This article is categorized under: Philosophy > Representation Philosophy > Foundations of Cognitive Science Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Kevin J Lande
- Department of Philosophy and Centre for Vision Research, York University, Toronto, Canada
| |
Collapse
|
4
|
Lu X, Jiang R, Song M, Wu Y, Ge Y, Chen N. Seeing in crowds: Averaging first, then max. Psychon Bull Rev 2024; 31:1856-1866. [PMID: 38337141 DOI: 10.3758/s13423-024-02468-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/24/2024] [Indexed: 02/12/2024]
Abstract
Crowding, a fundamental limit in object recognition, is believed to result from excessive integration of nearby items in peripheral vision. To understand its pooling mechanisms, we measured subjects' internal response distributions in an orientation crowding task. Contrary to the prediction of an averaging model, we observed a pattern suggesting that the perceptual judgement is made based on choosing the largest response across the noise-perturbed items. A model featuring first-stage averaging and second-stage signed-max operation predicts the diverse errors made by human observers under various signal strength levels. These findings suggest that different rules operate to resolve the bottleneck at early and high-level stages of visual processing, implementing a combination of linear and nonlinear pooling strategies.
Collapse
Affiliation(s)
- Xincheng Lu
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 506, Weiqing Building, Beijing, 100084, People's Republic of China
| | - Ruijie Jiang
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 506, Weiqing Building, Beijing, 100084, People's Republic of China
| | - Meng Song
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 506, Weiqing Building, Beijing, 100084, People's Republic of China
| | - Yiting Wu
- Khoury College of Computer Sciences, Northeastern University, 360 Huntington Ave, Boston, MA, 02115, USA
| | - Yiran Ge
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 506, Weiqing Building, Beijing, 100084, People's Republic of China
| | - Nihong Chen
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 506, Weiqing Building, Beijing, 100084, People's Republic of China.
- IDG/McGovern Institute for Brain Research at Tsinghua University, Beijing, 100084, People's Republic of China.
| |
Collapse
|
5
|
Kamble V, Buyle M, Crollen V. Investigating the crowding effect on letters and symbols in deaf adults. Sci Rep 2024; 14:16161. [PMID: 38997432 PMCID: PMC11245469 DOI: 10.1038/s41598-024-66832-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 07/04/2024] [Indexed: 07/14/2024] Open
Abstract
Reading requires the transformation of a complex array of visual features into sounds and meaning. For deaf signers who experience changes in visual attention and have little or no access to the sounds of the language they read, understanding the visual constraints underlying reading is crucial. This study aims to explore a fundamental aspect of visual perception intertwined with reading: the crowding effect. This effect manifests as the struggle to distinguish a target letter when surrounded by flanker letters. Through a two-alternative forced choice task, we assessed the recognition of letters and symbols presented in isolation or flanked by two or four characters, positioned either to the left or right of fixation. Our findings reveal that while deaf individuals exhibit higher accuracy in processing letters compared to symbols, their performance falls short of that of their hearing counterparts. Interestingly, despite their proficiency with letters, deaf individuals didn't demonstrate quicker letter identification, particularly in the most challenging scenario where letters were flanked by four characters. These outcomes imply the development of a specialized letter processing system among deaf individuals, albeit one that may subtly diverge from that of their hearing counterparts.
Collapse
Affiliation(s)
- Veena Kamble
- Institut de Recherche en Sciences Psychologiques, Université Catholique de Louvain, Place de l'Université, Louvain-la-Neuve, Belgium.
| | - Margot Buyle
- Institut de Recherche en Sciences Psychologiques, Université Catholique de Louvain, Place de l'Université, Louvain-la-Neuve, Belgium
| | - Virginie Crollen
- Institut de Recherche en Sciences Psychologiques, Université Catholique de Louvain, Place de l'Université, Louvain-la-Neuve, Belgium
| |
Collapse
|
6
|
Yildirim-Keles FZ, Coates DR, Sayim B. Attention in redundancy masking. Atten Percept Psychophys 2024; 86:1-14. [PMID: 38750302 DOI: 10.3758/s13414-024-02885-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2024] [Indexed: 11/13/2024]
Abstract
Peripheral vision is limited due to several factors, such as visual resolution, crowding, and attention. When attention is not directed towards a stimulus, detection, discrimination, and identification are often compromised. Recent studies have found a new phenomenon that strongly limits peripheral vision, "redundancy masking". In redundancy masking, the number of perceived items in repeating patterns is reduced. For example, when presenting three lines in the peripheral visual field and asking participants to report the number of lines, often only two lines are reported. Here, we investigated what role attention plays in redundancy masking. If redundancy masking was due to limited attention to the target, it should be stronger when less attention is allocated to the target, and absent when attention is maximally focused on the target. Participants were presented with line arrays and reported the number of lines in three cueing conditions (i.e., single cue, double cue, and no cue). Redundancy masking was observed in all cueing conditions, with observers reporting fewer lines than presented in the single, double, and no cue conditions. These results suggest that redundancy masking is not due to limited attention. The number of lines reported was closer to the correct number of lines in the single compared to the double and the no cue conditions, suggesting that reduced attention additionally compromised stimulus discrimination, and replicating typical effects of diminished attention. Taken together, our results suggest that the extent of attention to peripherally presented stimuli modulates discrimination performance, but does not account for redundancy masking.
Collapse
Affiliation(s)
- Fazilet Zeynep Yildirim-Keles
- Institute of Psychology, University of Bern, Fabrikstrasse 8, 3012, Bern, Switzerland.
- Department of Psychology, University of Fribourg, Faucigny 2, 1700, Fribourg, Switzerland.
| | - Daniel R Coates
- Institute of Psychology, University of Bern, Fabrikstrasse 8, 3012, Bern, Switzerland
- College of Optometry, University of Houston, Houston, TX, 77204, USA
| | - Bilge Sayim
- Institute of Psychology, University of Bern, Fabrikstrasse 8, 3012, Bern, Switzerland
- Univ. Lille, CNRS, UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, 59000, Lille, France
| |
Collapse
|
7
|
Kim T, Pasupathy A. Neural Correlates of Crowding in Macaque Area V4. J Neurosci 2024; 44:e2260232024. [PMID: 38670806 PMCID: PMC11170949 DOI: 10.1523/jneurosci.2260-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/29/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
Visual crowding refers to the phenomenon where a target object that is easily identifiable in isolation becomes difficult to recognize when surrounded by other stimuli (distractors). Many psychophysical studies have investigated this phenomenon and proposed alternative models for the underlying mechanisms. One prominent hypothesis, albeit with mixed psychophysical support, posits that crowding arises from the loss of information due to pooled encoding of features from target and distractor stimuli in the early stages of cortical visual processing. However, neurophysiological studies have not rigorously tested this hypothesis. We studied the responses of single neurons in macaque (one male, one female) area V4, an intermediate stage of the object-processing pathway, to parametrically designed crowded displays and texture statistics-matched metameric counterparts. Our investigations reveal striking parallels between how crowding parameters-number, distance, and position of distractors-influence human psychophysical performance and V4 shape selectivity. Importantly, we also found that enhancing the salience of a target stimulus could alleviate crowding effects in highly cluttered scenes, and this could be temporally protracted reflecting a dynamical process. Thus, a pooled encoding of nearby stimuli cannot explain the observed responses, and we propose an alternative model where V4 neurons preferentially encode salient stimuli in crowded displays. Overall, we conclude that the magnitude of crowding effects is determined not just by the number of distractors and target-distractor separation but also by the relative salience of targets versus distractors based on their feature attributes-the similarity of distractors and the contrast between target and distractor stimuli.
Collapse
Affiliation(s)
- Taekjun Kim
- Department of Biological Structure, University of Washington, Seattle, Washington 98195
- Washington National Primate Research Center, University of Washington, Seattle, Washington 98195
| | - Anitha Pasupathy
- Department of Biological Structure, University of Washington, Seattle, Washington 98195
- Washington National Primate Research Center, University of Washington, Seattle, Washington 98195
| |
Collapse
|
8
|
Cutler J, Bodet A, Rivest J, Cavanagh P. The word superiority effect overcomes crowding. Vision Res 2024; 222:108436. [PMID: 38820621 DOI: 10.1016/j.visres.2024.108436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/14/2024] [Accepted: 05/14/2024] [Indexed: 06/02/2024]
Abstract
Crowding and the word superiority effect are two perceptual phenomena that influence reading. The identification of the inner letters of a word can be hindered by crowding from adjacent letters, but it can be facilitated by the word context itself (the word superiority effect). In the present study, strings of four-letters (words and non-words) with different inter-letter spacings (ranging from an optimal spacing to produce crowding to a spacing too large to produce crowding) were presented briefly in the periphery and participants were asked to identify the third letter of the string. Each word had a partner word that was identical except for its third letter (e.g., COLD, CORD) so that guessing as the source of the improved performance for words could be ruled out. Unsurprisingly, letter identification accuracy for words was better than non-words. For non-words, it was lowest at closer spacings, confirming crowding. However, for words, accuracy remained high at all inter-letter spacings showing that crowding did not prevent identification of the inner letters. This result supports models of "holistic" word recognition where partial cues can lead to recognition without first identifying individual letters. Once the word is recognized, its inner letters can be recovered, despite their feature loss produced by crowding.
Collapse
Affiliation(s)
- June Cutler
- Department of Psychology, Glendon College, York University, Toronto, ON, M4N 3M6, Canada
| | - Alexandre Bodet
- Department of Psychology, Glendon College, York University, Toronto, ON, M4N 3M6, Canada
| | - Josée Rivest
- Department of Psychology, Glendon College, York University, Toronto, ON, M4N 3M6, Canada; Centre for Vision Research, York University, Toronto, ON, M3J 1P3, Canada.
| | - Patrick Cavanagh
- Department of Psychology, Glendon College, York University, Toronto, ON, M4N 3M6, Canada; Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA; Centre for Vision Research, York University, Toronto, ON, M3J 1P3, Canada
| |
Collapse
|
9
|
Bertamini M, Oletto CM, Contemori G. The Role of Uniform Textures in Making Texture Elements Visible in the Visual Periphery. Open Mind (Camb) 2024; 8:462-482. [PMID: 38665546 PMCID: PMC11045036 DOI: 10.1162/opmi_a_00136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 02/25/2024] [Indexed: 04/28/2024] Open
Abstract
There are important differences between central and peripheral vision. With respect to shape, contours retain phenomenal sharpness, although some contours disappear if they are near other contours. This leads to some uniform textures to appear non-uniform (Honeycomb illusion, Bertamini et al., 2016). Unlike other phenomena of shape perception in the periphery, this illusion is showing how continuity of the texture does not contribute to phenomenal continuity. We systematically varied the relationship between central and peripheral regions, and we collected subjective reports (how far can one see lines) as well as judgments of line orientation. We used extended textures created with a square grid and some additional lines that are invisible when they are located at the corners of the grid, or visible when they are separated from the grid (control condition). With respects to subjective reports, we compared the region of visibility for cases in which the texture was uniform (Exp 1a), or when in a central region the lines were different (Exp 1b). There were no differences, showing no role of objective uniformity on visibility. Next, in addition to the region of visibility we measured sensitivity using a forced-choice task (line tilted left or right) (Exp 2). The drop in sensitivity with eccentricity matched the size of the region in which lines were perceived in the illusion condition, but not in the control condition. When participants were offered a choice to report of the lines were present or absent (Exp 3) they confirmed that they did not see them in the illusion condition, but saw them in the control condition. We conclude that mechanisms that control perception of contours operate differently in the periphery, and override prior expectations, including that of uniformity. Conversely, when elements are detected in the periphery, we assign to them properties based on information from central vision, but these shapes cannot be identified correctly when the task requires such discrimination.
Collapse
|
10
|
Semizer Y, Yu D, Rosenholtz R. Peripheral vision and crowding in mental maze-solving. J Vis 2024; 24:22. [PMID: 38662347 PMCID: PMC11055501 DOI: 10.1167/jov.24.4.22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Indexed: 04/26/2024] Open
Abstract
Solving a maze effectively relies on both perception and cognition. Studying maze-solving behavior contributes to our knowledge about these important processes. Through psychophysical experiments and modeling simulations, we examine the role of peripheral vision, specifically visual crowding in the periphery, in mental maze-solving. Experiment 1 measured gaze patterns while varying maze complexity, revealing a direct relationship between visual complexity and maze-solving efficiency. Simulations of the maze-solving task using a peripheral vision model confirmed the observed crowding effects while making an intriguing prediction that saccades provide a conservative measure of how far ahead observers can perceive the path. Experiment 2 confirms that observers can judge whether a point lies on the path at considerably greater distances than their average saccade. Taken together, our findings demonstrate that peripheral vision plays a key role in mental maze-solving.
Collapse
Affiliation(s)
- Yelda Semizer
- Department of Humanities and Social Sciences, New Jersey Institute of Technology, Newark, NJ, USA
| | - Dian Yu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ruth Rosenholtz
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
11
|
Keshvari S, Wijntjes MWA. Peripheral material perception. J Vis 2024; 24:13. [PMID: 38625088 PMCID: PMC11033595 DOI: 10.1167/jov.24.4.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 02/19/2024] [Indexed: 04/17/2024] Open
Abstract
Humans can rapidly identify materials, such as wood or leather, even within a complex visual scene. Given a single image, one can easily identify the underlying "stuff," even though a given material can have highly variable appearance; fabric comes in unlimited variations of shape, pattern, color, and smoothness, yet we have little trouble categorizing it as fabric. What visual cues do we use to determine material identity? Prior research suggests that simple "texture" features of an image, such as the power spectrum, capture information about material properties and identity. Few studies, however, have tested richer and biologically motivated models of texture. We compared baseline material classification performance to performance with synthetic textures generated from the Portilla-Simoncelli model and several common image degradations. The textures retain statistical information but are otherwise random. We found that performance with textures and most degradations was well below baseline, suggesting insufficient information to support foveal material perception. Interestingly, modern research suggests that peripheral vision might use a statistical, texture-like representation. In a second set of experiments, we found that peripheral performance is more closely predicted by texture and other image degradations. These findings delineate the nature of peripheral material classification.
Collapse
Affiliation(s)
| | - Maarten W A Wijntjes
- Perceptual Intelligence Lab, Industrial Design Engineering, Delft University of Technology, Delft, Netherlands
| |
Collapse
|
12
|
Zhaoping L. Peripheral vision is mainly for looking rather than seeing. Neurosci Res 2024; 201:18-26. [PMID: 38000447 DOI: 10.1016/j.neures.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 11/14/2023] [Indexed: 11/26/2023]
Abstract
Vision includes looking and seeing. Looking, mainly via gaze shifts, selects a fraction of visual input information for passage through the brain's information bottleneck. The selected input is placed within the attentional spotlight, typically in the central visual field. Seeing decodes, i.e., recognizes and discriminates, the selected inputs. Hence, peripheral vision should be mainly devoted to looking, in particular, deciding where to shift the gaze. Looking is often guided exogenously by a saliency map created by the primary visual cortex (V1), and can be effective with no seeing and limited awareness. In seeing, peripheral vision not only suffers from poor spatial resolution, but is also subject to crowding and is more vulnerable to illusions by misleading, ambiguous, and impoverished visual inputs. Central vision, mainly for seeing, enjoys the top-down feedback that aids seeing in light of the bottleneck which is hypothesized to starts from V1 to higher areas. This feedback queries for additional information from lower visual cortical areas such as V1 for ongoing recognition. Peripheral vision is deficient in this feedback according to the Central-peripheral Dichotomy (CPD) theory. The saccades engendered by peripheral vision allows looking to combine with seeing to give human observers the impression of seeing the whole scene clearly despite inattentional blindness.
Collapse
Affiliation(s)
- Li Zhaoping
- University of Tübingen, Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
| |
Collapse
|
13
|
Chang TY, Cha O, McGugin R, Tomarken A, Gauthier I. How general is ensemble perception? PSYCHOLOGICAL RESEARCH 2024; 88:695-708. [PMID: 37861726 DOI: 10.1007/s00426-023-01883-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 09/28/2023] [Indexed: 10/21/2023]
Abstract
People can summarize features of groups of objects (e.g., the mean size of apples). Claims of dissociations or common mechanisms supporting such ensemble perception (EP) judgments have generally been made on the basis of correlations between pairs of tasks. These correlations can be inflated because they use the same stimuli, summary statistics and/or task format. Performance on EP tasks also correlates with that on object recognition (OR) tasks. Here, we seek evidence for a general EP ability that is also distinct from OR ability. Two-hundred participants completed three tasks that did not overlap in stimuli, summary statistic or task format. Participants performed a diversity comparison for arrays of nonsense blobs, a mean identity judgment with ensembles of Transformer toys, and the novel object memory task with novel objects (NOMT-Greeble). We hypothesized that EP contributes to the first two of these tasks, while OR contributes only to the last two. Performance on the two tasks suggested to tap an EP ability were correlated after controlling for the third task. Confirmatory factor analysis was used to test our predictions without the confound of measurement error. Correlations between factors assumed to share influence from EP or from OR were higher than that between the factors that we expect did not share these influences. The results provide the first clear evidence for a domain-general EP ability distinct from OR. We argue that understanding such a general ability will require a change in designs and analytical approaches in the study if individual differences in EP.
Collapse
Affiliation(s)
- Ting-Yun Chang
- Department of Psychology, Vanderbilt University, 111 21St Avenue South, Nashville, TN, 37240, USA.
| | - Oakyoon Cha
- Department of Psychology, Sungshin Women's University, Seoul, South Korea
| | - Rankin McGugin
- Department of Psychology, Vanderbilt University, 111 21St Avenue South, Nashville, TN, 37240, USA
| | - Andrew Tomarken
- Department of Psychology, Vanderbilt University, 111 21St Avenue South, Nashville, TN, 37240, USA
| | - Isabel Gauthier
- Department of Psychology, Vanderbilt University, 111 21St Avenue South, Nashville, TN, 37240, USA.
| |
Collapse
|
14
|
Tiurina NA, Markov YA, Whitney D, Pascucci D. The functional role of spatial anisotropies in ensemble perception. BMC Biol 2024; 22:28. [PMID: 38317216 PMCID: PMC10845794 DOI: 10.1186/s12915-024-01822-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 01/10/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND The human brain can rapidly represent sets of similar stimuli by their ensemble summary statistics, like the average orientation or size. Classic models assume that ensemble statistics are computed by integrating all elements with equal weight. Challenging this view, here, we show that ensemble statistics are estimated by combining parafoveal and foveal statistics in proportion to their reliability. In a series of experiments, observers reproduced the average orientation of an ensemble of stimuli under varying levels of visual uncertainty. RESULTS Ensemble statistics were affected by multiple spatial biases, in particular, a strong and persistent bias towards the center of the visual field. This bias, evident in the majority of subjects and in all experiments, scaled with uncertainty: the higher the uncertainty in the ensemble statistics, the larger the bias towards the element shown at the fovea. CONCLUSION Our findings indicate that ensemble perception cannot be explained by simple uniform pooling. The visual system weights information anisotropically from both the parafovea and the fovea, taking the intrinsic spatial anisotropies of vision into account to compensate for visual uncertainty.
Collapse
Affiliation(s)
- Natalia A Tiurina
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
- Department of Psychology, Technische Universität Dresden, Dresden, Germany.
| | - Yuri A Markov
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany
| | - David Whitney
- Vision Science Graduate Group, University of California, Berkeley, Berkeley, USA
- Department of Psychology, University of California, Berkeley, Berkeley, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, USA
| | - David Pascucci
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| |
Collapse
|
15
|
Van der Burg E, Cass J, Olivers CNL. A CODE model bridging crowding in sparse and dense displays. Vision Res 2024; 215:108345. [PMID: 38142531 DOI: 10.1016/j.visres.2023.108345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 12/04/2023] [Accepted: 12/04/2023] [Indexed: 12/26/2023]
Abstract
Visual crowding is arguably the strongest limitation imposed on extrafoveal vision, and is a relatively well-understood phenomenon. However, most investigations and theories are based on sparse displays consisting of a target and at most a handful of flanker objects. Recent findings suggest that the laws thought to govern crowding may not hold for densely cluttered displays, and that grouping and nearest neighbour effects may be more important. Here we present a computational model that accounts for crowding effects in both sparse and dense displays. The model is an adaptation and extension of an earlier model that has previously successfully accounted for spatial clustering, numerosity and object-based attention phenomena. Our model combines grouping by proximity and similarity with a nearest neighbour rule, and defines crowding as the extent to which target and flankers fail to segment. We show that when the model is optimized for explaining crowding phenomena in classic, sparse displays, it also does a good job in capturing novel crowding patterns in dense displays, in both existing and new data sets. The model thus ties together different principles governing crowding, specifically Bouma's law, grouping, and nearest neighbour similarity effects.
Collapse
Affiliation(s)
| | - John Cass
- MARCS Institute of Brain, Behaviour & Development, Western Sydney University, Australia
| | - Christian N L Olivers
- Institute for Brain and Behaviour Amsterdam, the Netherlands; Department of Experimental and Applied Psychology, Vrije Universiteit Amsterdam, the Netherlands
| |
Collapse
|
16
|
Moore CM, Zheng Q. Limited midlevel mediation of visual crowding: Surface completion fails to support uncrowding. J Vis 2024; 24:11. [PMID: 38294775 PMCID: PMC10839818 DOI: 10.1167/jov.24.1.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 12/10/2023] [Indexed: 02/01/2024] Open
Abstract
Visual crowding refers to impaired object recognition that is caused by nearby stimuli. It increases with eccentricity. Image-level explanations of crowding maintain that it is caused by information loss within early encoding processes that vary in functionality with eccentricity. Alternative explanations maintain that the interference is not limited to two-dimensional image-level interactions but that it is mediated within representations that reflect three-dimensional scene structure. Uncrowding refers to when adding stimulus information to a display, which increases the noise at an image level, nonetheless decreasing the amount of crowding that occurs. Uncrowding has been interpreted as evidence of midlevel mediation of crowding because the additional information tends to provide an opportunity for perceptually organizing stimuli into distinct and therefore protected representations. It is difficult, however, to rule out image-level explanations of crowding and uncrowding when stimulus differences exist between conditions. We adapted displays of a specific form of uncrowding to minimize stimulus differences across conditions, while retaining the potential for perceptual organization, specifically perceptual surface completion. Uncrowding under these conditions would provide strong support for midlevel mediation of crowding. In five experiments, however, we found no evidence of midlevel mediation of crowding, indicating that at least for this version of uncrowding, image-level explanations cannot be ruled out.
Collapse
Affiliation(s)
- Cathleen M Moore
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, USA
| | - Qingzi Zheng
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
17
|
Balas B, Greene MR. The role of texture summary statistics in material recognition from drawings and photographs. J Vis 2023; 23:3. [PMID: 38064227 PMCID: PMC10709799 DOI: 10.1167/jov.23.14.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 10/29/2023] [Indexed: 12/18/2023] Open
Abstract
Material depictions in artwork are useful tools for revealing image features that support material categorization. For example, artistic recipes for drawing specific materials make explicit the critical information leading to recognizable material properties (Di Cicco, Wjintjes, & Pont, 2020) and investigating the recognizability of material renderings as a function of their visual features supports conclusions about the vocabulary of material perception. Here, we examined how the recognition of materials from photographs and drawings was affected by the application of the Portilla-Simoncelli texture synthesis model. This manipulation allowed us to examine how categorization may be affected differently across materials and image formats when only summary statistic information about appearance was retained. Further, we compared human performance to the categorization accuracy obtained from a pretrained deep convolutional neural network to determine if observers' performance was reflected in the network. Although we found some similarities between human and network performance for photographic images, the results obtained from drawings differed substantially. Our results demonstrate that texture statistics play a variable role in material categorization across rendering formats and material categories and that the human perception of material drawings is not effectively captured by deep convolutional neural networks trained for object recognition.
Collapse
Affiliation(s)
- Benjamin Balas
- Psychology Department, North Dakota State University, Fargo, ND, USA
| | - Michelle R Greene
- Psychology Department, Barnard College, Columbia University, New York, NY, USA
| |
Collapse
|
18
|
Feather J, Leclerc G, Mądry A, McDermott JH. Model metamers reveal divergent invariances between biological and artificial neural networks. Nat Neurosci 2023; 26:2017-2034. [PMID: 37845543 PMCID: PMC10620097 DOI: 10.1038/s41593-023-01442-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/29/2023] [Indexed: 10/18/2023]
Abstract
Deep neural network models of sensory systems are often proposed to learn representational transformations with invariances like those in the brain. To reveal these invariances, we generated 'model metamers', stimuli whose activations within a model stage are matched to those of a natural stimulus. Metamers for state-of-the-art supervised and unsupervised neural network models of vision and audition were often completely unrecognizable to humans when generated from late model stages, suggesting differences between model and human invariances. Targeted model changes improved human recognizability of model metamers but did not eliminate the overall human-model discrepancy. The human recognizability of a model's metamers was well predicted by their recognizability by other models, suggesting that models contain idiosyncratic invariances in addition to those required by the task. Metamer recognizability dissociated from both traditional brain-based benchmarks and adversarial vulnerability, revealing a distinct failure mode of existing sensory models and providing a complementary benchmark for model assessment.
Collapse
Affiliation(s)
- Jenelle Feather
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Computational Neuroscience, Flatiron Institute, Cambridge, MA, USA.
| | - Guillaume Leclerc
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aleksander Mądry
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
19
|
Robinson MM, Brady TF. A quantitative model of ensemble perception as summed activation in feature space. Nat Hum Behav 2023; 7:1638-1651. [PMID: 37402880 PMCID: PMC10810262 DOI: 10.1038/s41562-023-01602-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 04/14/2023] [Indexed: 07/06/2023]
Abstract
Ensemble perception is a process by which we summarize complex scenes. Despite the importance of ensemble perception to everyday cognition, there are few computational models that provide a formal account of this process. Here we develop and test a model in which ensemble representations reflect the global sum of activation signals across all individual items. We leverage this set of minimal assumptions to formally connect a model of memory for individual items to ensembles. We compare our ensemble model against a set of alternative models in five experiments. Our approach uses performance on a visual memory task for individual items to generate zero-free-parameter predictions of interindividual and intraindividual differences in performance on an ensemble continuous-report task. Our top-down modelling approach formally unifies models of memory for individual items and ensembles and opens a venue for building and comparing models of distinct memory processes and representations.
Collapse
Affiliation(s)
- Maria M Robinson
- Psychology Department, University of California, San Diego, La Jolla, CA, USA.
| | - Timothy F Brady
- Psychology Department, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
20
|
Smithers SP, Shao Y, Altham J, Bex PJ. Large depth differences between target and flankers can increase crowding: Evidence from a multi-depth plane display. eLife 2023; 12:e85143. [PMID: 37665324 PMCID: PMC10476968 DOI: 10.7554/elife.85143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 07/20/2023] [Indexed: 09/05/2023] Open
Abstract
Crowding occurs when the presence of nearby features causes highly visible objects to become unrecognizable. Although crowding has implications for many everyday tasks and the tremendous amounts of research reflect its importance, surprisingly little is known about how depth affects crowding. Most available studies show that stereoscopic disparity reduces crowding, indicating that crowding may be relatively unimportant in three-dimensional environments. However, most previous studies tested only small stereoscopic differences in depth in which disparity, defocus blur, and accommodation are inconsistent with the real world. Using a novel multi-depth plane display, this study investigated how large (0.54-2.25 diopters), real differences in target-flanker depth, representative of those experienced between many objects in the real world, affect crowding. Our findings show that large differences in target-flanker depth increased crowding in the majority of observers, contrary to previous work showing reduced crowding in the presence of small depth differences. Furthermore, when the target was at fixation depth, crowding was generally more pronounced when the flankers were behind the target as opposed to in front of it. However, when the flankers were at fixation depth, crowding was generally more pronounced when the target was behind the flankers. These findings suggest that crowding from clutter outside the limits of binocular fusion can still have a significant impact on object recognition and visual perception in the peripheral field.
Collapse
Affiliation(s)
- Samuel P Smithers
- Department of Psychology, Northeastern UniversityBostonUnited States
| | - Yulong Shao
- Department of Psychology, Northeastern UniversityBostonUnited States
| | - James Altham
- Department of Psychology, Northeastern UniversityBostonUnited States
| | - Peter J Bex
- Department of Psychology, Northeastern UniversityBostonUnited States
| |
Collapse
|
21
|
Henderson MM, Tarr MJ, Wehbe L. A Texture Statistics Encoding Model Reveals Hierarchical Feature Selectivity across Human Visual Cortex. J Neurosci 2023; 43:4144-4161. [PMID: 37127366 PMCID: PMC10255092 DOI: 10.1523/jneurosci.1822-22.2023] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 03/21/2023] [Accepted: 03/26/2023] [Indexed: 05/03/2023] Open
Abstract
Midlevel features, such as contour and texture, provide a computational link between low- and high-level visual representations. Although the nature of midlevel representations in the brain is not fully understood, past work has suggested a texture statistics model, called the P-S model (Portilla and Simoncelli, 2000), is a candidate for predicting neural responses in areas V1-V4 as well as human behavioral data. However, it is not currently known how well this model accounts for the responses of higher visual cortex to natural scene images. To examine this, we constructed single-voxel encoding models based on P-S statistics and fit the models to fMRI data from human subjects (both sexes) from the Natural Scenes Dataset (Allen et al., 2022). We demonstrate that the texture statistics encoding model can predict the held-out responses of individual voxels in early retinotopic areas and higher-level category-selective areas. The ability of the model to reliably predict signal in higher visual cortex suggests that the representation of texture statistics features is widespread throughout the brain. Furthermore, using variance partitioning analyses, we identify which features are most uniquely predictive of brain responses and show that the contributions of higher-order texture features increase from early areas to higher areas on the ventral and lateral surfaces. We also demonstrate that patterns of sensitivity to texture statistics can be used to recover broad organizational axes within visual cortex, including dimensions that capture semantic image content. These results provide a key step forward in characterizing how midlevel feature representations emerge hierarchically across the visual system.SIGNIFICANCE STATEMENT Intermediate visual features, like texture, play an important role in cortical computations and may contribute to tasks like object and scene recognition. Here, we used a texture model proposed in past work to construct encoding models that predict the responses of neural populations in human visual cortex (measured with fMRI) to natural scene stimuli. We show that responses of neural populations at multiple levels of the visual system can be predicted by this model, and that the model is able to reveal an increase in the complexity of feature representations from early retinotopic cortex to higher areas of ventral and lateral visual cortex. These results support the idea that texture-like representations may play a broad underlying role in visual processing.
Collapse
Affiliation(s)
- Margaret M Henderson
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Psychology
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Michael J Tarr
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Psychology
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Leila Wehbe
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Psychology
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| |
Collapse
|
22
|
Bowen JD, Alforque CV, Silver MA. Effects of involuntary and voluntary attention on critical spacing of visual crowding. J Vis 2023; 23:2. [PMID: 36862108 PMCID: PMC9987171 DOI: 10.1167/jov.23.3.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023] Open
Abstract
Visual spatial attention can be allocated in two distinct ways: one that is voluntarily directed to behaviorally relevant locations in the world, and one that is involuntarily captured by salient external stimuli. Precueing spatial attention has been shown to improve perceptual performance on a number of visual tasks. However, the effects of spatial attention on visual crowding, defined as the reduction in the ability to identify target objects in clutter, are far less clear. In this study, we used an anticueing paradigm to separately measure the effects of involuntary and voluntary spatial attention on a crowding task. Each trial began with a brief peripheral cue that predicted that the crowded target would appear on the opposite side of the screen 80% of the time and on the same side of the screen 20% of the time. Subjects performed an orientation discrimination task on a target Gabor patch that was flanked by other similar Gabor patches with independent random orientations. For trials with a short stimulus onset asynchrony between cue and target, involuntary capture of attention led to faster response times and smaller critical spacing when the target appeared on the cue side. For trials with a long stimulus onset asynchrony, voluntary allocation of attention led to faster reaction times but no significant effect on critical spacing when the target appeared on the opposite side to the cue. We additionally found that the magnitudes of these cueing effects of involuntary and voluntary attention were not strongly correlated across subjects for either reaction time or critical spacing.
Collapse
Affiliation(s)
- Joel D Bowen
- Vision Science Graduate Group, University of California Berkeley, Berkeley, CA, USA.,
| | - Carissa V Alforque
- Herbert Wertheim School of Optometry & Vision Science, University of California Berkeley, Berkeley, CA, USA.,
| | - Michael A Silver
- Vision Science Graduate Group, University of California Berkeley, Berkeley, CA, USA.,Herbert Wertheim School of Optometry & Vision Science, University of California Berkeley, Berkeley, CA, USA.,Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA.,
| |
Collapse
|
23
|
Henry CA, Kohn A. Feature representation under crowding in macaque V1 and V4 neuronal populations. Curr Biol 2022; 32:5126-5137.e3. [PMID: 36379216 PMCID: PMC9729449 DOI: 10.1016/j.cub.2022.10.049] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 09/02/2022] [Accepted: 10/21/2022] [Indexed: 11/16/2022]
Abstract
Visual perception depends strongly on spatial context. A profound example is visual crowding, whereby the presence of nearby stimuli impairs the discriminability of object features. Despite extensive work on perceptual crowding and the spatial integrative properties of visual cortical neurons, the link between these two aspects of visual processing remains unclear. To understand better the neural basis of crowding, we recorded activity simultaneously from neuronal populations in V1 and V4 of fixating macaque monkeys. We assessed the information available from the measured responses about the orientation of a visual target both for targets presented in isolation and amid distractors. Both single neuron and population responses had less information about target orientation when distractors were present. Information loss was moderate in V1 and more substantial in V4. Information loss could be traced to systematic divisive and additive changes in neuronal tuning. Additive and multiplicative changes in tuning were more severe in V4; in addition, tuning exhibited other, non-affine transformations that were greater in V4, further restricting the ability of a fixed sensory readout strategy to extract accurate feature information across displays. Our results provide a direct test of crowding effects at different stages of the visual hierarchy. They reveal how crowded visual environments alter the spiking activity of cortical populations by which sensory stimuli are encoded and connect these changes to established mechanisms of neuronal spatial integration.
Collapse
Affiliation(s)
- Christopher A Henry
- Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY 10461, USA.
| | - Adam Kohn
- Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY 10461, USA; Department of Ophthalmology and Visual Sciences, Albert Einstein College of Medicine, Bronx, NY 10461, USA; Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA.
| |
Collapse
|
24
|
Tesileanu T, Piasini E, Balasubramanian V. Efficient processing of natural scenes in visual cortex. Front Cell Neurosci 2022; 16:1006703. [PMID: 36545653 PMCID: PMC9760692 DOI: 10.3389/fncel.2022.1006703] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022] Open
Abstract
Neural circuits in the periphery of the visual, auditory, and olfactory systems are believed to use limited resources efficiently to represent sensory information by adapting to the statistical structure of the natural environment. This "efficient coding" principle has been used to explain many aspects of early visual circuits including the distribution of photoreceptors, the mosaic geometry and center-surround structure of retinal receptive fields, the excess OFF pathways relative to ON pathways, saccade statistics, and the structure of simple cell receptive fields in V1. We know less about the extent to which such adaptations may occur in deeper areas of cortex beyond V1. We thus review recent developments showing that the perception of visual textures, which depends on processing in V2 and beyond in mammals, is adapted in rats and humans to the multi-point statistics of luminance in natural scenes. These results suggest that central circuits in the visual brain are adapted for seeing key aspects of natural scenes. We conclude by discussing how adaptation to natural temporal statistics may aid in learning and representing visual objects, and propose two challenges for the future: (1) explaining the distribution of shape sensitivity in the ventral visual stream from the statistics of object shape in natural images, and (2) explaining cell types of the vertebrate retina in terms of feature detectors that are adapted to the spatio-temporal structures of natural stimuli. We also discuss how new methods based on machine learning may complement the normative, principles-based approach to theoretical neuroscience.
Collapse
Affiliation(s)
- Tiberiu Tesileanu
- Center for Computational Neuroscience, Flatiron Institute, New York, NY, United States
| | - Eugenio Piasini
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Vijay Balasubramanian
- Department of Physics and Astronomy, David Rittenhouse Laboratory, University of Pennsylvania, Philadelphia, PA, United States
- Santa Fe Institute, Santa Fe, NM, United States
| |
Collapse
|
25
|
Chung S, Brasel SA. Digital claustrophobia: Affective responses to digital design decisions. COMPUTERS IN HUMAN BEHAVIOR REPORTS 2022. [DOI: 10.1016/j.chbr.2022.100259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
|
26
|
Herzog MH, Sayim B. Crowding: Recent advances and perspectives. J Vis 2022; 22:15. [DOI: 10.1167/jov.22.12.15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Michael H. Herzog
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland
| | - Bilge Sayim
- Sciences Cognitives et Sciences Affectives (SCALab), CNRS, UMR 9193, University of Lille, Lille, France
- Institute of Psychology, University of Bern, Bern, Switzerland
| |
Collapse
|
27
|
Cicchini GM, D'Errico G, Burr DC. Crowding results from optimal integration of visual targets with contextual information. Nat Commun 2022; 13:5741. [PMID: 36180497 PMCID: PMC9525686 DOI: 10.1038/s41467-022-33508-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 09/16/2022] [Indexed: 11/22/2022] Open
Abstract
Crowding is the inability to recognize an object in clutter, usually considered a fundamental low-level bottleneck to object recognition. Here we advance and test an alternative idea, that crowding, like predictive phenomena such as serial dependence, results from optimizing strategies that exploit redundancies in natural scenes. This notion leads to several testable predictions: crowding should be greatest for unreliable targets and reliable flankers; crowding-induced biases should be maximal when target and flankers have similar orientations, falling off for differences around 20°; flanker interference should be associated with higher precision in orientation judgements, leading to lower overall error rate; effects should be maximal when the orientation of the target is near that of the average of the flankers, rather than to that of individual flankers. Each of these predictions were supported, and could be simulated with ideal-observer models that maximize performance. The results suggest that while crowding can affect object recognition, it may be better understood not as a processing bottleneck, but as a consequence of efficient exploitation of the spatial redundancies of the natural world.
Collapse
Affiliation(s)
| | | | - David Charles Burr
- Institute of Neuroscience, CNR, via Moruzzi, 1, 56124, Pisa, Italy.
- Department of Neurosciences, Psychology, Drug Research and Child Health, University of Florence, viale Pieraccini, 6, 50139, Firenze, Italy.
| |
Collapse
|
28
|
Ensemble perception without phenomenal awareness of elements. Sci Rep 2022; 12:11922. [PMID: 35831387 PMCID: PMC9279487 DOI: 10.1038/s41598-022-15850-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 06/30/2022] [Indexed: 11/09/2022] Open
Abstract
Humans efficiently recognize complex scenes by grouping multiple features and objects into ensembles. It has been suggested that ensemble processing does not require, or even impairs, conscious discrimination of individual element properties. The present study examined whether ensemble perception requires phenomenal awareness of elements. We asked observers to judge the mean orientation of a line-based texture pattern whose central region was made invisible by backward masks. Masks were composed of either a Mondrian pattern (Exp. 1) or of an annular contour (Exp. 2) which, unlike the Mondrian, did not overlap spatially with elements in the central region. In the Mondrian-mask experiment, perceived mean orientation was determined only by visible elements outside the central region. However, in the annular-mask experiment, perceived mean orientation matched the mean orientation of all elements, including invisible elements within the central region. Results suggest that the visual system can compute spatial ensembles even without phenomenal awareness of stimuli.
Collapse
|
29
|
Shamsi F, Liu R, Kwon M. Foveal crowding appears to be robust to normal aging and glaucoma unlike parafoveal and peripheral crowding. J Vis 2022; 22:10. [PMID: 35848904 PMCID: PMC9308014 DOI: 10.1167/jov.22.8.10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 06/17/2022] [Indexed: 11/24/2022] Open
Abstract
Visual crowding is the inability to recognize a target object in clutter. Previous studies have shown an increase in crowding in both parafoveal and peripheral vision in normal aging and glaucoma. Here, we ask whether there is any increase in foveal crowding in both normal aging and glaucomatous vision. Twenty-four patients with glaucoma and 24 age-matched normally sighted controls (mean age = 65 ± 7 vs. 60 ± 8 years old) participated in this study. For each subject, we measured the extent of foveal crowding using Pelli's foveal crowding paradigm (2016). We found that the average crowding zone was 0.061 degrees for glaucoma and 0.056 degrees for age-matched normal vision, respectively. These values fall into the range of foveal crowding zones (0.0125 degrees to 0.1 degrees) observed in young normal vision. We, however, did not find any evidence supporting increased foveal crowding in glaucoma (p = 0.375), at least in the early to moderate stages of glaucoma. In the light of previous studies on foveal crowding in normal young vision, we did not find any evidence supporting age-related changes in foveal crowding. Even if there is any, the effect appears to be rather inconsequential. Taken together, our findings suggest unlike parafoveal or peripheral crowding (2 degrees, 4 degrees, 8 degrees, and 10 degrees eccentricities), foveal crowding (<0.25 degrees eccentricity) appears to be less vulnerable to normal aging or moderate glaucomatous damage.
Collapse
Affiliation(s)
- Foroogh Shamsi
- Department of Psychology, Northeastern University, Boston, MA, USA
| | - Rong Liu
- Department of Psychology, Northeastern University, Boston, MA, USA
- Department of Ophthalmology and Visual Sciences, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Life Science and Medicine, University of Science and Technology of China, Hefei, China
| | - MiYoung Kwon
- Department of Psychology, Northeastern University, Boston, MA, USA
- Department of Ophthalmology and Visual Sciences, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
30
|
Wolfe B, Sawyer BD, Rosenholtz R. Toward a Theory of Visual Information Acquisition in Driving. HUMAN FACTORS 2022; 64:694-713. [PMID: 32678682 PMCID: PMC9136385 DOI: 10.1177/0018720820939693] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 06/09/2020] [Indexed: 06/01/2023]
Abstract
OBJECTIVE The aim of this study is to describe information acquisition theory, explaining how drivers acquire and represent the information they need. BACKGROUND While questions of what drivers are aware of underlie many questions in driver behavior, existing theories do not directly address how drivers in particular and observers in general acquire visual information. Understanding the mechanisms of information acquisition is necessary to build predictive models of drivers' representation of the world and can be applied beyond driving to a wide variety of visual tasks. METHOD We describe our theory of information acquisition, looking to questions in driver behavior and results from vision science research that speak to its constituent elements. We focus on the intersection of peripheral vision, visual attention, and eye movement planning and identify how an understanding of these visual mechanisms and processes in the context of information acquisition can inform more complete models of driver knowledge and state. RESULTS We set forth our theory of information acquisition, describing the gap in understanding that it fills and how existing questions in this space can be better understood using it. CONCLUSION Information acquisition theory provides a new and powerful way to study, model, and predict what drivers know about the world, reflecting our current understanding of visual mechanisms and enabling new theories, models, and applications. APPLICATION Using information acquisition theory to understand how drivers acquire, lose, and update their representation of the environment will aid development of driver assistance systems, semiautonomous vehicles, and road safety overall.
Collapse
|
31
|
Abstract
Peripheral vision is fundamental for many real-world tasks, including walking, driving, and aviation. Nonetheless, there has been no effort to connect these applied literatures to research in peripheral vision in basic vision science or sports science. To close this gap, we analyzed 60 relevant papers, chosen according to objective criteria. Applied research, with its real-world time constraints, complex stimuli, and performance measures, reveals new functions of peripheral vision. Peripheral vision is used to monitor the environment (e.g., road edges, traffic signs, or malfunctioning lights), in ways that differ from basic research. Applied research uncovers new actions that one can perform solely with peripheral vision (e.g., steering a car, climbing stairs). An important use of peripheral vision is that it helps compare the position of one’s body/vehicle to objects in the world. In addition, many real-world tasks require multitasking, and the fact that peripheral vision provides degraded but useful information means that tradeoffs are common in deciding whether to use peripheral vision or move one’s eyes. These tradeoffs are strongly influenced by factors like expertise, age, distraction, emotional state, task importance, and what the observer already knows. These tradeoffs make it hard to infer from eye movements alone what information is gathered from peripheral vision and what tasks we can do without it. Finally, we recommend three ways in which basic, sport, and applied science can benefit each other’s methodology, furthering our understanding of peripheral vision more generally.
Collapse
|
32
|
Lin Z, Gong M, Li X. On the relation between crowding and ensemble perception: Examining the role of attention. Psych J 2022; 11:804-813. [PMID: 35557502 DOI: 10.1002/pchj.559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 04/10/2022] [Indexed: 11/06/2022]
Abstract
Ensemble perception of a crowd of stimuli is very accurate, even when individual stimuli are invisible due to crowding. The ability of high-precision ensemble perception can be an evolved compensatory mechanism for the limited attentional resolution caused by crowding. Thus the relationship of crowding and ensemble coding is like two sides of the same coin wherein attention may play a critical factor for their coexistence. The present study investigated whether crowding and ensemble coding were similarly modulated by attention, which can promote our understanding of their relation. Experiment 1 showed that diverting attention away from the target harmed the performance in both crowding and ensemble perception tasks regardless of stimulus density, but crowding was more severely harmed. Experiment 2 showed that directing attention toward the target bar enhanced the performance of crowding regardless of stimulus density. Ensemble perception of high-density bars was also enhanced but to a lesser extent, while ensemble perception of low-density bars was harmed. Together, our results indicate that crowding is strongly modulated by attention, whereas ensemble perception is only moderately modulated by attention, which conforms to the adaptive view.
Collapse
Affiliation(s)
- Zhen Lin
- School of Psychology, Jiangxi Normal University, Nanchang, China
| | - Mingliang Gong
- School of Psychology, Jiangxi Normal University, Nanchang, China
| | - Xiang Li
- School of Psychology, Jiangxi Normal University, Nanchang, China
| |
Collapse
|
33
|
Rummens K, Sayim B. Multidimensional feature interactions in visual crowding: When configural cues eliminate the polarity advantage. J Vis 2022; 22:2. [PMID: 35503508 PMCID: PMC9078080 DOI: 10.1167/jov.22.6.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 03/21/2022] [Indexed: 11/24/2022] Open
Abstract
Crowding occurs when surrounding objects (flankers) impair target perception. A key property of crowding is the weaker interference when target and flankers strongly differ on a given dimension. For instance, identification of a target letter is usually superior with flankers of opposite versus the same contrast polarity as the target (the "polarity advantage"). High performance when target-flanker similarity is low has been attributed to the ungrouping of target and flankers. Here, we show that configural cues can override the usual advantage of low target-flanker similarity, and strong target-flanker grouping can reduce - instead of exacerbate - crowding. In Experiment 1, observers were presented with line triplets in the periphery and reported the tilt (left or right) of the central line. Target and flankers had the same (uniform condition) or opposite contrast polarity (alternating condition). Flanker configurations were either upright (||), unidirectionally tilted (\\ or //), or bidirectionally tilted (\/ or /\). Upright flankers yielded stronger crowding than unidirectional flankers, and weaker crowding than bidirectional flankers. Importantly, our results revealed a clear interaction between contrast polarity and flanker configuration. Triplets with upright and bidirectional flankers, but not unidirectional flankers, showed the polarity advantage. In Experiments 2 and 3, we showed that emergent features and redundancy masking (i.e. the reduction of the number of perceived items in repeating configurations) made it easier to discriminate between uniform triplets when flanker tilts were unidirectional (but not when bidirectional). We propose that the spatial configurations of uniform triplets with unidirectional flankers provided sufficient task-relevant information to enable a similar performance as with alternating triplets: strong-target flanker grouping alleviated crowding. We suggest that features which modulate crowding strength can interact non-additively, limiting the validity of typical crowding rules to contexts where only single, independent dimensions determine the effects of target-flanker similarity.
Collapse
Affiliation(s)
- Koen Rummens
- University of Bern, Institute of Psychology, Bern, Switzerland
| | - Bilge Sayim
- University of Bern, Institute of Psychology, Bern, Switzerland
- Université de Lille, CNRS, UMR 9193 - SCALab - Sciences Cognitives et Sciences Affectives, Lille, France
| |
Collapse
|
34
|
Abstract
Humans are exquisitely sensitive to the spatial arrangement of visual features in objects and scenes, but not in visual textures. Category-selective regions in the visual cortex are widely believed to underlie object perception, suggesting such regions should distinguish natural images of objects from synthesized images containing similar visual features in scrambled arrangements. Contrarily, we demonstrate that representations in category-selective cortex do not discriminate natural images from feature-matched scrambles but can discriminate images of different categories, suggesting a texture-like encoding. We find similar insensitivity to feature arrangement in Imagenet-trained deep convolutional neural networks. This suggests the need to reconceptualize the role of category-selective cortex as representing a basis set of complex texture-like features, useful for a myriad of behaviors. The human visual ability to recognize objects and scenes is widely thought to rely on representations in category-selective regions of the visual cortex. These representations could support object vision by specifically representing objects, or, more simply, by representing complex visual features regardless of the particular spatial arrangement needed to constitute real-world objects, that is, by representing visual textures. To discriminate between these hypotheses, we leveraged an image synthesis approach that, unlike previous methods, provides independent control over the complexity and spatial arrangement of visual features. We found that human observers could easily detect a natural object among synthetic images with similar complex features that were spatially scrambled. However, observer models built from BOLD responses from category-selective regions, as well as a model of macaque inferotemporal cortex and Imagenet-trained deep convolutional neural networks, were all unable to identify the real object. This inability was not due to a lack of signal to noise, as all observer models could predict human performance in image categorization tasks. How then might these texture-like representations in category-selective regions support object perception? An image-specific readout from category-selective cortex yielded a representation that was more selective for natural feature arrangement, showing that the information necessary for natural object discrimination is available. Thus, our results suggest that the role of the human category-selective visual cortex is not to explicitly encode objects but rather to provide a basis set of texture-like features that can be infinitely reconfigured to flexibly learn and identify new object categories.
Collapse
|
35
|
Tamura H, Prokott KE, Fleming RW. Distinguishing mirror from glass: A "big data" approach to material perception. J Vis 2022; 22:4. [PMID: 35266961 PMCID: PMC8934559 DOI: 10.1167/jov.22.4.4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Distinguishing mirror from glass is a challenging visual inference, because both materials derive their appearance from their surroundings, yet we rarely experience difficulties in telling them apart. Very few studies have investigated how the visual system distinguishes reflections from refractions and to date, there is no image-computable model that emulates human judgments. Here we sought to develop a deep neural network that reproduces the patterns of visual judgments human observers make. To do this, we trained thousands of convolutional neural networks on more than 750,000 simulated mirror and glass objects, and compared their performance with human judgments, as well as alternative classifiers based on "hand-engineered" image features. For randomly chosen images, all classifiers and humans performed with high accuracy, and therefore correlated highly with one another. However, to assess how similar models are to humans, it is not sufficient to compare accuracy or correlation on random images. A good model should also predict the characteristic errors that humans make. We, therefore, painstakingly assembled a diagnostic image set for which humans make systematic errors, allowing us to isolate signatures of human-like performance. A large-scale, systematic search through feedforward neural architectures revealed that relatively shallow (three-layer) networks predicted human judgments better than any other models we tested. This is the first image-computable model that emulates human errors and succeeds in distinguishing mirror from glass, and hints that mid-level visual processing might be particularly important for the task.
Collapse
Affiliation(s)
- Hideki Tamura
- Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi, Japan.,
| | - Konrad Eugen Prokott
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.,
| | - Roland W Fleming
- Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.,Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany.,
| |
Collapse
|
36
|
Rideaux R, West RK, Wallis TSA, Bex PJ, Mattingley JB, Harrison WJ. Spatial structure, phase, and the contrast of natural images. J Vis 2022; 22:4. [PMID: 35006237 PMCID: PMC8762697 DOI: 10.1167/jov.22.1.4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/25/2021] [Indexed: 11/24/2022] Open
Abstract
The sensitivity of the human visual system is thought to be shaped by environmental statistics. A major endeavor in vision science, therefore, is to uncover the image statistics that predict perceptual and cognitive function. When searching for targets in natural images, for example, it has recently been proposed that target detection is inversely related to the spatial similarity of the target to its local background. We tested this hypothesis by measuring observers' sensitivity to targets that were blended with natural image backgrounds. Targets were designed to have a spatial structure that was either similar or dissimilar to the background. Contrary to masking from similarity, we found that observers were most sensitive to targets that were most similar to their backgrounds. We hypothesized that a coincidence of phase alignment between target and background results in a local contrast signal that facilitates detection when target-background similarity is high. We confirmed this prediction in a second experiment. Indeed, we show that, by solely manipulating the phase of a target relative to its background, the target can be rendered easily visible or undetectable. Our study thus reveals that, in addition to its structural similarity, the phase of the target relative to the background must be considered when predicting detection sensitivity in natural images.
Collapse
Affiliation(s)
- Reuben Rideaux
- Queensland Brain Institute, University of Queensland, St. Lucia, Queensland, Australia
| | - Rebecca K West
- School of Psychology, University of Queensland, St. Lucia, Queensland, Australia
| | - Thomas S A Wallis
- Institut für Psychologie & Centre for Cognitive Science, Technische Universität Darmstadt, Darmstadt, Germany
| | - Peter J Bex
- Department of Psychology, Northeastern University, Boston, MA, USA
| | - Jason B Mattingley
- Queensland Brain Institute, University of Queensland, St. Lucia, Queensland, Australia
- School of Psychology, University of Queensland, St. Lucia, Queensland, Australia
| | - William J Harrison
- Queensland Brain Institute, University of Queensland, St. Lucia, Queensland, Australia
- School of Psychology, University of Queensland, St. Lucia, Queensland, Australia
| |
Collapse
|
37
|
Shamsi F, Liu R, Kwon M. Binocularly Asymmetric Crowding in Glaucoma and a Lack of Binocular Summation in Crowding. Invest Ophthalmol Vis Sci 2022; 63:36. [PMID: 35084432 PMCID: PMC8802025 DOI: 10.1167/iovs.63.1.36] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Purpose Glaucoma is associated with progressive loss of retinal ganglion cells. Here we investigated the impact of glaucomatous damage on monocular and binocular crowding in parafoveal vision. We also examined the binocular summation of crowding to see if crowding is alleviated under binocular viewing. Methods The study design included 40 individuals with glaucoma and 24 age-similar normal cohorts. For each subject, the magnitude of crowding was determined by the extent of crowding zone. Crowding zone measurements were made binocularly in parafoveal vision (i.e., at 2° and 4° retinal eccentricities) visual field. For a subgroup of glaucoma subjects (n = 17), crowding zone was also measured monocularly for each eye. Results Our results showed that, compared with normal cohorts, individuals with glaucoma exhibited significantly larger crowding—enlargement of crowding zone (an increase by 21%; P < 0.01). Moreover, we also observed a lack of binocular summation (i.e., a binocular ratio of 1): binocular crowding was determined by the better eye. Hence, our results did not provide evidence supporting binocular summation of crowding in glaucomatous vision. Conclusions Our findings show that crowding is exacerbated in parafoveal vision in glaucoma and binocularly asymmetric glaucoma seems to induce binocularly asymmetric crowding. Furthermore, the lack of binocular summation for crowding observed in glaucomatous vision combined with the lack of binocular summation reported in a previous study on normal healthy vision support the view that crowding may start in the early stages of visual processing, at least before the process of binocular integration takes place.
Collapse
Affiliation(s)
- Foroogh Shamsi
- Department of Psychology, Northeastern University, Boston, Massachusetts, United States
| | - Rong Liu
- Department of Psychology, Northeastern University, Boston, Massachusetts, United States.,Department of Ophthalmology and Visual Sciences, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Life Science and Medicine, University of Science and Technology of China, Hefei, China
| | - MiYoung Kwon
- Department of Psychology, Northeastern University, Boston, Massachusetts, United States.,Department of Ophthalmology and Visual Sciences, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| |
Collapse
|
38
|
Theiss JD, Bowen JD, Silver MA. Spatial Attention Enhances Crowded Stimulus Encoding Across Modeled Receptive Fields by Increasing Redundancy of Feature Representations. Neural Comput 2021; 34:190-218. [PMID: 34710898 PMCID: PMC8693207 DOI: 10.1162/neco_a_01447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 07/01/2021] [Indexed: 11/04/2022]
Abstract
Any visual system, biological or artificial, must make a trade-off between the number of units used to represent the visual environment and the spatial resolution of the sampling array. Humans and some other animals are able to allocate attention to spatial locations to reconfigure the sampling array of receptive fields (RFs), thereby enhancing the spatial resolution of representations without changing the overall number of sampling units. Here, we examine how representations of visual features in a fully convolutional neural network interact and interfere with each other in an eccentricity-dependent RF pooling array and how these interactions are influenced by dynamic changes in spatial resolution across the array. We study these feature interactions within the framework of visual crowding, a well-characterized perceptual phenomenon in which target objects in the visual periphery that are easily identified in isolation are much more difficult to identify when flanked by similar nearby objects. By separately simulating effects of spatial attention on RF size and on the density of the pooling array, we demonstrate that the increase in RF density due to attention is more beneficial than changes in RF size for enhancing target classification for crowded stimuli. Furthermore, by varying target/flanker spacing, as well as the spatial extent of attention, we find that feature redundancy across RFs has more influence on target classification than the fidelity of the feature representations themselves. Based on these findings, we propose a candidate mechanism by which spatial attention relieves visual crowding through enhanced feature redundancy that is mostly due to increased RF density.
Collapse
Affiliation(s)
| | - Joel D Bowen
- University of California, Berkeley, CA 94720, U.S.A.
| | | |
Collapse
|
39
|
Li MS, Abbatecola C, Petro LS, Muckli L. Numerosity Perception in Peripheral Vision. Front Hum Neurosci 2021; 15:750417. [PMID: 34803635 PMCID: PMC8597708 DOI: 10.3389/fnhum.2021.750417] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 10/14/2021] [Indexed: 11/13/2022] Open
Abstract
Peripheral vision has different functional priorities for mammals than foveal vision. One of its roles is to monitor the environment while central vision is focused on the current task. Becoming distracted too easily would be counterproductive in this perspective, so the brain should react to behaviourally relevant changes. Gist processing is good for this purpose, and it is therefore not surprising that evidence from both functional brain imaging and behavioural research suggests a tendency to generalize and blend information in the periphery. This may be caused by the balance of perceptual influence in the periphery between bottom-up (i.e., sensory information) and top-down (i.e., prior or contextual information) processing channels. Here, we investigated this interaction behaviourally using a peripheral numerosity discrimination task with top-down and bottom-up manipulations. Participants compared numerosity between the left and right peripheries of a screen. Each periphery was divided into a centre and a surrounding area, only one of which was a task relevant target region. Our top-down task modulation was the instruction which area to attend - centre or surround. We varied the signal strength by altering the stimuli durations i.e., the amount of information presented/processed (as a combined bottom-up and recurrent top-down feedback factor). We found that numerosity perceived in target regions was affected by contextual information in neighbouring (but irrelevant) areas. This effect appeared as soon as stimulus duration allowed the task to be reliably performed and persisted even at the longest duration (1 s). We compared the pattern of results with an ideal-observer model and found a qualitative difference in the way centre and surround areas interacted perceptually in the periphery. When participants reported on the central area, the irrelevant surround would affect the response as a weighted combination - consistent with the idea of a receptive field focused in the target area to which irrelevant surround stimulation leaks in. When participants report on surround, we can best describe the response with a model in which occasionally the attention switches from task relevant surround to task irrelevant centre - consistent with a selection model of two competing streams of information. Overall our results show that the influence of spatial context in the periphery is mandatory but task dependent.
Collapse
Affiliation(s)
- Min Susan Li
- Centre for Cognitive Neuroimaging, School of Psychology and Neuroscience, University of Glasgow, Glasgow, United Kingdom
| | - Clement Abbatecola
- Centre for Cognitive Neuroimaging, School of Psychology and Neuroscience, University of Glasgow, Glasgow, United Kingdom
| | - Lucy S Petro
- Centre for Cognitive Neuroimaging, School of Psychology and Neuroscience, University of Glasgow, Glasgow, United Kingdom
| | - Lars Muckli
- Centre for Cognitive Neuroimaging, School of Psychology and Neuroscience, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
40
|
Bornet A, Choung OH, Doerig A, Whitney D, Herzog MH, Manassi M. Global and high-level effects in crowding cannot be predicted by either high-dimensional pooling or target cueing. J Vis 2021; 21:10. [PMID: 34812839 PMCID: PMC8626847 DOI: 10.1167/jov.21.12.10] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 09/30/2021] [Indexed: 11/24/2022] Open
Abstract
In visual crowding, the perception of a target deteriorates in the presence of nearby flankers. Traditionally, target-flanker interactions have been considered as local, mostly deleterious, low-level, and feature specific, occurring when information is pooled along the visual processing hierarchy. Recently, a vast literature of high-level effects in crowding (grouping effects and face-holistic crowding in particular) led to a different understanding of crowding, as a global, complex, and multilevel phenomenon that cannot be captured or explained by simple pooling models. It was recently argued that these high-level effects may still be captured by more sophisticated pooling models, such as the Texture Tiling model (TTM). Unlike simple pooling models, the high-dimensional pooling stage of the TTM preserves rich information about a crowded stimulus and, in principle, this information may be sufficient to drive high-level and global aspects of crowding. In addition, it was proposed that grouping effects in crowding may be explained by post-perceptual target cueing. Here, we extensively tested the predictions of the TTM on the results of six different studies that highlighted high-level effects in crowding. Our results show that the TTM cannot explain any of these high-level effects, and that the behavior of the model is equivalent to a simple pooling model. In addition, we show that grouping effects in crowding cannot be predicted by post-perceptual factors, such as target cueing. Taken together, these results reinforce once more the idea that complex target-flanker interactions determine crowding and that crowding occurs at multiple levels of the visual hierarchy.
Collapse
Affiliation(s)
- Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Oh-Hyeon Choung
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands
| | - David Whitney
- Department of Psychology, University of California, Berkeley, California, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, California, USA
- Vision Science Group, University of California, Berkeley, California, USA
| | - Michael H Herzog
- Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Mauro Manassi
- School of Psychology, University of Aberdeen, King's College, Aberdeen, UK
| |
Collapse
|
41
|
Dong X, Gao Y, Dong J, Chantler MJ. The Importance of Phase to Texture Discrimination and Similarity. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3755-3768. [PMID: 32191889 DOI: 10.1109/tvcg.2020.2981063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we investigate the importance of phase for texture discrimination and similarity estimation tasks. We first use two psychophysical experiments to investigate the relative importance of phase and magnitude spectra for human texture discrimination and similarity estimation. The results show that phase is more important to humans for both tasks. We further examine the ability of 51 computational feature sets to perform these two tasks. In contrast with the psychophysical experiments, it is observed that the magnitude data is more important to these computational feature sets than the phase data. We hypothesise that this inconsistency is due to the difference between the abilities of humans and the computational feature sets to utilise phase data. This motivates us to investigate the application of the 51 feature sets to phase-only images in addition to their use on the original data set. This investigation is extended to exploit Convolutional Neural Network (CNN) features. The results show that our feature fusion scheme improves the average performance of those feature sets for estimating humans' perceptual texture similarity. The superior performance should be attributed to the importance of phase to texture similarity.
Collapse
|
42
|
Wu R, Wang B, Zhuo Y, Chen L. Topological dominance in peripheral vision. J Vis 2021; 21:19. [PMID: 34570176 PMCID: PMC8479572 DOI: 10.1167/jov.21.10.19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The question of what peripheral vision is good for, especially in pattern recognition, is one of the most important and controversial issues in cognitive science. In a series of experiments, we provide substantial evidence that observers’ behavioral performance in the periphery is consistently superior to central vision for topological change detection, while nontopological change detection deteriorates with increasing eccentricity. These experiments generalize the topological account of object perception in the periphery to different kinds of topological changes (i.e., including introduction, disappearance, and change in number of holes) in comparison with a broad spectrum of geometric properties (e.g., luminance, similarity, spatial frequency, perimeter, and shape of the contour). Moreover, when the stimuli were scaled according to cortical magnification factor and the task difficulty was well controlled by adjusting luminance of the background, the advantage of topological change detection in the periphery remained. The observed advantage of topological change detection in the periphery supports the view that the topological definition of objects provides a coherent account for object perception in peripheral vision, allowing pattern recognition with limited acuity.
Collapse
Affiliation(s)
- Ruijie Wu
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.,
| | - Bo Wang
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.,Hefei Comprehensive National Science Center, Institute of Artificial Intelligence, Hefei, China.,University of Chinese Academy of Sciences, Beijing, China.,
| | - Yan Zhuo
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,
| | - Lin Chen
- State Key Laboratory of Brain and Cognitive Science, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.,Hefei Comprehensive National Science Center, Institute of Artificial Intelligence, Hefei, China.,University of Chinese Academy of Sciences, Beijing, China.,CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China.,
| |
Collapse
|
43
|
Harrison WJ, McMaster JMV, Bays PM. Limited memory for ensemble statistics in visual change detection. Cognition 2021; 214:104763. [PMID: 34062339 PMCID: PMC7614705 DOI: 10.1016/j.cognition.2021.104763] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 05/02/2021] [Accepted: 05/03/2021] [Indexed: 11/23/2022]
Abstract
Accounts of working memory based on independent item representations may overlook a possible contribution of ensemble statistics, higher-order regularities of a scene such as the mean or variance of a visual attribute. Here we used change detection tasks to investigate the hypothesis that observers store ensemble statistics in working memory and use them to detect changes in the visual environment. We controlled changes to the ensemble mean or variance between memory and test displays across six experiments. We made specific predictions of observers' sensitivity using an optimal summation model that integrates evidence across separate items but does not detect changes in ensemble statistics. We found strong evidence that observers outperformed this model, but only when task difficulty was high, and only for changes in stimulus variance. Under these conditions, we estimated that the variance of items contributed to change detection sensitivity more strongly than any individual item in this case. In contrast, however, we found strong evidence against the hypothesis that the average feature value is stored in working memory: when the mean of memoranda changed, sensitivity did not differ from the optimal summation model, which was blind to the ensemble mean, in five out of six experiments. Our results reveal that change detection is primarily limited by uncertainty in the memory of individual features, but that memory for the variance of items can facilitate detection under a limited set of conditions that involve relatively high working memory demands.
Collapse
Affiliation(s)
- William J Harrison
- Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK; Queensland Brain Institute, The University of Queensland, QBI Building 79, St Lucia, QLD 4072, Australia; School of Psychology, The University of Queensland, McElwain Building 24a, St Lucia, QLD 4072, Australia.
| | - Jessica M V McMaster
- Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| | - Paul M Bays
- Department of Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| |
Collapse
|
44
|
Ziemba CM, Simoncelli EP. Opposing effects of selectivity and invariance in peripheral vision. Nat Commun 2021; 12:4597. [PMID: 34321483 PMCID: PMC8319169 DOI: 10.1038/s41467-021-24880-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 07/08/2021] [Indexed: 02/07/2023] Open
Abstract
Sensory processing necessitates discarding some information in service of preserving and reformatting more behaviorally relevant information. Sensory neurons seem to achieve this by responding selectively to particular combinations of features in their inputs, while averaging over or ignoring irrelevant combinations. Here, we expose the perceptual implications of this tradeoff between selectivity and invariance, using stimuli and tasks that explicitly reveal their opposing effects on discrimination performance. We generate texture stimuli with statistics derived from natural photographs, and ask observers to perform two different tasks: Discrimination between images drawn from families with different statistics, and discrimination between image samples with identical statistics. For both tasks, the performance of an ideal observer improves with stimulus size. In contrast, humans become better at family discrimination but worse at sample discrimination. We demonstrate through simulations that these behaviors arise naturally in an observer model that relies on a common set of physiologically plausible local statistical measurements for both tasks.
Collapse
Affiliation(s)
- Corey M Ziemba
- Center for Perceptual Systems, The University of Texas at Austin, Austin, TX, USA.
- Center for Neural Science, New York University, New York, NY, USA.
| | - Eero P Simoncelli
- Center for Neural Science, New York University, New York, NY, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA
| |
Collapse
|
45
|
Orima T, Motoyoshi I. Analysis and Synthesis of Natural Texture Perception From Visual Evoked Potentials. Front Neurosci 2021; 15:698940. [PMID: 34381330 PMCID: PMC8350323 DOI: 10.3389/fnins.2021.698940] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
The primate visual system analyzes statistical information in natural images and uses it for the immediate perception of scenes, objects, and surface materials. To investigate the dynamical encoding of image statistics in the human brain, we measured visual evoked potentials (VEPs) for 166 natural textures and their synthetic versions, and performed a reverse-correlation analysis of the VEPs and representative texture statistics of the image. The analysis revealed occipital VEP components strongly correlated with particular texture statistics. VEPs correlated with low-level statistics, such as subband SDs, emerged rapidly from 100 to 250 ms in a spatial frequency dependent manner. VEPs correlated with higher-order statistics, such as subband kurtosis and cross-band correlations, were observed at slightly later times. Moreover, these robust correlations enabled us to inversely estimate texture statistics from VEP signals via linear regression and to reconstruct texture images that appear similar to those synthesized with the original statistics. Additionally, we found significant differences in VEPs at 200-300 ms between some natural textures and their Portilla-Simoncelli (PS) synthesized versions, even though they shared almost identical texture statistics. This differential VEP was related to the perceptual "unnaturalness" of PS-synthesized textures. These results suggest that the visual cortex rapidly encodes image statistics hidden in natural textures specifically enough to predict the visual appearance of a texture, while it also represents high-level information beyond image statistics, and that electroencephalography can be used to decode these cortical signals.
Collapse
Affiliation(s)
- Taiki Orima
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan.,Japan Society for the Promotion of Science, Tokyo, Japan
| | - Isamu Motoyoshi
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
46
|
Okada K, Motoyoshi I. Human Texture Vision as Multi-Order Spectral Analysis. Front Comput Neurosci 2021; 15:692334. [PMID: 34381346 PMCID: PMC8349988 DOI: 10.3389/fncom.2021.692334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 06/28/2021] [Indexed: 11/13/2022] Open
Abstract
Texture information plays a critical role in the rapid perception of scenes, objects, and materials. Here, we propose a novel model in which visual texture perception is essentially determined by the 1st-order (2D-luminance) and 2nd-order (4D-energy) spectra. This model is an extension of the dimensionality of the Filter-Rectify-Filter (FRF) model, and it also corresponds to the frequency representation of the Portilla-Simoncelli (PS) statistics. We show that preserving two spectra and randomizing phases of a natural texture image result in a perceptually similar texture, strongly supporting the model. Based on only two single spectral spaces, this model provides a simpler framework to describe and predict texture representations in the primate visual system. The idea of multi-order spectral analysis is consistent with the hierarchical processing principle of the visual cortex, which is approximated by a multi-layer convolutional network.
Collapse
Affiliation(s)
- Kosuke Okada
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Isamu Motoyoshi
- Department of Life Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
47
|
Tharmaratnam V, Patel M, Lowe MX, Cant JS. Shared cognitive mechanisms involved in the processing of scene texture and scene shape. J Vis 2021; 21:11. [PMID: 34269793 PMCID: PMC8297417 DOI: 10.1167/jov.21.7.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Recent research has demonstrated that the parahippocampal place area represents both the shape and texture features of scenes, with the importance of each feature varying according to perceived scene category. Namely, shape features are predominately more diagnostic to the processing of artificial human–made scenes, while shape and texture are equally diagnostic in natural scene processing. However, to date little is known regarding the degree of interactivity or independence observed in the processing of these scene features. Furthermore, manipulating the scope of visual attention (i.e., globally vs. locally) when processing ensembles of multiple objects—stimuli that share a functional neuroanatomical link with scenes—has been shown to affect their cognitive visual representation. It remains unknown whether manipulating the scope of attention impacts scene processing in a similar manner. Using the well-established Garner speeded-classification behavioral paradigm, we investigated the influence of both feature diagnosticity and the scope of visual attention on potential interactivity or independence in the shape and texture processing of artificial human–made scenes. The results revealed asymmetric interference between scene shape and texture processing, with the more diagnostic feature (i.e., shape) interfering with the less diagnostic feature (i.e., texture), but not vice versa. Furthermore, this interference was attenuated and enhanced with more local and global visual processing strategies, respectively. These findings suggest that the scene shape and texture processing are mediated by shared cognitive mechanisms and that, although these representations are governed primarily via feature diagnosticity, they can nevertheless be influenced by the scope of visual attention.
Collapse
Affiliation(s)
| | | | - Matthew X Lowe
- Graduate Program in Psychology, University of Toronto, Toronto, ON, Canada.,
| | - Jonathan S Cant
- Graduate Program in Psychology, University of Toronto, Toronto, ON, Canada.,Department of Psychology, University of Toronto Scarborough, Toronto, ON, Canada.,
| |
Collapse
|
48
|
Unraveling brain interactions in vision: The example of crowding. Neuroimage 2021; 240:118390. [PMID: 34271157 DOI: 10.1016/j.neuroimage.2021.118390] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 07/09/2021] [Accepted: 07/12/2021] [Indexed: 11/22/2022] Open
Abstract
Crowding, the impairment of target discrimination in clutter, is the standard situation in vision. Traditionally, crowding is explained with (feedforward) models, in which only neighboring elements interact, leading to a "bottleneck" at the earliest stages of vision. It is with this implicit prior that most functional magnetic resonance imaging (fMRI) studies approach the identification of the "neural locus" of crowding, searching for the earliest visual area in which the blood-oxygenation-level-dependent (BOLD) signal is suppressed under crowded conditions. Using this classic approach, we replicated previous findings of crowding-related BOLD suppression starting in V2 and increasing up the visual hierarchy. Surprisingly, under conditions of uncrowding, in which adding flankers improves performance, the BOLD signal was further suppressed. This suggests an important role for top-down connections, which is in line with global models of crowding. To discriminate between various possible models, we used dynamic causal modeling (DCM). We show that recurrent interactions between all visual areas, including higher-level areas like V4 and the lateral occipital complex (LOC), are crucial in crowding and uncrowding. Our results explain the discrepancies in previous findings: in a recurrent visual hierarchy, the crowding effect can theoretically be detected at any stage. Beyond crowding, we demonstrate the need for models like DCM to understand the complex recurrent processing which most likely underlies human perception in general.
Collapse
|
49
|
Bornet A, Doerig A, Herzog MH, Francis G, Van der Burg E. Shrinking Bouma's window: How to model crowding in dense displays. PLoS Comput Biol 2021; 17:e1009187. [PMID: 34228703 PMCID: PMC8284675 DOI: 10.1371/journal.pcbi.1009187] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 07/16/2021] [Accepted: 06/16/2021] [Indexed: 11/22/2022] Open
Abstract
In crowding, perception of a target deteriorates in the presence of nearby flankers. Traditionally, it is thought that visual crowding obeys Bouma's law, i.e., all elements within a certain distance interfere with the target, and that adding more elements always leads to stronger crowding. Crowding is predominantly studied using sparse displays (a target surrounded by a few flankers). However, many studies have shown that this approach leads to wrong conclusions about human vision. Van der Burg and colleagues proposed a paradigm to measure crowding in dense displays using genetic algorithms. Displays were selected and combined over several generations to maximize human performance. In contrast to Bouma's law, only the target's nearest neighbours affected performance. Here, we tested various models to explain these results. We used the same genetic algorithm, but instead of selecting displays based on human performance we selected displays based on the model's outputs. We found that all models based on the traditional feedforward pooling framework of vision were unable to reproduce human behaviour. In contrast, all models involving a dedicated grouping stage explained the results successfully. We show how traditional models can be improved by adding a grouping stage.
Collapse
Affiliation(s)
- Alban Bornet
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Adrien Doerig
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Michael H. Herzog
- Laboratory of Psychophysics, Brain Mind Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Gregory Francis
- Department of Psychological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Erik Van der Burg
- TNO, Human Factors, Soesterberg, The Netherlands
- Brain and Cognition, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
50
|
Redundancy between spectral and higher-order texture statistics for natural image segmentation. Vision Res 2021; 187:55-65. [PMID: 34217005 DOI: 10.1016/j.visres.2021.06.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 06/09/2021] [Accepted: 06/11/2021] [Indexed: 11/23/2022]
Abstract
Visual texture, defined by local image statistics, provides important information to the human visual system for perceptual segmentation. Second-order or spectral statistics (equivalent to the Fourier power spectrum) are a well-studied segmentation cue. However, the role of higher-order statistics (HOS) in segmentation remains unclear, particularly for natural images. Recent experiments indicate that, in peripheral vision, the HOS of the widely adopted Portilla-Simoncelli texture model are a weak segmentation cue compared to spectral statistics, despite the fact that both are necessary to explain other perceptual phenomena and to support high-quality texture synthesis. Here we test whether this discrepancy reflects a property of natural image statistics. First, we observe that differences in spectral statistics across segments of natural images are redundant with differences in HOS. Second, using linear and nonlinear classifiers, we show that each set of statistics individually affords high performance in natural scenes and texture segmentation tasks, but combining spectral statistics and HOS produces relatively small improvements. Third, we find that HOS improve segmentation for a subset of images, although these images are difficult to identify. We also find that different subsets of HOS improve segmentation to a different extent, in agreement with previous physiological and perceptual work. These results show that the HOS add modestly to spectral statistics for natural image segmentation. We speculate that tuning to natural image statistics under resource constraints could explain the weak contribution of HOS to perceptual segmentation in human peripheral vision.
Collapse
|