1
|
Skog E, Meese TS, Sargent IMJ, Ormerod A, Schofield AJ. Classification images for aerial images capture visual expertise for binocular disparity and a prior for lighting from above. J Vis 2024; 24:11. [PMID: 38607637 PMCID: PMC11019598 DOI: 10.1167/jov.24.4.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 02/06/2024] [Indexed: 04/13/2024] Open
Abstract
Using a novel approach to classification images (CIs), we investigated the visual expertise of surveyors for luminance and binocular disparity cues simultaneously after screening for stereoacuity. Stereoscopic aerial images of hedges and ditches were classified in 10,000 trials by six trained remote sensing surveyors and six novices. Images were heavily masked with luminance and disparity noise simultaneously. Hedge and ditch images had reversed disparity on around half the trials meaning hedges became ditch-like and vice versa. The hedge and ditch images were also flipped vertically on around half the trials, changing the direction of the light source and completing a 2 × 2 × 2 stimulus design. CIs were generated by accumulating the noise textures associated with "hedge" and "ditch" classifications, respectively, and subtracting one from the other. Typical CIs had a central peak with one or two negative side-lobes. We found clear differences in the amplitudes and shapes of perceptual templates across groups and noise-type, with experts prioritizing binocular disparity and using this more effectively. Contrariwise, novices used luminance cues more than experts meaning that task motivation alone could not explain group differences. Asymmetries in the luminance CIs revealed individual differences for lighting interpretation, with experts less prone to assume lighting from above, consistent with their training on aerial images of UK scenes lit by a southerly sun. Our results show that (i) dual noise in images can be used to produce simultaneous CI pairs, (ii) expertise for disparity cues does not depend on stereoacuity, (iii) CIs reveal the visual strategies developed by experts, (iv) top-down perceptual biases can be overcome with long-term learning effects, and (v) CIs have practical potential for directing visual training.
Collapse
Affiliation(s)
- Emil Skog
- School of Psychology, College of Health and Life Sciences, Aston University, Birmingham, B4 7ET, UK
- Aston Laboratory for Immersive Virtual Environments, College of Health and Life Sciences, Aston University, Birmingham, B4 7ET, UK
- Department of Health, Learning and Technology, Luleå University of Technology, Luleå, Sweden
| | - Timothy S Meese
- Aston Laboratory for Immersive Virtual Environments, College of Health and Life Sciences, Aston University, Birmingham, B4 7ET, UK
- https://research.aston.ac.uk/en/persons/tim-s-meese
| | - Isabel M J Sargent
- Ordnance Survey, Adanac Drive, Southampton, SO16 0AS, UK
- Electronics and Computer Science, University of Southampton, University Road, Southampton, SO17 1BJ, UK
- http://www.os.uk/
| | - Andrew Ormerod
- Ordnance Survey, Adanac Drive, Southampton, SO16 0AS, UK
- http://www.os.uk/
| | - Andrew J Schofield
- School of Psychology, College of Health and Life Sciences, Aston University, Birmingham, B4 7ET, UK
- Aston Laboratory for Immersive Virtual Environments, College of Health and Life Sciences, Aston University, Birmingham, B4 7ET, UK
- https://research.aston.ac.uk/en/persons/andrew-schofield
| |
Collapse
|
2
|
DiMattina C, Baker CL. Modeling second-order boundary perception: A machine learning approach. PLoS Comput Biol 2019; 15:e1006829. [PMID: 30883556 PMCID: PMC6438569 DOI: 10.1371/journal.pcbi.1006829] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Revised: 03/28/2019] [Accepted: 01/15/2019] [Indexed: 11/18/2022] Open
Abstract
Visual pattern detection and discrimination are essential first steps for scene analysis. Numerous human psychophysical studies have modeled visual pattern detection and discrimination by estimating linear templates for classifying noisy stimuli defined by spatial variations in pixel intensities. However, such methods are poorly suited to understanding sensory processing mechanisms for complex visual stimuli such as second-order boundaries defined by spatial differences in contrast or texture. We introduce a novel machine learning framework for modeling human perception of second-order visual stimuli, using image-computable hierarchical neural network models fit directly to psychophysical trial data. This framework is applied to modeling visual processing of boundaries defined by differences in the contrast of a carrier texture pattern, in two different psychophysical tasks: (1) boundary orientation identification, and (2) fine orientation discrimination. Cross-validation analysis is employed to optimize model hyper-parameters, and demonstrate that these models are able to accurately predict human performance on novel stimulus sets not used for fitting model parameters. We find that, like the ideal observer, human observers take a region-based approach to the orientation identification task, while taking an edge-based approach to the fine orientation discrimination task. How observers integrate contrast modulation across orientation channels is investigated by fitting psychophysical data with two models representing competing hypotheses, revealing a preference for a model which combines multiple orientations at the earliest possible stage. Our results suggest that this machine learning approach has much potential to advance the study of second-order visual processing, and we outline future steps towards generalizing the method to modeling visual segmentation of natural texture boundaries. This study demonstrates how machine learning methodology can be fruitfully applied to psychophysical studies of second-order visual processing. Many naturally occurring visual boundaries are defined by spatial differences in features other than luminance, for example by differences in texture or contrast. Quantitative models of such “second-order” boundary perception cannot be estimated using the standard regression techniques (known as “classification images”) commonly applied to “first-order”, luminance-defined stimuli. Here we present a novel machine learning approach to modeling second-order boundary perception using hierarchical neural networks. In contrast to previous quantitative studies of second-order boundary perception, we directly estimate network model parameters using psychophysical trial data. We demonstrate that our method can reveal different spatial summation strategies that human observers utilize for different kinds of second-order boundary perception tasks, and can be used to compare competing hypotheses of how contrast modulation is integrated across orientation channels. We outline extensions of the methodology to other kinds of second-order boundaries, including those in natural images.
Collapse
Affiliation(s)
- Christopher DiMattina
- Computational Perception Laboratory, Department of Psychology, Florida Gulf Coast University, Fort Myers, Florida, United States of America
- * E-mail:
| | - Curtis L. Baker
- McGill Vision Research Unit, Department of Ophthalmology, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
3
|
Baker DH, Meese TS. Grid-texture mechanisms in human vision: Contrast detection of regular sparse micro-patterns requires specialist templates. Sci Rep 2016; 6:29764. [PMID: 27460430 PMCID: PMC4962084 DOI: 10.1038/srep29764] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 05/23/2016] [Indexed: 11/26/2022] Open
Abstract
Previous work has shown that human vision performs spatial integration of luminance contrast energy, where signals are squared and summed (with internal noise) over area at detection threshold. We tested that model here in an experiment using arrays of micro-pattern textures that varied in overall stimulus area and sparseness of their target elements, where the contrast of each element was normalised for sensitivity across the visual field. We found a power-law improvement in performance with stimulus area, and a decrease in sensitivity with sparseness. While the contrast integrator model performed well when target elements constituted 50–100% of the target area (replicating previous results), observers outperformed the model when texture elements were sparser than this. This result required the inclusion of further templates in our model, selective for grids of various regular texture densities. By assuming a MAX operation across these noisy mechanisms the model also accounted for the increase in the slope of the psychometric function that occurred as texture density decreased. Thus, for the first time, mechanisms that are selective for texture density have been revealed at contrast detection threshold. We suggest that these mechanisms have a role to play in the perception of visual textures.
Collapse
Affiliation(s)
- Daniel H Baker
- Department of Psychology, University of York, York, YO10 5DD, UK.,School of Life &Health Sciences, Aston University, Birmingham, B47ET, UK
| | - Tim S Meese
- School of Life &Health Sciences, Aston University, Birmingham, B47ET, UK
| |
Collapse
|
4
|
Abstract
The second-order visual mechanisms perform the operation of integrating the spatially distributed local visual information. Their organization is traditionally considered within the framework of the filter-rectify-filter model. These are the second-order filters that provide the ability to detect texture gradients. However, the question of the mechanisms' selectivity to the modulation dimension remains open. The aim of this investigation is to answer the above question by using visual evoked potentials (VEPs). Stimuli were textures consisting of staggered Gabor patches. The base texture was nonmodulated (NM). Three other textures represented the base texture which was sinusoidally modulated in different dimensions: contrast, orientation, or spatial frequency. EEG was recorded with 20 electrodes. VEPs of 500 ms duration were obtained for each of the four textures. After that, VEP to the NM texture was subtracted from VEP to each modulated texture. As a result, three different waves (d-waves) were obtained for each electrode site. Each d-wave was then averaged across all the 48 observers. The revealed d-waves have a latency of about 200 ms and, in our opinion, reflect the second-order filters reactivation through the feedback connection. The d-waves for different modulation dimensions were compared with each other in time, amplitude, topography, and localization of the sources of activity that causes the d-wave (with sLORETA). We proceeded from the assumption that the d-wave (its first component) represents functioning of the second-order visual mechanisms and activity changes at the following processing stages. It was found that the d-waves for different modulation dimensions significantly differ in all parameters. The obtained results indicate that the spatial modulations of different texture parameters caused specific changes in the brain activity, which could be evidence supporting the specificity of the second-order visual mechanisms to modulation dimension.
Collapse
|
5
|
Meso AI, Chemla S. Perceptual fields reveal previously hidden dynamics of human visual motion sensitivity. J Neurophysiol 2014; 114:1360-3. [PMID: 25339713 DOI: 10.1152/jn.00698.2014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 10/21/2014] [Indexed: 11/22/2022] Open
Abstract
Motion sensitivity is a fundamental property of human vision. Although its neural correlates are normally only directly accessible with neurophysiological approaches, Neri (Neri P. J Neurosci 34: 8449-8491, 2014) proposed psychophysical reverse correlation to derive perceptual fields, revealing previously unseen dynamics of human motion detection. In this Neuro Forum, these key findings are discussed, putting them into broader context and pointing out possible implications of spatial scale considerations on the interpretation of the findings and dynamic model proposed.
Collapse
Affiliation(s)
- Andrew Isaac Meso
- Institut de Neurosciences de la Timone, UMR 7289 Centre National de la Recherche Scientifique and Aix-Marseille Université, Marseille, France
| | - Sandrine Chemla
- Institut de Neurosciences de la Timone, UMR 7289 Centre National de la Recherche Scientifique and Aix-Marseille Université, Marseille, France
| |
Collapse
|
6
|
Baker DH, Vilidaitė G. Broadband noise masks suppress neural responses to narrowband stimuli. Front Psychol 2014; 5:763. [PMID: 25076930 PMCID: PMC4098025 DOI: 10.3389/fpsyg.2014.00763] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Accepted: 06/29/2014] [Indexed: 11/13/2022] Open
Abstract
White pixel noise is widely used to estimate the level of internal noise in a system by injecting external variance into the detecting mechanism. Recent work (Baker and Meese, 2012) has provided psychophysical evidence that such noise masks might also cause suppression that could invalidate estimates of internal noise. Here we measure neural population responses directly, using steady-state visual evoked potentials, elicited by target stimuli embedded in different mask types. Sinusoidal target gratings of 1 c/deg flickered at 5 Hz, and were shown in isolation, or with superimposed orthogonal grating masks or 2D white noise masks, flickering at 7 Hz. Compared with responses to a blank screen, the Fourier amplitude at the target frequency increased monotonically as a function of target contrast when no mask was present. Both orthogonal and white noise masks caused rightward shifts of the contrast response function, providing evidence of contrast gain control suppression. We also calculated within-observer amplitude variance across trials. This increased in proportion to the target response, implying signal-dependent (i.e., multiplicative) noise at the system level, the implications of which we discuss for behavioral tasks. This measure of variance was reduced by both mask types, consistent with the changes in mean target response. An alternative variety of noise, which we term zero-dimensional noise, involves trial-by-trial jittering of the target contrast. This type of noise produced no gain control suppression, and increased the amplitude variance across trials.
Collapse
|
7
|
Taylor CP, Bennett PJ, Sekuler AB. Evidence for adjustable bandwidth orientation channels. Front Psychol 2014; 5:578. [PMID: 24971069 PMCID: PMC4054014 DOI: 10.3389/fpsyg.2014.00578] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 05/23/2014] [Indexed: 11/13/2022] Open
Abstract
The standard model of early vision claims that orientation and spatial frequency are encoded with multiple, quasi-independent channels that have fixed spatial frequency and orientation bandwidths. The standard model was developed using detection and discrimination data collected from experiments that used deterministic patterns such as Gabor patches and gratings used as stimuli. However, detection data from experiments using noise as a stimulus suggests that the visual system may use adjustable-bandwidth, rather than fixed-bandwidth, channels. In our previous work, we used classification images as a key piece of evidence against the hypothesis that pattern detection is based on the responses of channels with an adjustable spatial frequency bandwidth. Here we tested the hypothesis that channels with adjustable orientation bandwidths are used to detect two-dimensional, filtered noise targets that varied in orientation bandwidth and were presented in white noise. Consistent with our previous work that examined spatial frequency bandwidth, we found that detection thresholds were consistent with the hypothesis that observers sum information across a broad range of orientations nearly optimally: absolute efficiency for stimulus detection was 20-30% and approximately constant across a wide range of orientation bandwidths. Unlike what we found with spatial frequency bandwidth, the results of our classification image experiment were consistent with the hypothesis that the orientation bandwidth of internal filters were adjustable. Thus, for orientation summation, both detection thresholds and classification images support the adjustable channels hypothesis. Classification images also revealed hallmarks of inhibition or suppression from uninformative spatial frequencies and/or orientations. This work highlights the limitations of the standard model of summation for orientation. The standard model of orientation summation and tuning was chiefly developed with narrow-band stimuli that were not presented in noise, stimuli that are arguably less naturalistic than the variable bandwidth stimuli presented in noise used in our experiments. Finally, the disagreement between the results from our experiments on spatial frequency summation with the data presented in this paper suggests that orientation may be encoded more flexibly than spatial frequency channels.
Collapse
Affiliation(s)
- Christopher P. Taylor
- Department of Psychology and Clinical Language Sciences, Centre for Integrative Neuroscience and Neurodynamics, University of ReadingReading, UK
| | - Patrick J. Bennett
- Department of Psychology, Neuroscience, and Behaviour, McMaster UniversityHamilton, ON, Canada
- Centre for Vision Research, York UniversityToronto, ON, Canada
| | - Allison B. Sekuler
- Department of Psychology, Neuroscience, and Behaviour, McMaster UniversityHamilton, ON, Canada
- Centre for Vision Research, York UniversityToronto, ON, Canada
| |
Collapse
|