1
|
Perceptual dimensions of wood materials. J Vis 2024; 24:12. [PMID: 38787569 DOI: 10.1167/jov.24.5.12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024] Open
Abstract
Materials exhibit an extraordinary range of visual appearances. Characterizing and quantifying appearance is important not only for basic research on perceptual mechanisms but also for computer graphics and a wide range of industrial applications. Although methods exist for capturing and representing the optical properties of materials and how they vary across surfaces (Haindl & Filip, 2013), the representations are typically very high-dimensional, and how these representations relate to subjective perceptual impressions of material appearance remains poorly understood. Here, we used a data-driven approach to characterizing the perceived appearance characteristics of 30 samples of wood veneer using a "visual fingerprint" that describes each sample as a multidimensional feature vector, with each dimension capturing a different aspect of the appearance. Fifty-six crowd-sourced participants viewed triplets of movies depicting different wood samples as the sample rotated. Their task was to report which of the two match samples was subjectively most similar to the test sample. In another online experiment, 45 participants rated 10 wood-related appearance characteristics for each of the samples. The results reveal a consistent embedding of the samples across both experiments and a set of nine perceptual dimensions capturing aspects including the roughness, directionality, and spatial scale of the surface patterns. We also showed that a weighted linear combination of 11 image statistics, inspired by the rating characteristics, predicts perceptual dimensions well.
Collapse
|
2
|
High-level aftereffects reveal the role of statistical features in visual shape encoding. Curr Biol 2024; 34:1098-1106.e5. [PMID: 38218184 PMCID: PMC10931819 DOI: 10.1016/j.cub.2023.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 11/13/2023] [Accepted: 12/13/2023] [Indexed: 01/15/2024]
Abstract
Visual shape perception is central to many everyday tasks, from object recognition to grasping and handling tools.1,2,3,4,5,6,7,8,9,10 Yet how shape is encoded in the visual system remains poorly understood. Here, we probed shape representations using visual aftereffects-perceptual distortions that occur following extended exposure to a stimulus.11,12,13,14,15,16,17 Such effects are thought to be caused by adaptation in neural populations that encode both simple, low-level stimulus characteristics17,18,19,20 and more abstract, high-level object features.21,22,23 To tease these two contributions apart, we used machine-learning methods to synthesize novel shapes in a multidimensional shape space, derived from a large database of natural shapes.24 Stimuli were carefully selected such that low-level and high-level adaptation models made distinct predictions about the shapes that observers would perceive following adaptation. We found that adaptation along vector trajectories in the high-level shape space predicted shape aftereffects better than simple low-level processes. Our findings reveal the central role of high-level statistical features in the visual representation of shape. The findings also hint that human vision is attuned to the distribution of shapes experienced in the natural environment.
Collapse
|
3
|
Distinct Neural Components of Visually Guided Grasping during Planning and Execution. J Neurosci 2023; 43:8504-8514. [PMID: 37848285 PMCID: PMC10711727 DOI: 10.1523/jneurosci.0335-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 07/18/2023] [Accepted: 09/06/2023] [Indexed: 10/19/2023] Open
Abstract
Selecting suitable grasps on three-dimensional objects is a challenging visuomotor computation, which involves combining information about an object (e.g., its shape, size, and mass) with information about the actor's body (e.g., the optimal grasp aperture and hand posture for comfortable manipulation). Here, we used functional magnetic resonance imaging to investigate brain networks associated with these distinct aspects during grasp planning and execution. Human participants of either sex viewed and then executed preselected grasps on L-shaped objects made of wood and/or brass. By leveraging a computational approach that accurately predicts human grasp locations, we selected grasp points that disentangled the role of multiple grasp-relevant factors, that is, grasp axis, grasp size, and object mass. Representational Similarity Analysis revealed that grasp axis was encoded along dorsal-stream regions during grasp planning. Grasp size was first encoded in ventral stream areas during grasp planning then in premotor regions during grasp execution. Object mass was encoded in ventral stream and (pre)motor regions only during grasp execution. Premotor regions further encoded visual predictions of grasp comfort, whereas the ventral stream encoded grasp comfort during execution, suggesting its involvement in haptic evaluation. These shifts in neural representations thus capture the sensorimotor transformations that allow humans to grasp objects.SIGNIFICANCE STATEMENT Grasping requires integrating object properties with constraints on hand and arm postures. Using a computational approach that accurately predicts human grasp locations by combining such constraints, we selected grasps on objects that disentangled the relative contributions of object mass, grasp size, and grasp axis during grasp planning and execution in a neuroimaging study. Our findings reveal a greater role of dorsal-stream visuomotor areas during grasp planning, and, surprisingly, increasing ventral stream engagement during execution. We propose that during planning, visuomotor representations initially encode grasp axis and size. Perceptual representations of object material properties become more relevant instead as the hand approaches the object and motor programs are refined with estimates of the grip forces required to successfully lift the object.
Collapse
|
4
|
Where do the hypotheses come from? Data-driven learning in science and the brain. Behav Brain Sci 2023; 46:e386. [PMID: 38054335 DOI: 10.1017/s0140525x23001565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Everyone agrees that testing hypotheses is important, but Bowers et al. provide scant details about where hypotheses about perception and brain function should come from. We suggest that the answer lies in considering how information about the outside world could be acquired - that is, learned - over the course of evolution and development. Deep neural networks (DNNs) provide one tool to address this question.
Collapse
|
5
|
The eyes anticipate where objects will move based on their shape. Curr Biol 2023; 33:R894-R895. [PMID: 37699342 DOI: 10.1016/j.cub.2023.07.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 07/18/2023] [Accepted: 07/18/2023] [Indexed: 09/14/2023]
Abstract
Imagine staring into a clear river, starving, desperately searching for a fish to spear and cook. You see a dark shape lurking beneath the surface. It doesn't resemble any sort of fish you've encountered before - but you're hungry. To catch it, you need to anticipate which way it will move when you lunge for it, to compensate for your own sensory and motor processing delays1,2,3. Yet you know nothing about the behaviour of this creature, and do not know in which direction it will try to escape. What cues do you then use to drive such anticipatory responses? Fortunately, many species4, including humans, have the remarkable ability to predict the directionality of objects based on their shape - even if they are unfamiliar and so we cannot rely on semantic knowledge about their movements5. While it is known that such directional inferences can guide attention5, we do not yet fully understand how such causal inferences are made, or the extent to which they enable anticipatory behaviours. Does the oculomotor system, which moves our eyes to optimise visual input, use directional inferences from shape to anticipate upcoming motion direction? Such anticipation is necessary to stabilise the moving object on the high-resolution fovea of the retina while tracking the shape, a primary goal of the oculomotor system6, and to guide any future interactions7,8. Here, we leveraged a well-known behaviour of the oculomotor system: anticipatory smooth eye movements (ASEM), where an increase in eye velocity is observed in the direction of a stimulus' expected motion, before the stimulus actually moves3, to show that the oculomotor system extracts directional information from shape, and uses this inference to predict and anticipate upcoming motion.
Collapse
|
6
|
Inferring shape transformations in a drawing task. Mem Cognit 2023:10.3758/s13421-023-01452-0. [PMID: 37668880 DOI: 10.3758/s13421-023-01452-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/09/2023] [Indexed: 09/06/2023]
Abstract
Many objects and materials in our environment are subject to transformations that alter their shape. For example, branches bend in the wind, ice melts, and paper crumples. Still, we recognize objects and materials across these changes, suggesting we can distinguish an object's original features from those caused by the transformations ("shape scission"). Yet, if we truly understand transformations, we should not only be able to identify their signatures but also actively apply the transformations to new objects (i.e., through imagination or mental simulation). Here, we investigated this ability using a drawing task. On a tablet computer, participants viewed a sample contour and its transformed version, and were asked to apply the same transformation to a test contour by drawing what the transformed test shape should look like. Thus, they had to (i) infer the transformation from the shape differences, (ii) envisage its application to the test shape, and (iii) draw the result. Our findings show that drawings were more similar to the ground truth transformed test shape than to the original test shape-demonstrating the inference and reproduction of transformations from observation. However, this was only observed for relatively simple shapes. The ability was also modulated by transformation type and magnitude but not by the similarity between sample and test shapes. Together, our findings suggest that we can distinguish between representations of original object shapes and their transformations, and can use visual imagery to mentally apply nonrigid transformations to observed objects, showing how we not only perceive but also 'understand' shape.
Collapse
|
7
|
Visual perception: Contours that crack the ambiguity conundrum. Curr Biol 2023; 33:R760-R762. [PMID: 37490860 DOI: 10.1016/j.cub.2023.06.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
A new study shows how the brain exploits the parts of images where surfaces curve out of view to recover both the three-dimensional shape and material properties of objects. This sheds light on a long-standing 'chicken-and-egg' problem in perception research.
Collapse
|
8
|
Color and gloss constancy under diverse lighting environments. J Vis 2023; 23:8. [PMID: 37432844 PMCID: PMC10351023 DOI: 10.1167/jov.23.7.8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023] Open
Abstract
When we look at an object, we simultaneously see how glossy or matte it is, how light or dark, and what color. Yet, at each point on the object's surface, both diffuse and specular reflections are mixed in different proportions, resulting in substantial spatial chromatic and luminance variations. To further complicate matters, this pattern changes radically when the object is viewed under different lighting conditions. The purpose of this study was to simultaneously measure our ability to judge color and gloss using an image set capturing diverse object and illuminant properties. Participants adjusted the hue, lightness, chroma, and specular reflectance of a reference object so that it appeared to be made of the same material as a test object. Critically, the two objects were presented under different lighting environments. We found that hue matches were highly accurate, except for under a chromatically atypical illuminant. Chroma and lightness constancy were generally poor, but these failures correlated well with simple image statistics. Gloss constancy was particularly poor, and these failures were only partially explained by reflection contrast. Importantly, across all measures, participants were highly consistent with one another in their deviations from constancy. Although color and gloss constancy hold well in simple conditions, the variety of lighting and shape in the real world presents significant challenges to our visual system's ability to judge intrinsic material properties.
Collapse
|
9
|
Estimation of Contact Regions Between Hands and Objects During Human Multi-Digit Grasping. J Vis Exp 2023. [PMID: 37154551 DOI: 10.3791/64877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023] Open
Abstract
To grasp an object successfully, we must select appropriate contact regions for our hands on the surface of the object. However, identifying such regions is challenging. This paper describes a workflow to estimate the contact regions from marker-based tracking data. Participants grasp real objects, while we track the 3D position of both the objects and the hand, including the fingers' joints. We first determine the joint Euler angles from a selection of tracked markers positioned on the back of the hand. Then, we use state-of-the-art hand mesh reconstruction algorithms to generate a mesh model of the participant's hand in the current pose and the 3D position. Using objects that were either 3D printed or 3D scanned-and are, thus, available as both real objects and mesh data-allows the hand and object meshes to be co-registered. In turn, this allows the estimation of approximate contact regions by calculating the intersections between the hand mesh and the co-registered 3D object mesh. The method may be used to estimate where and how humans grasp objects under a variety of conditions. Therefore, the method could be of interest to researchers studying visual and haptic perception, motor control, human-computer interaction in virtual and augmented reality, and robotics.
Collapse
|
10
|
Corrigendum: Humans can visually judge grasp quality and refine their judgments through visual and haptic feedback. Front Neurosci 2022; 16:1088926. [PMID: 36578823 PMCID: PMC9791929 DOI: 10.3389/fnins.2022.1088926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 11/24/2022] [Indexed: 12/15/2022] Open
Abstract
[This corrects the article DOI: 10.3389/fnins.2020.591898.].
Collapse
|
11
|
Inferring shape transformations in a drawing task. J Vis 2022. [DOI: 10.1167/jov.22.14.3329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
12
|
Humans abandon the preferred grip axis in favor of low torques in precision grip grasping. J Vis 2022. [DOI: 10.1167/jov.22.14.4434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
13
|
Viewpoint similarity of 3D objects predicted by image-plane position shifts. J Vis 2022. [DOI: 10.1167/jov.22.14.3886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
14
|
Part structure predicts superordinate categorization of animals and plants. J Vis 2022. [DOI: 10.1167/jov.22.14.3228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
15
|
Asymmetric matching of color and gloss across different lighting environments. J Vis 2022. [DOI: 10.1167/jov.22.14.3279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
16
|
Abstract
The discovery of mental rotation was one of the most significant landmarks in experimental psychology, leading to the ongoing assumption that to visually compare objects from different three-dimensional viewpoints, we use explicit internal simulations of object rotations, to 'mentally adjust' one object until it matches the other1. These rotations are thought to be performed on three-dimensional representations of the object, by literal analogy to physical rotations. In particular, it is thought that an imagined object is continuously adjusted at a constant three-dimensional angular rotation rate from its initial orientation to the final orientation through all intervening viewpoints2. While qualitative theories have tried to account for this phenomenon3, to date there has been no explicit, image-computable model of the underlying processes. As a result, there is no quantitative account of why some object viewpoints appear more similar to one another than others when the three-dimensional angular difference between them is the same4,5. We reasoned that the specific pattern of non-uniformities in the perception of viewpoints can reveal the visual computations underlying mental rotation. We therefore compared human viewpoint perception with a model based on the kind of two-dimensional 'optical flow' computations that are thought to underlie motion perception in biological vision6, finding that the model reproduces the specific errors that participants make. This suggests that mental rotation involves simulating the two-dimensional retinal image change that would occur when rotating objects. When we compare objects, we do not do so in a distal three-dimensional representation as previously assumed, but by measuring how much the proximal stimulus would change if we watched the object rotate, capturing perspectival appearance changes7.
Collapse
|
17
|
Identifying specular highlights: Insights from deep learning. J Vis 2022; 22:6. [PMID: 35713928 PMCID: PMC9206496 DOI: 10.1167/jov.22.7.6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 04/04/2022] [Indexed: 11/24/2022] Open
Abstract
Specular highlights are the most important image feature for surface gloss perception. Yet, recognizing whether a bright patch in an image is due to specular reflection or some other cause (e.g., texture marking) is challenging, and it remains unclear how the visual system reliably identifies highlights. There is currently no image-computable model that emulates human highlight identification, so here we sought to develop a neural network that reproduces observers' characteristic successes and failures. We rendered 179,085 images of glossy, undulating, textured surfaces. Given such images as input, a feedforward convolutional neural network was trained to output an image containing only the specular reflectance component. Participants viewed such images and reported whether or not specific pixels were highlights. The queried pixels were carefully selected to distinguish between ground truth and a simple thresholding of image intensity. The neural network outperformed the simple thresholding model-and ground truth-at predicting human responses. We then used a genetic algorithm to selectively delete connections within the neural network to identify variants of the network that approximated human judgments even more closely. The best resulting network shared 68% of the variance with human judgments-more than the unpruned network. As a first step toward interpreting the network, we then used representational similarity analysis to compare its inner representations to a wide variety of hand-engineered image features. We find that the network learns representations that are similar not only to directly image-computable predictors but also to more complex predictors such as intrinsic or geometric factors, as well as some indications of photo-geometrical constraints learned by the network. However, our network fails to replicate human response patterns to violations of photo-geometric constraints (rotated highlights) as described by other authors.
Collapse
|
18
|
One-shot generalization in humans revealed through a drawing task. eLife 2022; 11:75485. [PMID: 35536739 PMCID: PMC9090327 DOI: 10.7554/elife.75485] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 05/01/2022] [Indexed: 11/13/2022] Open
Abstract
Humans have the amazing ability to learn new visual concepts from just a single exemplar. How we achieve this remains mysterious. State-of-the-art theories suggest observers rely on internal 'generative models', which not only describe observed objects, but can also synthesize novel variations. However, compelling evidence for generative models in human one-shot learning remains sparse. In most studies, participants merely compare candidate objects created by the experimenters, rather than generating their own ideas. Here, we overcame this key limitation by presenting participants with 2D 'Exemplar' shapes and asking them to draw their own 'Variations' belonging to the same class. The drawings reveal that participants inferred-and synthesized-genuine novel categories that were far more varied than mere copies. Yet, there was striking agreement between participants about which shape features were most distinctive, and these tended to be preserved in the drawn Variations. Indeed, swapping distinctive parts caused objects to swap apparent category. Our findings suggest that internal generative models are key to how humans generalize from single exemplars. When observers see a novel object for the first time, they identify its most distinctive features and infer a generative model of its shape, allowing them to mentally synthesize plausible variants.
Collapse
|
19
|
Visual perception: Colour brings shape into stark relief. Curr Biol 2022; 32:R272-R273. [PMID: 35349812 DOI: 10.1016/j.cub.2022.01.077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Recent research has uncovered a surprising new role of colour in the perception of three-dimensional shape. The brain is exquisitely sensitive to visual patterns emerging from the way different wavelengths interact with surfaces.
Collapse
|
20
|
Scale ambiguities in material recognition. iScience 2022; 25:103970. [PMID: 35281732 PMCID: PMC8914553 DOI: 10.1016/j.isci.2022.103970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 12/23/2021] [Accepted: 02/18/2022] [Indexed: 11/08/2022] Open
Abstract
Many natural materials have complex, multi-scale structures. Consequently, the inferred identity of a surface can vary with the assumed spatial scale of the scene: a plowed field seen from afar can resemble corduroy seen up close. We investigated this ‘material-scale ambiguity’ using 87 photographs of diverse materials (e.g., water, sand, stone, metal, and wood). Across two experiments, separate groups of participants (N = 72 adults) provided judgements of the material category depicted in each image, either with or without manipulations of apparent distance (by verbal instructions, or adding objects of familiar size). Our results demonstrate that these manipulations can cause identical images to be assigned to completely different material categories, depending on the assumed scale. Under challenging conditions, therefore, the categorization of materials is susceptible to simple manipulations of apparent distance, revealing a striking example of top-down effects in the interpretation of image features. Interpretations of ambiguous surface materials varied with assumed viewing distance Causal manipulation of assumed viewing distance reproduced this effect Canonical spatial scale can inform surface material inference under uncertainty
Collapse
|
21
|
Abstract
Distinguishing mirror from glass is a challenging visual inference, because both materials derive their appearance from their surroundings, yet we rarely experience difficulties in telling them apart. Very few studies have investigated how the visual system distinguishes reflections from refractions and to date, there is no image-computable model that emulates human judgments. Here we sought to develop a deep neural network that reproduces the patterns of visual judgments human observers make. To do this, we trained thousands of convolutional neural networks on more than 750,000 simulated mirror and glass objects, and compared their performance with human judgments, as well as alternative classifiers based on "hand-engineered" image features. For randomly chosen images, all classifiers and humans performed with high accuracy, and therefore correlated highly with one another. However, to assess how similar models are to humans, it is not sufficient to compare accuracy or correlation on random images. A good model should also predict the characteristic errors that humans make. We, therefore, painstakingly assembled a diagnostic image set for which humans make systematic errors, allowing us to isolate signatures of human-like performance. A large-scale, systematic search through feedforward neural architectures revealed that relatively shallow (three-layer) networks predicted human judgments better than any other models we tested. This is the first image-computable model that emulates human errors and succeeds in distinguishing mirror from glass, and hints that mid-level visual processing might be particularly important for the task.
Collapse
|
22
|
Abstract
Color constancy is our ability to perceive constant colors across varying illuminations. Here, we trained deep neural networks to be color constant and evaluated their performance with varying cues. Inputs to the networks consisted of two-dimensional images of simulated cone excitations derived from three-dimensional (3D) rendered scenes of 2,115 different 3D shapes, with spectral reflectances of 1,600 different Munsell chips, illuminated under 278 different natural illuminations. The models were trained to classify the reflectance of the objects. Testing was done with four new illuminations with equally spaced CIEL*a*b* chromaticities, two along the daylight locus and two orthogonal to it. High levels of color constancy were achieved with different deep neural networks, and constancy was higher along the daylight locus. When gradually removing cues from the scene, constancy decreased. Both ResNets and classical ConvNets of varying degrees of complexity performed well. However, DeepCC, our simplest sequential convolutional network, represented colors along the three color dimensions of human color vision, while ResNets showed a more complex representation.
Collapse
|
23
|
Effects of visual and visual-haptic perception of material rigidity on reaching and grasping in the course of development. Acta Psychol (Amst) 2021; 221:103457. [PMID: 34883348 DOI: 10.1016/j.actpsy.2021.103457] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 11/25/2021] [Accepted: 11/29/2021] [Indexed: 11/17/2022] Open
Abstract
The development of material property perception for grasping objects is not well explored during early childhood. Therefore, we investigated infants', 3-year-old children's, and adults' unimanual grasping behavior and reaching kinematics for objects of different rigidity using a 3D motion capture system. In Experiment 1, 11-month-old infants and for purposes of comparison adults, and in Experiment 2, 3-year old children were encouraged to lift relatively heavy objects with one of two handles differing in rigidity after visual (Condition 1) and visual-haptic exploration (Condition 2). Experiment 1 revealed that 11-months-olds, after visual object exploration, showed no significant material preference, and thus did not consider the material to facilitate grasping. After visual-haptic object exploration and when grasping the contralateral handles, infants showed an unexpected preference for the soft handles, which were harder to use to lift the object. In contrast, adults generally grasped the rigid handle exploiting their knowledge about efficient and functional grasping in both conditions. Reaching kinematics were barely affected by rigidity, but rather by condition and age. Experiment 2 revealed that 3-year-olds no longer exhibit a preference for grasping soft handles, but still no adult-like preference for rigid handles in both conditions. This suggests that material rigidity plays a minor role in infants' grasping behavior when only visual material information is available. Also, 3-year-olds seem to be on an intermediate level in the development from (1) preferring the pleasant sensation of a soft fabric, to (2) preferring the efficient rigid handle.
Collapse
|
24
|
Gloss perception: Searching for a deep neural network that behaves like humans. J Vis 2021; 21:14. [PMID: 34817568 PMCID: PMC8626854 DOI: 10.1167/jov.21.12.14] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 08/14/2021] [Indexed: 11/24/2022] Open
Abstract
The visual computations underlying human gloss perception remain poorly understood, and to date there is no image-computable model that reproduces human gloss judgments independent of shape and viewing conditions. Such a model could provide a powerful platform for testing hypotheses about the detailed workings of surface perception. Here, we made use of recent developments in artificial neural networks to test how well we could recreate human responses in a high-gloss versus low-gloss discrimination task. We rendered >70,000 scenes depicting familiar objects made of either mirror-like or near-matte textured materials. We trained numerous classifiers to distinguish the two materials in our images-ranging from linear classifiers using simple pixel statistics to convolutional neural networks (CNNs) with up to 12 layers-and compared their classifications with human judgments. To determine which classifiers made the same kinds of errors as humans, we painstakingly identified a set of 60 images in which human judgments are consistently decoupled from ground truth. We then conducted a Bayesian hyperparameter search to identify which out of several thousand CNNs most resembled humans. We found that, although architecture has only a relatively weak effect, high correlations with humans are somewhat more typical in networks of shallower to intermediate depths (three to five layers). We also trained deep convolutional generative adversarial networks (DCGANs) of different depths to recreate images based on our high- and low-gloss database. Responses from human observers show that two layers in a DCGAN can recreate gloss recognizably for human observers. Together, our results indicate that human gloss classification can best be explained by computations resembling early to mid-level vision.
Collapse
|
25
|
Probing human 3D shape perception with novel, but natural stimuli. J Vis 2021. [DOI: 10.1167/jov.21.9.2966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
26
|
Visual Prediction of Bounce Trajectories. J Vis 2021. [DOI: 10.1167/jov.21.9.2492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
27
|
Human judgments of relative 3D pose of novel complex objects. J Vis 2021. [DOI: 10.1167/jov.21.9.2873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
28
|
Modelling local and global explanations for shape aftereffects with naturalistic novel stimuli. J Vis 2021. [DOI: 10.1167/jov.21.9.2601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
29
|
Evolving visual representations from noise. J Vis 2021. [DOI: 10.1167/jov.21.9.2544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
30
|
‘Distinctiveness’ of parts in novel objects. J Vis 2021. [DOI: 10.1167/jov.21.9.2236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
31
|
The mental representation of materials distilled from >1.5 million similarity judgements. J Vis 2021. [DOI: 10.1167/jov.21.9.1981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
32
|
Material recognition and the role of assumed viewing distance. J Vis 2021. [DOI: 10.1167/jov.21.9.1936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
33
|
Stability versus natural hand pose: Humans sacrifice their usual grasp configuration to choose stable grasp locations. J Vis 2021. [DOI: 10.1167/jov.21.9.2360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
34
|
An image-computable model of human visual shape similarity. PLoS Comput Biol 2021; 17:e1008981. [PMID: 34061825 PMCID: PMC8195351 DOI: 10.1371/journal.pcbi.1008981] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 06/11/2021] [Accepted: 04/19/2021] [Indexed: 11/19/2022] Open
Abstract
Shape is a defining feature of objects, and human observers can effortlessly compare shapes to determine how similar they are. Yet, to date, no image-computable model can predict how visually similar or different shapes appear. Such a model would be an invaluable tool for neuroscientists and could provide insights into computations underlying human shape perception. To address this need, we developed a model (‘ShapeComp’), based on over 100 shape features (e.g., area, compactness, Fourier descriptors). When trained to capture the variance in a database of >25,000 animal silhouettes, ShapeComp accurately predicts human shape similarity judgments between pairs of shapes without fitting any parameters to human data. To test the model, we created carefully selected arrays of complex novel shapes using a Generative Adversarial Network trained on the animal silhouettes, which we presented to observers in a wide range of tasks. Our findings show that incorporating multiple ShapeComp dimensions facilitates the prediction of human shape similarity across a small number of shapes, and also captures much of the variance in the multiple arrangements of many shapes. ShapeComp outperforms both conventional pixel-based metrics and state-of-the-art convolutional neural networks, and can also be used to generate perceptually uniform stimulus sets, making it a powerful tool for investigating shape and object representations in the human brain. The ability to describe and compare shapes is crucial in many scientific domains from visual object recognition to computational morphology and computer graphics. Across disciplines, considerable effort has been devoted to the study of shape and its influence on object recognition, yet an important stumbling block is the quantitative characterization of shape similarity. Here we develop a psychophysically validated model that takes as input an object’s shape boundary and provides a high-dimensional output that can be used for predicting visual shape similarity. With this precise control of shape similarity, the model’s description of shape is a powerful tool that can be used across the neurosciences and artificial intelligence to test role of shape in perception and the brain.
Collapse
|
35
|
Abstract
One of the deepest insights in neuroscience is that sensory encoding should take advantage of statistical regularities. Humans’ visual experience contains many redundancies: Scenes mostly stay the same from moment to moment, and nearby image locations usually have similar colors. A visual system that knows which regularities shape natural images can exploit them to encode scenes compactly or guess what will happen next. Although these principles have been appreciated for more than 60 years, until recently it has been possible to convert them into explicit models only for the earliest stages of visual processing. But recent advances in unsupervised deep learning have changed that. Neural networks can be taught to compress images or make predictions in space or time. In the process, they learn the statistical regularities that structure images, which in turn often reflect physical objects and processes in the outside world. The astonishing accomplishments of unsupervised deep learning reaffirm the importance of learning statistical regularities for sensory coding and provide a coherent framework for how knowledge of the outside world gets into visual cortex.
Collapse
|
36
|
Abstract
How humans visually select where to grasp an object depends on many factors, including grasp stability and preferred grasp configuration. We examined how endpoints are selected when these two factors are brought into conflict: Do people favor stable grasps or do they prefer their natural grasp configurations? Participants reached to grasp one of three cuboids oriented so that its two corners were either aligned with, or rotated away from, each individual's natural grasp axis (NGA). All objects were made of brass (mass: 420 g), but the surfaces of their sides were manipulated to alter friction: 1) all-brass, 2) two opposing sides covered with wood, and the other two remained of brass, or 3) two opposing sides covered with sandpaper, and the two remaining brass sides smeared with Vaseline. Grasps were evaluated as either clockwise (thumb to the left of finger in frontal plane) or counterclockwise of the NGA. Grasp endpoints depended on both object orientation and surface material. For the all-brass object, grasps were bimodally distributed in the NGA-aligned condition but predominantly clockwise in the NGA-unaligned condition. These data reflected participants' natural grasp configuration independently of surface material. When grasping objects with different surface materials, endpoint selection changed: Participants sacrificed their usual grasp configuration to choose the more stable object sides. A model in which surface material shifts participants' preferred grip angle proportionally to the perceived friction of the surfaces accounts for our results. Our findings demonstrate that a stable grasp is more important than a biomechanically comfortable grasp configuration.NEW & NOTEWORTHY When grasping an object, humans can place their fingers at several positions on its surface. The selection of these endpoints depends on many factors, with two of the most important being grasp stability and grasp configuration. We put these two factors in conflict and examine which is considered more important. Our results highlight that humans are not reluctant to adopt unusual grasp configurations to satisfy grasp stability.
Collapse
|
37
|
Scaling and discriminability of perceived gloss. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2021; 38:203-210. [PMID: 33690530 DOI: 10.1364/josaa.409454] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 12/08/2020] [Indexed: 06/12/2023]
Abstract
While much attention has been given to understanding biases in gloss perception (e.g., changes in perceived reflectance as a function of lighting, shape, viewpoint, and other factors), here we investigated sensitivity to changes in surface reflectance. We tested how visual sensitivity to differences in specular reflectance varies as a function of the magnitude of specular reflectance. Stimuli consisted of renderings of glossy objects under natural illumination. Using maximum likelihood difference scaling (MLDS), we created a perceptual scaling of the specular reflectance parameter of the Ward reflectance model. Then, using the method of constant stimuli and a standard 2AFC procedure, we obtained psychometric functions for gloss discrimination across a range of reflectance values derived from the perceptual scale. Both methods demonstrate that discriminability is significantly diminished at high levels of specular reflectance, thus indicating that gloss sensitivity depends on the magnitude of change in the image produced by different reflectance values. Taken together, these experiments also suggest that internal sensory noise remains constant for suprathreshold and near-threshold intervals of specular reflectance, which supports the use of MLDS as a highly efficient method for evaluating gloss sensitivity.
Collapse
|
38
|
Humans Can Visually Judge Grasp Quality and Refine Their Judgments Through Visual and Haptic Feedback. Front Neurosci 2021; 14:591898. [PMID: 33510608 PMCID: PMC7835720 DOI: 10.3389/fnins.2020.591898] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/16/2020] [Indexed: 12/30/2022] Open
Abstract
How humans visually select where to grasp objects is determined by the physical object properties (e.g., size, shape, weight), the degrees of freedom of the arm and hand, as well as the task to be performed. We recently demonstrated that human grasps are near-optimal with respect to a weighted combination of different cost functions that make grasps uncomfortable, unstable, or impossible, e.g., due to unnatural grasp apertures or large torques. Here, we ask whether humans can consciously access these rules. We test if humans can explicitly judge grasp quality derived from rules regarding grasp size, orientation, torque, and visibility. More specifically, we test if grasp quality can be inferred (i) by using visual cues and motor imagery alone, (ii) from watching grasps executed by others, and (iii) through performing grasps, i.e., receiving visual, proprioceptive and haptic feedback. Stimuli were novel objects made of 10 cubes of brass and wood (side length 2.5 cm) in various configurations. On each object, one near-optimal and one sub-optimal grasp were selected based on one cost function (e.g., torque), while the other constraints (grasp size, orientation, and visibility) were kept approximately constant or counterbalanced. Participants were visually cued to the location of the selected grasps on each object and verbally reported which of the two grasps was best. Across three experiments, participants were required to either (i) passively view the static objects and imagine executing the two competing grasps, (ii) passively view videos of other participants grasping the objects, or (iii) actively grasp the objects themselves. Our results show that, for a majority of tested objects, participants could already judge grasp optimality from simply viewing the objects and imagining to grasp them, but were significantly better in the video and grasping session. These findings suggest that humans can determine grasp quality even without performing the grasp-perhaps through motor imagery-and can further refine their understanding of how to correctly grasp an object through sensorimotor feedback but also by passively viewing others grasp objects.
Collapse
|
39
|
Searching for Strangely Shaped Cookies - Is Taking a Bite Out of a Cookie Similar to Occluding Part of It? Perception 2020; 50:140-153. [PMID: 33377849 PMCID: PMC7879225 DOI: 10.1177/0301006620983729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Does recognizing the transformations that gave rise to an object’s retinal image contribute to early object recognition? It might, because finding a partially occluded object among similar objects that are not occluded is more difficult than finding an object that has the same retinal image shape without evident occlusion. If this is because the occlusion is recognized as such, we might see something similar for other transformations. We confirmed that it is difficult to find a cookie with a section missing when this was the result of occlusion. It is not more difficult to find a cookie from which a piece has been bitten off than to find one that was baked in a similar shape. On the contrary, the bite marks help detect the bitten cookie. Thus, biting off a part of a cookie has very different effects on visual search than occluding part of it. These findings do not support the idea that observers rapidly and automatically compensate for the ways in which objects’ shapes are transformed to give rise to the objects’ retinal images. They are easy to explain in terms of detecting characteristic features in the retinal image that such transformations may hide or create.
Collapse
|
40
|
Abstract
Object shape is an important cue to material identity and for the estimation of material properties. Shape features can affect material perception at different levels: at a microscale (surface roughness), mesoscale (textures and local object shape), or megascale (global object shape) level. Examples for local shape features include ripples in drapery, clots in viscous liquids, or spiraling creases in twisted objects. Here, we set out to test the role of such shape features on judgments of material properties softness and weight. For this, we created a large number of novel stimuli with varying surface shape features. We show that those features have distinct effects on softness and weight ratings depending on their type, as well as amplitude and frequency, for example, increasing numbers and pointedness of spikes makes objects appear harder and heavier. By also asking participants to name familiar objects, materials, and transformations they associate with our stimuli, we can show that softness and weight judgments do not merely follow from semantic associations between particular stimuli and real-world object shapes. Rather, softness and weight are estimated from surface shape, presumably based on learned heuristics about the relationship between a particular expression of surface features and material properties. In line with this, we show that correlations between perceived softness or weight and surface curvature vary depending on the type of surface feature. We conclude that local shape features have to be considered when testing the effects of shape on the perception of material properties such as softness and weight.
Collapse
|
41
|
Predicting precision grip grasp locations on three-dimensional objects. PLoS Comput Biol 2020; 16:e1008081. [PMID: 32750070 PMCID: PMC7428291 DOI: 10.1371/journal.pcbi.1008081] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 08/14/2020] [Accepted: 06/22/2020] [Indexed: 11/18/2022] Open
Abstract
We rarely experience difficulty picking up objects, yet of all potential contact points on the surface, only a small proportion yield effective grasps. Here, we present extensive behavioral data alongside a normative model that correctly predicts human precision grasping of unfamiliar 3D objects. We tracked participants' forefinger and thumb as they picked up objects of 10 wood and brass cubes configured to tease apart effects of shape, weight, orientation, and mass distribution. Grasps were highly systematic and consistent across repetitions and participants. We employed these data to construct a model which combines five cost functions related to force closure, torque, natural grasp axis, grasp aperture, and visibility. Even without free parameters, the model predicts individual grasps almost as well as different individuals predict one another's, but fitting weights reveals the relative importance of the different constraints. The model also accurately predicts human grasps on novel 3D-printed objects with more naturalistic geometries and is robust to perturbations in its key parameters. Together, the findings provide a unified account of how we successfully grasp objects of different 3D shape, orientation, mass, and mass distribution.
Collapse
|
42
|
Visual perception of liquids: Insights from deep neural networks. PLoS Comput Biol 2020; 16:e1008018. [PMID: 32813688 PMCID: PMC7437867 DOI: 10.1371/journal.pcbi.1008018] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Accepted: 06/05/2020] [Indexed: 11/19/2022] Open
Abstract
Visually inferring material properties is crucial for many tasks, yet poses significant computational challenges for biological vision. Liquids and gels are particularly challenging due to their extreme variability and complex behaviour. We reasoned that measuring and modelling viscosity perception is a useful case study for identifying general principles of complex visual inferences. In recent years, artificial Deep Neural Networks (DNNs) have yielded breakthroughs in challenging real-world vision tasks. However, to model human vision, the emphasis lies not on best possible performance, but on mimicking the specific pattern of successes and errors humans make. We trained a DNN to estimate the viscosity of liquids using 100.000 simulations depicting liquids with sixteen different viscosities interacting in ten different scenes (stirring, pouring, splashing, etc). We find that a shallow feedforward network trained for only 30 epochs predicts mean observer performance better than most individual observers. This is the first successful image-computable model of human viscosity perception. Further training improved accuracy, but predicted human perception less well. We analysed the network's features using representational similarity analysis (RSA) and a range of image descriptors (e.g. optic flow, colour saturation, GIST). This revealed clusters of units sensitive to specific classes of feature. We also find a distinct population of units that are poorly explained by hand-engineered features, but which are particularly important both for physical viscosity estimation, and for the specific pattern of human responses. The final layers represent many distinct stimulus characteristics-not just viscosity, which the network was trained on. Retraining the fully-connected layer with a reduced number of units achieves practically identical performance, but results in representations focused on viscosity, suggesting that network capacity is a crucial parameter determining whether artificial or biological neural networks use distributed vs. localized representations.
Collapse
|
43
|
Abstract
Visually inferring the elasticity of a bouncing object poses a challenge to the visual system: The observable behavior of the object depends on its elasticity but also on extrinsic factors, such as its initial position and velocity. Estimating elasticity requires disentangling these different contributions to the observed motion. We created 2-second simulations of a cube bouncing in a room and varied the cube's elasticity in 10 steps. The cube's initial position, orientation, and velocity were varied randomly to gain three random samples for each level of elasticity. We systematically limited the visual information by creating three versions of each stimulus: (a) a full rendering of the scene, (b) the cube in a completely black environment, and (c) a rigid version of the cube following the same trajectories but without rotating or deforming (also in a completely black environment). Thirteen observers rated the apparent elasticity of the cubes and the typicality of their motion. Generally, stimuli were judged as less typical if they showed rigid motion without rotations, highly elastic cubes, or unlikely events. Overall, elasticity judgments correlated strongly with the true elasticity but did not show perfect constancy. Yet, importantly, we found similar results for all three stimulus conditions, despite significant differences in their apparent typicality. This suggests that the trajectory alone contains the information required to make elasticity judgments.
Collapse
|
44
|
Abstract
Human observers are remarkably good at perceiving constant object color across illumination changes. However, there are numerous other factors that can modulate surface appearance, such as aging, bleaching, staining, or soaking. Despite this, we are often able to identify material properties across such transformations. Little is known about how and to what extent we can compensate for the accompanying color transformations. Here we investigated whether humans could reproduce the original color of bleached fabrics. We treated 12 different fabric samples with a commercial bleaching product. Bleaching increased luminance and decreased saturation. We presented photographs of the original and bleached samples on a computer screen and asked observers to match the fabric colors to an adjustable matching disk. Different groups of observers produced matches for original and bleached samples. One group of observers were instructed to match the color of the bleached samples as they were before bleaching (i.e., compensate for the effects of bleaching); another, to accurately match color appearance. Observers did compensate significantly for the effects of bleaching when instructed to do so, but not in the appearance match condition. Results of a second experiment suggest that observers achieve color consistency, at least in part, through a strategy based on local spatial differences within the bleached samples. According to the results of a third experiment, these local spatial differences are likely to be the perceptual image cues that allow participants to determine whether a sample is bleached. When the effect of bleaching was limited or uniformly distributed across a sample's surface, observers were uncertain about the bleaching magnitude and seemed to apply cognitive strategies to achieve color consistency.
Collapse
|
45
|
A dataset for evaluating one-shot categorization of novel object classes. Data Brief 2020; 29:105302. [PMID: 32140517 PMCID: PMC7044642 DOI: 10.1016/j.dib.2020.105302] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Revised: 02/02/2020] [Accepted: 02/10/2020] [Indexed: 11/27/2022] Open
Abstract
With the advent of deep convolutional neural networks, machines now rival humans in terms of object categorization. The neural networks solve categorization with a hierarchical organization that shares a striking resemblance to their biological counterpart, leading to their status as a standard model of object recognition in biological vision. Despite training on thousands of images of object categories, however, machine-learning networks are poorer generalizers, often fooled by adversarial images with very simple image manipulations that humans easily distinguish as a false image. Humans, on the other hand, can generalize object classes from very few samples. Here we provide a dataset of novel object classifications in humans. We gathered thousands of crowd-sourced human responses to novel objects embedded either with 1 or 16 context sample(s). Human decisions and stimuli together have the potential to be re-used (1) as a tool to better understand the nature of the gap in category learning from few samples between human and machine, and (2) as a benchmark of generalization across machine learning networks.
Collapse
|
46
|
Abstract
Materials with complex appearances, like textiles and foodstuffs, pose challenges for conventional theories of vision. But recent advances in unsupervised deep learning provide a framework for explaining how we learn to see them. We suggest that perception does not involve estimating physical quantities like reflectance or lighting. Instead, representations emerge from learning to encode and predict the visual input as efficiently and accurately as possible. Neural networks can be trained to compress natural images or to predict frames in movies without 'ground truth' data about the outside world. Yet, to succeed, such systems may automatically discover how to disentangle distal causal factors. Such 'statistical appearance models' potentially provide a coherent explanation of both failures and successes in perception.
Collapse
|
47
|
One-shot categorization of novel object classes in humans. Vision Res 2019; 165:98-108. [PMID: 31707254 DOI: 10.1016/j.visres.2019.09.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 09/17/2019] [Accepted: 09/18/2019] [Indexed: 10/25/2022]
Abstract
One aspect of human vision unmatched by machines is the capacity to generalize from few samples. Observers tend to know when novel objects are in the same class despite large differences in shape, material or viewpoint. A major challenge in studying such generalization is that participants can see each novel sample only once. To overcome this, we used crowdsourcing to obtain responses from 500 human observers on 20 novel object classes, with each stimulus compared to 1 or 16 related objects. The results reveal that humans generalize from sparse data in highly systematic ways with the number and variance of the samples. We compared human responses to 'ShapeComp', an image-computable model based on >100 shape descriptors, and 'AlexNet', a convolution neural network that roughly matches humans at recognizing 1000 categories of real-world objects. With 16 samples, the models were consistent with human responses without free parameters. Thus, when there are a sufficient number of samples, observers rely on shallow but efficient processes based on a fixed set of features. With 1 sample, however, the models required different feature weights for each object. This suggests that one-shot categorization involves more sophisticated processes that actively identify the unique characteristics underlying each object class.
Collapse
|
48
|
Unsupervised Neural Networks Learn Idiosyncrasies of Human Gloss Perception. J Vis 2019. [DOI: 10.1167/19.10.213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
49
|
|
50
|
|