1
|
Liao C, Sawayama M, Xiao B. Probing the Link Between Vision and Language in Material Perception Using Psychophysics and Unsupervised Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.25.577219. [PMID: 38328102 PMCID: PMC10849714 DOI: 10.1101/2024.01.25.577219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
We can visually discriminate and recognize a wide range of materials. Meanwhile, we use language to express our subjective understanding of visual input and communicate relevant information about the materials. Here, we investigate the relationship between visual judgment and language expression in material perception to understand how visual features relate to semantic representations. We use deep generative networks to construct an expandable image space to systematically create materials of well-defined and ambiguous categories. From such a space, we sampled diverse stimuli and compared the representations of materials from two behavioral tasks: visual material similarity judgments and free-form verbal descriptions. Our findings reveal a moderate but significant correlation between vision and language on a categorical level. However, analyzing the representations with an unsupervised alignment method, we discover structural differences that arise at the image-to-image level, especially among materials morphed between known categories. Moreover, visual judgments exhibit more individual differences compared to verbal descriptions. Our results show that while verbal descriptions capture material qualities on the coarse level, they may not fully convey the visual features that characterize the material's optical properties. Analyzing the image representation of materials obtained from various pre-trained data-rich deep neural networks, we find that human visual judgments' similarity structures align more closely with those of the text-guided visual-semantic model than purely vision-based models. Our findings suggest that while semantic representations facilitate material categorization, non-semantic visual features also play a significant role in discriminating materials at a finer level. This work illustrates the need to consider the vision-language relationship in building a comprehensive model for material perception. Moreover, we propose a novel framework for quantitatively evaluating the alignment and misalignment between representations from different modalities, leveraging information from human behaviors and computational models.
Collapse
Affiliation(s)
- Chenxi Liao
- American University, Department of Neuroscience, Washington DC, 20016, USA
| | - Masataka Sawayama
- The University of Tokyo, Graduate School of Information Science and Technology, Tokyo, 113-0033, Japan
| | - Bei Xiao
- American University, Department of Computer Science, Washington, DC, 20016, USA
| |
Collapse
|
2
|
Ujitoko Y, Kawabe T. Perceptual judgments for the softness of materials under indentation. Sci Rep 2022; 12:1761. [PMID: 35110650 PMCID: PMC8810927 DOI: 10.1038/s41598-022-05864-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 01/19/2022] [Indexed: 12/24/2022] Open
Abstract
Humans can judge the softness of elastic materials through only visual cues. However, factors contributing to the judgment of visual softness are not yet fully understood. We conducted a psychophysical experiment to determine which factors and motion features contribute to the apparent softness of materials. Observers watched video clips in which materials were indented from the top surface to a certain depth, and reported the apparent softness of the materials. The depth and speed of indentation were systematically manipulated. As physical characteristics of materials, compliance was also controlled. It was found that higher indentation speeds resulted in larger softness rating scores and the variation with the indentation speed was successfully explained by the image motion speed. The indentation depth had a powerful effect on the softness rating scores and the variation with the indentation depth was consistently explained by motion features related to overall deformation. Higher material compliance resulted in higher softness rating scores and these variation with the material compliance can be explained also by overall deformation. We conclude that the brain makes visual judgments about the softness of materials under indentation on the basis of the motion speed and deformation magnitude.
Collapse
Affiliation(s)
- Yusuke Ujitoko
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, 243-0198, Japan.
| | - Takahiro Kawabe
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, 243-0198, Japan
| |
Collapse
|
3
|
Sawayama M, Dobashi Y, Okabe M, Hosokawa K, Koumura T, Saarela TP, Olkkonen M, Nishida S. Visual discrimination of optical material properties: A large-scale study. J Vis 2022; 22:17. [PMID: 35195670 PMCID: PMC8883156 DOI: 10.1167/jov.22.2.17] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 01/04/2022] [Indexed: 11/24/2022] Open
Abstract
Complex visual processing involved in perceiving the object materials can be better elucidated by taking a variety of research approaches. Sharing stimulus and response data is an effective strategy to make the results of different studies directly comparable and can assist researchers with different backgrounds to jump into the field. Here, we constructed a database containing several sets of material images annotated with visual discrimination performance. We created the material images using physically based computer graphics techniques and conducted psychophysical experiments with them in both laboratory and crowdsourcing settings. The observer's task was to discriminate materials on one of six dimensions (gloss contrast, gloss distinctness of image, translucent vs. opaque, metal vs. plastic, metal vs. glass, and glossy vs. painted). The illumination consistency and object geometry were also varied. We used a nonverbal procedure (an oddity task) applicable for diverse use cases, such as cross-cultural, cross-species, clinical, or developmental studies. Results showed that the material discrimination depended on the illuminations and geometries and that the ability to discriminate the spatial consistency of specular highlights in glossiness perception showed larger individual differences than in other tasks. In addition, analysis of visual features showed that the parameters of higher order color texture statistics can partially, but not completely, explain task performance. The results obtained through crowdsourcing were highly correlated with those obtained in the laboratory, suggesting that our database can be used even when the experimental conditions are not strictly controlled in the laboratory. Several projects using our dataset are underway.
Collapse
Affiliation(s)
- Masataka Sawayama
- Inria, Bordeaux, France
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Kanagawa, Japan
| | - Yoshinori Dobashi
- Information Media Environment Laboratory, Hokkaido University, Hokkaido, Japan
- Prometech CG Research, Tokyo, Japan
| | - Makoto Okabe
- Department of Mathematical and Systems Engineering, Graduate School of Engineering, Shizuoka University, Shizuoka, Japan
| | - Kenchi Hosokawa
- Advanced Comprehensive Research Organization, Teikyo University, Tokyo, Japan
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Kanagawa, Japan
| | - Takuya Koumura
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Kanagawa, Japan
| | - Toni P Saarela
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Maria Olkkonen
- Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Shin'ya Nishida
- Cognitive Informatics Lab, Graduate School of informatics, Kyoto University, Kyoto, Japan
- NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Kanagawa, Japan
| |
Collapse
|
4
|
Nagle F, Johnston A. Recognising the dynamic form of fire. Sci Rep 2021; 11:10566. [PMID: 34011973 PMCID: PMC8134437 DOI: 10.1038/s41598-021-89453-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 04/13/2021] [Indexed: 11/25/2022] Open
Abstract
Encoding and recognising complex natural sequences provides a challenge for human vision. We found that observers could recognise a previously presented segment of a video of a hearth fire when embedded in a longer sequence. Recognition performance declined when the test video was spatially inverted, but not when it was hue reversed or temporally reversed. Sampled motion degraded forwards/reversed playback discrimination, indicating observers were sensitive to the asymmetric pattern of motion of flames. For brief targets, performance increased with target length. More generally, performance depended on the relative lengths of the target and embedding sequence. Increased errors with embedded sequence length were driven by positive responses to non-target sequences (false alarms) rather than omissions. Taken together these observations favour interpreting performance in terms of an incremental decision-making model based on a sequential statistical analysis in which evidence accrues for one of two alternatives. We also suggest that prediction could provide a means of providing and evaluating evidence in a sequential analysis model.
Collapse
Affiliation(s)
- Fintan Nagle
- CoMPLEX, University College London, London, WC1E 6BT, UK. .,Imperial College, Exhibition Road, London, SW7 2AZ, UK.
| | - Alan Johnston
- CoMPLEX, University College London, London, WC1E 6BT, UK.,School of Psychology, University of Nottingham, Nottingham, NG7 2RD, UK
| |
Collapse
|
5
|
Abstract
When an elastic material (e.g., fabric) is horizontally stretched (or compressed), the material is compressed (or extended) vertically – so-called the Poisson effect. In the different case of the Poisson effect, when an elastic material (e.g., rubber) is vertically squashed, the material is horizontally extended. In both cases, the visual system receives image deformations involving horizontal expansion and vertical compression. How does the brain disentangle the two cases and accurately distinguish stretching from squashing events? Manipulating the relative magnitude of the deformation of a square between horizontal and vertical dimensions in the two-dimensional stimuli, we asked observers to judge the force direction in the stimuli. Specifically, the participants reported whether the square was stretched or squashed. In general, the participant’s judgment was dependent on the relative deformation magnitude. We also checked the anisotropic effect of deformation direction [i.e., horizontal vs. vertical stretching (or squashing)] and found that the participant’s judgment was strongly biased toward horizontal stretching. We also observed that the asymmetric deformation pattern, which indicated the specific context of force direction, was also a strong cue to the force direction judgment. We suggest that the brain judges the force direction in the Poisson effect on the basis of assumptions about the relationship between image deformation and force direction, in addition to the relative image deformation magnitudes between horizontal and vertical dimensions.
Collapse
Affiliation(s)
- Takahiro Kawabe
- Human Information Science Laboratories, NTT Communication Science Laboratories, Tokyo, Japan
| |
Collapse
|
6
|
Schmid AC, Boyaci H, Doerschner K. Dynamic dot displays reveal material motion network in the human brain. Neuroimage 2020; 228:117688. [PMID: 33385563 DOI: 10.1016/j.neuroimage.2020.117688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 11/20/2020] [Accepted: 12/19/2020] [Indexed: 11/26/2022] Open
Abstract
There is growing research interest in the neural mechanisms underlying the recognition of material categories and properties. This research field, however, is relatively more recent and limited compared to investigations of the neural mechanisms underlying object and scene category recognition. Motion is particularly important for the perception of non-rigid materials, but the neural basis of non-rigid material motion remains unexplored. Using fMRI, we investigated which brain regions respond preferentially to material motion versus other types of motion. We introduce a new database of stimuli - dynamic dot materials - that are animations of moving dots that induce vivid percepts of various materials in motion, e.g. flapping cloth, liquid waves, wobbling jelly. Control stimuli were scrambled versions of these same animations and rigid three-dimensional rotating dots. Results showed that isolating material motion properties with dynamic dots (in contrast with other kinds of motion) activates a network of cortical regions in both ventral and dorsal visual pathways, including areas normally associated with the processing of surface properties and shape, and extending to somatosensory and premotor cortices. We suggest that such a widespread preference for material motion is due to strong associations between stimulus properties. For example viewing dots moving in a specific pattern not only elicits percepts of material motion; one perceives a flexible, non-rigid shape, identifies the object as a cloth flapping in the wind, infers the object's weight under gravity, and anticipates how it would feel to reach out and touch the material. These results are a first important step in mapping out the cortical architecture and dynamics in material-related motion processing.
Collapse
Affiliation(s)
- Alexandra C Schmid
- Department of Psychology, Justus Liebig University Giessen, Giessen 35394, Germany.
| | - Huseyin Boyaci
- Department of Psychology, Justus Liebig University Giessen, Giessen 35394, Germany; Department of Psychology, A.S. Brain Research Center, and National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara 06800, Turkey.
| | - Katja Doerschner
- Department of Psychology, Justus Liebig University Giessen, Giessen 35394, Germany; Department of Psychology, A.S. Brain Research Center, and National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara 06800, Turkey.
| |
Collapse
|
7
|
Abstract
Many objects that we encounter have typical material qualities: spoons are hard, pillows are soft, and Jell-O dessert is wobbly. Over a lifetime of experiences, strong associations between an object and its typical material properties may be formed, and these associations not only include how glossy, rough, or pink an object is, but also how it behaves under force: we expect knocked over vases to shatter, popped bike tires to deflate, and gooey grilled cheese to hang between two slices of bread when pulled apart. Here we ask how such rich visual priors affect the visual perception of material qualities and present a particularly striking example of expectation violation. In a cue conflict design, we pair computer-rendered familiar objects with surprising material behaviors (a linen curtain shattering, a porcelain teacup wrinkling, etc.) and find that material qualities are not solely estimated from the object's kinematics (i.e., its physical [atypical] motion while shattering, wrinkling, wobbling etc.); rather, material appearance is sometimes “pulled” toward the “native” motion, shape, and optical properties that are associated with this object. Our results, in addition to patterns we find in response time data, suggest that visual priors about materials can set up high-level expectations about complex future states of an object and show how these priors modulate material appearance.
Collapse
Affiliation(s)
| | | | - Katja Doerschner
- Justus Liebig University, Giessen, Germany.,Bilkent University, Ankara, Turkey.,
| |
Collapse
|
8
|
A Computational Mechanism for Seeing Dynamic Deformation. eNeuro 2020; 7:ENEURO.0278-19.2020. [PMID: 32169883 PMCID: PMC7189489 DOI: 10.1523/eneuro.0278-19.2020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 02/10/2020] [Accepted: 02/12/2020] [Indexed: 11/21/2022] Open
Abstract
Human observers perceptually discriminate the dynamic deformation of materials in the real world. However, the psychophysical and neural mechanisms responsible for the perception of dynamic deformation have not been fully elucidated. By using a deforming bar as the stimulus, we showed that the spatial frequency of deformation was a critical determinant of deformation perception. Simulating the response of direction-selective units (i.e., MT pattern motion cells) to stimuli, we found that the perception of dynamic deformation was well explained by assuming a higher-order mechanism monitoring the spatial pattern of direction responses. Our model with the higher-order mechanism also successfully explained the appearance of a visual illusion wherein a static bar apparently deforms against a tilted drifting grating. In particular, it was the lower spatial frequencies in this pattern that strongly contributed to the deformation perception. Finally, by manipulating the luminance of the static bar, we observed that the mechanism for the illusory deformation was more sensitive to luminance than contrast cues.
Collapse
|
9
|
Kawabe T. Mid-Air Action Contributes to Pseudo-Haptic Stiffness Effects. IEEE TRANSACTIONS ON HAPTICS 2020; 13:18-24. [PMID: 31880559 DOI: 10.1109/toh.2019.2961883] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Pseudo-haptic feedback takes advantage of a cross-modal integration between vision and haptics. Previous studies have shown that object stiffness can be rendered with pseudo-haptic feedback with external haptic inputs. This article explored whether the pseudo-haptic feedback was feasible with a mid-air action wherein no external haptic input was given. On each trial of the experiments, participants conduced a mid-air action to laterally move their hands as if they horizontally stretched an object in the display. In synchronized with the hands' motion, the object horizontally deformed. The magnitude of the object deformation varied with the horizontal distance between participants' hands (i.e., a hand distance). The ratio of deformation magnitudes to the hand distance (i.e., a deformation-to-distance ratio) was controlled; With a larger ratio, a smaller hand distance produced the maximum level of object deformation. The Poisson's ratio was also controlled; a higher Poisson's ratio produced a larger magnitude of vertical deformation. The participants were asked to report the stiffness of the objects with a five-point rating scale. Consequently, the stiffness rating decreased with the deformation-distance ratio and with the Poisson's ratio. The results indicate that pseudo-haptic stiffness can be rendered with mid-air action by manipulating the deformation-distance ratio and Poisson's ratio.
Collapse
|