Back to Search Start Over

Probing the link between vision and language in material perception using psychophysics and unsupervised learning.

Authors :
Liao, Chenxi
Sawayama, Masataka
Xiao, Bei
Source :
PLoS Computational Biology; 10/3/2024, Vol. 20 Issue 10, p1-23, 23p
Publication Year :
2024

Abstract

We can visually discriminate and recognize a wide range of materials. Meanwhile, we use language to describe what we see and communicate relevant information about the materials. Here, we investigate the relationship between visual judgment and language expression to understand how visual features relate to semantic representations in human cognition. We use deep generative models to generate images of realistic materials. Interpolating between the generative models enables us to systematically create material appearances in both well-defined and ambiguous categories. Using these stimuli, we compared the representations of materials from two behavioral tasks: visual material similarity judgments and free-form verbal descriptions. Our findings reveal a moderate but significant correlation between vision and language on a categorical level. However, analyzing the representations with an unsupervised alignment method, we discover structural differences that arise at the image-to-image level, especially among ambiguous materials morphed between known categories. Moreover, visual judgments exhibit more individual differences compared to verbal descriptions. Our results show that while verbal descriptions capture material qualities on the coarse level, they may not fully convey the visual nuances of material appearances. Analyzing the image representation of materials obtained from various pre-trained deep neural networks, we find that similarity structures in human visual judgments align more closely with those of the vision-language models than purely vision-based models. Our work illustrates the need to consider the vision-language relationship in building a comprehensive model for material perception. Moreover, we propose a novel framework for evaluating the alignment and misalignment between representations from different modalities, leveraging information from human behaviors and computational models. Author summary: Materials are building blocks of our environment, granting access to a wide array of visual experiences. The immense diversity, complexity, and versatility of materials present challenges in verbal articulation. To what extent can words convey the richness of visual material perception? What are the salient attributes for communicating about materials? We address these questions by measuring both visual material similarity judgments and free-form verbal descriptions. We use AI models to create a diverse array of plausible visual appearances of familiar and unfamiliar materials. Our findings reveal a moderate vision-language correlation within individual participants, yet a notable discrepancy persists between the two modalities. While verbal descriptions capture material qualities at a coarse categorical level, precise alignment between vision and language at the individual stimulus level is still lacking. These results highlight that visual representations of materials are richer than verbalized semantic features, underscoring the differential roles of language and vision in perception. Lastly, we discover that deep neural networks pre-trained on large-scale datasets can predict human visual similarities at a coarse level, suggesting the general visual representations learned by these networks carry perceptually relevant information for material-relevant tasks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1553734X
Volume :
20
Issue :
10
Database :
Complementary Index
Journal :
PLoS Computational Biology
Publication Type :
Academic Journal
Accession number :
180086292
Full Text :
https://doi.org/10.1371/journal.pcbi.1012481