1
|
Packard PA, Soto-Faraco S. Crossmodal semantic congruence and rarity improve episodic memory. Mem Cognit 2025:10.3758/s13421-024-01659-9. [PMID: 39971892 DOI: 10.3758/s13421-024-01659-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2024] [Indexed: 02/21/2025]
Abstract
Semantic congruence across sensory modalities at encoding of information has been shown to improve memory performance over a short time span. However, the beneficial effect of crossmodal congruence is less well established when it comes to episodic memories over longer retention periods. This gap in knowledge is particularly wide for cross-modal semantic congruence under incidental encoding conditions, a process that is especially relevant in everyday life. Here, we present the results of a series of four experiments (total N = 232) using the dual-process signal detection model to examine crossmodal semantic effects on recollection and familiarity. In Experiment 1, we established the beneficial effects of crossmodal semantics in younger adults: hearing congruent compared with incongruent object sounds during the incidental encoding of object images increased recollection and familiarity after 48 h. In Experiment 2 we reproduced and extended the finding to a sample of older participants (50-65 years old): older people displayed a commensurable crossmodal congruence effect, despite a selective decline in recollection compared with younger adults. In Experiment 3, we showed that crossmodal facilitation is resilient to large imbalances between the frequency of congruent versus incongruent events (from 10 to 90%): Albeit rare events are more memorable than frequent ones overall, the impact of this rarity effect on the crossmodal benefit was small, and only affected familiarity. Collectively, these findings reveal a robust crossmodal semantic congruence effect for incidentally encoded visual stimuli over a long retention span, bearing the hallmarks of episodic memory enhancement.
Collapse
Affiliation(s)
- Pau Alexander Packard
- Center for Brain and Cognition, Universitat Pompeu Fabra, Carrer de Ramon Trias Fargas, 25-27, 08005, Barcelona, Spain
| | - Salvador Soto-Faraco
- Center for Brain and Cognition, Universitat Pompeu Fabra, Carrer de Ramon Trias Fargas, 25-27, 08005, Barcelona, Spain.
- Institució Catalana de Recerca I Estudis Avançats, ICREA, Barcelona, Spain.
| |
Collapse
|
2
|
Kvasova D, Coll L, Stewart T, Soto-Faraco S. Crossmodal semantic congruence guides spontaneous orienting in real-life scenes. PSYCHOLOGICAL RESEARCH 2024; 88:2138-2148. [PMID: 39105825 DOI: 10.1007/s00426-024-02018-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 07/24/2024] [Indexed: 08/07/2024]
Abstract
In real-world scenes, the different objects and events are often interconnected within a rich web of semantic relationships. These semantic links help parse information efficiently and make sense of the sensory environment. It has been shown that, during goal-directed search, hearing the characteristic sound of an everyday life object helps finding the affiliate objects in artificial visual search arrays as well as in naturalistic, real-life videoclips. However, whether crossmodal semantic congruence also triggers orienting during spontaneous, not goal-directed observation is unknown. Here, we investigated this question addressing whether crossmodal semantic congruence can attract spontaneous, overt visual attention when viewing naturalistic, dynamic scenes. We used eye-tracking whilst participants (N = 45) watched video clips presented alongside sounds of varying semantic relatedness with objects present within the scene. We found that characteristic sounds increased the probability of looking at, the number of fixations to, and the total dwell time on semantically corresponding visual objects, in comparison to when the same scenes were presented with semantically neutral sounds or just with background noise only. Interestingly, hearing object sounds not met with an object in the scene led to increased visual exploration. These results suggest that crossmodal semantic information has an impact on spontaneous gaze on realistic scenes, and therefore on how information is sampled. Our findings extend beyond known effects of object-based crossmodal interactions with simple stimuli arrays and shed new light on the role that audio-visual semantic relationships out in the perception of everyday life scenarios.
Collapse
Affiliation(s)
- Daria Kvasova
- Center for Brain and Cognition, Department of Communication and Information Technologies, Universitat Pompeu Fabra, Carrer de Ramón Trias i Fargas 25-27, Barcelona, 08005, Spain
| | - Llucia Coll
- Multiple Sclerosis Centre of Catalonia (Cemcat), Hospital Universitari Vall d'Hebron, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Travis Stewart
- Center for Brain and Cognition, Department of Communication and Information Technologies, Universitat Pompeu Fabra, Carrer de Ramón Trias i Fargas 25-27, Barcelona, 08005, Spain
| | - Salvador Soto-Faraco
- Center for Brain and Cognition, Department of Communication and Information Technologies, Universitat Pompeu Fabra, Carrer de Ramón Trias i Fargas 25-27, Barcelona, 08005, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig de Lluís Companys, 23, Barcelona, 08010, Spain.
| |
Collapse
|
3
|
Gao M, Zhu W, Drewes J. The temporal dynamics of conscious and unconscious audio-visual semantic integration. Heliyon 2024; 10:e33828. [PMID: 39055801 PMCID: PMC11269866 DOI: 10.1016/j.heliyon.2024.e33828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 06/11/2024] [Accepted: 06/27/2024] [Indexed: 07/28/2024] Open
Abstract
We compared the time course of cross-modal semantic effects induced by both naturalistic sounds and spoken words on the processing of visual stimuli, whether visible or suppressed form awareness through continuous flash suppression. We found that, under visible conditions, spoken words elicited audio-visual semantic effects over longer time (-1000, -500, -250 ms SOAs) than naturalistic sounds (-500, -250 ms SOAs). Performance was generally better with auditory primes, but more so with congruent stimuli. Spoken words presented in advance (-1000, -500 ms) outperformed naturalistic sounds; the opposite was true for (near-)simultaneous presentations. Congruent spoken words demonstrated superior categorization performance compared to congruent naturalistic sounds. The audio-visual semantic congruency effect still occurred with suppressed visual stimuli, although without significant variations in the temporal patterns between auditory types. These findings indicate that: 1. Semantically congruent auditory input can enhance visual processing performance, even when the visual stimulus is imperceptible to conscious awareness. 2. The temporal dynamics is contingent on the auditory types only when the visual stimulus is visible. 3. Audiovisual semantic integration requires sufficient time for processing auditory information.
Collapse
Affiliation(s)
- Mingjie Gao
- School of Information Science, Yunnan University, Kunming, China
| | - Weina Zhu
- School of Information Science, Yunnan University, Kunming, China
| | - Jan Drewes
- Institute of Brain and Psychological Sciences, Sichuan Normal University, Chengdu, China
| |
Collapse
|
4
|
Wegner-Clemens K, Malcolm GL, Shomstein S. Predicting attentional allocation in real-world environments: The need to investigate crossmodal semantic guidance. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2024; 15:e1675. [PMID: 38243393 DOI: 10.1002/wcs.1675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/21/2024]
Abstract
Real-world environments are multisensory, meaningful, and highly complex. To parse these environments in a highly efficient manner, a subset of this information must be selected both within and across modalities. However, the bulk of attention research has been conducted within sensory modalities, with a particular focus on vision. Visual attention research has made great strides, with over a century of research methodically identifying the underlying mechanisms that allow us to select critical visual information. Spatial attention, attention to features, and object-based attention have all been studied extensively. More recently, research has established semantics (meaning) as a key component to allocating attention in real-world scenes, with the meaning of an item or environment affecting visual attentional selection. However, a full understanding of how semantic information modulates real-world attention requires studying more than vision in isolation. The world provides semantic information across all senses, but with this extra information comes greater complexity. Here, we summarize visual attention (including semantic-based visual attention), crossmodal attention, and argue for the importance of studying crossmodal semantic guidance of attention. This article is categorized under: Psychology > Attention Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Kira Wegner-Clemens
- Psychological and Brain Sciences, George Washington University, Washington, DC, USA
| | | | - Sarah Shomstein
- Psychological and Brain Sciences, George Washington University, Washington, DC, USA
| |
Collapse
|
5
|
Salselas I, Pereira F, Sousa E. Inducing visual attention through audiovisual stimuli: Can synchronous sound be a salient event? Perception 2024; 53:31-43. [PMID: 37872670 PMCID: PMC10798022 DOI: 10.1177/03010066231208127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 09/29/2023] [Indexed: 10/25/2023]
Abstract
We present an experimental research aiming to explore how spatial attention may be biased through auditory stimuli. In particular, we investigate how synchronous sound and image may affect attention and increase the saliency of the audiovisual event. We have designed and implemented an experimental study where subjects, wearing an eye-tracking system, were examined regarding their gaze toward the audiovisual stimuli being displayed. The audiovisual stimuli were specifically tailored for this experiment, consisting of videos contrasting in terms of Synch Points (i.e., moments where a visual event is associated with a visible trigger movement, synchronous with its correspondent sound). While consistency across audiovisual sensory modalities revealed to be an attention-drawing feature, when combined with synchrony, it clearly emphasized the biasing, triggering orienting, that is, focal attention towards the particular scene that contains the Synch Point. Consequently, results revealed synchrony to be a saliency factor, contributing to the strengthening of the focal attention.
Collapse
Affiliation(s)
| | | | - Emanuel Sousa
- Centro de computação Gráfica, Portugal
- Centro Algoritmi, Portugal
| |
Collapse
|
6
|
Zhao S, Wang C, Chen M, Zhai M, Leng X, Zhao F, Feng C, Feng W. Cross-modal enhancement of spatially unpredictable visual target discrimination during the attentional blink. Atten Percept Psychophys 2023; 85:2178-2195. [PMID: 37312000 DOI: 10.3758/s13414-023-02739-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/20/2023] [Indexed: 06/15/2023]
Abstract
The attentional blink can be substantially reduced by delivering a task-irrelevant sound synchronously with the second target (T2) embedded in a rapid serial visual presentation stream, which is further modulated by the semantic congruency between the sound and T2. The present study extended the cross-modal boost during attentional blink and the modulation of audiovisual semantic congruency in the spatial domain by showing that a spatially uninformative, semantically congruent (but not incongruent) sound could even improve the discrimination of spatially unpredictable T2 during attentional blink. T2-locked event-related potential (ERP) data yielded that the early cross-modal P195 difference component (184-234 ms) over the occipital scalp contralateral to the T2 location was larger preceding accurate than inaccurate discriminations of semantically congruent, but not incongruent, audiovisual T2s. Interestingly, the N2pc component (194-244 ms) associated with visual-spatial attentional allocation was enlarged for incongruent audiovisual T2s relative to congruent audiovisual and unisensory visual T2s only when they were accurately discriminated. These ERP findings suggest that the spatially extended cross-modal boost during attentional blink involves an early cross-modal interaction strengthening the perceptual processing of T2, without any sound-induced enhancement of visual-spatial attentional allocation toward T2. In contrast, the absence of an accuracy decrease in response to semantically incongruent audiovisual T2s may originate from the semantic mismatch capturing extra visual-spatial attentional resources toward T2.
Collapse
Affiliation(s)
- Song Zhao
- Department of Psychology, School of Education, Soochow University, Suzhou, 215123, Jiangsu, China
| | - Chongzhi Wang
- Department of Psychology, School of Education, Soochow University, Suzhou, 215123, Jiangsu, China
| | - Minran Chen
- Department of Psychology, School of Education, Soochow University, Suzhou, 215123, Jiangsu, China
| | - Mengdie Zhai
- Department of Psychology, School of Education, Soochow University, Suzhou, 215123, Jiangsu, China
| | - Xuechen Leng
- Department of Psychology, School of Education, Soochow University, Suzhou, 215123, Jiangsu, China
| | - Fan Zhao
- Department of Psychology, School of Education, Soochow University, Suzhou, 215123, Jiangsu, China
| | - Chengzhi Feng
- Department of Psychology, School of Education, Soochow University, Suzhou, 215123, Jiangsu, China.
| | - Wenfeng Feng
- Department of Psychology, School of Education, Soochow University, Suzhou, 215123, Jiangsu, China.
- Research Center for Psychology and Behavioral Sciences, Soochow University, Suzhou, 215123, Jiangsu, China.
| |
Collapse
|
7
|
Yuan Y, He X, Yue Z. Working memory load modulates the processing of audiovisual distractors: A behavioral and event-related potentials study. Front Integr Neurosci 2023; 17:1120668. [PMID: 36908504 PMCID: PMC9995450 DOI: 10.3389/fnint.2023.1120668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Accepted: 01/30/2023] [Indexed: 02/25/2023] Open
Abstract
The interplay between different modalities can help to perceive stimuli more effectively. However, very few studies have focused on how multisensory distractors affect task performance. By adopting behavioral and event-related potentials (ERPs) techniques, the present study examined whether multisensory audiovisual distractors could attract attention more effectively than unisensory distractors. Moreover, we explored whether such a process was modulated by working memory load. Across three experiments, n-back tasks (1-back and 2-back) were adopted with peripheral auditory, visual, or audiovisual distractors. Visual and auditory distractors were white discs and pure tones (Experiments 1 and 2), pictures and sounds of animals (Experiment 3), respectively. Behavioral results in Experiment 1 showed a significant interference effect under high working memory load but not under low load condition. The responses to central letters with audiovisual distractors were significantly slower than those to letters without distractors, while no significant difference was found between unisensory distractor and without distractor conditions. Similarly, ERP results in Experiments 2 and 3 showed that there existed an integration only under high load condition. That is, an early integration for simple audiovisual distractors (240-340 ms) and a late integration for complex audiovisual distractors (440-600 ms). These findings suggest that multisensory distractors can be integrated and effectively attract attention away from the main task, i.e., interference effect. Moreover, this effect is pronounced only under high working memory load condition.
Collapse
Affiliation(s)
- Yichen Yuan
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| | - Xiang He
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| | - Zhenzhu Yue
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
8
|
How much is a cow like a meow? A novel database of human judgements of audiovisual semantic relatedness. Atten Percept Psychophys 2022; 84:1317-1327. [PMID: 35449432 DOI: 10.3758/s13414-022-02488-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2022] [Indexed: 11/08/2022]
Abstract
Semantic information about objects, events, and scenes influences how humans perceive, interact with, and navigate the world. The semantic information about any object or event can be highly complex and frequently draws on multiple sensory modalities, which makes it difficult to quantify. Past studies have primarily relied on either a simplified binary classification of semantic relatedness based on category or on algorithmic values based on text corpora rather than human perceptual experience and judgement. With the aim to further accelerate research into multisensory semantics, we created a constrained audiovisual stimulus set and derived similarity ratings between items within three categories (animals, instruments, household items). A set of 140 participants provided similarity judgments between sounds and images. Participants either heard a sound (e.g., a meow) and judged which of two pictures of objects (e.g., a picture of a dog and a duck) it was more similar to, or saw a picture (e.g., a picture of a duck) and selected which of two sounds it was more similar to (e.g., a bark or a meow). Judgements were then used to calculate similarity values of any given cross-modal pair. An additional 140 participants provided word judgement to calculate similarity of word-word pairs. The derived and reported similarity judgements reflect a range of semantic similarities across three categories and items, and highlight similarities and differences among similarity judgments between modalities. We make the derived similarity values available in a database format to the research community to be used as a measure of semantic relatedness in cognitive psychology experiments, enabling more robust studies of semantics in audiovisual environments.
Collapse
|
9
|
Abstract
Increasing research has revealed that uninformative spatial sounds facilitate the early processing of visual stimuli. This study examined the crossmodal interactions of semantically congruent stimuli by assessing whether the presentation of event-related characteristic sounds facilitated or interfered with the visual search for corresponding event scenes in pictures. The search array consisted of four images: one target and three non-target pictures. Auditory stimuli were presented to participants in synchronization with picture onset using three types of sounds: a sound congruent with a target, a sound congruent with a distractor, or a control sound. The control sound varied across six experiments, alternating between a sound unrelated to the search stimuli, white noise, and no sound. Participants were required to swiftly localize a target position while ignoring the sound presentation. Visual localization resulted in rapid responses when a sound that was semantically related to the target was played. Furthermore, when a sound was semantically related to a distractor picture, the response times were longer. When the distractor-congruent sound was used, participants incorrectly localized the distractor position more often than at the chance level. These findings were replicated when the experiments ruled out the possibility that participants would learn picture-sound pairs during the visual tasks (i.e., the possibility of brief training during the experiments). Overall, event-related crossmodal interactions occur based on semantic representations, and audiovisual associations may develop as a result of long-term experiences rather than brief training in a laboratory.
Collapse
|
10
|
Almadori E, Mastroberardino S, Botta F, Brunetti R, Lupiáñez J, Spence C, Santangelo V. Crossmodal Semantic Congruence Interacts with Object Contextual Consistency in Complex Visual Scenes to Enhance Short-Term Memory Performance. Brain Sci 2021; 11:brainsci11091206. [PMID: 34573227 PMCID: PMC8467083 DOI: 10.3390/brainsci11091206] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/30/2021] [Accepted: 09/09/2021] [Indexed: 11/17/2022] Open
Abstract
Object sounds can enhance the attentional selection and perceptual processing of semantically-related visual stimuli. However, it is currently unknown whether crossmodal semantic congruence also affects the post-perceptual stages of information processing, such as short-term memory (STM), and whether this effect is modulated by the object consistency with the background visual scene. In two experiments, participants viewed everyday visual scenes for 500 ms while listening to an object sound, which could either be semantically related to the object that served as the STM target at retrieval or not. This defined crossmodal semantically cued vs. uncued targets. The target was either in- or out-of-context with respect to the background visual scene. After a maintenance period of 2000 ms, the target was presented in isolation against a neutral background, in either the same or different spatial position as in the original scene. The participants judged the same vs. different position of the object and then provided a confidence judgment concerning the certainty of their response. The results revealed greater accuracy when judging the spatial position of targets paired with a semantically congruent object sound at encoding. This crossmodal facilitatory effect was modulated by whether the target object was in- or out-of-context with respect to the background scene, with out-of-context targets reducing the facilitatory effect of object sounds. Overall, these findings suggest that the presence of the object sound at encoding facilitated the selection and processing of the semantically related visual stimuli, but this effect depends on the semantic configuration of the visual scene.
Collapse
Affiliation(s)
- Erika Almadori
- Neuroimaging Laboratory, IRCCS Santa Lucia Foundation, Via Ardeatina 306, 00179 Rome, Italy;
| | - Serena Mastroberardino
- Department of Psychology, School of Medicine & Psychology, Sapienza University of Rome, Via dei Marsi 78, 00185 Rome, Italy;
| | - Fabiano Botta
- Department of Experimental Psychology and Mind, Brain, and Behavior Research Center (CIMCYC), University of Granada, 18071 Granada, Spain; (F.B.); (J.L.)
| | - Riccardo Brunetti
- Cognitive and Clinical Psychology Laboratory, Department of Human Sciences, Università Europea di Roma, 00163 Roma, Italy;
| | - Juan Lupiáñez
- Department of Experimental Psychology and Mind, Brain, and Behavior Research Center (CIMCYC), University of Granada, 18071 Granada, Spain; (F.B.); (J.L.)
| | - Charles Spence
- Department of Experimental Psychology, Oxford University, Oxford OX2 6GG, UK;
| | - Valerio Santangelo
- Neuroimaging Laboratory, IRCCS Santa Lucia Foundation, Via Ardeatina 306, 00179 Rome, Italy;
- Department of Philosophy, Social Sciences & Education, University of Perugia, Piazza G. Ermini, 1, 06123 Perugia, Italy
- Correspondence:
| |
Collapse
|
11
|
Wöhner S, Jescheniak JD, Mädebach A. Semantic interference is not modality specific: Evidence from sound naming with distractor pictures. Q J Exp Psychol (Hove) 2020; 73:2290-2308. [PMID: 32640868 DOI: 10.1177/1747021820943130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In three experiments, participants named environmental sounds (e.g., the bleating of a sheep by producing the word "sheep") in the presence of distractor pictures. In Experiment 1, we observed faster responses in sound naming with congruent pictures (e.g., sheep; congruency facilitation) and slower responses with semantically related pictures (e.g., donkey; semantic interference), each compared with unrelated pictures (e.g., violin). In Experiments 2 and 3, we replicated these effects and used a psychological refractory period approach (combining an arrow decision or letter rotation task as Task 1 with sound naming as Task 2) to investigate the locus of the effects. Congruency facilitation was underadditive with dual-task interference suggesting that it arises, in part, during pre-central processing stages in sound naming (i.e., sound identification). In contrast, semantic interference was additive with dual-task interference suggesting that it arises during central (or post-central) processing stages in sound naming (i.e., response selection or later processes). These results demonstrate the feasibility of sound naming tasks for chronometric investigations of word production. Furthermore, they highlight that semantic interference is not restricted to the use of target pictures and distractor words but can be observed with quite different target-distractor configurations. The experiments support the view that congruency facilitation and semantic interference reflect some general cognitive mechanism involved in word production. These results are discussed in the context of the debate about semantic-lexical selection mechanisms in word production.
Collapse
Affiliation(s)
- Stefan Wöhner
- Institut für Psychologie - Wilhelm Wundt, Universität Leipzig, Leipzig, Germany
| | - Jörg D Jescheniak
- Institut für Psychologie - Wilhelm Wundt, Universität Leipzig, Leipzig, Germany
| | - Andreas Mädebach
- Institut für Psychologie - Wilhelm Wundt, Universität Leipzig, Leipzig, Germany.,Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|